1935
FAQ : IDL Machine Learning
- Does IDL include Machine Learning functionalities?
Yes: It is called the IDL Machine Learning framework. It provides a set of routines to apply Machine Learning to numerical data. It allows you to create and train models and apply those models in classification, clustering, or regression applications.
See links below for details and examples
https://www.l3harrisgeospatial.com/docs/Machine_Learning.html
https://www.l3harrisgeospatial.com/docs/idlml_workflow.html
- What type of data can I use with machine learning?
Input data should be a set of numerical attributes used to feed the model. The data will be used to create, train, and test the model. Afterwards the given models can be used to classify other sets of similar data
Example: the below example shows a data sample of 3 different seed types based on 7 numerical attributes
Area
|
Perimeter
|
Compactness
|
Length
of Kernel
|
Width
of Kernel
|
Asymmetry
Coefficient
|
Length of
Kernel Groove
|
Seed
Type
|
15.26
|
14.84
|
0.871
|
5.763
|
3.312
|
2.221
|
5.22
|
Kama
|
14.88
|
14.57
|
0.8811
|
5.554
|
3.333
|
1.018
|
4.956
|
Kama
|
13.34
|
13.95
|
0.862
|
5.389
|
3.074
|
5.995
|
5.307
|
Canadian
|
12.22
|
13.32
|
0.8652
|
5.224
|
2.967
|
5.469
|
5.221
|
Canadian
|
21.18
|
17.21
|
0.8989
|
6.573
|
4.033
|
5.78
|
6.231
|
Rosa
|
20.88
|
17.05
|
0.9031
|
6.45
|
4.032
|
5.016
|
6.321
|
Rosa
|
- What is the standard workflow to work with Machine Learning in IDL?
- Data preparation: it includes
- data normalization as machine algorithms works best with numerical data included between 0 and 1 value
- data separation: 2 groups of data will be defined from the input set – one for the model training and one to test the model and evaluate its accuracy
- Classification: different methods are available (Neural network, Softmax, Vector machine). The first step consists in creating and training the model, then new datasets can be classified using this model.
- Clustering and/or regression may complete the workflow
- Why does running the classification twice on the same dataset - with the exact same parameters - provide different outputs?
Data used to create and train the model can be organized for example with all samples for class 1 first, then all samples for class 2, etc. Such organization of input data may create a bias in the way the model will be trained, by giving more weight to the first class for example.
To prevent such bias, machine learning algorithms are ingesting the data samples into the model in a random way by default. Thus, the order, in which the data are input, differs from one test to another, which results in differences in the outputs. However, the aim is always to tend to the best ideal models so those outputs should be similar.
- Is there a way to force the machine learning workflow to be reproduceable?
Yes: If you need to reproduce the exact same workflow using machine learning, you need to of course input he same dataset and use the same methods and parameters for the workflow. In addition, you will need to set the SEED keyword when running:
- data shuffle with IDLmlShuffle
https://www.l3harrisgeospatial.com/docs/idlmlshuffle.html
- the classification method
- How do you control the convergence of the model?
One way to analyze how a model is converging to the ideal solution is to look at the Loss plot during the model train iterations.
This plot should ideally show a decreasing and a convergence to 0.
If you observe a stagnation of the loss value after a certain number of iterations -i.e. it no longer decreases - it means that parameters of classification should be adjusted
- How do you adjust classification parameters during model train?
- Increase the number of iterations for model training to be sure to reach the best convergence of the model
- The optimization parameter: it controls the increment that tends to the ideal model. If it is too large it may result in a non-convergence to the ideal model at the end of the iterations for model creation –
One adjustment consists in running several sets of iterations with a decreasing optimization parameter. For example:
Optimizer = IDLmloptAdam(0.1)
> then n iterations of model training
Optimizer = IDLmloptAdam(0.05)
> then n iterations of model training
Optimizer = IDLmloptAdam(0.02)
> then n iterations of model training
….
- Add intermediate layers in the classification during model training
Classifier = IDLmlFeedForwardNeuralNetwork([nAttributes, 8, nLabels], uniqueLabels)
Some classifiers allow you to define intermediate layers composed of a defined number of nodes. In order to find the best model for your data you can increase the number of intermediate layers – and the number of nodes for each layer
For example: add 3 intermediate layers with 16 nodes each
Classifier = IDLmlFeedForwardNeuralNetwork([nAttributes, 16,16,16, nLabels], uniqueLabels)
----------------------------------
created by BC on 10/27/2021
reviewed by IE and TS