FAQ : IDL Machine Learning

Does IDL include Machine Learning functionalities?

Yes: It is called the IDL Machine Learning framework. It provides a set of routines to apply Machine Learning to numerical data. It allows you to create and train models and apply those models in classification, clustering, or regression applications.

See links below for details and examples

https://www.l3harrisgeospatial.com/docs/Machine_Learning.html

https://www.l3harrisgeospatial.com/docs/idlml_workflow.html

What type of data can I use with machine learning?

Input data should be a set of numerical attributes used to feed the model. The data will be used to create, train, and test the model. Afterwards the given models can be used to classify other sets of similar data

Example: the below example shows a data sample of 3 different seed types based on 7 numerical attributes

Area	Perimeter	Compactness	Length of Kernel	Width of Kernel	Asymmetry Coefficient	Length of Kernel Groove	Seed Type
15.26	14.84	0.871	5.763	3.312	2.221	5.22	Kama
14.88	14.57	0.8811	5.554	3.333	1.018	4.956	Kama
13.34	13.95	0.862	5.389	3.074	5.995	5.307	Canadian
12.22	13.32	0.8652	5.224	2.967	5.469	5.221	Canadian
21.18	17.21	0.8989	6.573	4.033	5.78	6.231	Rosa
20.88	17.05	0.9031	6.45	4.032	5.016	6.321	Rosa

What is the standard workflow to work with Machine Learning in IDL?

Data preparation: it includes
- data normalization as machine algorithms works best with numerical data included between 0 and 1 value
- data separation: 2 groups of data will be defined from the input set – one for the model training and one to test the model and evaluate its accuracy
Classification: different methods are available (Neural network, Softmax, Vector machine). The first step consists in creating and training the model, then new datasets can be classified using this model.
Clustering and/or regression may complete the workflow

Why does running the classification twice on the same dataset - with the exact same parameters - provide different outputs?

Data used to create and train the model can be organized for example with all samples for class 1 first, then all samples for class 2, etc. Such organization of input data may create a bias in the way the model will be trained, by giving more weight to the first class for example.

To prevent such bias, machine learning algorithms are ingesting the data samples into the model in a random way by default. Thus, the order, in which the data are input, differs from one test to another, which results in differences in the outputs. However, the aim is always to tend to the best ideal models so those outputs should be similar.

Is there a way to force the machine learning workflow to be reproduceable?

Yes: If you need to reproduce the exact same workflow using machine learning, you need to of course input he same dataset and use the same methods and parameters for the workflow. In addition, you will need to set the SEED keyword when running:

data shuffle with IDLmlShuffle

https://www.l3harrisgeospatial.com/docs/idlmlshuffle.html

the classification method

How do you control the convergence of the model?

One way to analyze how a model is converging to the ideal solution is to look at the Loss plot during the model train iterations.

This plot should ideally show a decreasing and a convergence to 0.

If you observe a stagnation of the loss value after a certain number of iterations -i.e. it no longer decreases - it means that parameters of classification should be adjusted

How do you adjust classification parameters during model train?

Increase the number of iterations for model training to be sure to reach the best convergence of the model

The optimization parameter: it controls the increment that tends to the ideal model. If it is too large it may result in a non-convergence to the ideal model at the end of the iterations for model creation –

One adjustment consists in running several sets of iterations with a decreasing optimization parameter. For example:

Optimizer = IDLmloptAdam(0.1)

> then n iterations of model training

Optimizer = IDLmloptAdam(0.05)

> then n iterations of model training

Optimizer = IDLmloptAdam(0.02)

> then n iterations of model training

….

Add intermediate layers in the classification during model training

Classifier = IDLmlFeedForwardNeuralNetwork([nAttributes, 8, nLabels], uniqueLabels)

Some classifiers allow you to define intermediate layers composed of a defined number of nodes. In order to find the best model for your data you can increase the number of intermediate layers – and the number of nodes for each layer

For example: add 3 intermediate layers with 16 nodes each

Classifier = IDLmlFeedForwardNeuralNetwork([nAttributes, 16,16,16, nLabels], uniqueLabels)

----------------------------------

created by BC on 10/27/2021

reviewed by IE and TS

Enterprise Analysis and Management System to Monitor Water Quality

Monitor Agriculture with SAR | The SAR Insider Series