The IDL Machine Learning framework provides a powerful and flexible way to run machine learning applications on numerical data. You can create and train models and apply them in classification, clustering, or regression applications.

For a detailed example of how to use the IDL Machine Learning Framework to train a model that learns to recognize hand-written digits, compile and run classify_digits.pro, which is located in the examples/machine_learning directory of your IDL installation.

This topic provides basic concepts and workflows to get started with IDL Machine Learning. The following steps are described here.

  • Prepare Data: How to get data ready to use in machine learning.
  • Classification: Use examples to train a model that will predict a discrete output class (only a finite number of outputs is possible).
  • Clustering: Use examples to train a model that will cluster a dataset into a given number of groups or clusters.
  • Regression: Use examples to train a model that will predict a continuous output value (an infinite number of output values is possible).

Prepare Data


The first step in machine learning is to identify your input data. The input data is a set of numerical attributes that will feed your model to produce an output. There are two important considerations when preparing your data:

  1. Data normalization: Machine learning algorithms work best if the data is constrained to a [0 to 1] or [-1 to 1] range. Use a normalizer to perform this task, which is described in the examples that follow. The normalizers are: IDLmlLinearNormalizer, IDLmlRangeNormalizer, IDLmlTanHNormalizer, IDLmlUnitNormalizer, IDLmlVarianceNormalizer.
  2. Data separation: Prepare two groups of data. One group is used for training the model, the other is used to test its accuracy. It is recommended that you do not use the same data for training and testing. Depending on how you read your data, two helper routines can split the data: IDLmlShuffle and IDLmlPartition. Shuffling randomizes the order of the features and values. Partitioning separates the data into two or more groups, where each group contains a specific quantity of the elements.

To illustrate these steps, start by reading some data:

read_seeds_example_data, features, labels

This routine reads a CSV file that ships with IDL and returns two arrays: Features and Labels. Features is an array of 7 x 210 elements. Each column represents a different attribute (area, perimeter, compactness, length, width, asymmetry coefficient, and length of kernel groove), and there are 210 different instances. Labels is a string array of 210 elements that contain the type of seed (Kama, Rosa, or Canadian).

Below shows a preview of 13 rows (out of 210) of the CSV file:

Area

Perimeter

Compactness

Length
of Kernel

Width
of Kernel

Asymmetry
Coefficient

Length of
Kernel Groove

Seed
Type

15.26

14.84

0.871

5.763

3.312

2.221

5.22

Kama

14.88

14.57

0.8811

5.554

3.333

1.018

4.956

Kama

14.29

14.09

0.905

5.291

3.337

2.699

4.825

Kama

13.84

13.94

0.8955

5.324

3.379

2.259

4.805

Kama

 

 

 

 

 

 

 

 

13.07

13.92

0.848

5.472

2.994

5.304

5.395

Canadian

13.32

13.94

0.8613

5.541

3.073

7.035

5.44

Canadian

13.34

13.95

0.862

5.389

3.074

5.995

5.307

Canadian

12.22

13.32

0.8652

5.224

2.967

5.469

5.221

Canadian

 

 

 

 

 

 

 

 

21.18

17.21

0.8989

6.573

4.033

5.78

6.231

Rosa

20.88

17.05

0.9031

6.45

4.032

5.016

6.321

Rosa

20.1

16.99

0.8746

6.581

3.785

1.955

6.449

Rosa

18.76

16.2

0.8984

6.172

3.796

3.12

6.053

Rosa

18.81

16.29

0.8906

6.272

3.693

3.237

6.053

Rosa

Below is the output if you print the Features array. This example shows only the first 10 lines of the output:

IDL> print, features
  15.260000    14.840000    0.87100000    5.7630000    3.3120000    2.2210000    5.2200000
  14.880000    14.570000    0.88110000    5.5540000    3.3330000    1.0180000    4.9560000
  14.290000    14.090000    0.90500000    5.2910000    3.3370000    2.6990000    4.8250000
  13.840000    13.940000    0.89550000    5.3240000    3.3790000    2.2590000    4.8050000
  16.140000    14.990000    0.90340000    5.6580000    3.5620000    1.3550000    5.1750000
  14.380000    14.210000    0.89510000    5.3860000    3.3120000    2.4620000    4.9560000
  14.690000    14.490000    0.87990000    5.5630000    3.2590000    3.5860000    5.2190000
  14.110000    14.100000    0.89110000    5.4200000    3.3020000    2.7000000    5.0000000
  16.630000    15.460000    0.87470000    6.0530000    3.4650000    2.0400000    5.8770000
  16.440000    15.250000    0.88800000    5.8840000    3.5050000    1.9690000    5.5330000
  15.260000    14.850000    0.86960000    5.7140000    3.2420000    4.5430000    5.3140000

The steps below will prepare the data for use in the classification example that follows this section.

To prepare the data, start by normalizing it.

Normalizer = IDLmlVarianceNormalizer(features)
Normalizer.Normalize, features

Next, shuffle the data, then split it in two groups. One group will have 80% of the samples (for training), the other group will have 20% of the samples (for testing).

IDLmlShuffle, features, labels
part = IDLmlPartition({train:80, test:20}, features, labels)

The following variables are now ready for use:

part.train.features: A 7 x 168 array of features that will be used for training.

part.train.labels: A 168 string array of labels that will be used for training.

part.test.features: A 7 x 42 array of features that will be used for testing.

part.test.labels: A 168 string array of labels that will be used for testing.

Classification


IDL provides three models you can use for classification purposes:

The following provides a simple example of performing classification using a Support Vector Machine model.

After loading and preparing your data as described previously, the next step is to define a classifier. From the data description there are seven input attributes and three possible string outputs, so define the model as follows:

Classifier = IDLmlSupportVectorMachineClassification(7, ['Kama', 'Rosa', 'Canadian'])

To train the model, call the Train method and pass in training features and labels:

loss = Classifier.Train(part.train.features, LABELS=part.train.labels)

Notice that SVM is unique among models in that you only need to invoke the Train method once. Other classifiers require iterative training in a loop. For example, with the Softmax classifier:

SoftmaxClassifier = IDLmlSoftmax(7, ['Kama', 'Rosa', 'Canadian'])
p = Plot(Fltarr(2), title='Loss')
Loss = List()
for i=1, 200 do begin
  Loss.Add, SoftmaxClassifier.Train(part.train.features, $
    LABELS=part.train.labels)
  p.SetData, Loss.ToArray()
endfor

Within the loop, you can check the value of Loss. When it stabilizes at its lowest possible value, the model has been fully trained.

To assess the quality of a model trained for classification, take the portion of the data allocated for testing and run it through another helper function, IDLmlTestClassifier. This will return a number of indicators of the accuracy of the model’s performance:

confMatrix = IDLmlTestClassifier(Classifier, $
      part.test.features, part.test.labels, $
      ACCURACY=accuracy)
      print, accuracy

The result is 0.928571, which indicates a 92.8% classification accuracy using test data.

Now the model is ready to classify data.

print, Classifier.Classify(part.test.features[*,0])

To use the model in a future IDL session without having to retrain it, save it to a file. Also save the normalizer, since you will have to renormalize the data too:

Classifier.Save, 'c:\tmp\myclassifier.sav'
Normalizer.Save, 'c:\tmp\mynormalizer.sav'

To restore these in a future IDL session:

Classifier = IDLmlModel.Restore('c:\tmp\myclassifier.sav')
Normalizer = IDLmlNormalizer.Restore('c:\tmp\mynormalizer.sav')

Clustering


IDL provides two models you can use for clustering purposes:

This section provides a simple example of performing classification using an autoencoder. An autoencoder is a type of neural network that specializes in learning a representation of the data, which can be used group or cluster a dataset into a small set of categories.

Use a small JPEG file for this example and cluster the image into five different categories based on the RGB pixel values. To read the image:

file = Filepath('rose.jpg', subdirectory=['examples', 'data'])
data = Read_Image(file)

IDL machine learning algorithms require the input data to be stored in a 2D array of size n x m, where n is the number of attributes and m is the number of examples. This example has three attributes (the red, green, and blue values per pixel), where the number of examples is the total number of pixels in the image. The data is a 3 x 227 x 149 array that you can reform to 3 attributes by 33823 examples by doing the following:

features = Reform(data, 3, (data.dim)[1]*(data.dim)[2])

As discussed in Prepare Data, define a normalizer that will transform the data. You can use any normalizer; this example uses IDLmlVarianceNormalizer:

Normalizer = IDLmlVarianceNormalizer(features)
Normalizer.Normalize, features

Now define the autoencoder, and determine how many layers to use. In this case, use two layers. Since there are three attributes, the sizes for the first and last layer must be 3. To cluster into five categories, the size of the middle layer will be 5.

Classifier = IDLmlAutoEncoder([3, 5, 3])

Note: To fine-tune your autoencoder, you can define an Activation Function for each layer. See Defining Activation Functions for more information.

Now train the model. During training, an autoencoder will learn how to produce the input image based on features. These internal features effectively become the clustered image:

Optimizer = IDLmloptGradientDescent(0.01)
p = Plot(Fltarr(2), title='Loss')
Loss = List()
for i=1, 300 do begin
  Loss.Add, Classifier.Train(features, OPTIMIZER=Optimizer)
  p.SetData, Loss.ToArray()
endfor

Note: Training a neural network requires the use of an optimizer. An optimizer helps the neural network adjust the learning rate during training based on how quickly the model converges to a solution. The optimizers are: IDLmloptAdam, IDLmloptGradientDescent, IDLmloptMomentum, IDLmloptQuickProp, and IDLmloptRMSProp.

Tip: An instance of an optimizer should not be reused to train a different model.

To obtain the clustered image, classify the input data:

result = Classifier.Classify(features)

Display the original image next to the cluster result:

!null = Image(data, title='Original Image', layout=[2,1,1])
!null = Image(Reform(result, (data.dim)[1], (data.dim)[2]), $
  rgb_table=25, title='Clustered Image', layout=[2,1,2], /current)

Which results in an image that looks like this:

 

Regression


IDL provides three classes you can use for regression purposes:

This section provides a simple example of performing regression using a Feed Forward Neural Network model.

First, define an array with two attributes (x and y), and a particular shape for the model to learn:

size = 100
x = Findgen(size) / (size-1) * 4 - 2
y = Fltarr(size) + 1.0
xx = x # y
yy = Transpose(xx)
zz = 2.0/(exp((xx-0.5)^2+yy^2))-2.0/(exp((xx+0.5)^2+yy^2))
s = Surface(zz)

IDL machine learning algorithms require storing the input data in a 2D array of size n x m, where n is the number of attributes and m is the number of examples. In the example below, the number of attributes is 2 (the x and y components), where the number of examples is the total number of values in the array. Combine the xx and yy arrays into a 2D array with the following:

features = Transpose([[Reform(xx, size^2)], [Reform(yy, size^2)]])
scores = Reform(zz, size^2)

Now shuffle and partition the data into two groups: one group will be used for training, the other for testing. Make the groups the same size:

IDLmlShuffle, features, scores
part = IDLmlPartition({train:50, test:50}, features, scores)

As discussed in Prepare Data, define a normalizer that will transform the data. You can use any normalizer; this example uses IDLmlVarianceNormalizer:

Normalizer1 = IDLmlVarianceNormalizer(features)
Normalizer1.Normalize, features

The same applies to the model outputs. Unlike classification, where there are a finite and fixed set of outputs, the output of a regression is continuous, and it is important to keep it normalized.

Normalizer2 = IDLmlVarianceNormalizer(scores)
Normalizer2.Normalize, scores

Now define the neural network, and determine how many layers to use. In this case, use three layers. Since there are three attributes, the size for the first layer must be 3. For a regression problem, the size of the last layer will be 1, since you will not be classifying into different classes.

Model = IDLmlFeedForwardNeuralNetwork([2, 7, 7, 1], $
  ACTIVATION_FUNCTIONS=[IDLmlafArcTan(), IDLmlafArcTan(), $
    IDLmlafArcTan()])

Defining Activation Functions

Activation functions are important in machine learning. They allow neural networks (which by themselves are only linear systems) to introduce non-linearities, thus making them able to model more complex functions. IDL includes the following activation functions:

Choosing the right activation function for each layer is crucial to the accuracy of the neural network. There are very few guidelines; therefore, in most cases it is a trial and error method:

Optimizer = IDLmloptAdam(0.1)
Loss = List()
p = Plot(Fltarr(10), title='Loss')
for i=1, 300 do begin
  Loss.Add, Model.Train(part.train.features, $
    SCORES=part.train.scores, OPTIMIZER=Optimizer)
  p.SetData, Loss.ToArray()
endfor

Note: Training a neural network requires the use of an optimizer. An optimizer helps the neural network adjust the learning rate during training based on how quickly the model converges to a solution. The following optimizers are available: IDLmloptAdam, IDLmloptGradientDescent, IDLmloptMomentum, IDLmloptQuickProp, and IDLmloptRMSProp.

Tip: An optimizer that is used to train one model should not be reused to train a different model.

Now pass the entire x, y dataset through the model and see how well the output resembles the data it tried to learn:

data = Transpose([[Reform(xx, size^2)], [Reform(yy, size^2)]])
Normalizer1.Normalize, data
result = Model.Evaluate(data)
 
result2D = Reform(result, size, size)
!null = Surface(result2D, xx, yy, color=[255,0,0], style='mesh')

You can also use the Evaluate method to evaluate the accuracy of the model against the test data. This is returned by the LOSS keyword that will produce the RMSE (root mean square error) of the calculated results as compared to the actual scores:

!null = Model.Evaluate(part.test.features, SCORES=part.test.scores, $
  LOSS=loss)
print, loss

The result is 0.0981930.

To use the model in a future IDL session without having to retrain it, you can save it to a file. Also save the normalizer, since you will have to renormalize your data too:

Model.Save, 'c:\tmp\model.sav'
Normalizer1.Save, 'c:\tmp\normalizer.sav'

To restore these in a future IDL session:

Model = IDLmlModel.Restore('c:\tmp\model.sav')
Normalizer = IDLmlNormalizer.Restore('c:\tmp\normalizer.sav')

See Also


IDL Machine Learning list of routines