Use this procedure to perform classification using any of the following methods:

  • Supervised classification: parallelepiped, minimum distance, maximum likelihood, Spectral Angle Mapper (SAM), Spectral Information Divergence (SID), Mahalanobis, or Binary Encoding
  • Unsupervised classification: ISODATA or K-Means

All classification methods use a combination of common keywords and those for specific classification methods.

Syntax


ENVI_DOIT, 'CLASS_DOIT', [Keywords=value]

Keywords:

CHANGE_THRESH=floating point

MEAN=array

CLASS_NAMES=array

METHOD=integer

COV=array

MIN_CLASSES=integer

DATA_SCALE=floating point

NUM_CLASSES=integer

DIMS=array

OUT_BNAME=string

FID=file ID

OUT_NAME=string

/IN_MEMORY

POS=array

ITERATIONS=integer

R_FID=variable

ISO_MERGE_DIST=floating point

RULE_FID=file ID

ISO_MERGE_PAIRS=integer

/RULE_IN_MEMORY

ISO_MIN_PIXELS=integer

RULE_OUT_BNAME=array

ISO_SPLIT_SMULT=floating point

RULE_OUT_NAME=string

ISO_SPLIT_STD=floating point

STDV=array

LOOKUP=array

STD_MULT=value

M_FID=file ID

THRESH=value

M_POS=value

 

Keywords


CHANGE_THRESH

(Required for K-Means and ISODATA unsupervised methods)

Specify a floating-point number between 0.0 and 1.0 to specify the percentage of pixels that can change classes during each iteration. If this value is greater than the CHANGE_THRESH value, another iteration is performed, provided that it does not exceed the maximum number of iterations. If the percentage is less then the threshold, the classification is complete. A value of 1.0 means 100%.

CLASS_NAMES

(Required; common to all classification methods)

Specify names for each output class for the supervised classification methods; the unsupervised methods generate their own CLASS_NAMES based on the color of the class. CLASS_NAMES is an array of strings with num_classes+1 elements. The first element (Class 0) is “Unclassified.” The order of the other classes is determined by the order of the classification data specified in the keyword MEAN.

COV

(Required for maximum likelihood and Mahalanobis methods)

Specify a floating-point or double-precision array with dimensions [num_bands, num_bands, num_classes] containing the covariance of the classification spectrum used.

DATA_SCALE

(Optional for minimum distance method; Required for maximum likelihood method)

Specify a floating-point value representing the data scale factor, which is a division factor used to convert integer scaled reflectance or radiance data into floating-point values. For example, for reflectance data scaled into the range of 0 to 10,000, set the scale factor to 10,000. For uncalibrated integer data, set the scale factor to the maximum value the instrument can measure ((2n) - 1, where n is the bit depth of the instrument). Set the scale factor to 255 for 8-bit instruments (such as Landsat-4), set the scale factor to 1023 for 10-bit instruments (such as NOAA-12 AVHRR), and set the scale factor to 2047 for 11-bit instruments (such as IKONOS).

DIMS

(Required; common to all classification methods)

The “dimensions” keyword is a five-element array of long integers that defines the spatial subset (of a file or array) to use for processing. Nearly every time you specify the keyword FID, you must also specify the spatial subset of the corresponding file (even if the entire file, with no spatial subsetting, is to be processed).

  • DIMS[0]: A pointer to an open ROI; use only in cases where ROIs define the spatial subset. Otherwise, set to -1L.
  • DIMS[1]: The starting sample number. The first x pixel is 0.
  • DIMS[2]: The ending sample number
  • DIMS[3]: The starting line number. The first y pixel is 0.
  • DIMS[4]: The ending line number

To process an entire file (with no spatial subsetting), define DIMS as shown in the following code example. This example assumes you have already opened a file using ENVI_SELECT or ENVI_PICKFILE:

  envi_file_query, fid, dims=dims

FID

(Required; common to all classification methods)

The file ID (FID) is a long-integer scalar with a value greater than 0. An invalid FID has a value of -1. The FID is provided as a named variable by any routine used to open or select a file. Often, the FID is returned from the keyword R_FID in the ENVIRasterToFID routine. Files are processed by referring to their FIDs. If you work directly with the file in IDL, the FID is not equivalent to a logical unit number (LUN).

IN_MEMORY

(Boolean; Required; common to all classification methods)

Set this keyword to specify that output should be stored in memory. If you do not set IN_MEMORY, output will be stored on disk and you must specify OUT_NAME (see below).

ITERATIONS

(Required for K-Means and ISODATA unsupervised methods)

Specify an integer value with the maximum iteration count.

ISO_MERGE_DIST

(Required for ISODATA unsupervised method)

Specify a floating-point number greater than 0.0 that indicates the class merge distance (in DN). If the distance between class means is less than ISO_MERGE_DIST, the classes will be merged. The maximum number of pairs merged in any loop is determined by ISO_MERGE_PAIRS.

ISO_MERGE_PAIRS

(Required for ISODATA unsupervised method)

Specify a long-integer value that indicates the maximum number of classes that can be merged in a single iteration.

ISO_MIN_PIXELS

(Required for ISODATA unsupervised method)

Specify a long-integer value that indicates the minimum number of pixels needed to form a class. If there are few pixels in the class, that class will be deleted.

ISO_SPLIT_SMULT

(Required for ISODATA unsupervised method)

Specify a floating-point number greater than 0.0 that indicates the standard deviation multiplier used to calculate the mean of split classes. The new means are calculated as follows:

class_1_mean = class_mean + ISO_SPLIT_STD * current_stdv
class_2_mean = class_mean - ISO_SPLIT_STD * current_stdv

The default value is 1.0.

ISO_SPLIT_STD

(Required for ISODATA unsupervised method)

Specify a floating-point number greater than 0.0 that indicates the minimum class standard deviation value (in DN). If a class standard deviation is greater than ISO_SPLIT_STD, the class is split into two classes.

LOOKUP

(Required for any supervised method)

Specify an array of long integers representing class RGB values for the supervised methods only; the unsupervised methods generate their own LOOKUP based on the number of output classes. The LOOKUP array contains an RGB triplet for the “Unclassified” class plus one RGB triplet for each output class. The “Unclassified” class typically uses the RGB triplet [0, 0, 0] for black. The dimensions of the array are [3, num_classes+1], and the RGB triplet is ordered [r, g , b]. LOOKUP[*, 0] is the “Unclassified” class, and the order of the other classes is determined by the order of the classification data in keyword MEAN.

M_FID

(Optional; common to all classification methods)

Use this keyword to specify the file ID of the mask file. This value is returned from the keyword R_FID in the ENVI_OPEN_FILE procedure. M_FID is a long integer with a value greater than 0. An invalid file ID has a value of -1.

M_POS

(Optional; common to all classification methods)

Use this keyword to specify the band position of the mask band. M_POS is a long integer with a value greater than or equal to 0.

MEAN

(Optional; use with any supervised method)

Specify the mean spectral values for each class when performing supervised classification. MEAN is a floating-point or double-precision array of [num_bands, num_classes] values. The spectral mean of each class (for supervised methods) is commonly computed from the spectral mean of the ROI representing the training region of the class. The actual number of output classes, NUM_CLASSES, is computed from the number of spectral means plus one for the Unclassified class.

For the unsupervised methods of ISODATA and K-Means, the initial starting classes are calculated automatically from the mean on the input data and do not require the MEAN keyword.

METHOD

(Required; common to all classification methods)

Set this keyword to one of the following values to specify the classification method:

  • 0: Parallelepiped (supervised)
  • 1: Minimum distance (supervised)
  • 2: Maximum likelihood (supervised)
  • 3: SAM (supervised)
  • 4: ISODATA (unsupervised)
  • 5: Mahalanobis (supervised)
  • 6: Binary Encoding (supervised)
  • 7: K-Means (unsupervised)
  • 8: SID (supervised)

MIN_CLASSES

(Required for ISODATA unsupervised method)

Specify the minimum number of output classes.

NPTS

(Required for Mahalanobis method)

Specify an array of long integers representing the number of points in each ROI, with one element per ROI. Use the ENVI_GET_ROI_INFORMATION routine to return the number of points in each ROI.

NUM_CLASSES

(Required for K-Means and ISODATA unsupervised methods)

Specify the desired number of output classes.

OUT_BNAME

(Optional; common to all classification methods)

Specify a string with an output band name for the classification image.

OUT_NAME

(Required; common to all classification methods)

Use this keyword to specify a string with the output filename for the resulting data. If you set the keyword IN_MEMORY, you do not need to specify OUT_NAME.

POS

Use this keyword to specify an array of band positions, indicating the band numbers on which to perform the operation. This keyword indicates the spectral subset of bands to use in processing. POS is an array of long integers, ranging from 0 to the number of bands minus 1. Specify bands starting with zero (Band 1=0, Band 2=1, etc.) For example, to process only Bands 3 and 4 of a multi-band file, POS=[2, 3].

POS is typically used with individual files. The example code below illustrates the use of POS for a single file with four bands of data:

  pos=[0,1,2,3]
                  
envi_doit, 'envi_stats_doit', dims=dims, fid=fid, pos=pos, $
                  
comp_flag=3, dmin=dmin, dmax=dmax, mean=mean, stdv=stdv, hist=hist

But what if you need to create an output file consisting of data from different bands, each from different files? Library routines such as CF_DOIT and ENVI_LAYER_STACKING_DOIT can accomplish this, but they use the POS keyword differently. Suppose you have four files, test1, test2, test3, and test4, with corresponding FIDs of fid1, fid2, fid3, and fid4, respectively. In the following example, you want Band 3 from test1 in the first position, Band 2 from test2 in the second position, Band 6 from test3 in the third position, and Band 4 from test4 in the fourth position. The code should be as follows:

  fid_array = [fid1,fid2,fid3,fid4]
                  
pos=[2,1,5,3]
                  
envi_doit, 'cf_doit', dims=dims, fid=fid_array
                  
out_name='test_composite_file'

R_FID

(Optional; common to all classification methods)

ENVI Classic library routines that result in new images also have an R_FID, or “returned FID.” This is simply a named variable containing the file ID to access the processed data. Specifying this keyword saves you the step of opening the new file from disk.

RULE_FID

(Optional; use with any supervised method)

Specify a named variable that contains the file ID for the processed rule image. This file ID can be used to access the processed data.

RULE_IN_MEMORY

(Boolean; optional; use with any supervised method)

Set this keyword to store output rule images in memory.

RULE_OUT_BNAME

(Optional; use with any supervised method)

Specify a string array that contains the output band names for the rule image.

RULE_OUT_NAME

(Optional; use with any supervised method)

Specify an output filename for the rule image. If you set this keyword, the rule image is automatically saved.

STDV

(Required for parallelepiped and maximum likelihood methods)

Specify a floating-point or double-precision array with the dimensions [num_bands, num_classes] containing the standard deviation for each of the spectral classes.

STD_MULT

(Optional for parallelepiped, K-Means, and ISODATA methods; Required for minimum distance method)

Specify a floating-point or double-precision multiplication factor or array of factors (one for each class) representing the width around the standard deviation within which the spectrum may fall and still be classified into that class. If you specify an array, each class is tested with its corresponding width. If you use STD_MULT, you must set the keyword STDV. The default value is 1.0.

THRESH

This keyword has multiple definitions, depending on the method used:

Minimum Distance (required), Mahalanobis (optional), K-Means (optional), and ISODATA (optional)

This value represents the maximum distance error by which the spectral value can differ from the mean value. Specify one of the following:

  • Single floating-point or double-precision value, applied class-by-class, not as the total error.
  • Array of values, one for each class. Each class is tested against its corresponding error.

Maximum Likelihood (optional keyword)

This value represents the minimum probability that a class must have in order to be classified. Values range from 0 to 1. Specify one of the following:

  • Single floating-point or double-precision value
  • Array of values, one for each class. Each class is tested against its corresponding probability.

Spectral Angle Mapper (optional keyword)

This value represents the maximum spectral angle in radians to classify. The default value is π/2. Specify one of the following:

  • Single floating-point or double-precision value, ranging from 0 to π/2.
  • Array of values, one for each class. Each class is tested against its corresponding maximum angle.

Spectral Information Divergence (optional keyword)

This value represents the maximum spectral divergence. The default value is 0.05, but it can vary substantially given the nature of this similarity measure. SID is based upon a dimensionless metric involving the logarithm of a ratio of probabilities based upon each spectral vector. Therefore, a threshold that discriminates well for one pair of spectral vectors may be either too sensitive or not sensitive enough for another pair due to the similar/dissimilar nature of their probability distributions. Specify one of the following:

  • Single floating-point or double-precision value
  • Array of values, one for each class. Each class is tested against its corresponding maximum spectral divergence.

Binary Encoding (optional keyword)

This value represents the minimum match percentage, ranging from 0 to 1.0. A value of 1.0 means that all bands must match the Binary Encoding method, and a value of 0.4 means that at least 40% of the bands must match. Specify one of the following:

  • Single floating-point or double-precision value
  • Array of values, one for each class. Each class is tested against its corresponding mimimum match percentage.

API Version


4.2