Train TensorFlow Pixel Models

Extract One Feature Tutorial Extract Multiple Features Tutorial

Once you have labeled your training rasters, you can train a TensorFlow pixel segmentation model so that it will learn what the features look like. The model can be empty or previously trained.

Training a deep learning model involves a number of stochastic processes. Training with the same parameters will yield different models. This is because of the way the algorithm tries to converge to an answer, and it is a fundamental part of the training process.

You can also write a script to train a model using the TrainTensorFlowPixelModel task.

See the following sections:

Select Training and Validation Rasters
Set Model Parameters
Set Training Parameters
Set Advanced Parameters
References

Select Training and Validation Rasters

Choose one of the following options to start the Train TensorFlow Pixel Model tool:
- In the ENVI Toolbox, select Deep Learning > Pixel Segmentation > Train TensorFlow Pixel Model.
- In the Deep Learning Guide Map, click this sequence of buttons: Pixel Segmentation > Train a New Pixel Model > Train Model.
The Train TensorFlow Pixel Model dialog appears with the Main tab selected. Use this tab to specify the training rasters, validation rasters, and output models.
Click the Add File(s) button next to the Training Rasters field. Select one or more labeled rasters that you want to use for training, and click Open. They are added to the Training Rasters field. Use the Build Label Raster from Classification or Build Label Raster from ROI tool to create them.

If you do not spectrally subset the training rasters, all bands will be used. Click the Spectral Subset button and select the appropriate number of bands for input. The number of bands selected will be the number of bands required for all training rasters and any raster classified with the model.
Click the Add File(s) button next to the Validation Rasters field. These are separate labeled rasters that can be used to evaluate the accuracy of a TensorFlow model. Although classification accuracy is typically better if you use different rasters for training and validation, you can still use the same raster for both.
Specify a filename (.h5) and location for the Output Model. This will be the "best" trained model, which is the model from the epoch with the lowest validation loss. By default, the tool will save the best and last model. Usually, the best model will perform the best compared to the last model, but not always. Having both outputs lets you choose which model works best for your scenario.
Specify a filename (.h5) and location for the Output Last Model. This will be the trained model from the last epoch.

Set Model Parameters

Select the Model tab. These parameters are optional.

Enter a Model Name. If you do not provide one, "ENVI Deep Learning" will be used as the model name.
Enter a Model Description in the field provided.
An architecture is a set of parameters that defines the underlying convolutional neural network. From the Model Architecture drop-down list, select the model architecture to use during training. The options are:
- SegUNet++: (default) Recommended for training on structural objects, such as vehicles, buildings, shipping containers.
- SegUNet: Recommended for training on features that are more inconsistent in appearance, such as debris and clouds.
- DeepLabV3+: A fast option for training that provides good results
SegUNet and SegUNet++) are based on the U-Net architecture, SegUNet is based on work by Ronneberger, Fischer, and Brox (2015). Like U-Net, they are mask-based, encoder-decoder architectures that classify every pixel in the image. DeepLabV3+ is based on ResNet50.
From the Patch Size drop-down list, select the edge length, in pixels, of the square patches used for training.
In the Trained Model field, provide a trained pixel segmentation model to use as a starting point. If you do not specify one, the model will initialize from random weights.

Set Training Parameters

Select the Training tab in the Train TensorFlow Pixel Model dialog. These parameters are for users who want more control over how a TensorFlow model learns to recognize features during training. They are all optional. See the Pixel Segmentation Training Background topic to learn more about them.

If you are not sure how to set the values for these fields, you can use the ENVI Modeler to create random values. See Train Deep Learning Models Using the ENVI Modeler for details.

Set the Augment Scale option to Yes to augment the training data with resized (scaled) versions of the data. See Data Augmentation.
Set the Augment Rotation option to Yes to augment the training data with rotated versions of the data. See Data Augmentation.
In the Number of Epochs field, enter the number of epochs to run. An epoch is a full pass of the entire training dataset through the algorithm's learning process. Training inputs are adjusted at the end of each epoch. The default value is 25. See Epochs and Batches for more information.
Leave the Patches per Batch field as-is. The default value of 3 is based on a GPU with 8 GB of RAM. If the system GPU offers more than 8GB of RAM you may be able to use a larger batch size.
In the Feature Patch Percentage field, specify the percentage of patches that contain labeled features to use during training. Values should range from 0 to 1. This applies to both the training and validation datasets. The default value is 1, which means that 100% of the patches that contain features will be used for training. The resulting patches are then used as input to the Background Patch Ratio, described in the next step.

Example: Suppose that a label raster has 50 patches that contain labeled features. A Feature Patch Percentage value of 0.4 means that 20 of those patches will be used for training (20/50 = 0.4, or 40%).

The default value of 1 ensures that you are training on all of the features that you labeled. In general, if you have a large training dataset (hundreds of images), lowering the Feature Patch Percentage will decrease training time.
In the Background Patch Ratio field, enter the ratio of background patches (those that contain no labeled features) to patches with features. For example, a ratio of 1.0 for 100 patches with features would provide 100 patches without features. The default value is 0.15.

When features are sparse in a training raster, the training can be biased by empty patches throughout. The Background Patch Ratio parameter allows you to restrict the number of empty patches, relative to those that contain features. Increasing the value tends to reduce false positives, particularly when features are sparse.

Set Advanced Parameters

Select the Advanced tab. The parameters on the Advanced tab are designed for users who want more control over how a TensorFlow model learns to recognize features during training. They are all optional.

Enter one or more Class Names that represent the features of interest.
- If the class names are defined already in an ROI file, click the Import Names from ROIs button . Use the ROI Selection dialog to select the ROIs that contain the class names, and click OK.
- Click the Synchronize Parameters button to populate the Solid Distance and Blur Distance fields with the class names so that you can specify different values for each class.
The Solid Distance field pertains to point and polyline labels only. For each class, enter the number of pixels surrounding point or polyline labels that should be considered part of the target feature. See the Solid Distance background discussion for more information.
Blur Distance is used in conjunction with Solid Distance. You can optionally blur features of interest that vary in size. Blurring the edges of features and decreasing the blur during training can help the model gradually focus on the feature of interest. In most cases, you can leave this field blank; however, it is available for you to experiment with. See the Blur Distance background discussion for more information.
In the Class Weight field, enter the minimum and maximum weights for having a more even balance of classes (including background) when sampling. Diversity of sampling is weighted by the maximum value at the beginning of training and decreased to the minimum value at the end of training. The useful range for the maximum value is between 1 and 6. A general recommendation is to set the Min to 2 and the Max to 3 when your features of interest are sparse in your training rasters. Otherwise, set them from 0 to 1. See Training Parameters.
In the Loss Weight field, enter a value between 0 and 1.0. A value of 0 is a good starting point and will be fine for most cases. A value of 0 means the model will treat feature and background pixels equally. Increased values will bias the loss function to place more emphasis on correctly identifying feature pixels than identifying background pixels. This is useful when features are sparse or if not all of the features are labeled.
To reuse these task settings in future ENVI sessions, save them to a file. Click the down arrow next to the OK button and select Save Parameter Values, then specify the location and filename to save to. Note that some parameter types, such as rasters, vectors, and ROIs, will not be saved with the file. To apply the saved task settings, click the down arrow and select Restore Parameter Values, then select the file where you previously stored your settings.
To run the process in the background, click the down arrow and select Run Task in the Background. If an ENVI Server has been set up on the network, the Run Task on remote ENVI Server name is also available. The ENVI Server Job Console will show the progress of the job and will provide a link to display the result when processing is complete. See the ENVI Servers topic in ENVI Help for more information.
Click OK.

Training a model takes a significant amount of time due to the computations involved. Depending on your system and graphics hardware, processing can take several minutes to several hours. While training is in progress, a dialog will show a progress bar, the current Epoch and Step of the progress, and the updated validation loss value. For example:

At the same time, a TensorBoard page displays in a new web browser. TensorBoard is a visualization toolkit included with TensorFlow. It reports real-time metrics such as Loss, Accuracy, Precision, and Recall during training. See View Training Metrics for details.

With each epoch, the weights of the model are adjusted to make it more correct, and the label rasters are exposed to the model again. The weights from the epoch that produces the lowest validation loss will be used in the final, trained model. For example, if you want the model to complete 25 epochs and the lowest validation loss was achieved during Epoch #20, ENVI will retain and use the weights from that epoch in the trained model.

When training is complete, you can pass the trained model to the TensorFlow Pixel Classification tool.

References:

Ronneberger, O., P. Fischer, and T. Brox. "U-Net: Convolutional Neural Networks for Biomedical Image Segmentation." In: Navab N., Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Lecture Notes in Computer Science 9351. Springer, Cham.

Module	Deep Learning

Version	3.0