ENVI Deep Learning Tutorial: Extract One Feature

In this tutorial you will use ENVI Deep Learning to identify shipping containers in a high-resolution orthophoto of a seaport. The tutorial demonstrates how to extract a single feature of interest from imagery using a pixel segmentation model. The result is a class activation raster that shows the relative probability of pixels belonging to the "Shipping Containers" class. The tutorial can be used with ENVI Deep Learning 2.1 or later.

See the following sections:

System Requirements
Files Used in This Tutorial
Background
Build a Label Raster
Train a Deep Learning Model
Perform Classification
Convert Class Activation Values to ROIs
Final Comments

System Requirements

Refer to the Systems Requirements topic.

Files Used in This Tutorial

Sample data files are available on our ENVI Tutorials web page. Click the "Deep Learning" link in the ENVI Tutorial Data section to download a .zip file containing the data. Extract the contents to a local directory.

The files for this tutorial are in the shipping_containers subdirectory.

File	Description
OaklandPortOrthophoto1.dat	Subsets of public-domain High Resolution Orthoimagery (HRO), 4 bands, 0.3 meter spatial resolution, acquired 20 February 2015, ENVI raster format. Images are courtesy of the U.S. Geological Survey.
OaklandPortOrthophoto2.dat
ShippingContainerROIs.xml	Point-based ROIs used to create a label image

Background

A common use of deep learning in remote sensing is pixel-based feature extraction; that is, identifying specific features in imagery such as vehicles, road centerlines, or utility equipment. Through a process called labeling, you mark the locations of each feature in one or more images. You can train the deep learning model to find all instances of the feature (for example, conducting an inventory of assets) or to draw approximate boundaries around the features With training, you pass the labeled data to a deep learning model so that it learns what the feature looks like. Finally, the trained model can be used to look for more of the same feature in the same image or other images with similar properties. This is the classification step.

In this tutorial, you will train a deep learning model to extract intermodal shipping containers from an aerial orthophoto of a seaport. While you will be using a nadir-looking orthophoto in this tutorial, you can also use oblique imagery from drones and other unmanned aerial vehicles (UAVs) for deep learning feature extraction.

Open and Display an Orthophoto

Click the Open button in the ENVI toolbar.
Go to the location where you downloaded the tutorial data.
Use the Ctrl key on your keyboard to select both OaklandPortOrthophoto1.dat and ShippingContainerROIs.xml. Click Open. This is a high-resolution (0.3-meter) orthophoto of a small area in the Oakland Seaport, California, USA from February 2015. The orthophoto is displayed at 100% resolution.
Press the F12 key on your keyboard to zoom to the full extent of the orthophoto.

Click the Data Manager button in the ENVI toolbar. The Data Manager shows that this orthophoto contains four bands. Some USGS High Resolution Orthoimages like this one contain a near-infrared band (Band 4) in addition to red, green, and blue bands.
Close the Data Manager.

Build a Label Raster

Training a model to identify features requires one or more images containing labeled pixel data. These are called label rasters. The labeled pixel data can come from regions of interest (ROIs) or existing classification maps. You can use the ENVI ROI Tool to draw points, lines, or polygons on the features you are interested in. This is the labeling process.

In this section, you will restore point ROIs that were already created for you. The ROIs identify the locations of shipping containers in the image. The orthophoto and ROIs will be used to create a label raster that provides the examples needed for a deep learning model to learn what shipping containers look like. You can create as many label rasters as you want for training; however, this tutorial just uses one label raster.

In the ENVI Toolbox, select Deep Learning > Deep Learning Guide Map. As its name implies, this tool guides you through various steps in ENVI Deep Learning based on your objective.
In the Deep Learning Guide Map, click the following sequence of buttons: Pixel Segmentation > Train a New Model > Build Label Raster from ROI.
In the Build Label Raster from ROI dialog, the Input Raster field is automatically populated with OaklandPortOrthophoto1.dat. You do not need to change that entry.
Click the Browse button next to Input ROI. The ROI Selection dialog appears.
Select Shipping Containers and click OK.
Click the Browse button next to Output Raster.
Choose a folder where you want to save output files for this tutorial.
Name the output file LabelRasterContainers.dat. Then click Save.
Enable the Display result option in the Build Label Raster from ROI dialog.
Click OK. The label raster is added to the Data Manager and Layer Manager, and it is displayed in the view.

Train a Deep Learning Model

Training involves repeatedly exposing the label raster to a model. Over time, the model will learn to translate the spectral and spatial information in the label raster into a class activation raster that highlights the features it was shown during training. Class activation rasters are discussed later.

In the Deep Learning Guide Map, click Pixel Segmentation > Train a New Pixel Model > Train Model. The Train TensorFlow Pixel Model dialog appears with the Main tab selected. The first step in the training process is to set up a new deep learning model. ENVI uses TensorFlow technology for deep learning.
The file LabelRasterContainers.dat appears in the Training Rasters field. You do not need to change it.
Validation rasters are separate label rasters than can be used to evaluate the accuracy of a TensorFlow model. They are typically separate from the label rasters that are used for training. Although classification accuracy is typically better if you use different label rasters for training and validation, you can still use the same raster for both. Click the Add Files button next to Validation Rasters. A file selection dialog appears.
In the Output Model field, choose an output folder and name the model file TrainedModelContainers.h5, then click Open. This will be the "best" trained model, which is the model from the epoch with the lowest validation loss.
In the Output Last Model field, choose an output folder and name the model file TrainedModelContainerslLast.h5, then click Open. This will be the trained model from the last epoch.
Select the Model tab.
Keep the default Model Name.
Select SegUNet from the Model Architecture drop-down list .
Keep the Patch Sizedefault of 464.
Leave the Output Model field blank. A temporary file will be created.
In the Trained Model field, select LabelRasterContainers.dat and click Open.
Select the Training tab. The following parameters are advanced settings for users who want more control over the training process. For this tutorial, use the following values:

Augment Scale: No
Augment Rotation: Yes
Number of Epochs: 25
Patches per Batch: 3
Feature Patch Percentage:1
Background Patch Ratio: 0.15

Select the Advanced tab to set additional advanced parameters. Use the following settings:

Solid Distance: 4.0
Blur Distance: Min 0, Max 20
Class Weight: Min 1, Max 3
Loss Weight:Leave blank; it will default to 1.0

Click OK.

As training begins, a TensorBoard page displays in a new web browser. This is described next in View Training and Accuracy Metrics. Training a model takes a significant amount of time due to the computations involved. Depending on your system and graphics hardware, processing can take several minutes to a half hour. A Training Model dialog shows the progress of training, along with the updated training loss value. For example:

View Training and Accuracy Metrics

TensorBoard is a visualization toolkit included with TensorFlow. It reports real-time metrics such as Loss, Accuracy, Precision, and Recall during training. Refer to the TensorBoard online documentation for details on its use.

Follow these steps:

Click the icon in the TensorBoard toolbar. The Settings dialog appears.
Enable the Reload Data option.
Set the Reload Period to a minimum of 15 seconds.
Dismiss the Settings dialog by clicking anywhere in the TensorBoard web page.
Set the Smoothing value on the left side of TensorBoard to 0.999.
After the first few epochs are complete, click on Loss in TensorBoard to expand it and view a plot of Loss values for each epoch. Loss is a unitless number that indicates how closely the classifier fits the validation training data. A value of 0 represents a perfect fit. The further the value from 0, the less accurate the fit. Loss plots are provided per batch (batch loss) and per epoch step, both for training and validation datasets (training loss and validation loss). Ideally, the Loss values reported in the Loss plot should decrease rapidly during the first few epochs, then converge toward 0 as the number of epochs increases. The following image shows a sample Loss plot during a training session of 25 epochs . Your plot might be different.
Wait for training to finish, then note the point on the validation loss plot at which the lowest Loss value occurred. In this example, it occurred during the last step of the last epoch. Move your cursor over this data point. A black popup window appears. The Value field shows the Loss value during this epoch. In this example, the lowest Loss value is 0.02723. Your plot might be different.

Note which epoch produced the lowest Loss value. By default, ENVI will output model files from the "best" epoch and the last epoch so that you can use either one for classification later.
Expand the accuracy section to view a plot of overall accuracies achieved during each epoch.
Note: Consistently high Loss values accompanied by low accuracy values across multiple epochs typically indicates a bad training run. If this happens, cancel the training and restart it.
You can also view plots of Precision and Recall by expanding the Precision and Recall sections, respectively.
When you are finished evaluating training and accuracy metrics, close TensorBoard in your web browser.

Perform Classification

Now that you have a model that was trained to find shipping containers in one orthophoto, you will use the same model to find containers in another orthophoto from the seaport. This is one of the benefits of ENVI Deep Learning; you can train a model once and apply it multiple times to other images that are spatially and spectrally similar.

Click the Open button in the ENVI toolbar. The Open dialog appears.
Go to the folder containing the tutorial data and select OaklandPortOrthophoto2.dat. Click Open. The image appears in the display.
Press the F12 key on your keyboard to zoom to the full extent of the image. You will perform classification on this image.
In the Deep Learning Guide Map, click Pixel Segmentation > Classify Raster Using a Trained Model. The TensorFlow Pixel Classification dialog appears.
The Input Raster field is populated with the file OaklandPortOrthophoto2.dat. The Input Model field is populated with the file TrainedModelContainers.h5. You do not need to change anything in these fields.
Leave the Output Classification Raster field blank and disable the Display result option for this raster. You will not create a classification raster in this tutorial.
In the Output Class Activation Raster field, choose an output folder and name the model file ClassActivationContainers.dat, then click Open. This will be the "best" trained model, which is the model from the epoch with the lowest validation loss.
Enable the Display result option for the Output Class Activation Raster.
Click OK. A Classifying Raster dialog shows the progress of classification.

When processing is complete, the class activation raster is added to the Layer Manager.

A class activation rasters is a grayscale image whose pixels roughly represent the probability of belonging to a feature of interest. In this case, bright pixels indicate high matches to features labeled as shipping containers. The following figure shows an example displayed at full extent:

Your results may be slightly different than what is shown here. Training a deep learning model involves a number of stochastic processes that contain a certain degree of randomness. This means you will never get exactly the same result from multiple training runs.

If you get a class activation raster that is completely black, it is possible that the model could not accurately reproduce the training data. Or maybe the training did converge but to an incorrect solution. If this happens, rerun the training step to see if it produces a valid result. Also try increasing the Max values for Class Weight and/or Blur Distance.

Viewing the grayscale image by itself makes it difficult to identify shipping containers relative to the other objects in the scene. In the next few steps, you will visualize the results in a more meaningful way.

In the Layer Manager, right-click on the following layers and select Remove:

LabelRasterContainers.dat
OaklandPortOrthophoto1.dat

Apply a Raster Color Slice to the Class Activation Raster

To better visualize the class activation raster, you can apply a raster color slice to it. A color slice divides the pixel values of an image into discrete ranges with different colors for each range. Then you can view only the ranges of data you are interested in.

In the Layer Manager, right-click on ClassActivationContainers.dat and select New Raster Color Slice. The Data Selection dialog appears.
Select the Shipping Containers band under ClassActivationContainers.dat and click OK. The Edit Raster Color Slices dialog appears. The pixel values are divided into equal increments, each with a different color.

Click OK in the Edit Raster Color Slices dialog to accept the default categories and colors.
In the Layer Manager, uncheck the ClassActivationContainers.dat layer to hide it.
In the Raster Color Slice layer in the Layer Manager, uncheck the purple color slice range with the lowest data values. This is the background. The remaining colors (blue to red) identify pixels with increasing probabilities of matching shipping containers, according to the training data you provided. The following image shows an example:

Getting decent results with deep learning-based feature extraction is not always a quick and simple process. It often involves several iterations to yield the most accurate results. One way you can refine a deep learning model is to create ROIs of the highest pixel values in the class activation raster, then use the ROIs to build a new label raster. Refine the trained model using the new label raster. This is purely optional, but it can help to improve classification results. The next section describes how to create ROIs of high class activation values.

Convert Class Activation Values to ROIs

In the Deep Learning Guide Map, click Refine a Trained Model, followed by Class Activation to Pixel ROI. The Class Activation to Pixel ROI dialog appears.
The Input Raster field already lists ClassActivationContainers.dat. You do not need to change anything here.
The Threshold slider lets you choose the minimum data value in the class activation raster to retain for the output ROI. Remember that the pixel values range from 0 to 1.0, representing the probability of a match to shipping containers. Change the Threshold value to 0.60. You will keep the highest 40% of class activation pixels for the output ROI.
Accept the default value of Otsu for Automatic Threshold Method. These methods only apply when the Threshold is set to 0.
In the Output ROI field, accept the default output filename of ClassActivationToPixelROI.xml.
Click OK.
When processing is complete, uncheck the Raster Color Slice layer in the Layer Manager to hide that layer.
From the ENVI main menu bar, select File > Open.
Select ClassActivationToPixelROI.xml and click Open. The Select Base ROI Visualization Layer dialog appears.
Select OaklandPortOrthophoto2.dat and click OK. The pixel ROIs are displayed over the orthophoto.

These steps demonstrated how to apply a threshold to the class activation raster and how to convert the highest pixel values to a pixel-based ROI. If your ROIs do an effective job at identifying features you are interested in, you can use them to create a new label raster and refine the TensorFlow model. However, you do not have to perform those steps in this tutorial.

Final Comments

In this tutorial you learned how to use ENVI Deep Learning to extract a single feature type (shipping containers) from imagery using point ROIs to label the features. Point ROIs provide a quick and efficient way to identify all instances of a certain feature in your training images. Once you determine the best parameters to train a deep learning model, you only need to train the model once. Then you can use the trained model to extract the same feature in other, similar images. Deep learning technology provides a robust solution for learning complex spatial and spectral patterns in data, meaning that it can extract features from a complex background, regardless of their shape, color, size, and other attributes.

For more information about the capabilities presented here, refer to the ENVI Deep Learning Help.

Module	Deep Learning

Version	3.0