7518 Rate this article:

Using Simulated Data To Train Deep Learning Models

David Starbuck Tuesday, August 29, 2023

My colleague, Atle Borsholm, and I had the opportunity to attend the DIRSIG Basic Training in October 2022, hosted by the Rochester Institute of Technology (RIT). DIRSIG, a pioneering technology developed by RIT, offers an innovative solution for generating synthetic remote sensing data. Our goal was to leverage DIRSIG to create training data for deep learning models that can classify real images. Using synthetically created objects to build deep learning models can provide significant cost savings in acquiring imagery for labeling and performing model training. This method opens opportunities to detect features lacking specific training data (such as lacking an object model number or type).

During the training, we also had the privilege to connect with Emily Travinsky of Lockheed Martin, who provided insights into her group's experiments on this subject. The paper for this experiment can be found using the following link: https://ieeexplore.ieee.org/document/9174596

First Reproduction

We attempted to perform a similar test to Lockheed’s study with some of the following differences:

DIRSIGS’s Chipmaker plugin was used to generate the positive image patches (images with the target object within it)
DIRSIG’s simple atmosphere was used instead of using MODTRAN when generating the patches
Real data patches were used for the negatives (images without the target object within it)
We tested the trained deep learning model using a real scene that was not used as training or validation data as the model was being generated.

The image we were using for the test was a Worldview-3 panchromatic image of an airfield. We created a model to identify the following objects: Tu-160, Tu-95, Misc. Airplanes, and helicopters. We used CAD models of the Tu-160 and Tu-95 aircrafts to generate image chips with these aircrafts. Using the B737, Badger and helicopter models that come with DIRSIG, we generated image chips with these aircraft types. A separate WorldView image was used to generate image chips empty of any aircraft. We used these chips to generate a multiclass Resnet-50 model and used this model to classify patches within the test image. The results are shown below:

class	ap	tp	fp	fn	recall	prec	F1
Tu-160	9	8	2	1	0.889	0.800	0.842
Tu-95	15	14	2	1	0.933	0.875	0.903
Misc.	23	10	0	13	0.435	1.000	0.606
helicopter	19	17	0	2	0.895	1.000	0.944
total	66	49	4	17	0.742	0.925	0.824

ap - total number of actual positives

tp – true positives

fp – false positives

fn – false negatives

recall – ratio between positives correctly identified and total positives

prec - ratio of true positives and total number of predicted positives (tp/(tp+fp))

F1 – Harmonic mean between recall and precision = 2 / (1/recall + 1/precision)

Even though we used synthetic data for training, the accuracy was over 84% for all of the objects other than miscellaneous planes. The CAD models used to generate the miscellaneous planes most likely did not match the planes in the test image well enough.

Second Reproduction

To increase the visuals that can be shown in this blogpost, we performed a simplified reproduction using NAIP data. Some the differences between this reproduction and the first reproduction are:

Only the B737 CAD model that comes with the DIRSIG was used to generate the positive chips
The classification only determined if a plane was present in a patch or was not present
DIRSIG was used to generate empty chips
ENVI’s Deep Learning module was used to train the model and perform the classification

A screenshot of the B737 CAD model as well as a sample of the chips generated by DIRSIG from the model are shown below:

We generated a model using about ~500 positive image chips and ~100 negative image chips to train a “Grid” model using ENVI Deep Learning module. The “Grid” model is a new feature that will be included in the next version of ENVI Deep Learning. The “Grid” will allow users to generate Resnet50 or Resnet101 models. The “Grid” model generates an output grid of positive patches.

We used a NAIP image of Denver International Airport (DIA) to test the generated model. A sample of the image is shown below:

We ran the grid classification on the DIA scene and the output grid is shown below:

We performed a closer examination of a section of the result that was an area of 13x21 patches (273).

In the image below, true positives are highlighted in green, true negatives are highlighted in blue, false positives are highlighted in orange, and false negatives are highlighted in red.

In this section there are 59 true positives, 9 false positives, 1 false negative and 204 true negatives. The accuracy is about 96.3%. This is a very high level of accuracy using a model built with synthetic data on a real image.

The application of using synthetic data to build models has a great deal of value in detecting targets when there simply isn’t sufficient representative training data. Whether you're a researcher, student, or technology enthusiast, the exploration into the world of synthetic remote sensing data promises to be an exciting journey.

Click here to learn more about DIRSIG, and get more detailed information on ENVI Deep Learning here.