User-Defined ENVITask That Masks Some Output Pixels
Anonym
My last Data Point blog post User-Defined ENVITasks in ENVI 5.2 SP1 introduced how you can use
the new ENVITaskFromProcedure class to create your own custom ENVITasks that
wrap any IDL procedure you have. Today I’m going to explore a concrete example
that combines a few more advanced topics:
- Raster Statistics
- Threshold ROI Masking
- Saving the results with correct pixels masked out
This example is motivated by a recent Tech Support question I was asked to assist on, but I am fleshing it out more to be a little more useful to you the reader. The question was about how to use ENVIROIMaskRaster() to mask pixels out of a raster and then export the results so that the masked pixels would persist on disk. I could just focus on this part, but I thought it would be useful to also show how to build the ROI using the ENVI API, instead of just loading one from a file.
The intent of this task is to take an input raster, evaluate its histogram, and then mask out any pixels in the top or bottom N% of the histogram. This is similar to what our linear percent stretch modes do, but in that case pixels in the top and bottom of the are clamped to a 0 or 255 value in their respective band, with the values in between linearly scaled in that range. This task will make those extreme pixels invalid, so they are transparent when visualized and ignored by our analytics.
You might be worried that we need to iterate over all the pixels in the raster to build the histograms, but fear not, the ENVIRasterStatistics() function was introduced in ENVI 5.2, and it provides an optimized implementation of the histogram calculations. So we can utilize this function with the HISTOGRAMS and HISTOGRAM_BINSIZE keywords to get the histogram for each band. By default this function will return a 256 bin histogram, but for this task we will set the bin size to 1 so that we get (max – min + 1) bins for each band.
Once we have the histogram, we pass it into Total() with the /CUMULATIVE keyword. This will result in an array of the same size as the histogram, where each element is the sum of all the input elements with an index less than or equal to the output index. The last element of each output array will be equal to the number of pixels in the raster. So if we divide by the last element (or number of pixels), then we will get the cumulative distribution function (CDF) of the pixels for each band.
We can then use a WHERE test to find out how many bins have a CDF value less than N%, or greater than (1 – N%). This will tell us what the range of pixel values will be if we want to eliminate the top and bottom N% of the pixels. We use these values to create a threshold ROI, using the ENVIROI::AddThreshold procedure.
We have two options for how to use the ROIs to mask the raster. We can create 1 ROI and add separate thresholds to it for each band, or we can create multiple ROIs that each have one band threshold. The results will be the same when we use the ENVIROIMaskRaster() virtual raster function, only pixels that are outside of every threshold are masked out. This is a logical AND operation, but I want a logical OR operation, where every pixel that is outside of any threshold will be masked out. To do this I need to create separate ROIs for each band threshold and apply them consecutively using the ENVIROIMaskRaster() function. The nice benefit of the virtual raster functions is that they don’t commit anything to disk, they build up a filter chain of raster operations that are executed on the pixels on a per tile basis, so there is very little performance overhead when doing this.
Once we have the masked raster, we want to save it to disk so that the masked pixels will stay masked. We don’t want to waste disk space by writing out are array of Boolean values, so the “data ignore value” is used to mark the masked pixels as invalid. Unfortunately there isn’t a one line solution to this in ENVI 5.2 SP1, so we have to “burn in” the data ignore value for the masked pixels. In ENVI 5.3 we have added this feature with a DATA_IGNORE_VALUE keyword on ENVIRaster::Export, but in the meantime we need to create a new ENVIRaster object and explicitly change the masked pixel values so that we can persist the masking. So we use the ENVIRaster() factory function, and use the INHERITS_FROM keyword to copy over all the metadata and the spatial reference object from our input raster. We will also supply the URI that this output raster will be written to, and specify the DATA_IGNORE_VALUE we want to use, so that the ENVI header file will contain that metadata field and properly mask out the pixels for us.
The ENVIRasterIterator class makes the pixel processing a lot simpler. We first need to get a raster iterator, which is via the ENVIRaster::CreateTileIterator() function. We’ll omit the keywords, as we want to process the whole raster and will let it decide the optimal tile size to use. There are a few ways to use the iterator, and while I’m usually partial to foreach loops over for loops, we can’t use the foreach syntax this time because we need to get the PIXEL_STATE output from the ENVIRasterIterator::Next() fuction. Once we have the PIXEL_STATE array, we use WHERE to find all the non-zero values (pixels that either already have the data ignore value or are masked out), and then we replace all the corresponding pixels with the data ignore value. We then use the ENVIRaster::SetTile procedure to copy the pixels to the output raster, which takes the current raster iterator object so that it knows the line/sample/band extents of the pixel array and commits it to disk correctly. Once we’ve finished iterating over the raster and copying the masked pixels to the output raster, all we need to do is call ENVIRaster::Save on the output raster to finish committing the pixels to disk and to generate the header file for us.
Here is the final version of my procedure, which uses keywords for the input raster, the threshold percentage, the data ignore value, and the output filename to use:
pro maskExtremePixels, INPUT_RASTER=inputRaster, THRESHOLD_PERCENT=percent, $
DATA_IGNORE_VALUE=dataIgnoreValue, OUTPUT_FILENAME=outFilename
compile_opt idl2
; calculate raster statistics and histograms
stats = ENVIRasterStatistics(inputRaster, /HISTOGRAMS, HISTOGRAM_BINSIZE=1)
; create local variable to represent current virtual raster, starting with input raster
currRaster = inputRaster
; iterate over list of histogram hashes
foreach hist, stats['HISTOGRAMS'], band do begin
; calculate cumulative distribution function from histogram
cdf = Total(hist['COUNTS'], /Cumulative)
cdf /= cdf[-1]
; find CDF bins that are above/below threshold
lowInd = Where(cdf lt percent, lowCount)
highInd = Where(cdf gt (1.0 - percent), highCount)
; calculate digital number that corresponds to threshold bins
minVal = stats['MIN', band] + lowCount
maxVal = stats['MAX', band] - highCount
; create ROI to mask the input raster
roi = ENVIRoi(NAME='Band Thresholds', COLOR='Blue')
; add threshold definition for current band to ROI
roi.AddThreshold, inputRaster, band, MIN_VALUE=minVal, MAX_VALUE=maxVal
; apply ROI to raster to mask out extreme pixels, and update current reference
currRaster = ENVIRoiMaskRaster(currRaster, roi)
endforeach
; create raster to commit masked pixels to disk
outRaster = ENVIRaster(INHERITS_FROM=inputRaster, $
DATA_IGNORE_VALUE=dataIgnoreValue, $
URI=outFilename)
; create a raster iterator to access data
iterator = currRaster.CreateTileIterator()
for i = 1, iterator.nTiles do begin
; get the current tile's pixel and pixel value state arrays
tile = iterator.Next(PIXEL_STATE=pixelState)
; identify any pixels which should be masked out
maskInd = where(pixelState ne 0, maskCount)
; apply pixel mask if any pixels in this tile are masked
if (maskCount gt 0) then begin
tile[maskInd] = dataIgnoreValue
endif
; push masked pixels into output raster
outRaster.SetTile, tile, iterator
endfor
; save output raster and header to disk
outRaster.Save
end
Now we want to turn this into an ENVITask, so that we can run this on the desktop and in the cloud using ENVI Services Engine. All we need to do is build a task template file that describes the input parameters to this procedure:
{
"name": "MaskExtremePixels",
"baseClass": "ENVITaskFromProcedure",
"routine": "MaskExtremePixels",
"displayName": "Mask Extreme Pixels",
"description": "This task processes the histogram of each band on the input ENVIRaster and builds pixel masks that exclude all pixels that are in the top or bottom percent of the histogram.",
"version": "5.2.1",
"parameters": [
{
"name": "INPUT_RASTER",
"displayName": "Input Raster",
"dataType": "ENVIRASTER",
"direction": "input",
"parameterType": "required",
"description": "Specify the raster to mask."
},
{
"name": "THRESHOLD_PERCENT",
"displayName": "Threshold Percent",
"dataType": "FLOAT",
"direction": "input",
"parameterType": "required",
"description": "Percent of histogram to mask out from top and bottom."
},
{
"name": "DATA_IGNORE_VALUE",
"displayName": "Data Ignore Value",
"dataType": "DOUBLE",
"direction": "input",
"parameterType": "required",
"description": "Value to use for masked out pixels in OUTPUT_RASTER."
},
{
"name": "OUTPUT_RASTER",
"keyword": "OUTPUT_FILENAME",
"displayName": "Output Raster",
"dataType": "ENVIRASTER",
"direction": "output",
"parameterType": "required",
"description": "Masked version of INPUT_RASTER."
}
]
}
For the most part, this task template is pretty straightforward. We provide a name and description, then since we are wrapping a procedure we use the ENVITaskFromProcedure class for baseClass and the procedure name for routine. Then we have the list of parameters, with their name, displayName, dataType, direction, parameterType, and description attributes. The only anomaly to this is the OUTPUT_RASTER parameter. If we look at the procedure signature, you’ll see the keyword OUTPUT_FILENAME, which is an input string telling the procedure where to write the raster to disk. When we wrap it as a task, we want to make life easier for task consumers, so we make the output parameter of type ENVIRaster, and set its keyword to OUTPUT_FILENAME. When this task template is loaded, the task framework will automatically add an input parameter of type ENVIURI that wraps this same keyword, appending the “_URI” suffix to the original parameter’s name. So the user could explicitly set the OUTPUT_RASTER_URI parameter value to control where the masked raster is written to, or they can leave it unset in which case the task framework will generate a temporary filename for that keyword.
Once you copy this task template to a file with the extension .task and the procedure PRO file to the custom code folder (C:\Program Files\Exelis\ENVI52\custom_code on Windows), or one of the other documented locations you can then use it. Here is an example showing how:
e = envi()
file = FilePath('qb_boulder_msi', ROOT_DIR=e.ROOT_DIR, SUBDIR='data')
oRaster = e.OpenRaster(file)
oTask = enviTask('maskExtremePixels')
oTask.INPUT_RASTER = oRaster
oTask.THRESHOLD_PERCENT = 0.02
oTask.DATA_IGNORE_VALUE = -1
oTask.Execute
oView = e.getView()
oLayer = oView.CreateLayer(oTask.OUTPUT_RASTER)