HISTOGRAM

The HISTOGRAM function computes the density function of Array. In the simplest case, the density function, at subscript i, is the number of Array elements in the argument with a value of i.

Let F_i= the value of element i, 0 ≤ i < n. Let H_v = result of histogram function, an integer vector. The definition of the histogram function becomes:

Note: There may not always be enough virtual memory available to find the density functions of arrays that contain a large number of bins.

Note: The data type of the values specified for BINSIZE, MAX, and MIN should match the data type of the Array argument. Since these keyword values are converted to the data type of Array, specifying mismatched data types may produce undesired results. For example, if you have a byte image as your input array, but you want to use a fractional binsize, you should convert your input array to floating-point before calling HISTOGRAM.

Tip: For bivariate probability distributions, use the HIST_2D function.

Tip: HISTOGRAM can optionally return an array containing a list of the original array subscripts that contributed to each histogram bin. This list, commonly called the reverse (or backwards) index list, efficiently determines which array elements are accumulated in a set of histogram bins. A typical application of the reverse index list is reverse histogram or scatter plot interrogation—a histogram bin or 2D scatter plot location is marked with the cursor and the original data items within that bin are highlighted.

Example

This example computes image histograms and plots the frequencies of their pixel values. Copy and paste the code into the IDL Workbench to run it.

; Read a flat binary file with an image.

file = FILE_WHICH('ctscan.dat')

scan = READ_BINARY(file, DATA_DIMS=[256,256])

; Compute the image histogram, using the default bin size of 1.

pdf = HISTOGRAM(scan, LOCATIONS=xbin)

cdf = TOTAL(pdf, /CUMULATIVE) / N_ELEMENTS(scan)

; Display the image and its histogram, setting MAX_VALUE to exclude the spike

; at 0 caused by the black pixels.

pimage = IMAGE(scan, /ORDER, LAYOUT=[3,1,1], DIMENSIONS=[4*256,256], $

   TITLE='Image')

phisto = PLOT(xbin, pdf, LAYOUT=[3,1,2], /CURRENT, XRANGE=[0,255], $

   TITLE='Histogram', XTITLE='Pixel value', YTITLE='Frequency', $

   MAX_VALUE=5e3, AXIS_STYLE=1, COLOR='red')

pcumul = PLOT(xbin, cdf, LAYOUT=[3,1,3], /CURRENT, XRANGE=[0,255], $

   TITLE='Cumulative histogram', XTITLE='Pixel value', $

   YTITLE='Cumulative frequency', $

   AXIS_STYLE=1, COLOR='blue')

The Additional Examples section contains more information on using HISTOGRAM.

Syntax

Result = HISTOGRAM( Array [, BINSIZE=value] [, INPUT=variable] [, LOCATIONS=variable] [, MAX=value] [, MIN=value] [, /NAN] [, NBINS=value] [, OMAX=variable] [, OMIN=variable] [, /L64 | REVERSE_INDICES=variable] )

Return Value

Returns a 32-bit or a 64-bit integer vector equal to the density function of the input Array.

Arguments

Array

The vector or array for which the density function is to be computed.

Keywords

BINSIZE

Set this keyword to the size of the bin to use. If this keyword is not specified, and NBINS is not set, then a bin size of 1 is used. If NBINS is set, the default is BINSIZE = (MAX – MIN) / (NBINS – 1).

Note: The data type of the value specified for BINSIZE should match the data type of the Array argument. Since BINSIZE is converted to the data type of Array, specifying mismatched data types may produce undesired results.

INPUT

Set this keyword to a named variable that contains an array to be added to the output of HISTOGRAM. The density function of Array is added to the existing contents of INPUT and returned as the result. The array is converted to longword type if necessary and must have at least as many elements as are required to form the histogram. Multiple histograms can be efficiently accumulated by specifying partial sums via this keyword.

L64

By default, the return value of HISTOGRAM is 32-bit integer when possible, and 64-bit integer if the number of elements being processed requires it. Set L64 to force 64-bit integers to be returned in all cases. L64 controls the type of Result as well as the output from the REVERSE_INDICES keyword.

Note: Only 64-bit versions of IDL are capable of creating variables requiring a 64-bit result. Check the value of !VERSION.MEMORY_BITS to see if your IDL is 64-bit or not.

LOCATIONS

Set this keyword to a named variable in which to return the starting locations for each bin. The starting locations are given by MIN + v*BINSIZE, with v = 0,1,...,NBINS-1. LOCATIONS has the same number of elements as the Result, and has the same type as the input Array.

MAX

Set this keyword to the maximum value to consider. If this keyword is not specified, Array is searched for its largest value. If this keyword is not specified, and Array is of type byte, 255 is used.

Note: The data type of the value specified for MAX should match the data type of the input array. Since MAX is converted to the data type of the input array, specifying mismatched data types may produce undesired results.

Note: If NBINS is specified, the value for MAX will be adjusted to NBINS*BINSIZE + MIN. This ensures that the last bin has the same width as the other bins. Note that this may cause additional data values (beyond the originally-specified MAX) to be included in the result.

MIN

Set this keyword to the minimum value to consider. If this keyword is not specified, and Array is of type byte, 0 is used. If this keyword is not specified and Array is not of byte type, Array is searched for its smallest value.

Note: The data type of the value specified for MIN should match the data type of the input array. Since MIN is converted to the data type of the input array, specifying mismatched data types may produce undesired results.

NAN

Set this keyword to cause the routine to check for occurrences of the IEEE floating-point values NaN or Infinity in the input data. Elements with the value NaN or Infinity are treated as missing data.

NBINS

Set this keyword to the number of bins to use. If BINSIZE is specified, the number of bins in Result is NBINS, starting at MIN and ending at MIN+(NBINS–1)*BINSIZE. If MAX is specified, the bins will be evenly spaced between MIN and MAX. It is an error to specify NBINS with both BINSIZE and MAX.

If you define NBINS without specifying MIN, MAX, and BINSIZE, the values are distributed among the (NBINS-1) with the last data value being placed in the last bin.

OMAX

Set this keyword to a named variable that will contain the starting location of the last bin in the histogram.

OMIN

Set this keyword to a named variable that will contain the starting location of the first bin in the histogram.

REVERSE_INDICES

Set this keyword to a named variable in which the list of reverse indices is returned. When possible, this list is returned as a 32-bit integer vector whose number of elements is the sum of the number of elements in the histogram, N, and the number of array elements included in the histogram, plus one. If the number of elements is too large to be contained in a 32-bit integer, or if the L64 keyword is set, REVERSE_INDICES is returned as a 64-bit integer.

The subscripts of the original array elements falling in the i^thbin, 0 ≤ i < N, are given by the expression: R(R[i] : R[i+1]-1), where R is the reverse index list. If R[i] is equal to R[i+1], no elements are present in the i^th bin.

For example, make the histogram of array A:

H = HISTOGRAM(A, REVERSE_INDICES = R)

;Set all elements of A that are in the ith bin of H to 0.

IF R[i] NE R[i+1] THEN A[R[R[I] : R[i+1]-1]] = 0

The above is usually more efficient than the following:

bini = WHERE(A EQ i, count)

IF count NE 0 THEN A[bini] = 0

Additional Examples

; Read the ENSO (El Nino Southern Oscillation) time series

;  and set up a time vector.

enso = READ_BINARY(FILE_WHICH('elnino.dat'), DATA_TYPE=4, ENDIAN='little')

delta = 0.25 ; years

time = FINDGEN(N_ELEMENTS(enso))*delta + 1871

; Calculate histogram of series using bins of given width.

binsize = 0.1 ; in dimensionless units of ENSO index.

h_enso = HISTOGRAM(enso, BINSIZE=binsize, LOCATIONS=binvals)

; Display times series and histogram.

winsize = 500

w = WINDOW(DIMENSIONS=[2*winsize, winsize])

series = PLOT(time, enso, $

  /CURRENT, $

  POSITION=[0.10, 0.10, 0.65, 0.90], $

  XSTYLE=3, $

  XTITLE='Time (years)', $

  YTITLE='ENSO Index', $

  TITLE='El Ni!Sn!R!U~!No - Southern Oscillation (ENSO) Index (1871-1996)')

; Add a dotted line to indicate the zero value.

zero1 = PLOT(series.xrange, [0,0], LINESTYLE='dotted', /OVERPLOT)

histoplot = PLOT(h_enso, binvals, $

  /CURRENT, $

  POSITION=[0.70, 0.10, 0.95, 0.90], $

  /STAIRSTEP, $

  XTITLE='Frequency', $

  TITLE='Histogram of ENSO Index Values')

; Add a dotted line to indicate the zero value.

zero2 = PLOT(histoplot.xrange, [0,0], LINESTYLE='dotted', /OVERPLOT)

Examples using NBINS:

data = FINDGEN(100)

myHist = HISTOGRAM(data, NBINS=2)

myHist2 = HISTOGRAM(data, NBINS=10)

PRINT, myHist

PRINT, myHist2

IDL displays results similar to (notice that the last bin contains just a single element):

   99            1

   11           11           11           11           11           11           11           11           11            1

Examples using NBINS with BINSIZE:

data = FINDGEN(100)

myHist = HISTOGRAM(data, BINSIZE=50, NBINS=2)

myHist2 = HISTOGRAM(data, BINSIZE=10, NBINS=10)

PRINT, myHist

PRINT, myHist2

IDL displays results similar to:

   50                    50

   10           10           10           10           10           10           10  10           10           10

Other Examples

; Create a simple, 2D dataset:
D = DIST(200)
; Plot the histogram of D with a bin size of 1 and the default
; minimum and maximum:
PLOT, HISTOGRAM(D)
; Plot a histogram considering only those values from 10 to 50 
; using a bin size of 4:
PLOT, HISTOGRAM(D, MIN = 10, MAX = 50, BINSIZE = 4)

The HISTOGRAM function can also be used to increment the elements of one vector whose subscripts are contained in another vector. To increment those elements of vector A indicated by vector B, use the command:

A = HISTOGRAM(B, INPUT=A, MIN=0, MAX=N_ELEMENTS(A)-1)

This method works for duplicate subscripts, whereas the following statement never adds more than 1 to any element, even if that element is duplicated in vector B:

A[B] = A[B]+1

For example, for the following commands:

A = LONARR(5)
B = [2,2,3]
PRINT, HISTOGRAM(B, INPUT=A, MIN=0, MAX=4)

IDL prints:

   0  0  2  1  0

The commands:

A = LONARR(5)
A[B] = A[B]+1
PRINT, A

give the result:

   0  0  1  1  0

The following example demonstrates how to use HISTOGRAM:

PRO t_histogram
data = [[-5,  4,   2,  -8,  1], $
        [ 3,  0,   5,  -5,  1], $
        [ 6, -7,   4,  -4, -8], $
        [-1, -5, -14,   2,  1]]
hist = HISTOGRAM(data)
bins = FINDGEN(N_ELEMENTS(hist)) + MIN(data)
PRINT, MIN(hist)
PRINT, bins
PLOT, bins, hist, YRANGE = [MIN(hist)-1, MAX(hist)+1], PSYM = 10, $
   XTITLE = 'Bin Number', YTITLE = 'Density per Bin'
END