X
17379 Rate this article:
3.2

IDL's HISTOGRAM function

Anonym

IDL's HISTOGRAM function

IDL's HISTOGRAM function is one of the most versatile functionsI can think of. It can be very fast and efficient for a number of common tasks.

1. Plotting a histogram is an effective way to investigate statisticalproperties of data. A probability density graph quickly shows the distribution:

IDL> a = randomn(seed, 100000)+15.5+3*randomn(seed, 100000)

IDL> p = plot(location,histogram(a,nbins=1000,location=location),'r')

histogram plot

From the histogram, you could quickly conclude that asuitable range for BYTSCL might be MIN=10.0, MAX=20.0.

2. Finding percentiles in a programmatic way can be doneusing lookups in the cumulative histogram, for example if you want to find the5% and 95% in a dataset:

IDL> a = randomn(seed, 100000)+15.5+3*randomn(seed, 100000)

IDL> location[value_locate(total(histogram(a,nbins=1000,location=location),/cumulative)/a.length,[0.05,0.95])]

      10.285119       20.638901

Which shows that about 90% of the values are between 10.29and 20.64.

3. Finding the most common number in an array of integers.

IDL> arr = [3,7,34,5,8,8,5,31,5,8]

IDL> location[where(histogram(arr) eq max(histogram(arr,location=location)))]

       5       8

This shows a tie between 5 and 8 for the most abundant valuein the array.

4. Sorting can also be performed with HISTOGRAM. For example2-D sorting into a grid and computing the mean "F" value for eachgrid tile:

IDL> x = 45*randomu(seed, 100000)

IDL> y = 32*randomu(seed, 100000)

IDL> f = 5.5*randomn(seed, 100000) + 16

IDL> grid_index = floor(x + floor(y)*ceil(max(x)))

IDL> h = histogram(grid_index, min=0, binsize=1,reverse_indices=rev)

IDL> f_means = dblarr(ceil([max(x),max(y)]))

IDL> for i=0,h.length-1 do if h[i] gt 0 then f_means[i] = mean(f[rev[rev[i]:rev[i+1]-1]])

Check one of the values using the slower "WHERE"approach:

IDL> f_means[6,8]

      15.784905433654785

IDL> mean(f[where(floor(x) eq 6 and floor(y) eq 8)])

       15.784905