X
11831 Rate this article:
No rating

Finding the Nth ordered element in a large array

Anonym

A common task when working with large arrays is to find the Nth array value in the ordered array. This can be useful for finding the Nth smallest or largest pixel value, as well as for statistical analysis of floating point data samples, (i.e. find the 95% percentile or similar). The shortest IDL code for finding the Nth value in the ordered sequence is only 2 lines of actual code. Here is a short function that accomplishes this:

 

;+

; Returns the Nth number in theordered sequence

;-

function ordinal_1, array, N

 compile_opt idl2,logical_predicate

 s = sort(array)

 return, array[s[N]]

end

 

However, because sort is an expensive computation, it runs fairly slow, especially, when the array gets larger. I did the following time test.

IDL> a = total(read_image(filepath('ohare.jpg',subdir=['examples','data'])),1)

IDL> tic & x = ordinal_1(a, 123456) & toc & print, x

% Time elapsed: 3.7200000 seconds.

      150.000

In my case it took 3.72 seconds to find the 123456th smallest array element. The MEDIAN function in IDL, returns the central element in the ordered sequence without doing a full sorting. It is much faster than sorting, because it doesn't have to keep track of all elements and their ordered positions. It only cares about element N/2 in the ordered array. In the following example, repeated calls to MEDIAN and reducing the array size in half every iteration, is used to find the Nth element in the ordered sequence. The code is much longer than the code above, but it does end up running faster:

;+

; Returns the Nth number in theordered sequence.

;

; Uses repeated median.

;-

function ordinal_2, array, N

 compile_opt idl2,logical_predicate

 na = n_elements(array)

 type =size(array, /type)

 target_index = N

 tmp = arg_present(array) ? array : temporary(array)

 ntmp = na

 while ntmp ne target_index do begin

   ntmp = n_elements(tmp)

   val = fix(median(tmp), type=type)

   if target_index gt ntmp/2 then begin

     tmp = tmp[where(tmp gt val, count)]

     target_index -= ntmp-count

   endif else if target_index lt ntmp+1/2 then begin

     tmp = tmp[where(tmp lt val, count)]

   endif else break

   if target_index lt 0 then break

   if target_index ge count then break

   if target_index eq 0 then begin

     val = min(tmp)

     break

   endif

   if target_index eq count-1 then begin

     val = max(tmp)

     break

   endif

 endwhile

 return, val

end

This is the same time test as with the short code:

IDL> tic & x = ordinal_2(a, 123456) & toc & print, x

% Time elapsed: 0.57999992 seconds.

      150.000

 

As can be seen here, the time saving is significant, it goes from 3.72 to 0.58 seconds, and as the array grows larger, the savings can get more significant. This function works for numeric data types such as floating point and integer arrays.