X
No rating

# Finding the Nth ordered element in a large array

Anonym

A common task when working with large arrays is to find the Nth array value in the ordered array. This can be useful for finding the Nth smallest or largest pixel value, as well as for statistical analysis of floating point data samples, (i.e. find the 95% percentile or similar). The shortest IDL code for finding the Nth value in the ordered sequence is only 2 lines of actual code. Here is a short function that accomplishes this:

;+

; Returns the Nth number in theordered sequence

;-

function ordinal_1, array, N

compile_opt idl2,logical_predicate

s = sort(array)

return, array[s[N]]

end

However, because sort is an expensive computation, it runs fairly slow, especially, when the array gets larger. I did the following time test.

IDL> tic & x = ordinal_1(a, 123456) & toc & print, x

% Time elapsed: 3.7200000 seconds.

150.000

In my case it took 3.72 seconds to find the 123456th smallest array element. The MEDIAN function in IDL, returns the central element in the ordered sequence without doing a full sorting. It is much faster than sorting, because it doesn't have to keep track of all elements and their ordered positions. It only cares about element N/2 in the ordered array. In the following example, repeated calls to MEDIAN and reducing the array size in half every iteration, is used to find the Nth element in the ordered sequence. The code is much longer than the code above, but it does end up running faster:

;+

; Returns the Nth number in theordered sequence.

;

; Uses repeated median.

;-

function ordinal_2, array, N

compile_opt idl2,logical_predicate

na = n_elements(array)

type =size(array, /type)

target_index = N

tmp = arg_present(array) ? array : temporary(array)

ntmp = na

while ntmp ne target_index do begin

ntmp = n_elements(tmp)

val = fix(median(tmp), type=type)

if target_index gt ntmp/2 then begin

tmp = tmp[where(tmp gt val, count)]

target_index -= ntmp-count

endif else if target_index lt ntmp+1/2 then begin

tmp = tmp[where(tmp lt val, count)]

endif else break

if target_index lt 0 then break

if target_index ge count then break

if target_index eq 0 then begin

val = min(tmp)

break

endif

if target_index eq count-1 then begin

val = max(tmp)

break

endif

endwhile

return, val

end

This is the same time test as with the short code:

IDL> tic & x = ordinal_2(a, 123456) & toc & print, x

% Time elapsed: 0.57999992 seconds.

150.000

As can be seen here, the time saving is significant, it goes from 3.72 to 0.58 seconds, and as the array grows larger, the savings can get more significant. This function works for numeric data types such as floating point and integer arrays.

PRODUCTS

SOLUTIONS

LEARN