Finding the Nth ordered element in a large array
Anonym
A common task when working with large arrays is to find the Nth array value in the ordered array. This can be useful for finding the Nth smallest or largest pixel value, as well as for statistical analysis of floating point data samples, (i.e. find the 95% percentile or similar). The shortest IDL code for finding the Nth value in the ordered sequence is only 2 lines of actual code. Here is a short function that accomplishes this:
;+
; Returns the Nth number in theordered sequence
;-
function ordinal_1, array, N
compile_opt idl2,logical_predicate
s = sort(array)
return, array[s[N]]
end
However, because sort is an expensive computation, it runs fairly slow, especially, when the array gets larger. I did the following time test.
IDL> a = total(read_image(filepath('ohare.jpg',subdir=['examples','data'])),1)
IDL> tic & x = ordinal_1(a, 123456) & toc & print, x
% Time elapsed: 3.7200000 seconds.
150.000
In my case it took 3.72 seconds to find the 123456th smallest array element. The MEDIAN function in IDL, returns the central element in the ordered sequence without doing a full sorting. It is much faster than sorting, because it doesn't have to keep track of all elements and their ordered positions. It only cares about element N/2 in the ordered array. In the following example, repeated calls to MEDIAN and reducing the array size in half every iteration, is used to find the Nth element in the ordered sequence. The code is much longer than the code above, but it does end up running faster:
;+
; Returns the Nth number in theordered sequence.
;
; Uses repeated median.
;-
function ordinal_2, array, N
compile_opt idl2,logical_predicate
na = n_elements(array)
type =size(array, /type)
target_index = N
tmp = arg_present(array) ? array : temporary(array)
ntmp = na
while ntmp ne target_index do begin
ntmp = n_elements(tmp)
val = fix(median(tmp), type=type)
if target_index gt ntmp/2 then begin
tmp = tmp[where(tmp gt val, count)]
target_index -= ntmp-count
endif else if target_index lt ntmp+1/2 then begin
tmp = tmp[where(tmp lt val, count)]
endif else break
if target_index lt 0 then break
if target_index ge count then break
if target_index eq 0 then begin
val = min(tmp)
break
endif
if target_index eq count-1 then begin
val = max(tmp)
break
endif
endwhile
return, val
end
This is the same time test as with the short code:
IDL> tic & x = ordinal_2(a, 123456) & toc & print, x
% Time elapsed: 0.57999992 seconds.
150.000
As can be seen here, the time saving is significant, it goes from 3.72 to 0.58 seconds, and as the array grows larger, the savings can get more significant. This function works for numeric data types such as floating point and integer arrays.