Finding the Nth ordered element in a large array
			
			
		
		
		
			
			
				
				Anonym
				
			
		
			A common task when working with large arrays is to find the Nth array value in the ordered array. This can be useful for finding the Nth smallest or largest pixel value, as well as for statistical analysis of floating point data samples, (i.e. find the 95% percentile or similar). The shortest IDL code for finding the Nth value in the ordered sequence is only 2 lines of actual code. Here is a short function that accomplishes this:
 
;+ 
; Returns the Nth number in theordered sequence 
;- 
function ordinal_1, array, N 
 compile_opt idl2,logical_predicate 
 
 s = sort(array) 
 return, array[s[N]] 
end
 
However, because sort is an expensive computation, it runs fairly slow, especially, when the array gets larger. I did the following time test.
IDL> a = total(read_image(filepath('ohare.jpg',subdir=['examples','data'])),1) 
IDL> tic & x = ordinal_1(a, 123456) & toc & print, x 
% Time elapsed: 3.7200000 seconds. 
      150.000
In my case it took 3.72 seconds to find the 123456th smallest array element. The MEDIAN function in IDL, returns the central element in the ordered sequence without doing a full sorting. It is much faster than sorting, because it doesn't have to keep track of all elements and their ordered positions. It only cares about element N/2 in the ordered array. In the following example, repeated calls to MEDIAN and reducing the array size in half every iteration, is used to find the Nth element in the ordered sequence. The code is much longer than the code above, but it does end up running faster:
;+ 
; Returns the Nth number in theordered sequence. 
;  
; Uses repeated median. 
;- 
function ordinal_2, array, N 
 compile_opt idl2,logical_predicate 
 
 na = n_elements(array) 
 type =size(array, /type) 
 target_index = N 
 tmp = arg_present(array) ? array : temporary(array) 
 ntmp = na 
 while ntmp ne target_index do begin 
   ntmp = n_elements(tmp) 
   val = fix(median(tmp), type=type) 
   if target_index gt ntmp/2 then begin 
     tmp = tmp[where(tmp gt val, count)] 
     target_index -= ntmp-count 
   endif else if target_index lt ntmp+1/2 then begin 
     tmp = tmp[where(tmp lt val, count)] 
   endif else break 
   if target_index lt 0 then break 
   if target_index ge count then break 
   if target_index eq 0 then begin 
     val = min(tmp) 
     break 
   endif 
   if target_index eq count-1 then begin 
     val = max(tmp) 
     break 
   endif 
 endwhile 
 return, val 
end
This is the same time test as with the short code:
IDL> tic & x = ordinal_2(a, 123456) & toc & print, x 
% Time elapsed: 0.57999992 seconds. 
      150.000
 
As can be seen here, the time saving is significant, it goes from 3.72 to 0.58 seconds, and as the array grows larger, the savings can get more significant. This function works for numeric data types such as floating point and integer arrays.