Optimizing Max Kernel Operation in IDL
Anonym
I found this optimization question on the
comp.lang.idl-pvwave, and decided to give it a try. The question was to implement
an algorithm that replaces every element in a 2-D array with its neighborhood
maximum value. In this case the neighborhood size was 101x101 (i.e. -50 to +50),
and the array size was 3200x3248. The nested FOR loop approach looks like the
following code snippet.
data = randomu(seed,3200,3248)
dim = size(data,/dimension)
nx = dim[0]
ny = dim[1]
t0 = tic('Nested FOR')
result2 = data
for i=0,nx-1 do begin
for j=0,ny-1 do begin
result2[i,j] = max(data[(i-50)>0:(i+50)<(nx-1),(j-50)>0:(j+50)<(ny-1)])
endfor
endfor
toc,t0
My first thought was to use the > operator which returns
the maximum of 2 arguments. It operates on arrays, and in conjunction with the
SHIFT function it serves to return the larger of 2 neighbors. The other trick
here is that since we are looking for a 101x101 neighborhood maximum, we can
use a combination of smaller neighborhood maxima as input in a structured way
in order to achieve the exact 101x101 neighborhood size. The code that I ended
up with after some trial and error was the following.
t0 = tic('Iterative >')
; Using SHIFT and > in an iterative way
padded = replicate(min(data),size(data,/dimension)+100)
padded[50,50] = data
tmp3 = shift(padded,1,0) > padded > shift(padded,-1,0)
tmp9 = shift(tmp3,3,0) > tmp3 > shift(tmp3,-3,0)
tmp27 = shift(tmp9,9,0) > tmp9 > shift(tmp9,-9,0)
tmp81 = shift(tmp27,27,0) > tmp27 > shift(temporary(tmp27),-27,0)
tmp99 = shift(tmp9,44,0) > temporary(tmp81) > shift(temporary(tmp9),-44,0)
tmp101 = shift(tmp3,49,0) > temporary(tmp99) > shift(temporary(tmp3),-49,0)
; Same for Y-dim
tmp3 = shift(tmp101,0,1) > tmp101 > shift(temporary(tmp101),0,-1)
tmp9 = shift(tmp3,0,3) > tmp3 > shift(tmp3,0,-3)
tmp27 = shift(tmp9,0,9) > tmp9 > shift(tmp9,0,-9)
tmp81 = shift(tmp27,0,27) > tmp27 > shift(temporary(tmp27),0,-27)
tmp99 = shift(tmp9,0,44) > temporary(tmp81) > shift(temporary(tmp9),0,-44)
tmp101 = shift(tmp3,0,49) > temporary(tmp99) > shift(temporary(tmp3),0,-49)
result1 = (temporary(tmp101))[50:50+nx-1,50:50+ny-1]
toc,t0
I didn’t say that optimized code always looks pretty, but
the goal here is to run fast. Adding in some result comparison checking to make
sure the results are equivalent.
print, array_equal(result1,result2)
? 'Results are matching' : 'SOMETHING
went wrong'
Finally, here are the results, which yielded an impressive
amount of speed-up, the execution time went from 189.4 seconds to 1.4 seconds,
and the results are identical:
%
Time elapsed Nested FOR: 189.39470 seconds.
%
Time elapsed Iterative >: 1.4241931 seconds.
Results
are matching