7463 Rate this article:
No rating

Optimizing Max Kernel Operation in IDL


I found this optimization question on the comp.lang.idl-pvwave, and decided to give it a try. The question was to implement an algorithm that replaces every element in a 2-D array with its neighborhood maximum value. In this case the neighborhood size was 101x101 (i.e. -50 to +50), and the array size was 3200x3248. The nested FOR loop approach looks like the following code snippet.

  data = randomu(seed,3200,3248)
  dim = size(data,/dimension)
  nx = dim[0]
  ny = dim[1]
  t0 = tic('Nested FOR')
  result2 = data
  for i=0,nx-1 do begin
    for j=0,ny-1 do begin
      result2[i,j] = max(data[(i-50)>0:(i+50)<(nx-1),(j-50)>0:(j+50)<(ny-1)])

My first thought was to use the > operator which returns the maximum of 2 arguments. It operates on arrays, and in conjunction with the SHIFT function it serves to return the larger of 2 neighbors. The other trick here is that since we are looking for a 101x101 neighborhood maximum, we can use a combination of smaller neighborhood maxima as input in a structured way in order to achieve the exact 101x101 neighborhood size. The code that I ended up with after some trial and error was the following.

  t0 = tic('Iterative >')
  ; Using SHIFT and > in an iterative way
  padded = replicate(min(data),size(data,/dimension)+100)
  padded[50,50] = data
  tmp3 = shift(padded,1,0) > padded > shift(padded,-1,0)
  tmp9 = shift(tmp3,3,0) > tmp3 > shift(tmp3,-3,0)
  tmp27 = shift(tmp9,9,0) > tmp9 > shift(tmp9,-9,0)
  tmp81 = shift(tmp27,27,0) > tmp27 > shift(temporary(tmp27),-27,0)
  tmp99 = shift(tmp9,44,0) > temporary(tmp81) > shift(temporary(tmp9),-44,0)
  tmp101 = shift(tmp3,49,0) > temporary(tmp99) > shift(temporary(tmp3),-49,0)
  ; Same for Y-dim
  tmp3 = shift(tmp101,0,1) > tmp101 > shift(temporary(tmp101),0,-1)
  tmp9 = shift(tmp3,0,3) > tmp3 > shift(tmp3,0,-3)
  tmp27 = shift(tmp9,0,9) > tmp9 > shift(tmp9,0,-9)
  tmp81 = shift(tmp27,0,27) > tmp27 > shift(temporary(tmp27),0,-27)
  tmp99 = shift(tmp9,0,44) > temporary(tmp81) > shift(temporary(tmp9),0,-44)
  tmp101 = shift(tmp3,0,49) > temporary(tmp99) > shift(temporary(tmp3),0,-49)
  result1 = (temporary(tmp101))[50:50+nx-1,50:50+ny-1]

I didn’t say that optimized code always looks pretty, but the goal here is to run fast. Adding in some result comparison checking to make sure the results are equivalent.

  print, array_equal(result1,result2) ? 'Results are matching' : 'SOMETHING went wrong'

Finally, here are the results, which yielded an impressive amount of speed-up, the execution time went from 189.4 seconds to 1.4 seconds, and the results are identical:

  % Time elapsed Nested FOR: 189.39470 seconds.
  % Time elapsed Iterative >: 1.4241931 seconds.
  Results are matching