X

NV5 Geospatial Blog

Each month, NV5 Geospatial posts new blog content across a variety of categories. Browse our latest posts below to learn about important geospatial information or use the search bar to find a specific topic or author. Stay informed of the latest blog posts, events, and technologies by joining our email list!



Not All Supernovae Are Created Equal: Rethinking the Universe’s Measuring Tools

Not All Supernovae Are Created Equal: Rethinking the Universe’s Measuring Tools

6/3/2025

Rethinking the Reliability of Type 1a Supernovae   How do astronomers measure the universe? It all starts with distance. From gauging the size of a galaxy to calculating how fast the universe is expanding, measuring cosmic distances is essential to understanding everything in the sky. For nearby stars, astronomers use... Read More >

Using LLMs To Research Remote Sensing Software: Helpful, but Incomplete

Using LLMs To Research Remote Sensing Software: Helpful, but Incomplete

5/26/2025

Whether you’re new to remote sensing or a seasoned expert, there is no doubt that large language models (LLMs) like OpenAI’s ChatGPT or Google’s Gemini can be incredibly useful in many aspects of research. From exploring the electromagnetic spectrum to creating object detection models using the latest deep learning... Read More >

From Image to Insight: How GEOINT Automation Is Changing the Speed of Decision-Making

From Image to Insight: How GEOINT Automation Is Changing the Speed of Decision-Making

4/28/2025

When every second counts, the ability to process geospatial data rapidly and accurately isn’t just helpful, it’s critical. Geospatial Intelligence (GEOINT) has always played a pivotal role in defense, security, and disaster response. But in high-tempo operations, traditional workflows are no longer fast enough. Analysts are... Read More >

Thermal Infrared Echoes: Illuminating the Last Gasp of a Dying Star

Thermal Infrared Echoes: Illuminating the Last Gasp of a Dying Star

4/24/2025

This blog was written by Eli Dwek, Emeritus, NASA Goddard Space Flight Center, Greenbelt, MD and Research Fellow, Center for Astrophysics, Harvard & Smithsonian, Cambridge, MA. It is the fifth blog in a series showcasing our IDL® Fellows program which supports passionate retired IDL users who may need support to continue their work... Read More >

A New Era of Hyperspectral Imaging with ENVI® and Wyvern’s Open Data Program

A New Era of Hyperspectral Imaging with ENVI® and Wyvern’s Open Data Program

2/25/2025

This blog was written in collaboration with Adam O’Connor from Wyvern.   As hyperspectral imaging (HSI) continues to grow in importance, access to high-quality satellite data is key to unlocking new insights in environmental monitoring, agriculture, forestry, mining, security, energy infrastructure management, and more.... Read More >

1345678910Last
13530 Rate this article:
3.0

An LSD radix sort algorithm in IDL

Anonym

(Note: I'm at the HDF and HDF-EOS Workshop XV this week, so today I have a guest post by Atle Borsholm, a Senior Consultant in the Exelis VIS Professional Services Group. Atle is an IDL master. I’m trying to persuade him to share some of the cool programs he’s written on this blog. Here’s one.  –MP) The IDL SORT function uses the quicksort (or qsort) algorithm. This is a memory-efficient sorting algorithm that handles any data type that supports comparison (e.g., operators like LT, GT, EQ). However, it can be slow for large input data. Wikipedia says the [proportional] speed of quicksort can range from n*log(n) to n^2 (where n is the number of elements). My main motivation for implementing a least significant digit (LSD) radix sort algorithm in IDL was to see if I could get a speed improvement, although a bonus with this algorithm is that it’s a so-called “stable” sort algorithm, meaning that the relative order of equal entries is retained. This is unfortunately not the case with IDL’s SORT. For example:

IDL> print, sort([2,2,3,2,3,3]), format='(6(i0,x))'
1 3 0 4 5 2

The first three indices in the result [1,3,0] all correspond to the number 2, which in a stable sort would be returned as [0,1,3]. The same goes for the 3s, which should be returned as [2,4,5]. So, a stable sort would give a result of [0,1,3,2,4,5] for this array. An LSD radix sort is designed primarily for arrays of unsigned integer types of any length. The idea is to start sorting on the least significant digit of each array element. Since the digits have a fixed range, a histogram is used to place the entries into bins without regard for the other entries (except retaining the original order within each bin for stability). The algorithm continues by sorting on the next significant digit and so on until there are no more digits. Here’s the first pass:

function radix_sort_r1, data, radix=radix
   compile_opt idl2, logical_predicate

   ; default radix if not specified
   if ~keyword_set(radix) then radix = 256
   radix = long64(radix)

   sorted = data
   mx = max(sorted)
   factor = 1ull
   repeat begin
      mx /= radix
      rem = sorted / factor
      digit = rem mod radix
      factor *= radix
      h = histogram(digit, min=0, max=radix-1, binsize=1, reverse_indices=ri)
      sorted = sorted[ri[radix+1:*]]
   endrep until mx eq 0

   return, sorted
end

Test RADIX_SORT_R1 on the array used above:

IDL> print, radix_sort_r1([2,2,3,2,3,3]), format='(6(i0,x))'
2 2 2 3 3 3

Sorting an array in-place is nice, but in most cases we want the list of indices — the way IDL’s SORT function returns its result — so we can use the indices on other arrays. So, adding a few lines of code will also support the index as an output:

function radix_sort_r2, data, radix=radix, index=index
   compile_opt idl2, logical_predicate

   ; default radix if not specified
   if ~keyword_set(radix) then radix = 256
   radix = long64(radix)

   ; index output is requested
   if arg_present(index) then begin
      index = lindgen(n_elements(data))
   endif

   sorted = data
   mx = max(sorted)
   factor = 1ull
   repeat begin
      mx /= radix
      rem = sorted / factor
      digit = rem mod radix
      factor *= radix
      h = histogram(digit, min=0, max=radix-1, binsize=1, reverse_indices=ri)
      sorted = sorted[ri[radix+1:*]]
      if arg_present(index) then index = index[ri[radix+1:*]]
   endrep until mx eq 0

   return, sorted
end

Test on the array used above:

IDL> void = radix_sort_r2([2,2,3,2,3,3], index=index)
IDL> print, index, format='(6(i0,x))'
0 1 3 2 4 5

This is the result as expected from a stable sort algorithm. There’s a problem with data type support, though; the above sort only works for non-negative integers. There are some tricks that can be performed on the data to expand support to include negative integers. Basically, we can achieve this by converting to unsigned integer type and shifting by half the integer range:

   ; pre-processing to support signed ints
   case typename(data) of
      'INT': sorted = uint(data) + ishft(1us, 15)
      'LONG': sorted = ulong(data) + ishft(1ul, 31)
      'LONG64': sorted = ulong64(data) + ishft(1ull, 63)
      else: sorted = data
   endcase

And, at the end, insert:

   case typename(data) of
      'INT': sorted = fix(sorted - ishft(1us, 15))
      'LONG': sorted = long(sorted - ishft(1ul, 31))
      'LONG64': sorted = long64(sorted - ishft(1ull, 63))
      else:
   endcase

These changes allow negative integers to be sorted. But what about floating point numbers? There’s a more complicated trick that will allow support for floats, at least finite ones. The IEEE standard floating point numbers are ordered in a fairly logical bitwise fashion, so by “casting” the data from double to unsigned 64-bit integers we can manipulate the bits to achieve integers that retain the same order. Here’s the code I added to the CASE statement for pre-processing:

   
      'DOUBLE': begin
         sorted = ulong64(data,0,n_elements(data))
         ; IEEE does not use 2's complement for negative numbers, so flip the
         ; bits for the negative numbers with this trick. This may not be
         ; 100% IEEE compliant sorting, but it works well for finite numbers.
         sorted xor= ((sorted and ishft(1ull,63)) ne 0) * (not ishft(1ull,63))
         sorted += ishft(1ull, 63)
      end
      'FLOAT': begin
         sorted = ulong(data,0,n_elements(data))
         sorted xor= ((sorted and ishft(1ul,31)) ne 0) * (not ishft(1ul,31))
         sorted += ishft(1ul, 31)
      end

And for post processing back into the floating point types:

      'DOUBLE': begin
         sorted -= ishft(1ull, 63)
         sorted xor= ((sorted and ishft(1ull,63)) ne 0) * (not ishft(1ull,63))
         sorted = double(sorted, 0, n_elements(data))
      end
      'FLOAT': begin
         sorted -= ishft(1ul, 31)
         sorted xor= ((sorted and ishft(1ul,31)) ne 0) * (not ishft(1ul,31))
         sorted = float(sorted, 0, n_elements(data))
      end

Now that the LSD radix sort program is complete, we can test its performance:

IDL> data = randomu(1, 50000000, /long) ; 50 million long integers
IDL> t0 = systime(1) & sort_output = sort(data) & print, systime(1)-t0
      31.062000
IDL> t0 = systime(1) & radix_output = radix_sort(data, index=index) & print, systime(1)-t0
      13.468000
IDL> print, array_equal(data[sort_output], radix_output), array_equal(sort_output, index)
   1   1

RADIX_SORT is more than twice as fast as SORT in this example. The results are identical, even the indices, because RANDOMU doesn’t produce duplicates. If you want to trade memory for speed, you can up the radix from 256 (the default) to 65536. Then RADIX_SORT is even faster:

IDL> t0=systime(1) & radix_output = radix_sort(data, index=index, radix=65536) & print, systime(1)-t0
      8.9219999

Here’s a test with floating-point numbers:

IDL> data = randomn(1, 10000000) ; 10 million floats
IDL> t0=systime(1) & sort_output = sort(data) & print, systime(1)-t0
      4.8600001
IDL> t0=systime(1) & radix_output = radix_sort(data, index=index, radix=65536) & print, systime(1)-t0
      2.0630000
IDL> print, array_equal(data[sort_output], radix_output), array_equal(sort_output, index)
   1   0

Note that the RADIX_SORT output may have duplicate indices, but the sorted outputs are the same. Here’s a final test using integers with lots of duplicates:

IDL> data = long(randomn(1, 30000000)*300) ; 30 million longs with duplicates
IDL> t0=systime(1) & sort_output = sort(data) & print, systime(1)-t0
      10.750000
IDL> t0=systime(1) & radix_output = radix_sort(data, index=index, radix=65536) & print, systime(1)-t0
       2.7970002
IDL> print, array_equal(data[sort_output], data[index])
   1

In this last example we got greater than 3x speed improvement with the added bonus of a stable sort. The final code listing can be downloaded here. (Thanks, Atle! –MP)

Please login or register to post comments.