X

NV5 Geospatial Blog

Each month, NV5 Geospatial posts new blog content across a variety of categories. Browse our latest posts below to learn about important geospatial information or use the search bar to find a specific topic or author. Stay informed of the latest blog posts, events, and technologies by joining our email list!



NV5 at ESA’s Living Planet Symposium 2025

NV5 at ESA’s Living Planet Symposium 2025

9/16/2025

We recently presented three cutting-edge research posters at the ESA Living Planet Symposium 2025 in Vienna, showcasing how NV5 technology and the ENVI® Ecosystem support innovation across ocean monitoring, mineral exploration, and disaster management. Explore each topic below and access the full posters to learn... Read More >

Monitor, Measure & Mitigate: Integrated Solutions for Geohazard Risk

Monitor, Measure & Mitigate: Integrated Solutions for Geohazard Risk

9/8/2025

Geohazards such as slope instability, erosion, settlement, or seepage pose ongoing risks to critical infrastructure. Roads, railways, pipelines, and utility corridors are especially vulnerable to these natural and human-influenced processes, which can evolve silently until sudden failure occurs. Traditional ground surveys provide only periodic... Read More >

Geo Sessions 2025: Geospatial Vision Beyond the Map

Geo Sessions 2025: Geospatial Vision Beyond the Map

8/5/2025

Lidar, SAR, and Spectral: Geospatial Innovation on the Horizon Last year, Geo Sessions brought together over 5,300 registrants from 159 countries, with attendees representing education, government agencies, consulting, and top geospatial companies like Esri, NOAA, Airbus, Planet, and USGS. At this year's Geo Sessions, NV5 is... Read More >

Not All Supernovae Are Created Equal: Rethinking the Universe’s Measuring Tools

Not All Supernovae Are Created Equal: Rethinking the Universe’s Measuring Tools

6/3/2025

Rethinking the Reliability of Type 1a Supernovae   How do astronomers measure the universe? It all starts with distance. From gauging the size of a galaxy to calculating how fast the universe is expanding, measuring cosmic distances is essential to understanding everything in the sky. For nearby stars, astronomers use... Read More >

Using LLMs To Research Remote Sensing Software: Helpful, but Incomplete

Using LLMs To Research Remote Sensing Software: Helpful, but Incomplete

5/26/2025

Whether you’re new to remote sensing or a seasoned expert, there is no doubt that large language models (LLMs) like OpenAI’s ChatGPT or Google’s Gemini can be incredibly useful in many aspects of research. From exploring the electromagnetic spectrum to creating object detection models using the latest deep learning... Read More >

1345678910Last
«September 2025»
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011
8225 Rate this article:
3.0

String processing performance in IDL

Anonym

IDL performs array based operations very efficiently, but most processing tasks do require some amount of string parsing and manipulation. I have selected 3 common string processing tasks to analyze in more depth in order to find the best string processing strategies in each of these cases. The first example is to find all the strings that start with a given substring. IDL 8.4 has many new intrinsic methods for string type variables, and one of them is "StartsWith". Here is the code I used to compare 4 different approaches to find out which strings in a string array starts with the word "end".

pro StrTest_StartsWith

 compile_opt idl2,logical_predicate

 

 f = file_which('amoeba.pro')

 str = strarr(file_lines(f))

 openr, lun, f, /get_lun

 readf, lun, str

 free_lun, lun

 

 first = str.StartsWith('end')

 n = 50000

 times = dblarr(4)

 methods = ['StartsWith','STRCMP','STREGEX','STRPOS']

 for method=0,3 do begin

   t0 = tic()

   case method of

   0: for i=0, n-1 do x = str.StartsWith('end')

   1: for i=0, n-1 do x = strcmp(str,'end',3)

   2: for i=0, n-1 do x = stregex(str,'^end',/boolean)

   3: for i=0, n-1 do x = strpos(str,'end') eq 0

   endcase

   times[method] = toc(t0)

   print, array_equal(x,first) ? 'Same answer' : 'Different answer'

 endfor

 print, string(methods[sort(times)] + ':', format='(a-15)') + $

   string(times[sort(times)], format='(g0)'), $

   format='(a)'

end

The first method is to use the new intrinsic "StartsWith" method, the next is to use STRCMP with a 3rd argument specifying how many characters to compare. The third method uses a regular expression with STREGEX, and the final method uses STRPOS and compare the result to 0, meaning the pattern was found starting at position 0. The result I get when I run this code in IDL 8.4 is:

Same answer

Same answer

Same answer

Same answer

STRCMP:        0.128

StartsWith:    0.147

STRPOS:        0.91

STREGEX:      1.497

All methods return a byte array of zeros and ones indicating where the matches are. STRCMP with 3 arguments ended up being the fastest, with the new "StartsWith" method being a close second. STREGEX should be avoided unless it is really needed for a more complex expression.

In this second example, the goal is to replace the first occurrence of an equal sign (=) with a color (:) on every line that contains at least one equal (=) sign. If there are additional equal signs, they should remain unchanged. This is mostly useful for converting the format of name/value pairs stored in a text file. I used 4 different methods to achieve the same result:

pro StrTest_Substring

 compile_opt idl2,logical_predicate

 

 f = file_which('amoeba.pro')

 str = strarr(file_lines(f))

 openr, lun, f, /get_lun

 readf, lun, str

 free_lun, lun

 

 n = 2000

 index = str.IndexOf('=')

 w = where(index ne -1)

 index = index[w]

 first = str

 first[w] = str[w].Substring(0,index-1)+':'+str[w].Substring(index+1)

 methods = ['Substring','STRPUT','Split/Join','BYTARR']

 times = dblarr(4)

 for method=0,3 do begin

   t0 = tic()

   case method of

     0: for i=0, n-1 do begin

       index = str.IndexOf('=')

       w = where(index ne -1)

       index = index[w]

       y = str[w]

       x = str

       x[w] = y.SubString(0,index-1)+':'+y.SubString(index+1)

     endfor

     1: for i=0, n-1 do begin

       x = str

       pos = strpos(str,'=')

       foreach xx, x, j do begin

          if pos[j] ne -1 then begin

            strput, xx, ':', pos[j]

            x[j] = xx

          endif

       endforeach

     endfor

     2: for i=0, n-1 do begin

       x = str

       foreach xx, x, j do begin

          parts = xx.Split('=')

          if parts.length gt 1 then x[j] = ([parts[0],parts[1:*].join('=')]).join(':')

       endforeach

     endfor

     3: for i=0, n-1 do begin

       b = byte(str)

       b[maxInd[where(max(b eq 61b, dimension=1, maxInd))]] = 58b

       x = string(b)

     endfor

   endcase

   times[method] = toc(t0)

   print, array_equal(x,first) ? 'Same answer' : 'Different answer'

 endfor

 print, string(methods[sort(times)] + ':', format='(a-15)') + $

   string(times[sort(times)], format='(g0)'), $

   format='(a)'

 

end

Same answer

Same answer

Same answer

Same answer

BYTARR:        0.148

STRPUT:        0.187

Substring:     0.188

Split/Join:   1.456

The cryptic byte array method ended up being the fastest, even though it does perform a lot of copying, and doesn't contain any obvious string processing functions. This is because IDL can run operations on arrays very efficiently to speed up the computations. For example, the internal array indexing gives good predictable memory access patterns. However, I would not really recommend using this approach here, since the code is very hard to understand, and to modify if needed. I would also avoid using the SPLIT/JOIN approach as that is very inefficient. Using "IndexOf" and "Substring" is nice here, especially notice that the "Substring" method is similar to STRMID, but can handle an array of different positions matching the size of the string array. This is a significant improvement over the old STRMID. For example, to extract the beginnings of every string up and including the first "e", you could use:

IDL> a=['!Hello!', 'test','this one!']

IDL> a.Substring(0,a.IndexOf('e'))

!He

te

this one

Or, to extract the characters after the first colon:

IDL> x = ((orderedhash(!cpu))._overloadPrint())

IDL> x

HW_VECTOR:            0

VECTOR_ENABLE:            0

HW_NCPU:            6

TPOOL_NTHREADS:            6

TPOOL_MIN_ELTS:                 100000

TPOOL_MAX_ELTS:                      0

IDL> x.Substring(x.IndexOf(':'))

:            0

:            0

:            6

:            6

:                 100000

:                     0

The final example is replacing every occurrence of = with =>. I used 2 different methods for this, using the new "Replace"method on string types, and using STRSPLIT/STRJOIN. The results show that the new Replace method is much more efficient.

pro StrTest_Replace

 compile_opt idl2,logical_predicate

 

  f = file_which('amoeba.pro')

 str = strarr(file_lines(f))

 openr, lun, f, /get_lun

 readf, lun, str

 free_lun, lun

 

 n = 5000

 first = str.Replace('=', '=>')

 methods = ['Replace','STRSPLIT']

 times = dblarr(2)

 for method=0,1 do begin

   t0 = tic()

   case method of

     0: for i=0, n-1 do begin

       x = str.Replace('=','=>')

     endfor

     1: for i=0, n-1 do begin

       x = str

       foreach xx, x, j do x[j] = strjoin(strsplit(xx,'=',/extract),'=>')

     endfor

   endcase

   times[method] = toc(t0)

   print, array_equal(x,first) ? 'Same answer' : 'Different answer'

 endfor

 print, string(methods[sort(times)] + ':', format='(a-15)') + $

   string(times[sort(times)], format='(g0)'), $

   format='(a)'

end

Same answer

Same answer

Replace:       0.545

STRSPLIT:     2.778

Please login or register to post comments.