Data Structure Analysis

Anonym Thursday, February 12, 2015

One of the main questions anybody using a programming language has to ask themselves is "what data structures should I be using?" This can be a complicated and difficult question as there are many trade-offs to consider. If it is desired to have a dynamic data type, IDL provides multiple options. In this analysis we will consider: dynamic arrays, lists, hashes, and ordered hashes and their ability to insert and delete elements. To start, let's review our data structures. Dynamic arrays are based on the ability for IDL arrays to resize themselves. For example:

array = [1,2,3,4] ; Declaration

array = [array, 5] ; Insert

array = [array[0:1],array[3:*]] ; Remove

Lists use the IDL object LIST:

list = LIST(1,2,3,4) ; Declaration

list.add,5 ; Insert

list.remove, 2 ; Remove

Hashes use the IDL object HASH:

hash = HASH([1,2,3,4],[1,2,3,4]) ; Declaration

hash[5] = 5 ; Insert

hash.remove, 2 ; Remove

Ordered hashes use the IDL object ORDEREDHASH:

ohash = ORDEREDHASH([1,2,3,4],[1,2,3,4]) ; Declaration

ohash[5] = 5 ; Insert

ohash.remove, 2 ; Remove

For each data structure we will time how long it takes to insert and remove n elements (Note: the inserts/removals are done inside of a FOR loop, one insert/removal per iteration. This is done to simulate an application which expects a high degree of volatility in the use of their data structures. However, since IDL is a vectorized language, it is always best to try to group multiple operations into a single call). Please see the attached plots for the results of the runs. The results are what we would expect from a simple big-O analysis. Dynamic arrays are comparable for small input sizes, however, as soon as the size of the input grows, it becomes much faster to use a hash (either type) or a list. In terms of pure speed for any arbitrary input size, list is the fastest. However, if you know your input bounded to a few elements, all of the proposed data structures can offer a similar performance.

Note: For this analysis, all the data structures had similar performance up to 10,000 elements. This is in no way a comprehensive test and the results may differ on your system. However, the rule of thumb I follow is, if your input is less than 10,000 elements choose the data structure which is the easiest for you to work with.

New Features Coming Soon to IDL - With a Few Brief Examples The Real Value of Remotely Sensed Data