Data Structure Analysis
Anonym
One of the main questions anybody using a
programming language has to ask themselves is "what data structures should
I be using?" This can be a complicated and difficult question as there
are many trade-offs to consider. If it is desired to have a dynamic data type,
IDL provides multiple options. In this analysis we will consider: dynamic
arrays, lists, hashes, and ordered hashes and their ability to insert and
delete elements. To start, let's review our data structures. Dynamic arrays
are based on the ability for IDL arrays to resize themselves. For example:
array = [1,2,3,4] ; Declaration
array = [array, 5] ; Insert
array = [array[0:1],array[3:*]] ; Remove
Lists use the IDL object LIST:
list = LIST(1,2,3,4) ; Declaration
list.add,5 ; Insert
list.remove, 2 ; Remove
Hashes use the IDL object HASH:
hash = HASH([1,2,3,4],[1,2,3,4]) ; Declaration
hash[5] = 5 ; Insert
hash.remove, 2 ; Remove
Ordered hashes use the IDL object ORDEREDHASH:
ohash = ORDEREDHASH([1,2,3,4],[1,2,3,4]) ; Declaration
ohash[5] = 5 ; Insert
ohash.remove, 2 ; Remove
For each data structure we will time how long it takes to insert
and remove n elements (Note: the inserts/removals are done inside of a
FOR loop, one insert/removal per iteration. This is done to simulate an
application which expects a high degree of volatility in the use of their data
structures. However, since IDL is a vectorized language, it is always best to
try to group multiple operations into a single call). Please see the attached
plots for the results of the runs. The results are what we would expect from a
simple big-O analysis. Dynamic arrays are comparable for small input sizes,
however, as soon as the size of the input grows, it becomes much faster to use
a hash (either type) or a list. In terms of pure speed for any arbitrary input
size, list is the fastest. However, if you know your input bounded to a few
elements, all of the proposed data structures can offer a similar performance.
Note: For this analysis, all the data structures had similar
performance up to 10,000 elements. This is in no way a comprehensive test and
the results may differ on your system. However, the rule of thumb I follow is,
if your input is less than 10,000 elements choose the data structure which is
the easiest for you to work with.