X
PrevPrev Go to previous topic
NextNext Go to next topic
Last Post 31 Jan 2018 03:10 AM by  Niklas Keck
Performance Issues with OrderedHash
 2 Replies
Sort:
You are not authorized to post a reply.
Author Messages

Niklas Keck



New Member


Posts:15
New Member


--
16 Jan 2018 06:40 AM
    Good day,

    The time required to access the keys and values of an Orderedhash
    rises exponentially with the number of elements contained in the Orderedhash.
    This makes it unusable for large collections.

    The same is !not! true for ordinary Hash objects.
    With a hash the time required to access the keys and values
    rises approximatly linearly with the number of elements contained within the hash.
    This is what I would expect for an OrderedHash as well.

    Is this expected behaviour? Will this be fixed in future versions of IDL?
    If this is expected: Please make a notice in the documentation so others won't make the same mistake
    as I did: Using OrderedHash for large numbers of elements.

    ; =========================================================================================================
    ; Code to visualize the issue (CAUTION: May take some minutes to execute)
    ; =========================================================================================================
    lowest = 10L^4
    largest = 10L^5
    keys = INDGEN(largest)
    values = FINDGEN(largest)

    ; =========================================================================================================
    ; HASH
    ht = LIST()
    hi = LIST()
    FOR i=lowest,largest-1,1000 DO BEGIN &$
    h = HASH(keys[0:i],values[0:i]) &$
    t0=SYSTIME(/SECONDS) &$
    k = h.keys() &$
    ht.add,SYSTIME(/SECONDS)-t0 &$
    hi.add,i &$
    ENDFOR

    time_per_element = ht.toarray()/hi.toarray()
    number_of_elements = hi.toarray()
    plot(number_of_elements,time_per_element,XTITLE="number of elements in HASH",YTITLE="time required for each element",TITLE="HASH performance")

    ; =========================================================================================================
    ; OrderedHash
    oht = LIST()
    ohi = LIST()

    ; for orderedhash execute everything one magintude smaller
    FOR i=lowest*0.1,largest*0.1-1,1000*0.1 DO BEGIN &$
    h = ORDEREDHASH(keys[0:i],values[0:i]) &$
    t0=SYSTIME(/SECONDS) &$
    k = h.keys() &$
    oht.add,SYSTIME(/SECONDS)-t0 &$
    ohi.add,i &$
    ENDFOR
    time_per_element_oh = oht.toarray()/ohi.toarray()
    number_of_elements_oh = ohi.toarray()
    plot(number_of_elements_oh,time_per_element_oh,XTITLE="number of elements in ORDEREDHASH",YTITLE="time required for each element",TITLE="ORDEREDHASH performance")

    David Starbuck



    Basic Member


    Posts:143
    Basic Member


    --
    23 Jan 2018 02:25 PM
    The OrderedHash contains additional information (the order in which things are added the hash) that is not stored by a regular hash. Thus the OrderedHash object is less efficient than the HASH object. If you are trying to store a very large data set, I would recommend using the HASH instead of the OrderedHash object.

    David
    -HGS

    Niklas Keck



    New Member


    Posts:15
    New Member


    --
    31 Jan 2018 03:10 AM
    Dear David,
    thank you for the reply.
    I understand that an OrderedHash has to do additional work when retrieving its values / keys.
    The only thing that I do not understand is, why there is an exponential growth in work with a growing number of items for an OrderedHash as opposed to a linear growth for an ordinary Hash.
    I would be totally happy with an OrderedHash being lets say 10 times slower than a Hash but with any number of items.
    If this is not fixable, it would be great to have a note about this in the documentation.

    Have a great day
    Niklas



    Anyway, probably need to refacture my code than

    You are not authorized to post a reply.