On the dangers of keyword inheritance

Anonym Thursday, December 1, 2011

IDL gives you the tools to write highly efficient code very quickly. It also gives you many, many opportunities to write very, very bad code. Here's one problem:

"I love _EXTRA. I use it everywhere in my code to cut down on my typing, but my applications run more slowly and sometimes I run out of memory for reasons that I don't understand. What's up with that?"

Imagine a chain of routines. Here, A calls B, then B calls C.

pro C, FOO = foo, BAR = bar
   HELP, foo, bar
end 

pro B, Z = z, _EXTRA =_e2
   HELP, z
   C, _EXTRA = _e2
end 

pro A, _EXTRA = _e1
   B, _EXTRA = _e1
end

(NB: I've left out any error handling to keep the code simple. Remember to execute RETALL if you encounter any runtime errors.) The routine A may be, say, a data ingest routine, B may be a data conditioning routine, and C may be a display routine. As my chain of procedures is called, each routine pulls off the keywords it needs then passes anything unknown down to the next routine in the chain. (NB: We're ignoring the case where identical keywords may appear in multiple routines; that's a bookkeeping nightmare left for another day.) In theory, I've made my life a lot easier by using _EXTRA to package up all unneeded keywords in one routine and passing them as a bundle down to the next routine in the chain. There's a lot less typing than if I'm explicit about including all the keywords in B that are used by C, or all those in A that are used by B and C. Look how snappy this is! I can use the keywords in any combination to affect the behavior of routines way down in my call chain:

IDL> A
IDL> A, FOO = 20
IDL> A, Z = 10, FOO = 20, BAR = 3

Wasn't that convenient? IDL's the best thing EVER! Let's try something a little different now. What if FOO is a big array? This could, for example, be a VERT_COLORS array eventually sent to the SURFACE function or the AUXDATA keyword to IDLgrTessellator. (NB: The size of the array in the following example that will test your system's limits will be highly dependent on your host's configuration, but the takeaway message will be common across all platforms. If this array size exhausts your memory, try something smaller.)

IDL> foo = FLTARR(10000, 10000)

First, let's ignore the routines A and B for the moment and just pass the variable ‘foo’ to C instead of A.

IDL> C, FOO = foo

Okee dokee! No issues. Let's pass this array as a keyword that'll be passed silently through B but will be picked up in C.

IDL> A, FOO = foo

Did you run out of memory? If not, did you see an unexpected delay before the command prompt returned? Do you now think IDL is the worst thing ever? Since the routines A and B are essentially no-ops in this case, on the surface it seems like we're basically just calling

IDL> C, FOO = foo

That call alone is not slow and doesn't chew up our memory, so what's going on here? Let's throw a call to this new routine in each.

pro SHOWMEM
   routine = (SCOPE_TRACEBACK(/STRUCTURE))[-2].Routine
   PRINT, 'memory use at start of routine ' + $
      routine + ' = ' + STRTRIM((MEMORY())[0], 2)
end

This little utility is going to show us how much memory IDL has allocated when it's called from another routine, printing out the name of the routine as well. Add a call to this routine at the start of A, B, and C:

pro C, FOO = foo, BAR = bar
   SHOWMEM
   HELP, foo, bar
end 

pro B, Z = z, _EXTRA =_e2
   SHOWMEM
   HELP, z
   C, _EXTRA = _e2
end 

pro A, _EXTRA = _e1
   SHOWMEM   B, _EXTRA = _e1
end

Call A again:

IDL> A, FOO = foo

Can you see what happened here? With each call, the memory allocated essentially increased by the size of our variable named ‘foo’. Zoinks! This isn't a coincidence. Why is this? _EXTRA isn't just a keyword. It's a mechanism that copies the contents of unconsumed keywords in a procedure or function call into an anonymous structure. Another way to describe this is to say that _EXTRA passes copies of data by value. Add some HELP calls to the routines to see the contents of our _EXTRA variable at each step.

pro C, FOO = foo, BAR = bar
   HELP, foo
end 

pro B, Z = z, _EXTRA =_e2
   HELP, /STRUCTURE, _e2
   C, _EXTRA = _e2
end 

pro A, _EXTRA = _e1
   HELP, /STRUCTURE, _e1
   B, _EXTRA = _e1
end

This time, let's execute it with an argument to the Z keyword that will be consumed by the routine B.

IDL> A, Z = 5, FOO = foo

First, notice that the contents of our _EXTRA variables, ‘_e1’ and ‘_e2’, are different in the scope of each routine. In A, both Z and FOO have been copied to the structure ‘_e1’. When routine B is called from A, before it executes any instructions in the procedure, IDL's interpreter extracts the contents from the _EXTRA structure into named keywords it knows about. It then packages the remaining keywords into a new _EXTRA keyword by copying the values before executing the code in the routine. The important thing to notice is that when we're in routine B with ‘_e2’, the variable ‘_e1’ is still in scope in routine B. Therefore, ‘_e1.FOO’ is one copy of ‘foo’ in the $MAIN$ scope and ‘_e2.FOO’ is another copy of ‘foo.’ When you finally make it into C, the local keyword variable ‘foo’ is yet another copy. A solution that is useful in many cases is to simply replace _EXTRA in the procedure definitions with _REF_EXTRA, the "by-reference" mechanism for passing keywords.

pro C, FOO = foo, BAR = bar
   SHOWMEM
   HELP, foo, bar
end 

pro B, Z = z, _REF_EXTRA =_e2
   SHOWMEM   C, _EXTRA = _e2
end 

pro A, _REF_EXTRA = _e1
   SHOWMEM   B, _EXTRA = _e1
end

Call A again:

IDL> A, FOO = foo

Notice that memory didn't increase in any egregious way between the procedure calls. Don't you think IDL is the best thing ever again? A general discussion on this topic is outside the scope of this blog post, but suffice it to say that _REF_EXTRA passes a list of variable names from one routine to the next rather than their contents. Via the _REF_EXTRA mechanism, IDL’s interpreter knows that it needs to look up one or more levels in the call stack for the actual data storage. You may find that you are limited in the extent to which _REF_EXTRA will save your hide, however, since not all of IDL's internal library routines are set up to use _REF_EXTRA. Also, since _REF_EXTRA is pass-by-reference, modifying the contents of variable ‘foo’ in C would be reflected in a change at the B, A, and $MAIN$ levels, which you may or may not intend. FOO is “read/write”. This would not be the case if the variable is passed via _EXTRA. Any changes to the copy of ‘foo’ in C would be discarded when the routine exits, in this case. FOO is “read only”.

A small, helpful routine The COMPILE_OPT statement; you should use it!