Reliable Scalar vs Array Determination

Anonym Thursday, March 31, 2016

In the past I’ve blogged about identifying undefined variablesvs !NULL and the perils of overriding IDL_Object methods, I’m here today in a related vein to talk about how to identify if a variable is a scalar or an array.

One may wonder why you would care, but there are certain situations where a scalar value and a 1-element array behave the same, and there are other situations where they don’t. A very simple case is with the comparison operators and WHERE clauses:

IDL> a = IndGen(10)
IDL> Where(a GT 5)
6 7 8 9
IDL> Where(a GT [5])
-1

When I use a scalar threshold in the GT comparison, I get the expected results. If that threshold is turned into a 1-element array, then I get -1 signaling no good values. If we look at what the comparison expression inside the WHERE clause returns, we’ll see why:

IDL> a GT 5
0 0 0 0 0 0 1 1 1 1
IDL> a GT [5]
0

The scalar comparison returns an array of Boolean values the same dimensions as the “a” variable, while the array comparison returns a scalar. This is because the GT comparison operator operates on subsets of the operands based on the smaller dimension:

IDL> a GT [0]
   0
IDL> a GT [-1]
   1
IDL> a GT [-1, 0, 1]
   1   1   1
IDL> a GT [-1, 0, 2]
   1   1   0

The GT operator is vectorized, and performs pairwise comparisons on the values at each corresponding index, up to the smaller maximum index. This behavior is easy to identify when we use numeric literals, but if we were using a variable instead then you would have to inspect that variable to get the desired behavior. One common source of 1-element arrays in my work is the WHERE clause itself – the -1 failure return value is a scalar, but a successful return is always an array, be it one element or many. A simple solution to this is to always index your variables at index 0 to make them scalar, but if you truly want an array then you have to be careful where you use that index to “scalarify” the variable.

IDL> a = 5
IDL> b = [a]
IDL> help, a, b, a[0], b[0]
A               INT       =        5
B               INT       = Array[1]
<Expression>    INT       =        5
<Expression>    INT       =        5

List and Hash Woes

Another common mistake for scalar vs 1-element array differences is when working with List and Hash objects. When you use the square bracket index operator with a scalar value, it returns the element that corresponds to that index, but if you use an array, it returns a new List or Hash containing copies of that appropriate element(s):

IDL> l = List(1,2,3,4)
IDL> help, l[2]
<Expression>    INT       =        3
IDL> help, l[[2]]
<Expression>    LIST <ID=10 NELEMENTS=1>
IDL> print, l[[2]]
       3

This makes a big difference, and is easy to do if you use a WHERE clause or the List or Hash Where() method, as they always return arrays, even in the 1-element case. This behavior is documented, towards the bottom of the List and Hash help pages, about how to set or get a single element versus many.

Identifying a Scalar

So now that we have covered why it is important to know when a given variable is a scalar or an array, the question arises of how to reliably do this. The classic answer is to test the return value of N_ELEMENTS() or to call ISA(/SCALAR), but there are times when this will yield incorrect results. With the introduction of IDL_Object in 8.0, classes can inherit and override the _overloadSize()method, which is just what List and Hash objects do. When you call N_ELEMENTS() or even SIZE(/DIMENSIONS) on a List or Hash, you get back the number of elements in the object, not the number of objects in your variable.

IDL> l = List(1,2,3,4)
IDL> N_Elements(l)
           4
IDL> Size(l, /DIMENSIONS)
           4
IDL> Size(l, /N_DIMENSIONS)
           1
IDL> ISA(l, /SCALAR)
   0

This is even more confusing with an empty List or Hash:

IDL> l = List()
IDL> N_Elements(l)
           0
IDL> Size(l, /DIMENSIONS)
           0
IDL> Size(l, /N_DIMENSIONS)
           0
IDL> ISA(l, /SCALAR)
   1

As I showed in the blog about IDL_Object caveats, you can scope a function call to the base class to get the expected result, but it is awkward looking:

IDL> l.IDL_Object::_overloadSize()
1

I’ve had to deal with this over and over again, and the best solution I’ve come up with relies on the fact that the Obj_Valid() function will return a Boolean value with the same dimensions as the input argument. So if I pass in a scalar object a scalar Boolean is returned, a 1-element array of objects returns a 1-element array of Booleans, and an N-dimensional array of objects returns an N-dimensional array of Booleans. I can then take the return of Obj_Valid() and pass it into ISA(/SCALAR). Thanks to a ternary operator, this can be coded up as a one line function:

function isScalar, value
compile_opt idl2

return, ISA((Size(value, /TYPE) eq 11) $
? Obj_Valid(value) : value, $
/SCALAR)
end

I have tested this with scalar, 1-element, and multi-element arrays of numbers, strings, pointers (null and valid), List and Hash objects, and normal objects (null and valid). The only time you might get an unexpected result from this function is with a “scalar struct” variable, which is a construct that IDL doesn’t support. Even if you create a single anonymous struct with only one tag, it is considered a 1-element array:

IDL> s = { foo : 1 }
IDL> N_Elements(s)
           1
IDL> Size(s, /DIMENSIONS)
           1
IDL> Size(s, /N_DIMENSIONS)
           1
IDL> ISA(s, /SCALAR)
   0

So if you need to differentiate between a single struct and an array of many structs, then testing the N_ELEMENTS() return is the way to go, and could be added to the isScalar() function.

Note: I could also have used Obj_ISA(), but timing tests revealed that Obj_Valid() is faster.

UAS: A lot of buzz, but still working on the take off Plain as Day