Reliable Scalar vs Array Determination
In the past I’ve blogged about identifying undefined variablesvs !NULL
and the perils of overriding IDL_Object methods, I’m here today in a related vein to talk about how to identify if a variable
is a scalar or an array.
One may wonder why you would care, but there are certain situations
where a scalar value and a 1-element array behave the same, and there are other
situations where they don’t. A very simple case is with the comparison
operators and WHERE clauses:
IDL> a = IndGen(10)
IDL> Where(a GT 5)
6 7 8 9
IDL> Where(a GT [5])
-1
When I use a scalar threshold in the GT comparison, I get
the expected results. If that threshold is turned into a 1-element array, then
I get -1 signaling no good values. If we look at what the comparison
expression inside the WHERE clause returns, we’ll see why:
IDL> a GT 5
0 0 0 0 0 0 1 1 1 1
IDL> a GT [5]
0
The scalar comparison returns an array of Boolean values the
same dimensions as the “a” variable, while the array comparison returns a
scalar. This is because the GT comparison operator operates on subsets of the
operands based on the smaller dimension:
IDL> a GT [0]
0
IDL> a GT [-1]
1
IDL> a GT [-1, 0, 1]
1 1 1
IDL> a GT [-1, 0, 2]
1 1 0
The GT operator is vectorized, and performs pairwise
comparisons on the values at each corresponding index, up to the smaller
maximum index. This behavior is easy to identify when we use numeric literals,
but if we were using a variable instead then you would have to inspect that
variable to get the desired behavior. One common source of 1-element arrays in
my work is the WHERE clause itself – the -1 failure return value is a scalar,
but a successful return is always an array, be it one element or many. A
simple solution to this is to always index your variables at index 0 to make
them scalar, but if you truly want an array then you have to be careful where
you use that index to “scalarify” the variable.
IDL> a = 5
IDL> b = [a]
IDL> help, a, b, a[0], b[0]
A INT = 5
B INT = Array[1]
<Expression> INT = 5
<Expression> INT = 5
List and Hash Woes
Another common mistake for scalar vs 1-element array
differences is when working with List and Hash objects. When you use the
square bracket index operator with a scalar value, it returns the element that
corresponds to that index, but if you use an array, it returns a new List or
Hash containing copies of that appropriate element(s):
IDL> l = List(1,2,3,4)
IDL> help, l[2]
<Expression> INT = 3
IDL> help, l[[2]]
<Expression> LIST <ID=10 NELEMENTS=1>
IDL> print, l[[2]]
3
This makes a big difference, and is easy to do if you use a
WHERE clause or the List or Hash Where() method, as they always return arrays,
even in the 1-element case. This behavior is documented, towards the bottom of
the List
and Hash help pages, about how to set or get a single element versus many.
Identifying a Scalar
So now that we have covered why it is important to know when
a given variable is a scalar or an array, the question arises of how to
reliably do this. The classic answer is to test the return value of N_ELEMENTS()
or to call ISA(/SCALAR), but there are times when this will yield incorrect
results. With the introduction of IDL_Object in 8.0, classes can inherit and
override the _overloadSize()method, which is just what List and Hash objects do. When you call
N_ELEMENTS() or even SIZE(/DIMENSIONS) on a List or Hash, you get back the
number of elements in the object, not the number of objects in your variable.
IDL> l = List(1,2,3,4)
IDL> N_Elements(l)
4
IDL> Size(l, /DIMENSIONS)
4
IDL> Size(l, /N_DIMENSIONS)
1
IDL> ISA(l, /SCALAR)
0
This is even more confusing with an empty List or Hash:
IDL> l = List()
IDL> N_Elements(l)
0
IDL> Size(l, /DIMENSIONS)
0
IDL> Size(l, /N_DIMENSIONS)
0
IDL> ISA(l, /SCALAR)
1
As I showed in the blog about IDL_Object caveats, you can
scope a function call to the base class to get the expected result, but it is
awkward looking:
IDL> l.IDL_Object::_overloadSize()
1
I’ve had to deal with this over and over again, and the best
solution I’ve come up with relies on the fact that the Obj_Valid() function
will return a Boolean value with the same dimensions as the input argument. So
if I pass in a scalar object a scalar Boolean is returned, a 1-element array
of objects returns a 1-element array of Booleans, and an N-dimensional array of
objects returns an N-dimensional array of Booleans. I can then take the return
of Obj_Valid() and pass it into ISA(/SCALAR). Thanks to a ternary operator, this can be coded up as a one line
function:
function isScalar, value
compile_opt idl2
return, ISA((Size(value, /TYPE) eq 11) $
? Obj_Valid(value) : value,
$
/SCALAR)
end
I have tested this with scalar, 1-element, and multi-element
arrays of numbers, strings, pointers (null and valid), List and Hash objects,
and normal objects (null and valid). The only time you might get an unexpected
result from this function is with a “scalar struct” variable, which is a
construct that IDL doesn’t support. Even if you create a single anonymous
struct with only one tag, it is considered a 1-element array:
IDL> s = { foo : 1 }
IDL> N_Elements(s)
1
IDL> Size(s, /DIMENSIONS)
1
IDL> Size(s, /N_DIMENSIONS)
1
IDL> ISA(s, /SCALAR)
0
So if you need to differentiate between a single struct and
an array of many structs, then testing the N_ELEMENTS() return is the way to
go, and could be added to the isScalar() function.
Note: I could also have used Obj_ISA(), but timing tests
revealed that Obj_Valid() is faster.