CMSET_OP Name
CMSET_OP
Author
Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
craigm@lheamail.gsfc.nasa.gov Purpose
Performs an AND, OR, or XOR operation between two sets
Calling Sequence
SET = CMSET_OP(A, OP, B) Description
SET_OP performs three common operations between two sets. The
three supported functions of OP are:
OP Meaning
'AND' - to find the intersection of A and B;
'OR' - to find the union of A and B;
'XOR' - to find the those elements who are members of A or B
but not both;
Sets as defined here are one dimensional arrays composed of
numeric or string types. Comparisons of equality between elements
are done using the IDL EQ operator.
The complements of either set can be taken as well, by using the
NOT1 and NOT2 keywords. For example, it may be desireable to find
the elements in A but not B, or B but not A (they are different!).
The following IDL expressions achieve each of those effects:
SET = CMSET_OP(A, 'AND', /NOT2, B) ; A but not B
SET = CMSET_OP(/NOT1, A, 'AND', B) ; B but not A
Note the distinction between NOT1 and NOT2. NOT1 refers to the
first set (A) and NOT2 refers to the second (B). Their ordered
placement in the calling sequence is entirely optional, but the
above ordering makes the logical meaning explicit.
NOT1 and NOT2 can only be set for the 'AND' operator, and never
simultaneously. This is because the results of an operation with
'OR' or 'XOR' and any combination of NOTs -- or with 'AND' and
both NOTs -- formally cannot produce a defined result.
The implementation depends on the type of operands. For integer
types, a fast technique using HISTOGRAM is used. However, this
algorithm becomes inefficient when the dynamic range in the data
is large. For those cases, and for other data types, a technique
based on SORT() is used. Thus the compute time should scale
roughly as (A+B)*ALOG(A+B) or better, rather than (A*B) for the
brute force approach. For large arrays this is a significant
benefit. Inputs
A, B - the two sets to be operated on. A one dimensional array of
either numeric or string type. A and B must be of the same
type. Empty sets are permitted, and are either represented
as an undefined variable, or by setting EMPTY1 or EMPTY2.
OP - a string, the operation to be performed. Must be one of
'AND', 'OR' or 'XOR' (lower or mixed case is permitted).
Other operations will cause an error message to be produced.
Keywords
NOT1, NOT2 - if set and OP is 'AND', then the complement of A (for
NOT1) or B (for NOT2) will be used in the operation.
NOT1 and NOT2 cannot be set simultaneously.
EMPTY1, EMPTY2 - if set, then A (for EMPTY1) or B (for EMPTY2) are
assumed to be the empty set. The actual values
passed as A or B are then ignored.
INDEX - if set, then return a list of indices instead of the array
values themselves. The "slower" set operations are always
performed in this case.
The indices refer to the *combined* array [A,B]. To
clarify, in the following call: I = CMSET_OP(..., /INDEX);
returned values from 0 to NA-1 refer to A[I], and values
from NA to NA+NB-1 refer to B[I-NA].
COUNT - upon return, the number of elements in the result set.
This is only important when the result set is the empty
set, in which case COUNT is set to zero.
Returns
The resulting set as a one-dimensional array. The set may be
represented by either an array of data values (default), or an
array of indices (if INDEX is set). Duplicate elements, if any,
are removed, and element order may not be preserved.
The empty set is represented as a return value of -1L, and COUNT
is set to zero. Note that the only way to recognize the empty set
is to examine COUNT. See Also
SET_UTILS.PRO by RSI
Modification History
Written, CM, 23 Feb 2000
Added empty set capability, CM, 25 Feb 2000
Documentation clarification, CM 02 Mar 2000
Incompatible but more consistent reworking of EMPTY keywords, CM,
04 Mar 2000
Minor documentation clarifications, CM, 26 Mar 2000
Corrected bug in empty_arg special case, CM 06 Apr 2000
Add INDEX keyword, CM 31 Jul 2000
Clarify INDEX keyword documentation, CM 06 Sep 2000
Made INDEX keyword always force SLOW_SET_OP, CM 06 Sep 2000
Added CMSET_OP_UNIQ, and ability to select FIRST_UNIQUE or
LAST_UNIQUE values, CM, 18 Sep 2000
Removed FIRST_UNIQUE and LAST_UNIQUE, and streamlined
CMSET_OP_UNIQ until problems with SORT can be understood, CM, 20
Sep 2000 (thanks to Ben Tupper)
Still trying to get documentation of INDEX and NOT right, CM, 28
Sep 2000 (no code changes)
Correct bug for AND case, when input sets A and B each only have
one unique value, and the values are equal. CM, 04 Mar 2004
(thanks to James B. jbattat at cfa dot harvard dot edu)
Add support for the cases where the input data types are mixed,
but still compatible; also, attempt to return the same data
type that was passed in; CM, 05 Feb 2005
Fix bug in type checking (thanks to "marit"), CM, 10 Dec 2005
Work around a stupidity in the built-in IDL HISTOGRAM routine,
which tries to "help" you by restricting the MIN/MAX to the
range of the input variable (thanks to Will Maddox), CM, 16 Jan 2006