## Name

CMSET_OP

## Author

Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770

craigm@lheamail.gsfc.nasa.gov

## Purpose

Performs an AND, OR, or XOR operation between two sets

## Calling Sequence

SET = CMSET_OP(A, OP, B)

## Description

SET_OP performs three common operations between two sets. The

three supported functions of OP are:

OP Meaning

'AND' - to find the intersection of A and B;

'OR' - to find the union of A and B;

'XOR' - to find the those elements who are members of A or B

but not both;

Sets as defined here are one dimensional arrays composed of

numeric or string types. Comparisons of equality between elements

are done using the IDL EQ operator.

The complements of either set can be taken as well, by using the

NOT1 and NOT2 keywords. For example, it may be desireable to find

the elements in A but not B, or B but not A (they are different!).

The following IDL expressions achieve each of those effects:

SET = CMSET_OP(A, 'AND', /NOT2, B) ; A but not B

SET = CMSET_OP(/NOT1, A, 'AND', B) ; B but not A

Note the distinction between NOT1 and NOT2. NOT1 refers to the

first set (A) and NOT2 refers to the second (B). Their ordered

placement in the calling sequence is entirely optional, but the

above ordering makes the logical meaning explicit.

NOT1 and NOT2 can only be set for the 'AND' operator, and never

simultaneously. This is because the results of an operation with

'OR' or 'XOR' and any combination of NOTs -- or with 'AND' and

both NOTs -- formally cannot produce a defined result.

The implementation depends on the type of operands. For integer

types, a fast technique using HISTOGRAM is used. However, this

algorithm becomes inefficient when the dynamic range in the data

is large. For those cases, and for other data types, a technique

based on SORT() is used. Thus the compute time should scale

roughly as (A+B)*ALOG(A+B) or better, rather than (A*B) for the

brute force approach. For large arrays this is a significant

benefit.

## Inputs

A, B - the two sets to be operated on. A one dimensional array of

either numeric or string type. A and B must be of the same

type. Empty sets are permitted, and are either represented

as an undefined variable, or by setting EMPTY1 or EMPTY2.

OP - a string, the operation to be performed. Must be one of

'AND', 'OR' or 'XOR' (lower or mixed case is permitted).

Other operations will cause an error message to be produced.

## Keywords

NOT1, NOT2 - if set and OP is 'AND', then the complement of A (for

NOT1) or B (for NOT2) will be used in the operation.

NOT1 and NOT2 cannot be set simultaneously.

EMPTY1, EMPTY2 - if set, then A (for EMPTY1) or B (for EMPTY2) are

assumed to be the empty set. The actual values

passed as A or B are then ignored.

INDEX - if set, then return a list of indices instead of the array

values themselves. The "slower" set operations are always

performed in this case.

The indices refer to the *combined* array [A,B]. To

clarify, in the following call: I = CMSET_OP(..., /INDEX);

returned values from 0 to NA-1 refer to A[I], and values

from NA to NA+NB-1 refer to B[I-NA].

COUNT - upon return, the number of elements in the result set.

This is only important when the result set is the empty

set, in which case COUNT is set to zero.

## Returns

The resulting set as a one-dimensional array. The set may be

represented by either an array of data values (default), or an

array of indices (if INDEX is set). Duplicate elements, if any,

are removed, and element order may not be preserved.

The empty set is represented as a return value of -1L, and COUNT

is set to zero. Note that the only way to recognize the empty set

is to examine COUNT.

## See Also

SET_UTILS.PRO by RSI

## Modification History

Written, CM, 23 Feb 2000

Added empty set capability, CM, 25 Feb 2000

Documentation clarification, CM 02 Mar 2000

Incompatible but more consistent reworking of EMPTY keywords, CM,

04 Mar 2000

Minor documentation clarifications, CM, 26 Mar 2000

Corrected bug in empty_arg special case, CM 06 Apr 2000

Add INDEX keyword, CM 31 Jul 2000

Clarify INDEX keyword documentation, CM 06 Sep 2000

Made INDEX keyword always force SLOW_SET_OP, CM 06 Sep 2000

Added CMSET_OP_UNIQ, and ability to select FIRST_UNIQUE or

LAST_UNIQUE values, CM, 18 Sep 2000

Removed FIRST_UNIQUE and LAST_UNIQUE, and streamlined

CMSET_OP_UNIQ until problems with SORT can be understood, CM, 20

Sep 2000 (thanks to Ben Tupper)

Still trying to get documentation of INDEX and NOT right, CM, 28

Sep 2000 (no code changes)

Correct bug for AND case, when input sets A and B each only have

one unique value, and the values are equal. CM, 04 Mar 2004

(thanks to James B. jbattat at cfa dot harvard dot edu)

Add support for the cases where the input data types are mixed,

but still compatible; also, attempt to return the same data

type that was passed in; CM, 05 Feb 2005

Fix bug in type checking (thanks to "marit"), CM, 10 Dec 2005

Work around a stupidity in the built-in IDL HISTOGRAM routine,

which tries to "help" you by restricting the MIN/MAX to the

range of the input variable (thanks to Will Maddox), CM, 16 Jan 2006