The IMSL_SORTDATA function sorts observations by specified keys, with option to tally cases into a multiway frequency table.

This routine requires an IDL Advanced Math and Stats license. For more information, contact your sales or technical support representative.

The IMSL_SORTDATA function can perform both a key sort and/or tabulation of frequencies into a multiway frequency table.

Sorting

The IMSL_SORTDATA function sorts the rows of real matrix x using particular columns in x as the keys. The sort is algebraic with the first key as the most significant, the second key as the next most significant, etc. When x is sorted in ascending order, the resulting sorted array is such that the following is true:

  • For i = 0, 1, ..., N_ELEMENTS (x(*, 0)) – 2, x(1, INDICES_KEYS(0)) ≤ x(i + 1, INDICES_KEYS(0))
  • For k = 1, ..., n_keys – 1, if x(1, INDICES_KEYS(j)) = x(i + 1, INDICES_KEYS(j)) for j = 0, 1, ..., k – 1, then x(1, INDICES_KEYS(j)) = x(i + 1, INDICES_KEYS(k)) The observations also can be sorted in descending order

The rows of x containing the missing value code NaN in at least one of the specified columns are considered as an additional group. These rows are moved to the end of the sorted x.

The sorting algorithm is based on a quicksort method given by Singleton (1969) with modifications by Griffin and Redish (1970) and Petro (1970).

Frequency Tabulation

The IMSL_SORTDATA function determines the distinct values in multivariate data and computes frequencies for the data. This function accepts the data in the matrix x but performs computations only for the variables (columns) in the first n_keys columns of x (Exception: see optional the keyword INDICES_KEYS). In general, the variables for which frequencies should be computed are discrete; they should take on a relatively small number of different values. Variables that are continuous can be grouped first. The IMSL_FREQTABLE function can be used to group variables and determine the frequencies of groups.

When the TABLE_N, TABLE_VALUES, and TABLE_BAL keywords are specified, IMSL_SORTDATA fills the vector TABLE_VALUES with the unique values of the variables and tallies the number of unique values of each variable in the vector TABLE_BAL. Each combination of one value from each variable forms a cell in a multiway table. The frequencies of these cells are entered in TABLE_BAL so that the first variable cycles through its values exactly once and the last variable cycles through its values most rapidly. Some cells cannot correspond to any observations in the data; in other words, “missing cells” are included in the TABLE_BAL table and have a value of zero.

When N_LIST_CELLS, LIST_CELLS, and TABLE_UNBAL are specified, the frequency of each cell is entered in TABLE_UNBAL so that the first variable cycles through its values exactly once and the last variable cycles through its values most rapidly. All cells have a frequency of at least 1, i.e., there is no “missing cell.” The array LIST_CELLS can be considered “parallel” to TABLE_UNBAL because row i of LIST_CELLS is the set of n_keys values that describes the cell for which row i of TABLE_UNBAL contains the corresponding frequency.

Examples


Example 1

The rows of a 10 x 3 matrix x are sorted in ascending order using Columns 0 and 1 as the keys. There are two missing values (NaNs) in the keys. The observations containing these values are moved to the end of the sorted array.

f = IMSL_MACHINE(/Float)
c0 =[1.0, 2.0, 1.0, 1.0, 2.0, 1.0, f.NaN, 1.0, 2.0, 1.0]
c1 =[1.0, 1.0, 1.0, 1.0, f.NaN, 2.0, 2.0, 1.0, 2.0, 1.0]
c2 =[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0]
x = [ [c0], [c1], [c2] ]
PM, x, Title = 'Unsorted Matrix'
Unsorted Matrix
1.00000    1.00000    1.00000
2.00000    1.00000    2.00000
1.00000    1.00000    3.00000
1.00000    1.00000    4.00000
2.00000        NaN    5.00000
1.00000    2.00000    6.00000
    NaN    2.00000    7.00000
1.00000    1.00000    8.00000
2.00000    2.00000    9.00000
1.00000    1.00000    9.00000
PM, IMSL_SORTDATA(x, 2), Title = 'Sorted Matrix'
Sorted Matrix:
1.00000    1.00000    1.00000
1.00000    1.00000    9.00000
1.00000    1.00000    3.00000
1.00000    1.00000    4.00000
1.00000    1.00000    8.00000
1.00000    2.00000    6.00000
2.00000    1.00000    2.00000
2.00000    2.00000    9.00000
    NaN    2.00000    7.00000
2.00000        NaN    5.00000

Example 2

This example uses the same data as the previous example. The permutation of the rows is output using the keyword Permutation.

f = IMSL_MACHINE(/Float)
c0 =[1.0, 2.0, 1.0, 1.0, 2.0, 1.0, f.NaN, 1.0, 2.0, 1.0]
c1 =[1.0, 1.0, 1.0, 1.0, f.NaN, 2.0, 2.0, 1.0, 2.0, 1.0]
c2 =[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0]
; Fill up a matrix, including some missing values.
x = [ [c0], [c1], [c2] ]
PM, x, Title = 'Unsorted Matrix'
; Output the unsorted matrix.
Unsorted Matrix
1.00000    1.00000    1.0000
2.00000    1.00000    2.00000
1.00000    1.00000    3.00000
1.00000    1.00000    4.00000
2.00000        NaN    5.00000
1.00000    2.00000    6.00000
    NaN    2.00000    7.00000
1.00000    1.00000    8.00000
2.00000    2.00000    9.00000
1.00000    1.00000    9.00000
y = IMSL_SORTDATA(x, 2, Permutation = permutation)
; Use IMSL_SORTDATA to sort x.
PM, y, Title = 'Sorted Matrix:'
Sorted Matrix:
1.00000    1.00000    1.00000
1.00000    1.00000    9.00000
1.00000    1.00000    3.00000
1.00000    1.00000    4.00000
1.00000    1.00000    8.00000
1.00000    2.00000    6.00000
2.00000    1.00000    2.00000
2.00000    2.00000    9.00000
    NaN    2.00000    7.00000
2.00000        NaN    5.00000
PM, permutation, Title = 'Permutation Matrix:'
; Print the permutation vector.
Permutation Matrix:
0
9
2
3
7
5
1
8
6
4
z = x(permutation, *)
PM, z, Title = 'Sorted Matrix'
; Use the permutation vector to sort the data.
Sorted Matrix
1.00000    1.00000    1.00000
1.00000    1.00000    9.00000
1.00000    1.00000    3.00000
1.00000    1.00000    4.00000
1.00000    1.00000    8.00000
1.00000    2.00000    6.00000
2.00000    1.00000    2.00000
2.00000    2.00000    9.00000
    NaN    2.00000    7.00000
2.00000        NaN    5.00000

Syntax


Result = IMSL_SORTDATA(X, N_Keys [, ASCENDING=value] [, DESCENDING=value] [, /DOUBLE] [, FREQUENCIES=array] [, INDICES_KEYS=array] [, LIST_CELLS=variable] [, N_CELLS=variable] [, N_LIST_CELLS=variable] [, PERMUTATION=variable] [, TABLE_BAL=variable] [, TABLE_N=variable] [, TABLE_VALUES=variable] [, TABLE_UNBAL=variable])

Return Value


The sorted array.

Arguments


N_Keys

Number of columns of x on which to sort. The first N_Keys columns of X are used as the sorting keys. (Exception: See INDICES_KEYS).

X

One- or two-dimensional array containing the observations to be sorted.

Keywords


ASCENDING (optional)

If present and nonzero, the sort is in ascending order. (Default) The keywords ASCENDING and DESCENDING cannot be used together.

DESCENDING (optional)

If present and nonzero, the sort is in descending order. The keywords ASCENDING and DESCENDING cannot be used together.

DOUBLE (optional)

If present and nonzero, double precision is used.

FREQUENCIES (optional)

One-dimensional array containing the frequency for each observation in x. Default: (*) = 1

INDICES_KEYS (optional)

One-dimensional array of length n_keys giving the column numbers of x which are to be used in the sort. Default: (*) = 0, 1, ..., n_keys – 1

LIST_CELLS (optional)

Named variable into which the two-dimensional array of length N_LIST_CELLS x n_keys containing, for each row, a list of the levels of n_keys corresponding classification variables that describe a cell, is stored. The keywords N_LIST_CELLS, LIST_CELLS, and TABLE_UNBAL must be used together.

N_CELLS (optional)

Named variable into which the a one-dimensional array containing the number of observations per group is stored. A group contains observations (rows) in x that are equal with respect to the method of comparison. The first N_CELLS (0) rows of the sorted x are in group number 1. The next N_CELLS (1) rows of the sorted x are in group number 2, etc. The last N_Cells(N_ELEMENTS(N_Cells) – 1) rows of the sorted x are in group number N_ELEMENTS(N_Cells).

N_LIST_CELLS (optional)

Named variable into which the number of nonempty cells is stored. The keywords N_LIST_CELLS, LIST_CELLS, and TABLE_UNBAL must be used together.

PERMUTATION (optional)

Named variable into which a one-dimensional array containing the rearrangement (permutation) of the observations (rows) is stored.

TABLE_BAL (optional)

Named variable into which an array of length

Table_N(0) + Table_N(1) + ... + Table_N(N_Keys – 1), containing the frequencies in the cells of the table to be fit, is stored. Empty cells are included in TABLE_BAL, and each element of TABLE_BAL is nonnegative. The cells of TABLE_BAL are sequenced so that the first variable cycles through its Table_N(0) categories one time, the second variable cycles through its Table_N(1) categories Table_N(0) times, the third variable cycles through its Table_N(2) categories Table_N(0) x Table_N(1) times, etc., up to the N_Keys-th variable, which cycles through its Table_N(N_Keys – 1) categories:

Table_N(0) + Table_N(1) + Table_N(N_Keys – 2) times. The keywords TABLE_N, TABLE_VALUES, and TABLE_BAL must be used together.

TABLE_N (optional)

Named variable into which a one-dimensional array of length n_keys, containing in its i-th element (i = 0, 1, ..., (n_keys – 1)) the number of levels or categories of the i-th classification variable (column), is stored. The keywords TABLE_N, TABLE_VALUES, and TABLE_BAL must be used together.

TABLE_VALUES (optional)

Named variable into which an array of length

Table_N(0) + Table_N(1) + ... + Table_N(n_keys – 1), containing the values of the classification variables, is stored. The first Table_N(0) elements of TABLE_VALUES contain the values for the first classification variable. The next Table_N(1) contain the values for the second variable. The last Table_N(N_Keys – 1) positions contain the values for the last classification variable. The keywords TABLE_N, TABLE_VALUES, and TABLE_BAL must be used together.

TABLE_UNBAL (optional)

Named variable into which the one-dimensional array of length N_LIST_CELLS containing the frequency for each cell is stored. The keywords N_LIST_CELLS, LIST_CELLS, and TABLE_UNBAL must be used together.

Version History


6.4

Introduced