IMSL_REGRESSORS

The IMSL_REGRESSORS function generates regressors for a general linear model.

This routine requires an IDL Advanced Math and Stats license. For more information, contact your sales or technical support representative.

The IMSL_REGRESSORS function generates regressors for a general linear model from a data matrix. The data matrix can contain classification variables as well as continuous variables. Regressors for effects composed solely of continuous variables are generated as powers and crossproducts. Consider a data matrix containing continuous variables as Columns 3 and 4. The effect indices (3, 3) generate a regressor whose i-th value is the square of the i-th value in Column 3. The effect indices (3, 4) generates a regressor whose i-th value is the product of the i-th value in Column 3 with the i-th value in Column 4.

Regressors for an effect (source of variation) composed of a single classification variable are generated using indicator variables. Let the classification variable A take on values a₁, a₂, ..., a_n. From this classification variable, IMSL_REGRESSORS creates n indicator variables. For k = 1, 2, ..., n:

For each classification variable, another set of variables is created from the indicator variables. These new variables are called dummy variables. Dummy variables are generated from the indicator variables in one of three manners:

The dummies are the n indicator variables. (Default method)
The dummies are the first n – 1 indicator variables. (DUMMY_METHOD = 1)
The n – 1 dummies are defined in terms of the indicator variables so that for balanced data, the usual summation restrictions are imposed on the regression coefficients. (DUMMY_METHOD = 2)

In particular, for the default case, the dummy variables are A_k = I_k (k = 1, 2, ..., n). For DUMMY_METHOD = 1, the dummy variables are A_k = I_k (k = 1, 2, ..., 1). For DUMMY_METHOD = 2, the dummy variables are A_k = I_k (k = 1, 2, ..., n – 1). The regressors generated for an effect composed of a single-classification variable are the associated dummy variables.

Let m_j be the number of dummies generated for the j-th classification variable. Suppose there are two classification variables A and B with dummies:

and

The regressors generated for an effect composed of two classification variables A and B are:

More generally, the regressors generated for an effect composed of several classification variables and several continuous variables are given by the Kronecker products of variables, where the order of the variables is specified in INDICES_EFFECTS. Consider a data matrix containing classification variables in Columns 0 and 1 and continuous variables in Columns 2 and 3. Label these four columns A, B, X1, and X2.

The regressors generated by the effect indices (0, 1, 2, 2, 3) are:

Remarks

Let the data matrix x = (A, B, X₁), where A and B are classification variables and X₁ is a continuous variable. The model containing the effects A, B, AB, X₁, AX₁, BX₁, and ABX₁ is specified as follows (use optional keyword Indices_Effects):

N_Class = 2

N_Continuous = 1

VAR_EFFECTS = [1, 1, 2, 1, 2, 2, 3]

INDICES_EFFECTS = [0, 1, 0, 1, 2, 0, 2, 1, 2, 0, 1, 2]

For this model, suppose that variable A has two levels, A1 and A2, and that variable B has three levels, B1, B2, and B3. For each DUMMY_METHOD option, the regressors in their order of appearance in IMSL_REGRESSORS are given below

(Default): A₁, A₂, B₁, B₂, B₃, A₁ B₁, A₁ B₂, A₁ B₃, A₂ B₁, A₂ B₂,
A₂ B₃, X₁, A₁ X₁, A₂ X₁, B₁ X₁, B₂ X₁, B₃ X₁, A₁ B₁ X₁,
A₁ B₂ X₁, A₁ B₃ X₁, A₂ B₁ X₁, A₂ B₂ X₁, A₂ B₃ X₁
_1—A₁, B₁, B₂, A₁ B₁, A₁ B₂, X₁, A₁ X₁, B₁ X₁, B₂ X₁, —A₁ B₁ X₁, A₁ B₂ X₁
_2—A₁ – A₂, B₁ – B₃, B₂ – B₃, (A₁ – A₂) (B₁ – B₂), (A₁ – A₂) (B₂ – B₃), X₁, (A₁ – A₂) X₁, (B₁ – B₃) X₁, (B₂ – B₃) X₁, (A₁ – A₂) (B₁ – B₂) X₁, (A₁ – A₂) (B₂ – B₃) X₁

Within a group of regressors corresponding to an interaction effect, the indicator variables composing the regressors vary most rapidly for the last classification variable, next most rapidly for the next to last classification variable, etc.

By default, IMSL_REGRESSORS internally generates values for VAR_EFFECTS and INDICES_EFFECTS, which correspond to a first order model with NEF = N_Continuous + N_Class. The variables then are used to create the regressor variables. The effects are ordered such that the first effect corresponds to the first column of x, the second effect corresponds to the second column of x, etc. A second order model corresponding to the columns (variables) of x is generated if ORDER with ORDER = 2 is specified.

There are:

effects, where NVAR = N_Continuous + N_Class. The first NVAR effects correspond to the columns of x, such that the first effect corresponds to the first column of x, the second effect corresponds to the second column of x, ..., the NVAR-th effect corresponds to the NVAR-th column of x (i.e., x (NVAR – 1)). The next N_Continuous effects correspond to squares of the continuous variables. The last:

effects correspond to the two-variable interactions.

Let the data matrix x = (A, B, X₁), where A and B are classification variables and X₁ is a continuous variable. The effects generated and order of appearance is A, B, X₁, X₂ , AB, AX₁, BX₁.
Let the data matrix x = (A, X₁, X₂), where A is a classification variable and X₁ and X₂ are continuous variables. The effects generated and order of appearance is A, X₁, X₂, X²₁, X²₂, AX₁, AX₂, X₁X₂.
Let the data matrix x = (X₁, A, X₂) (see CLASS_COLUMNS), where A is a classification variable and X₁ and X₂ are continuous variables. The effects generated and order of appearance is X₁, A, X₂, X²₁, X²₂, X₁A, X₁X₂, AX₂.

Higher-order and more complicated models can be specified using INDICES_EFFECTS.

Examples

Example 1

In the following example, there are two classification variables, A and B, with two and three values, respectively. Regressors for a one-way model (the default model order) are generated using the ALL dummy method (the default dummy method). The five regressors generated are A₁, A₂, B₁, B₂, B₃.

labels = ['A1', 'A2', 'B1', 'B2', 'B3']

; Define some labels for printing later.

RM, x, 6, 2

; Enter the data. row 0: 10	5

row 1: 20 15

row 2: 20 10

row 3: 10 10

row 4: 10 15

row 5: 20 5

reg = IMSL_REGRESSORS(x, 2, 0)

; Call IMSL_REGRESSORS.

PM, labels, reg, FORMAT = '(5a8, /, 6(5f8.1, /))'

; Print the results.

   A1     A2     B1     B2     B3

  1.0    0.0    1.0    0.0    0.0

  0.0    1.0    0.0    0.0    1.0

  0.0    1.0    0.0    1.0    0.0

  1.0    0.0    0.0    1.0    0.0

  1.0    0.0    0.0    0.0    1.0

  1.0    1.0    0.0    0.0    0.0

Example 2

In this example, a two-way analysis of covariance model containing all the interaction terms is fit. First, IMSL_REGRESSORS is called to produce a matrix of regressors, reg, from the data x. The regressors, generated using DUMMY_METHOD = 1, are the model whose mean function is:

µ + αi + βj + γ ij + δ xij + ζixij + η j xij + θ ijxij i = 1, 2; j = 1, 2, 3

where α2 = β3 = γ21 = γ22 = γ23 = ζ2 = η3 = θ21 = θ22 = θ23 = 0.

labels = ['Alpha1', 'Beta1', 'Beta2', 'Gamma11', 'Gamma12', $

'Delta', 'Zeta1', 'Eta1', 'Eta2', 'Theta11', 'Theta12']

; Define some labels to use in printing the results.

x = transpose([ [1.0, 1.0, 1.11], [1.0, 1.0, 2.22], $

[1.0, 1.0, 3.33], [1.0, 2.0, 1.11], [1.0, 2.0, 2.22], $

[1.0, 2.0, 3.33], [1.0, 3.0, 1.11], [1.0, 3.0, 2.22]

$ [1.0, 3.0, 3.33], [2.0, 1.0, 1.11], [2.0, 1.0, 2.22],

$ [2.0, 1.0, 3.33], [2.0, 2.0, 1.11], [2.0, 2.0, 2.22],

$ [2.0, 2.0, 3.33], [2.0, 3.0, 1.11], [2.0, 3.0, 2.22],

$ [2.0, 3.0, 3.33]])

Var_Effects = [1, 1, 2, 1, 2, 2, 3]

Indices_Effects = [0, 1, 0, 1, 2, 0, 2, 1, 2, 0, 1, 2]

reg = IMSL_REGRESSORS(x, 2, 1, Dummy_Method = 1, $

Var_Effects = var_effects, Indices_Effects = indices_effects)

; Call IMSL_REGRESSORS.

PM, labels(0:5), reg(*, 0:5), FORMAT = '(6a9, /, 18(6f9.2, /))'

; Output the results.

  Alpha1  Beta1  Beta2  Gamma11  Gamma12  Delta

   1.0     1.0     0.0     1.0     0.0     1.1

  1.00    1.00    0.00    1.00    0.00    2.22

  1.00    1.00    0.00    1.00    0.00    3.33

  1.00    0.00    1.00    0.00    1.00    1.11

  1.00    0.00    1.00    0.00    1.00    2.22

  1.00    0.00    1.00    0.00    1.00    3.33

  1.00    0.00    0.00    0.00    0.00    1.11

  1.00    0.00    0.00    0.00    0.00    2.22

  1.00    0.00    0.00    0.00    0.00    3.33

  0.00    1.00    0.00    0.00    0.00    1.11

  0.00    1.00    0.00    0.00    0.00    2.22

  0.00    1.00    0.00    0.00    0.00    3.33

  0.00    0.00    1.00    0.00    0.00    1.11

  0.00    0.00    1.00    0.00    0.00    2.22

  0.00    0.00    1.00    0.00    0.00    3.33

  0.00    0.00    0.00    0.00    0.00    1.11

  0.00    0.00    0.00    0.00    0.00    2.22

  0.00    0.00    0.00    0.00    0.00    3.33

PM, labels(6:10), reg(*, 6:10), FORMAT = '(5a9, /, 18(5f9.2, /))'

  Zeta1   Eta1    Eta2  Theta11  Theta12

   1.1     1.1     0.0     1.1     0.0

  2.22    2.22    0.00    2.22    0.00

  3.33    3.33    0.00    3.33    0.00

  1.11    0.00    1.11    0.00    1.11

  2.22    0.00    2.22    0.00    2.22

  3.33    0.00    3.33    0.00    3.33

  1.11    0.00    0.00    0.00    0.00

  2.22    0.00    0.00    0.00    0.00

  3.33    0.00    0.00    0.00    0.00

  0.00    1.11    0.00    0.00    0.00

  0.00    2.22    0.00    0.00    0.00

  0.00    3.33    0.00    0.00    0.00

  0.00    0.00    1.11    0.00    0.00

  0.00    0.00    2.22    0.00    0.00

  0.00    0.00    3.33    0.00    0.00

  0.00    0.00    0.00    0.00    0.00

  0.00    0.00    0.00    0.00    0.00

  0.00    0.00    0.00    0.00    0.00

Syntax

Result = IMSL_REGRESSORS(X, N_Class, N_Continuous [, CLASS_COLUMNS=array] [, /DOUBLE] [, DUMMY_METHOD=variable] [, INDICES_EFFECTS=array] [, ORDER=value] [, VAR_EFFECTS=array])

Return Value

A two-dimensional array containing the regressor variables generated from X.

Arguments

X

Two-dimensional array containing the data. The columns must be ordered such that the first N_Class columns contain the class variables and the next N_Continuous columns contain the continuous variables. (Exception: See they keyword CLASS_COLUMNS.)

N_Class

Number of classification variables.

N_Continuous

Number of continuous variables.

Keywords

CLASS_COLUMNS (optional)

One-dimensional array of length N_Class containing the column numbers of x that are the classification variables. The remaining n_continuous variables are assumed to correspond to the columns of x in the range 0, ..., N_Class – 1 that are not listed in CLASS_COLUMNS. Default: [0, 1, ..., N_Class – 1].

DOUBLE (optional)

If present and nonzero, double precision is used.

DUMMY_METHOD (optional)

Dummy variable option. Indicator variables are defined for each class variable as described in the Discussion section. Dummy variables are then generated from the n indicator variables in one of the following three ways:

(Default): The n indicator variables are the dummy variables.
1: Dummies are the first n – 1 indicator variables.
2: The n – 1 dummies are defined in terms of the indicator variables so that for balanced data, the usual summation restrictions are imposed on the regression coefficients.

INDICES_EFFECTS (optional)

One-dimensional array of length VAR_EFFECTS (0) + VAR_EFFECTS (1) + ... VAR_EFFECTS (N_ELEMENTS (VAR_EFFECTS ) – 1). The first VAR_EFFECTS (0) elements give the column numbers of x for each variable in the first effect. The next VAR_EFFECTS (1) elements give the column numbers for each variable in the second effect. The last VAR_EFFECTS (N_ELEMENTS (VAR_EFFECTS ) – 1) elements give the column numbers for each variable in the last effect. The keywords VAR_EFFECTS and INDICES_EFFECTS must be used together.

ORDER (optional)

Order of the model. Model order can be specified as 1 or 2. Use the keyword INDICES_EFFECTS to specify more complicated models. The keywords VAR_EFFECTS and INDICES_EFFECTS must be used together. Default: 1

VAR_EFFECTS (optional)

One-dimensional array containing the number of variables associated with each effect in the model. The keywords VAR_EFFECTS and INDICES_EFFECTS must be used together.

Version History

6.4	Introduced

Module	Math&Stats

Version	9.2