IMSL_EXACT_NETWORK

The IMSL_EXACT_NETWORK function computes Fisher exact probabilities and a hybrid approximation of the Fisher exact method for a two-way contingency table using the network algorithm.

This routine requires an IDL Advanced Math and Stats license. For more information, contact your sales or technical support representative.

The IMSL_EXACT_NETWORK function computes Fisher exact probabilities, or a hybrid algorithm approximation to Fisher exact probabilities, for a N_rows by N_columns contingency table with fixed row and column marginals. A marginal is the number of counts in a row or column.

Let f_ij denote the count in row i and column j of a table, and let f_i and f_•j denote the row and column marginals. Under the hypothesis of independence, the (conditional) probability of the fixed marginals of the observed table is given by:

Where:

f_•• is the total number of counts in the table
P_f corresponds to the output keyword PROB_TABLE.

A more extreme table X is defined in the probabilistic sense as more extreme than the observed table if the conditional probability computed for table X (for the same marginal sums) is less than the conditional probability computed for the observed table. Note that this definition can be considered two-sided in the cell counts.

Example

This example demonstrates various methods of computing chi-squared p-values with respect to accuracy. As seen in the output of this example, the Fisher exact probability and the usual asymptotic chi-squared probability (generated using IMSL_CONTINGENCY) can be different.

.RUN

PRO print_results, p, p2, p3, p4

PRINT, 'Asymptotic Chi-Squared p-value'

PRINT, 'p-value =', p

PRINT, 'Network Algorithm with Approximation'

PRINT, 'p-value =', p2

PRINT, 'Network Algorithm without Approximation'

PRINT, 'p-value =', p3

PRINT, 'Total Enumeration Method'

PRINT, 'p-value =', p4

END

table = TRANSPOSE([[20, 20, 0, 0, 0], [10, 10, 2, 2, 1], $

  [20, 20, 0, 0, 0]])

p	= IMSL_CONTINGENCY(table)

p2 = IMSL_EXACT_NETWORK(table)

p3 = IMSL_EXACT_NETWORK(table, /NO_APPROX)

p4 = IMSL_EXACT_ENUM(table)

print_results, p, p2, p3, p4

IDL prints:

Asymptotic Chi-Squared p-value

p-value =	0.0322604

Network Algorithm with Approximation

p-value =	0.0601165

Network Algorithm without Approximation

p-value =	0.0598085

Total Enumeration Method

p-value =	0.0597294

Errors

Warning Errors

STAT_HASH_TABLE_ERROR_2: The value "ldkey" = # is too small. "ldkey" is calculated as Wk_Params(0)*pow(10, N_Attempts−1) ending this execution attempt.

STAT_HASH_TABLE_ERROR_3: The value "ldstp" = # is too small. "ldstp" is calculated as Wk_Params(1)*pow(10, N_Attempts−1) ending this execution attempt.

Fatal Errors

STAT_HASH_TABLE_ERROR_1: The hash table key cannot be computed because the largest key is larger than the largest representable integer. The algorithm cannot proceed.

Syntax

Result = IMSL_EXACT_NETWORK(Table [, APPROX_PARAMS=array] [, /DOUBLE] [, /NO_APPROX] [, P_VALUE=variable] [, PROB_TABLE=variable] [, WK_PARAMS=array])

Return Value

The p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where "extreme" is taken in the Neyman-Pearson sense. The p-value is "two-sided".

Arguments

Table

Two-dimensional array containing the observed counts in the contingency table.

Keywords

APPROX_PARAMS (optional)

One-dimensional array of size 3. APPROX_PARAMS(0) is the expected value used in the hybrid approximation to Fisher’s exact test algorithm for deciding when to use asymptotic probabilities when computing path lengths. APPROX_PARAMS(1) is the percentage of remaining cells that must have estimated expected values greater than APPROX_PARAMS(0) before asymptotic probabilities can be used in computing path lengths. APPROX_PARAMS(2) is the minimum cell estimated value allowed for asymptotic chi-squared probabilities to be used.

Asymptotic probabilities are used in computing path lengths whenever APPROX_PARAMS(1) or more of the cells in the table have estimated expected values of APPROX_PARAMS(0) or more, with no cell having expected value less than APPROX_PARAMS(2). See the description at the beginning of this topic for details.

Defaults:

APPROX_PARAMS(0) = 5.0

APPROX_PARAMS(1) = 80.0

APPROX_PARAMS(2) = 1.0

These defaults correspond to the "Cochran" condition.

DOUBLE (optional)

If present and nonzero, double precision is used.

NO_APPROX (optional)

If present and nonzero, the Fisher exact test is used and APPROX_PARAMS is ignored.

P_VALUE (optional)

Named variable containing the p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where "extreme" is in the Neyman-Pearson sense. The P_Value is "two-sided." The p-value is also returned in functional form.

A table is more extreme if its probability (for fixed marginals) is less than or equal to PROB_TABLE.

PROB_TABLE (optional)

Named variable containing the probability of the observed table occurring, given that the null hypothesis of independent rows and columns is true.

WK_PARAMS (optional)

One-dimensional array of size 3. The network algorithm requires a large amount of workspace. Some of the workspace requirements are well-defined, while most of the workspace requirements can only be estimated. The estimate is based primarily on table size.

The IMSL_EXACT_ENUM function allocates a default amount of workspace suitable for small problems. If the algorithm determines that this initial allocation of workspace is inadequate, the memory is freed, a larger amount of memory allocated (twice as much as the previous allocation), and the network algorithm is re-started. The algorithm allows for up to WK_PARAMS(2) attempts to complete the algorithm.

Because each attempt requires computer time, it is suggested that WK_PARAMS(0) and WK_PARAMS(1) be set to some large numbers (like 1,000 and 30,000) if the problem to be solved is large. It is suggested that WK_PARAMS(1) be 30 times larger than WK_PARAMS(0). Although IMSL_EXACT_ENUM will eventually work its way up to a large enough memory allocation, it is quicker to allocate enough memory initially.

The known (well-defined) workspace requirements are as follows:

f_•• = ΣΣf_ij is equal to the sum of all cell frequencies in the observed table
nt = f_•• + 1
mx = max(N_rows, N_columns)
mn = min(N_rows, N_columns)
t₁ = max(800 + 7mx, (5 +2mx) (N_rows + N_columns + 1) )
t₂ = max(400 + mx, + 1, N_rows + N_columns + 1)

IN IDL terms:

N_rows = N_ELEMENTS(Table(*,0))
N_columns = N_ELEMENTS(Table(0,*)).

The amount of integer workspace allocated is 3mx + 2mn + t₁.

The amount of real workspace allocated is nt + t₂.

The remainder of workspace that is required must be estimated and allocated based on WK_PARAMS(0) and WK_PARAMS(1).

The amount of integer workspace allocated is 6n * (WK_PARAMS(0) + WK_PARAMS(1)).
The amount of real workspace allocated is n, where n is (6*WK_PARAMS(0) + 2* Wk_Params(1)).
Variable n is the index for the attempt, 1 < n ≤ Wk_Params(2).

Defaults:

WK_PARAMS(0) = 100

WK_PARAMS(1) = 3000

WK_PARAMS(2) = 10

Version History

6.4	Introduced

Module	Math&Stats

Version	9.1