The IMSL_EXACT_NETWORK function computes Fisher exact probabilities and a hybrid approximation of the Fisher exact method for a two-way contingency table using the network algorithm.
This routine requires an IDL Advanced Math and Stats license. For more information, contact your sales or technical support representative.
The IMSL_EXACT_NETWORK function computes Fisher exact probabilities, or a hybrid algorithm approximation to Fisher exact probabilities, for a N_rows by N_columns contingency table with fixed row and column marginals. A marginal is the number of counts in a row or column.
Let fij denote the count in row i and column j of a table, and let fi and f•j denote the row and column marginals. Under the hypothesis of independence, the (conditional) probability of the fixed marginals of the observed table is given by:
Where:
- f•• is the total number of counts in the table
- Pf corresponds to the output keyword PROB_TABLE.
A more extreme table X is defined in the probabilistic sense as more extreme than the observed table if the conditional probability computed for table X (for the same marginal sums) is less than the conditional probability computed for the observed table. Note that this definition can be considered two-sided in the cell counts.
Example
This example demonstrates various methods of computing chi-squared p-values with respect to accuracy. As seen in the output of this example, the Fisher exact probability and the usual asymptotic chi-squared probability (generated using IMSL_CONTINGENCY) can be different.
.RUN
PRO print_results, p, p2, p3, p4
PRINT, 'Asymptotic Chi-Squared p-value'
PRINT, 'p-value =', p
PRINT, 'Network Algorithm with Approximation'
PRINT, 'p-value =', p2
PRINT, 'Network Algorithm without Approximation'
PRINT, 'p-value =', p3
PRINT, 'Total Enumeration Method'
PRINT, 'p-value =', p4
END
table = TRANSPOSE([[20, 20, 0, 0, 0], [10, 10, 2, 2, 1], $
[20, 20, 0, 0, 0]])
p = IMSL_CONTINGENCY(table)
p2 = IMSL_EXACT_NETWORK(table)
p3 = IMSL_EXACT_NETWORK(table, /NO_APPROX)
p4 = IMSL_EXACT_ENUM(table)
print_results, p, p2, p3, p4
IDL prints:
Asymptotic Chi-Squared p-value
p-value = 0.0322604
Network Algorithm with Approximation
p-value = 0.0601165
Network Algorithm without Approximation
p-value = 0.0598085
Total Enumeration Method
p-value = 0.0597294
Errors
Warning Errors
STAT_HASH_TABLE_ERROR_2: The value "ldkey" = # is too small. "ldkey" is calculated as Wk_Params(0)*pow(10, N_Attempts−1) ending this execution attempt.
STAT_HASH_TABLE_ERROR_3: The value "ldstp" = # is too small. "ldstp" is calculated as Wk_Params(1)*pow(10, N_Attempts−1) ending this execution attempt.
Fatal Errors
STAT_HASH_TABLE_ERROR_1: The hash table key cannot be computed because the largest key is larger than the largest representable integer. The algorithm cannot proceed.
Syntax
Result = IMSL_EXACT_NETWORK(Table [, APPROX_PARAMS=array] [, /DOUBLE] [, /NO_APPROX] [, P_VALUE=variable] [, PROB_TABLE=variable] [, WK_PARAMS=array])
Return Value
The p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where "extreme" is taken in the Neyman-Pearson sense. The p-value is "two-sided".
Arguments
Table
Two-dimensional array containing the observed counts in the contingency table.
Keywords
APPROX_PARAMS (optional)
One-dimensional array of size 3. APPROX_PARAMS(0) is the expected value used in the hybrid approximation to Fisher’s exact test algorithm for deciding when to use asymptotic probabilities when computing path lengths. APPROX_PARAMS(1) is the percentage of remaining cells that must have estimated expected values greater than APPROX_PARAMS(0) before asymptotic probabilities can be used in computing path lengths. APPROX_PARAMS(2) is the minimum cell estimated value allowed for asymptotic chi-squared probabilities to be used.
Asymptotic probabilities are used in computing path lengths whenever APPROX_PARAMS(1) or more of the cells in the table have estimated expected values of APPROX_PARAMS(0) or more, with no cell having expected value less than APPROX_PARAMS(2). See the description at the beginning of this topic for details.
Defaults:
APPROX_PARAMS(0) = 5.0
APPROX_PARAMS(1) = 80.0
APPROX_PARAMS(2) = 1.0
These defaults correspond to the "Cochran" condition.
DOUBLE (optional)
If present and nonzero, double precision is used.
NO_APPROX (optional)
If present and nonzero, the Fisher exact test is used and APPROX_PARAMS is ignored.
P_VALUE (optional)
Named variable containing the p-value for independence of rows and columns. The p-value represents the probability of a more extreme table where "extreme" is in the Neyman-Pearson sense. The P_Value is "two-sided." The p-value is also returned in functional form.
A table is more extreme if its probability (for fixed marginals) is less than or equal to PROB_TABLE.
PROB_TABLE (optional)
Named variable containing the probability of the observed table occurring, given that the null hypothesis of independent rows and columns is true.
WK_PARAMS (optional)
One-dimensional array of size 3. The network algorithm requires a large amount of workspace. Some of the workspace requirements are well-defined, while most of the workspace requirements can only be estimated. The estimate is based primarily on table size.
The IMSL_EXACT_ENUM function allocates a default amount of workspace suitable for small problems. If the algorithm determines that this initial allocation of workspace is inadequate, the memory is freed, a larger amount of memory allocated (twice as much as the previous allocation), and the network algorithm is re-started. The algorithm allows for up to WK_PARAMS(2) attempts to complete the algorithm.
Because each attempt requires computer time, it is suggested that WK_PARAMS(0) and WK_PARAMS(1) be set to some large numbers (like 1,000 and 30,000) if the problem to be solved is large. It is suggested that WK_PARAMS(1) be 30 times larger than WK_PARAMS(0). Although IMSL_EXACT_ENUM will eventually work its way up to a large enough memory allocation, it is quicker to allocate enough memory initially.
The known (well-defined) workspace requirements are as follows:
- f•• = ΣΣfij is equal to the sum of all cell frequencies in the observed table
- nt = f•• + 1
- mx = max(N_rows, N_columns)
- mn = min(N_rows, N_columns)
- t1 = max(800 + 7mx, (5 +2mx) (N_rows + N_columns + 1) )
- t2 = max(400 + mx, + 1, N_rows + N_columns + 1)
IN IDL terms:
- N_rows = N_ELEMENTS(Table(*,0))
- N_columns = N_ELEMENTS(Table(0,*)).
The amount of integer workspace allocated is 3mx + 2mn + t1.
The amount of real workspace allocated is nt + t2.
The remainder of workspace that is required must be estimated and allocated based on WK_PARAMS(0) and WK_PARAMS(1).
- The amount of integer workspace allocated is 6n * (WK_PARAMS(0) + WK_PARAMS(1)).
- The amount of real workspace allocated is n, where n is (6*WK_PARAMS(0) + 2* Wk_Params(1)).
- Variable n is the index for the attempt, 1 < n ≤ Wk_Params(2).
Defaults:
WK_PARAMS(0) = 100
WK_PARAMS(1) = 3000
WK_PARAMS(2) = 10
Version History