IMSL_NORMALITY

The IMSL_NORMALITY function performs a test for normality.

This routine requires an IDL Advanced Math and Stats license. For more information, contact your sales or technical support representative.

Three methods are provided for testing normality: the Chi-Squared test, the Shapiro- Wilk W test, and the Lilliefors test.

Chi-Squared Test

This function computes the chi-squared statistic, its p-value, and the degrees of freedom of the test. The keyword NCAT finds the number of intervals into which the observations are to be divided. The intervals are equi-probable except for the first and last interval which are infinite in length. If more flexibility is desired for the specification of intervals, the same test can be performed with a call to IMSL_CHISQTEST using the optional arguments described for that function.

Shapiro-Wilk W Test

D’Agostino and Stevens (1986, p. 406) refer to the Shapiro-Wilk W test as the best omnibus tests of normality. The function is based on the approximations and code given by Royston (1982a, b, c). It can be used in samples as large as 2,000 or as small as 3. In the Shapiro and Wilk test, W is given by:

where x(_i) is the i-th smallest order statistic and:

is the sample mean. Royston (1982) gives approximations and tabled values that can be used to compute the coefficients a_i, i = 1, ..., n, and obtains the significance level of the W statistic.

Lilliefors Test

This function computes Lilliefors test and its p-values for a normal distribution in which both the mean and variance are estimated. The one-sample, two-sided Kolmogorov-Smirnov statistic D is first computed. The p-values are then computed using an analytic approximation given by Dallal and Wilkinson (1986). Because Dallal and Wilkinson give approximations in the range (0.01, 0.10) if the computed probability of a greater D is less than 0.01, a note is issued and the p-value is set to 0.50. Note that because parameters are estimated, p-values in Lilliefors test are not the same as in the Kolmogorov-Smirnov Test.

Observations should not be tied. If tied observations are found, an informational message is printed. A general reference for the Lilliefors test is Conover (1980). The original reference for the test for normality is Lilliefors (1967).

Examples

Example 1

The following example is taken from Conover (1980, pp. 195, 364). The data consists of 50 two-digit numbers taken from a telephone book. The W test fails to reject the null hypothesis of normality at the .05 level of significance.

x = [23,	36,	54,	61,	73,	23,	37,	54,	61,	73,	$

  24,	40,	56,	62,	74,	27,	42,	57,	63,	75,	$

  29,	43,	57,	64,	77,	31,	43,	58,	65,	81,	$

  32,	44,	58,	66,	87,	33,	45,	58,	68,	89,	$

  33,	48,	58,	68,	93,	35,	48,	59,	70,	97]

p = IMSL_NORMALITY(x)

PRINT, 'P-Value = ', p

P-Value =	0.230858

Example 2

The following example uses the same data as the previous example. Here, the Shapiro-Wilk W statistic is output.

p = IMSL_NORMALITY(x, SHAPIRO_WILK = sw)

PRINT, 'p-Value	= ', p

PRINT, 'Shapiro Wilk W Statistic = ', sw

IDL prints:

p-Value                  = 0.230858

Shapiro Wilk W Statistic = 0.964217

Errors

Warning Errors

STAT_ALL_OBS_TIED: All observations in x are tied.

Fatal Errors

STAT_NEED_AT_LEAST_5: All but # elements of x are missing. At least five nonmissing observations are necessary to continue.

STAT_NEG_IN_EXPONENTIAL: In testing the exponential distribution, an invalid element in x is found (x[ ] = #). Negative values are not possible in exponential distributions.

STAT_NO_VARIATION_INPUT: There is no variation in the input data. All nonmissing observations are tied.

Syntax

Result = IMSL_NORMALITY(X [, CHISQ=variable] [, DF=variable] [, /DOUBLE] [, LILLIEFORS=variable] [, NCAT=value] [, SHAPIRO_WILK=variable])

Return Value

The p-value for the Shapiro-Wilk W test or the Lilliefors test for normality. The Shapiro-Wilk test is the default. If the Lilliefors test is used, probabilities less than 0.01 are reported as 0.01, and probabilities greater than 0.10 for the normal distribution are reported as 0.5; otherwise, an approximate probability is computed.

Arguments

X

One-dimensional array containing the observations.

Keywords

CHISQ (optional)

Specifies a variable into which the chi-square statistic is stored. The keywords NCAT, DF, and CHISQ must be used together and indicate that the chi-squared goodness-of-fit test is to be performed.

DF (optional)

Specifies a variable into which the degrees of freedom for the test are stored. The keywords NCAT, DF, and CHISQ must be used together and indicate that the chi- squared goodness-of-fit test is to be performed.

DOUBLE (optional)

If present and nonzero, double precision is used.

LILLIEFORS (optional)

Named variable into which the maximum absolute difference between the empirical and the theoretical distributions is stored. If Lilliefors is present, then Lilliefors test is performed.

NCAT (optional)

An integer specifying number of cells into which the observations are to be tallied. The keywords NCAT, DF, and CHISQ must be used together and indicate that the chi- squared goodness-of-fit test is to be performed.

SHAPIRO_WILK (optional)

Named variable into which the Shapiro-Wilk W statistic is stored. If Shapiro_Wilk is present, then the Shapiro-Wilk W test is performed. Default: Shapiro-Wilk W test is performed

Version History

6.4	Introduced

Module	Math&Stats

Version	9.2