The IMSL_KOLMOGOROV2 function performs a Kolmogorov-Smirnov two- sample test.

This routine requires an IDL Advanced Math and Stats license. For more information, contact your sales or technical support representative.

The IMSL_KOLMOGOROV2 function computes Kolmogorov-Smirnov two-sample test statistics for testing that two continuous cumulative distribution functions (CDF's) are identical based upon two random samples. One- or two-sided alternatives are allowed. If n_observations_x = N_ELEMENTS(x) and n_observations_y = N_ELEMENTS(y), then the exact p-values are computed for the two-sided test when n_observations_x * n_observations_y is less than 104.

Let Fn(x) denote the empirical CDF in the X sample, let Gm(y) denote the empirical CDF in the Y sample, where n = n_observations_x- NMISSINGX and m = n_observations_y NMISSINGY, and let the corresponding population distribution functions be denoted by F(x) and G(y), respectively. Then, the hypotheses tested by IMSL_KOLMOGOROV2 are as follows:

  • H0 : F (x) = G (x)     H1 :F (x) ≠ G (x)
  • H0 : F (x) ≥ G (x)     H1 : F (x) < G (x)
  • H0 : F (x) ≤ G (x)     H1 : F (x) > G (x)

The test statistics are given as follows:

Asymptotically, the distribution of the statistic

(returned in Result (0)) converges to a distribution given by Smirnov (1939).

Exact probabilities for the two-sided test are computed when m * n is less than or equal to 104, according to an algorithm given by Kim and Jennrich (1973;). When m * n is greater than 104, the very good approximations given by Kim and Jennrich are used to obtain the two-sided p-values. The one-sided probability is taken as one half the two-sided probability. This is a very good approximation when the p-value is small (say, less than 0.10) and not very good for large p-values.

Example


The following example illustrates the IMSL_KOLMOGOROV2 routine with two randomly generated samples from a uniform(0,1) distribution. Since the two theoretical distributions are identical, we would not expect to reject the null hypothesis.

IMSL_RANDOMOPT, set	=	123457
x	=	IMSL_RANDOM(100, /Uniform)
y	=	IMSL_RANDOM(60, /Uniform)
stats	=	IMSL_KOLMOGOROV2(x, y, DIFFERENCES = d, $
  NMISSINGX = nmx, NMISSINGY = nmy)
PRINT, 'D	=', d(0)
PRINT, 'D+ =', d(1) PRINT, 'D- =', d(2)
PRINT, 'Z	=', stats(0)
PRINT, 'Prob greater D one sided =', stats(1)
PRINT, 'Prob greater D two sided =', stats(2)
PRINT, 'Missing X =', nmx
PRINT, 'Missing Y =', nmy
 
D	=    0.180000
D+	=   0.180000
D-	=   0.0100001
Z	=    1.10227
Prob greater D one sided =    0.0720105
Prob greater D two sided =    0.144021
Missing X =    0
Missing Y =    0

Syntax


Result = KOLMORGOROV2(X, Y [, DIFFERENCES=variable] [, /DOUBLE] [, NMISSINGX=variable] [, NMISSINGY=variable])

Return Value


One-dimensional array of length 3 containing Z, p1, and p2.

Arguments


X

One-dimensional array containing the observations from sample one.

Y

One-dimensional array containing the observations from sample two.

Keywords


DIFFERENCES (optional)

Named variable into which a one-dimensional array containing Dn, Dn, Dn is stored.

DOUBLE (optional)

If present and nonzero, then double precision is used.

NMISSINGX (optional)

Named variable into which the number of missing values in the x sample is stored.

NMISSINGY (optional)

Named variable into which the number of missing values in the y sample is stored.

Version History


6.4

Introduced

See Also


IMSL_KOLMOGOROV1