The RUNNING_STATS function computes the mean and unbiased sample variance of an array without overflow. The function can also combine previously computed values with new data to allow computing mean and variance on data sets that are too large to fit into memory.

RUNNING_STATS uses the Welford "online" algorithm to compute the running mean and variance in a single pass through the data. The routine is more stable when computing the mean and variance, is significantly faster than the VARIANCE function, and unlike VARIANCE, does not require any additional memory.

## Examples

`; Define a vector of sample data:`

IDL> A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

`; Compute the [mean, variance, count]:`

`IDL> result = RUNNING_STATS(A)`

IDL> result

IDL prints:

5.5000000000000000 9.1666666666666661 10.000000000000000

## Syntax

*Result* = RUNNING_STATS( *X* [, /NAN] [, PREVIOUS=*value*]
)

## Return Value

Returns the statistics of the array *X* in the form [*mean*, *variance*, *count*] in double precision.

## Arguments

### X

The array to be processed. This array can be any numeric type other than complex or double complex.

## Keywords

### NAN

Set this keyword to cause the routine to check for occurrences of the IEEE floating-point values *NaN* or *Infinity* in the input data. Elements with the value *NaN* or *Infinity* are treated as missing data.

*Note: *Since the value NaN is treated as missing data, if you set /NAN and *Array* contains only NaN values, the routine will return NaN for the *mean* and *variance*, and zero for the *count*.

### PREVIOUS

Set this keyword to a three-element array containing the [*mean*, *variance*, and *count*] from a previous calculation. These three values will be combined with the new statistics computed from the input array. If this keyword is omitted or is set to [0, 0, 0], then a new calculation is started.

*Tip: *See below for examples of chaining together multiple calls to RUNNING_STATS using the PREVIOUS keyword.

*Note: *If the *count* from a previous calculation is zero, then a new calculation is started, regardless of the *mean* or *variance* values.

### Thread Pool Keywords

This routine is written to make use of IDL’s *thread pool*, which can increase execution speed on systems with multiple CPUs. The values stored in the !CPU system variable control whether IDL uses the thread pool for a given computation. In addition, you can use the thread pool keywords TPOOL_MAX_ELTS, TPOOL_MIN_ELTS, and TPOOL_NOTHREAD to override the defaults established by !CPU for a single invocation of this routine. See Thread Pool Keywords for details.

When computing the statistics for a large number of values, the results will depend upon the order in which the numbers are combined. Since the thread pool will combine values in a different order, you may obtain a different — but equally correct — result than that obtained using the standard non-threaded implementation. This effect occurs because RUNNING_STATS uses floating point arithmetic, and the mantissa of a floating point value has a fixed number of significant digits. For more information on floating-point numbers, see Accuracy and Floating Point Operations.

### Additional Examples

IDL> A = [1, 2, 3, 4, 5]

IDL> B = [6, 7, 8, 9, 10]

`; First compute the stats for the combined array:`

`IDL> RUNNING_STATS([A, B])`

`; 5.5000000000000000 9.1666666666666661 10.000000000000000`

`; Now compute the stats of just A and then combine with B using PREVIOUS keyword`

`IDL> Stats_of_A = RUNNING_STATS(A)`

IDL> Stats_of_A

` ; 3.000000000000000 2.500000000000000 5.000000000000000`

`IDL> RUNNING_STATS(B, PREVIOUS = Stats_of_A)`

`; 5.5000000000000000 9.1666666666666661 10.000000000000000`

`; use PREVIOUS keyword to efficiently calculate stats on a huge array`

IDL> stats = [0, 0, 0]

IDL> for i=0,99 do stats = RUNNING_STATS(randomu(seed, 1e7), PREVIOUS=stats)

IDL> stats

IDL prints:

0.50000184809149439 0.083333037727096743 1000000000.0000000

## Version History

8.8.3 |
Introduced |