The CREATEBOXPLOTDATA function takes a raw input dataset and generates the data needed as input into the BOXPLOT function.

CREATEBOXPLOTDATA returns five values for each input dataset: the minimum (excluding possible outliers), the lower quartile, the median, the upper quartile, and the maximum (excluding possible outliers). If neither outlier nor suspected outliers are calculated then the minimum and maximum returned values will be the minimum and maximum of the dataset. If outliers or suspected outliers are calculated then the minimum and maximum returned will be the smallest and largest value (respectively) in the dataset that is not included in the outlier or suspected outlier data.

## Examples

Copy and paste the following code to the IDL command line to create data for use in BOXPLOT.

`; Create an array of average speeds on two different bicycles`

`; to use in CREATEBOXPLOTDATA`

bike_mph = [ $

[12.2, 16.2], $

[12.1, 16.4], $

[10.7, 16.9], $

[11.6, 17.0], $

[10.2, 16.5], $

[10.9, 16.1], $

[11.8, 17.1], $

[10.9, 16.0], $

[12.4, 16.8], $

[12.9, 16.9], $

[13.1, 17.5], $

[13.0, 17.4]]

`;Create the data and store mean and outlier values`

`bpd = CREATEBOXPLOTDATA(bike_mph, MEAN_VALUES=means, OUTLIER_VALUES=outliers)`

`;Display the data created to be used in BOXPLOT`

`PRINT, bpd`

IDL displays:

10.200000 16.000000

11.250000 16.450001

12.150000 16.900000

12.950000 17.250000

13.100000 17.500000

`; Display the mean values created`

`PRINT, means`

IDL displays:

11.8167 16.7333

`; Display the outlier values created`

`PRINT, outliers`

IDL displays:

!NULL

## Syntax

*result* = CREATEBOXPLOTDATA(*data* [, IGNORE=*value*] [, CI_VALUES=*variable*] [FINITE_INDICES=*variable*] [, MEAN_VALUES=*variable*] [, OUTLIER_VALUES=*variable*] [, SUSPECTED_OUTLIER_VALUES=*variable*)

## Return Value

An *M* x 5 element array, where *M* is the number of distinct datasets containing data for use in BOXPLOT. IDL creates data in the order needed for BOXPLOT: minimum, lower quartile, median, upper quartile, and maximum values.

## Arguments

### Data

The input data used to generate the results for the BOXPLOT function. The input data may be any of the following:

- an
*M*x*N*array of data where*M*is the number of distinct datasets and*N*is the number of data values for each dataset. - an
*N*-element array of pointers. Each pointer denotes one dataset. - an
*N*-element list. Each list element denotes one dataset.

## Keywords

### IGNORE

Set this keyword to a value to treat as bad data and to ignore when calculating the results.

### CI_VALUES

Set this keyword to a named variable to return an *N*-element array denoting the confidence interval value around the median for each box. These values are used for the boundaries of the notch in the BOXPLOT function, if displayed.

### FINITE_INDICES

Set this keyword to a named variable to return a vector containing the indices of the datasets in which valid data was returned. This useful when your data contains NaN's or infinite values, e.g., some datasets can not be used to create the five needed values for BOXPLOT.

### MEAN_VALUES

Set this keyword to a named variable to return an *M*-element vector containing the mean values for each input dataset.

### OUTLIER_VALUES

Set this keyword to a named variable to return a 2 x *N*-element array containing any outliers from each input dataset. For each value [*x, y*], *x* represents the box location and *y* represents the value at that location.

### SUSPECTED_OUTLIER_VALUES

Set this keyword to a named variable to return a 2 x *N* element array containing any suspected outliers from each input dataset. For each value [*x, y*], *x* represents the box location and *y* represents the value at that location.

## Notes on CREATEBOXPLOTDATA Calculations

Values returned by CREATEBOXPLOTDATA are calculated using the conventions outlined below. Given an ordered dataset with *n* elements:

- The position of the lower quartile (Q1) is 0.25 * (
*n*+ 1). If this position is not an integer then the weighted average of the two surrounding positions is used. - The position of the median is 0.50 * (
*n*+ 1). If this position is not an integer then the weighted average of the two surrounding positions is used. - The position of the upper quartile (Q3) is 0.75 * (
*n*+ 1). If this position is not an integer then the weighted average of the two surrounding positions is used. - The Inner Quartile Range (IQR) = Q3 - Q1.
- Suspected outliers are those values that fall within the following ranges: [Q1 - 3 * IQR, Q1 - 1.5 * IQR] or [Q3 + 1.5 * IQR, Q3 + 3 * IQR].
- Outliers are those values that are either less than Q1 - 3 * IQR or greater than Q3 + 3 * IQR.
- The Confidence Interval (CI) value is calculated as (1.57 * IQR) / sqrt(
*n*). When this value is passed into BOXPLOT, a notch will be displayed around the median using the values of median +/- CI.

## Version History

8.2.2 | Introduced |