The CREATEBOXPLOTDATA function takes a raw input dataset and generates the data needed as input into the BOXPLOT function.
CREATEBOXPLOTDATA returns five values for each input dataset: the minimum (excluding possible outliers), the lower quartile, the median, the upper quartile, and the maximum (excluding possible outliers). If neither outlier nor suspected outliers are calculated then the minimum and maximum returned values will be the minimum and maximum of the dataset. If outliers or suspected outliers are calculated then the minimum and maximum returned will be the smallest and largest value (respectively) in the dataset that is not included in the outlier or suspected outlier data.
Examples
Copy and paste the following code to the IDL command line to create data for use in BOXPLOT.
bike_mph = [ $
[12.2, 16.2], $
[12.1, 16.4], $
[10.7, 16.9], $
[11.6, 17.0], $
[10.2, 16.5], $
[10.9, 16.1], $
[11.8, 17.1], $
[10.9, 16.0], $
[12.4, 16.8], $
[12.9, 16.9], $
[13.1, 17.5], $
[13.0, 17.4]]
bpd = CREATEBOXPLOTDATA(bike_mph, MEAN_VALUES=means, OUTLIER_VALUES=outliers)
PRINT, bpd
IDL displays:
10.200000 16.000000
11.250000 16.450001
12.150000 16.900000
12.950000 17.250000
13.100000 17.500000
PRINT, means
IDL displays:
11.8167 16.7333
PRINT, outliers
IDL displays:
!NULL
Syntax
result = CREATEBOXPLOTDATA(data [, IGNORE=value] [, CI_VALUES=variable] [FINITE_INDICES=variable] [, MEAN_VALUES=variable] [, OUTLIER_VALUES=variable] [, SUSPECTED_OUTLIER_VALUES=variable)
Return Value
An M x 5 element array, where M is the number of distinct datasets containing data for use in BOXPLOT. IDL creates data in the order needed for BOXPLOT: minimum, lower quartile, median, upper quartile, and maximum values.
Arguments
Data
The input data used to generate the results for the BOXPLOT function. The input data may be any of the following:
- an M x N array of data where M is the number of distinct datasets and N is the number of data values for each dataset.
- an N-element array of pointers. Each pointer denotes one dataset.
- an N-element list. Each list element denotes one dataset.
Keywords
IGNORE
Set this keyword to a value to treat as bad data and to ignore when calculating the results.
CI_VALUES
Set this keyword to a named variable to return an N-element array denoting the confidence interval value around the median for each box. These values are used for the boundaries of the notch in the BOXPLOT function, if displayed.
FINITE_INDICES
Set this keyword to a named variable to return a vector containing the indices of the datasets in which valid data was returned. This useful when your data contains NaN's or infinite values, e.g., some datasets can not be used to create the five needed values for BOXPLOT.
MEAN_VALUES
Set this keyword to a named variable to return an M-element vector containing the mean values for each input dataset.
OUTLIER_VALUES
Set this keyword to a named variable to return a 2 x N-element array containing any outliers from each input dataset. For each value [x, y], x represents the box location and y represents the value at that location.
SUSPECTED_OUTLIER_VALUES
Set this keyword to a named variable to return a 2 x N element array containing any suspected outliers from each input dataset. For each value [x, y], x represents the box location and y represents the value at that location.
Notes on CREATEBOXPLOTDATA Calculations
Values returned by CREATEBOXPLOTDATA are calculated using the conventions outlined below. Given an ordered dataset with n elements:
- The position of the lower quartile (Q1) is 0.25 * (n + 1). If this position is not an integer then the weighted average of the two surrounding positions is used.
- The position of the median is 0.50 * (n + 1). If this position is not an integer then the weighted average of the two surrounding positions is used.
- The position of the upper quartile (Q3) is 0.75 * (n + 1). If this position is not an integer then the weighted average of the two surrounding positions is used.
- The Inner Quartile Range (IQR) = Q3 - Q1.
- Suspected outliers are those values that fall within the following ranges: [Q1 - 3 * IQR, Q1 - 1.5 * IQR] or [Q3 + 1.5 * IQR, Q3 + 3 * IQR].
- Outliers are those values that are either less than Q1 - 3 * IQR or greater than Q3 + 3 * IQR.
- The Confidence Interval (CI) value is calculated as (1.57 * IQR) / sqrt(n). When this value is passed into BOXPLOT, a notch will be displayed around the median using the values of median +/- CI.
Version History
See Also
BOXPLOT