PCA
Name
PCA
Purpose
Carry out a Principal Components Analysis (Karhunen-Loeve Transform)
Explanation
Results can be directed to the screen, a file, or output variables
See notes below for comparison with the intrinsic IDL function PCOMP.
Calling Sequence
PCA, data, eigenval, eigenvect, percentages, proj_obj, proj_atr,
[MATRIX =, TEXTOUT = ,/COVARIANCE, /SSQ, /SILENT ]
Input Parameters
data - 2-d data matrix, data(i,j) contains the jth attribute value
for the ith object in the sample. If N_OBJ is the total
number of objects (rows) in the sample, and N_ATTRIB is the
total number of attributes (columns) then data should be
dimensioned N_OBJ x N_ATTRIB.
Optional Input Keyword Parameters
/COVARIANCE - if this keyword is set, then the PCA will be carried out
on the covariance matrix (rare), the default is to use the
correlation matrix
/SILENT - If this keyword is set, then no output is printed
/SSQ - if this keyword is set, then the PCA will be carried out on
on the sums-of-squares & cross-products matrix (rare)
TEXTOUT - Controls print output device, defaults to !TEXTOUT
textout=1 TERMINAL using /more option
textout=2 TERMINAL without /more option
textout=3 <program>.prt
textout=4 laser.tmp
textout=5 user must open file
textout = filename (default extension of .prt)
Optional Output Parameters
eigenval - N_ATTRIB element vector containing the sorted eigenvalues
eigenvect - N_ATRRIB x N_ATTRIB matrix containing the corresponding
eigenvectors
percentages - N_ATTRIB element containing the cumulative percentage
variances associated with the principal components
proj_obj - N_OBJ by N_ATTRIB matrix containing the projections of the
objects on the principal components
proj_atr - N_ATTRIB by N_ATTRIB matrix containing the projections of
the attributes on the principal components
Optional Output Parameter
MATRIX = analysed matrix, either the covariance matrix if /COVARIANCE
is set, the "sum of squares and cross-products" matrix if
/SSQ is set, or the (by default) correlation matrix. Matrix
will have dimensions N_ATTRIB x N_ATTRIB
Notes
This procedure performs Principal Components Analysis (Karhunen-Loeve
Transform) according to the method described in "Multivariate Data
Analysis" by Murtagh & Heck [Reidel : Dordrecht 1987], pp. 33-48.
See http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
Keywords /COVARIANCE and /SSQ are mutually exclusive.
The printout contains only (at most) the first seven principle
eigenvectors. However, the output variables EIGENVECT contain
all the eigenvectors
Different authors scale the covariance matrix in different ways.
The eigenvalues output by PCA may have to be scaled by 1/N_OBJ or
1/(N_OBJ-1) to agree with other calculations when /COVAR is set.
PCA uses the non-standard system variables !TEXTOUT and !TEXTUNIT.
These can be added to one's session using the procedure ASTROLIB.
The intrinsic IDL function PCOMP duplicates most
most of the functionality of PCA, but uses different conventions and
normalizations. Note the following:
(1) PCOMP requires a N_ATTRIB x N_OBJ input array; this is the transpose
of what PCA expects
(2) PCA uses standardized variables for the correlation matrix: the input
vectors are set to a mean of zero and variance of one and divided by
sqrt(n); use the /STANDARDIZE keyword to PCOMP for a direct comparison.
(3) PCA (unlike PCOMP) normalizes the eigenvectors by the square root
of the eigenvalues.
(4) PCA returns cumulative percentages; the VARIANCES keyword of PCOMP
returns the variance in each variable
(5) PCOMP divides the eigenvalues by (1/N_OBJ-1) when the covariance matrix
is used.
Example
Perform a PCA analysis on the covariance matrix of a data matrix, DATA,
and write the results to a file
IDL> PCA, data, /COVAR, t = 'pca.dat'
Perform a PCA analysis on the correlation matrix. Suppress all
printing, and save the eigenvectors and eigenvalues in output variables
IDL> PCA, data, eigenval, eigenvect, /SILENT
Procedures Called
TEXTOPEN, TEXTCLOSE
Revision History
Immanuel Freedman (after Murtagh F. and Heck A.). December 1993
Wayne Landsman, modified I/O December 1993
Fix MATRIX output, remove GOTO statements W. Landsman August 1998
Changed some index variable to type LONG W. Landsman March 2000
Fix error in computation of proj_atr, see Jan 1990 fix in
http://astro.u-strasbg.fr/~fmurtagh/mda-sw/pca.f W. Landsman Feb 2008