LINMIX_ERR
Name
LINMIX_ERR
Purpose
Bayesian approach to linear regression with errors in both X and Y
Explanation
Perform linear regression of y on x when there are measurement
errors in both variables. the regression assumes :
ETA = ALPHA + BETA * XI + EPSILON
X = XI + XERR
Y = ETA + YERR
Here, (ALPHA, BETA) are the regression coefficients, EPSILON is the
intrinsic random scatter about the regression, XERR is the
measurement error in X, and YERR is the measurement error in
Y. EPSILON is assumed to be normally-distributed with mean zero and
variance SIGSQR. XERR and YERR are assumed to be
normally-distributed with means equal to zero, variances XSIG^2 and
YSIG^2, respectively, and covariance XYCOV. The distribution of XI
is modelled as a mixture of normals, with group proportions PI,
mean MU, and variance TAUSQR. Bayesian inference is employed, and
a structure containing random draws from the posterior is
returned. Convergence of the MCMC to the posterior is monitored
using the potential scale reduction factor (RHAT, Gelman et
al.2004). In general, when RHAT < 1.1 then approximate convergence
is reached.
Simple non-detections on y may also be included.
Calling Sequence
LINMIX_ERR, X, Y, POST, XSIG=, YSIG=, XYCOV=, DELTA=, NGAUSS=, /SILENT,
/METRO, MINITER= , MAXITER=
Inputs
X - THE OBSERVED INDEPENDENT VARIABLE. THIS SHOULD BE AN
NX-ELEMENT VECTOR.
Y - THE OBSERVED DEPENDENT VARIABLE. THIS SHOULD BE AN NX-ELEMENT
VECTOR.
Optional Inputs
XSIG - THE 1-SIGMA MEASUREMENT ERRORS IN X, AN NX-ELEMENT VECTOR.
YSIG - THE 1-SIGMA MEASUREMENT ERRORS IN Y, AN NX-ELEMENT VECTOR.
XYCOV - THE COVARIANCE BETWEEN THE MEASUREMENT ERRORS IN X AND Y,
AND NX-ELEMENT VECTOR.
DELTA - AN NX-ELEMENT VECTOR INDICATING WHETHER A DATA POINT IS
CENSORED OR NOT. IF DELTA[i] = 1, THEN THE SOURCE IS
DETECTED, ELSE IF DELTA[i] = 0 THE SOURCE IS NOT DETECTED
AND Y[i] SHOULD BE AN UPPER LIMIT ON Y[i]. NOTE THAT IF
THERE ARE CENSORED DATA POINTS, THEN THE
MAXIMUM-LIKELIHOOD ESTIMATE (THETA) IS NOT VALID. THE
DEFAULT IS TO ASSUME ALL DATA POINTS ARE DETECTED, IE,
DELTA = REPLICATE(1, NX).
METRO - IF METRO = 1, THEN THE MARKOV CHAINS WILL BE CREATED USING
THE METROPOLIS-HASTINGS ALGORITHM INSTEAD OF THE GIBBS
SAMPLER. THIS CAN HELP THE CHAINS CONVERGE WHEN THE SAMPLE
SIZE IS SMALL OR IF THE MEASUREMENT ERRORS DOMINATE THE
SCATTER IN X AND Y.
SILENT - SUPPRESS TEXT OUTPUT.
MINITER - MINIMUM NUMBER OF ITERATIONS PERFORMED BY THE GIBBS
SAMPLER OR METROPOLIS-HASTINGS ALGORITHM. IN GENERAL,
MINITER = 5000 SHOULD BE SUFFICIENT FOR CONVERGENCE. THE
DEFAULT IS MINITER = 5000. THE MCMC IS STOPPED AFTER
RHAT < 1.1 FOR ALL PARAMETERS OF INTEREST, AND THE
NUMBER OF ITERATIONS PERFORMED IS GREATER THAN MINITER.
MAXITER - THE MAXIMUM NUMBER OF ITERATIONS PERFORMED BY THE
MCMC. THE DEFAULT IS 1D5. THE MCMC IS STOPPED
AUTOMATICALLY AFTER MAXITER ITERATIONS.
NGAUSS - THE NUMBER OF GAUSSIANS TO USE IN THE MIXTURE
MODELLING. THE DEFAULT IS 3. IF NGAUSS = 1, THEN THE
PRIOR ON (MU, TAUSQR) IS ASSUMED TO BE UNIFORM.
Output
POST - A STRUCTURE CONTAINING THE RESULTS FROM THE MCMC. EACH
ELEMENT OF POST IS A DRAW FROM THE POSTERIOR DISTRIBUTION
FOR EACH OF THE PARAMETERS.
ALPHA - THE CONSTANT IN THE REGRESSION.
BETA - THE SLOPE OF THE REGRESSION.
SIGSQR - THE VARIANCE OF THE INTRINSIC SCATTER.
PI - THE GAUSSIAN WEIGHTS FOR THE MIXTURE MODEL.
MU - THE GAUSSIAN MEANS FOR THE MIXTURE MODEL.
TAUSQR - THE GAUSSIAN VARIANCES FOR THE MIXTURE MODEL.
MU0 - THE HYPERPARAMETER GIVING THE MEAN VALUE OF THE
GAUSSIAN PRIOR ON MU. ONLY INCLUDED IF NGAUSS >
1.
USQR - THE HYPERPARAMETER DESCRIBING FOR THE PRIOR
VARIANCE OF THE INDIVIDUAL GAUSSIAN CENTROIDS
ABOUT MU0. ONLY INCLUDED IF NGAUSS > 1.
WSQR - THE HYPERPARAMETER DESCRIBING THE `TYPICAL' SCALE
FOR THE PRIOR ON (TAUSQR,USQR). ONLY INCLUDED IF
NGAUSS > 1.
XIMEAN - THE MEAN OF THE DISTRIBUTION FOR THE
INDEPENDENT VARIABLE, XI.
XISIG - THE STANDARD DEVIATION OF THE DISTRIBUTION FOR
THE INDEPENDENT VARIABLE, XI.
CORR - THE LINEAR CORRELATION COEFFICIENT BETWEEN THE
DEPENDENT AND INDEPENDENT VARIABLES, XI AND ETA.
Called Routines
RANDOMCHI, MRANDOMN, RANDOMGAM, RANDOMDIR, MULTINOM
References
Carroll, R.J., Roeder, K., & Wasserman, L., 1999, Flexible
Parametric Measurement Error Models, Biometrics, 55, 44
Kelly, B.C., 2007, Some Aspects of Measurement Error in
Linear Regression of Astronomical Data, The Astrophysical
Journal, 665, 1489 (arXiv:0705.2774)
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B., 2004,
Bayesian Data Analysis, Chapman & Hall/CRC
Revision History
AUTHOR : BRANDON C. KELLY, STEWARD OBS., JULY 2006
- MODIFIED PRIOR ON MU0 TO BE UNIFORM OVER [MIN(X),MAX(X)] AND
PRIOR ON USQR TO BE UNIFORM OVER [0, 1.5 * VARIANCE(X)]. THIS
TENDS TO GIVE BETTER RESULTS WITH FEWER GAUSSIANS. (B.KELLY, MAY
2007)
- FIXED BUG SO THE ITERATION COUNT RESET AFTER THE BURNIN STAGE
WHEN SILENT = 1 (B. KELLY, JUNE 2009)
- FIXED BUG WHEN UPDATING MU VIA THE METROPOLIS-HASTING
UPDATE. PREVIOUS VERSIONS DID NO INDEX MUHAT, SO ONLY MUHAT[0]
WAS USED IN THE PROPOSAL DISTRIBUTION. THANKS TO AMY BENDER FOR
POINTING THIS OUT. (B. KELLY, DEC 2011)