MPFITPEAK Name
MPFITPEAK
Author
Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
craigm@lheamail.gsfc.nasa.gov
UPDATED VERSIONs can be found on my WEB PAGE:
http://cow.physics.wisc.edu/~craigm/idl/idl.html
Purpose
Fit a gaussian, lorentzian or Moffat model to data
Major Topics
Curve and Surface Fitting
Calling Sequence
yfit = MPFITPEAK(X, Y, A, NTERMS=nterms, ...)
Description
MPFITPEAK fits a gaussian, lorentzian or Moffat model using the
non-linear least squares fitter MPFIT. MPFITPEAK is meant to be a
drop-in replacement for IDL's GAUSSFIT function (and requires
MPFIT and MPFITFUN).
The choice of the fitting function is determined by the keywords
GAUSSIAN, LORENTZIAN and MOFFAT. By default the gaussian model
function is used. [ The Moffat function is a modified Lorentzian
with variable power law index. (Moffat, A. F. J. 1969, Astronomy &
Astrophysics, v. 3, p. 455-461) ]
The functional form of the baseline is determined by NTERMS and
the function to be fitted. NTERMS represents the total number of
parameters, A, to be fitted. The functional forms and the
meanings of the parameters are described in this table:
GAUSSIAN# Lorentzian# Moffat#
Model A[0]*exp(-0.5*u^2) A[0]/(u^2 + 1) A[0]/(u^2 + 1)^A[3]
A[0] Peak Value Peak Value Peak Value
A[1] Peak Centroid Peak Centroid Peak Centroid
A[2] Gaussian Sigma HWHM% HWHM%
A[3] + A[3] * + A[3] * Moffat Index
A[4] + A[4]*x * + A[4]*x * + A[4] *
A[5] + A[5]*x *
Notes: # u = (x - A[1])/A[2]
% Half-width at half maximum
* Optional depending on NTERMS
By default the initial starting values for the parameters A are
estimated from the data. However, explicit starting values can be
supplied using the ESTIMATES keyword. Also, error or weighting
values can optionally be provided; otherwise the fit is
unweighted.
MPFITPEAK fits the peak value of the curve. The area under a
gaussian peak is A[0]*A[2]*SQRT(2*!DPI); the area under a
lorentzian peak is A[0]*A[2]*!DPI.
Data values of NaN or Infinity for "Y", "ERROR" or "WEIGHTS" will
be ignored as missing data if the NAN keyword is set. Otherwise,
they may cause the fitting loop to halt with an error message.
Note that the fit will still halt if the model function, or its
derivatives, produces infinite or NaN values, or if an "X" value is
missing. Restrictions
If no starting parameter ESTIMATES are provided, then MPFITPEAK
attempts to estimate them from the data. This is not a perfect
science; however, the author believes that the technique
implemented here is more robust than the one used in IDL's
GAUSSFIT. The author has tested cases of strong peaks, noisy
peaks and broad peaks, all with success.
Users should be aware that if the baseline term contains a strong
linear component then the automatic estimation may fail. For
automatic estimation to work the peak amplitude should dominate
over the the maximum baseline.
Compatibility
This function is designed to work with IDL 5.0 or greater.
Because TIED parameters rely on the EXECUTE() function, they cannot
be used with the free version of the IDL Virtual Machine.
Inputs
X - Array of independent variable values, whose values should
monotonically increase.
Y - Array of "measured" dependent variable values. Y should have
the same data type and dimension as X.
NOTE: the following special cases apply:
* if Y is NaN or Infinite, and the NAN keyword is
set, then the corresponding data point is ignored
Outputs
A - Upon return, an array of NTERMS best fit parameter values.
See the table above for the meanings of each parameter
element.
Returns
Returns the best fitting model function.
Keywords
** NOTE ** Additional keywords such as PARINFO, BESTNORM, and
STATUS are accepted by MPFITPEAK but not documented
here. Please see the documentation for MPFIT for the
description of these advanced options.
AUTODERIV - Set to 1 to have MPFIT compute the derivatives numerically.
Default is 0 - derivatives are computed analytically, which is
generally faster. (Prior to Jan 2011, the default was 1)
CHISQ - the value of the summed squared residuals for the
returned parameter values.
DOF - number of degrees of freedom, computed as
DOF = N_ELEMENTS(DEVIATES) - NFREE
Note that this doesn't account for pegged parameters (see
NPEGGED).
ERROR - upon input, the measured 1-sigma uncertainties in the "Y"
values. If no ERROR or WEIGHTS are given, then the fit is
unweighted.
NOTE: the following special cases apply:
* if ERROR is zero, then the corresponding data point
is ignored
* if ERROR is NaN or Infinite, and the NAN keyword is
set, then the corresponding data point is ignored
* if ERROR is negative, then the absolute value of
ERROR is used.
ESTIMATES - Array of starting values for each parameter of the
model. The number of parameters should at least be
three (four for Moffat), and if less than NTERMS, will
be extended with zeroes. If ESTIMATES is not set,
then the starting values are estimated from the data
directly, before fitting. (This also means that
PARINFO.VALUES is ignored.)
Default: not set - parameter values are estimated from data.
GAUSSIAN - if set, fit a gaussian model function. The Default.
LORENTZIAN - if set, fit a lorentzian model function.
MOFFAT - if set, fit a Moffat model function.
MEASURE_ERRORS - synonym for ERRORS, for consistency with built-in
IDL fitting routines.
NAN - ignore infinite or NaN values in the Y, ERR or WEIGHTS
parameters. These values will be treated as missing data.
However, the fit will still halt with an error condition if
the model function becomes infinite, or if X has missing
values.
NEGATIVE / POSITIVE - if set, and ESTIMATES is not provided, then
MPFITPEAK will assume that a
negative/positive peak is present.
Default: determined automatically
NFREE - the number of free parameters in the fit. This includes
parameters which are not FIXED and not TIED, but it does
include parameters which are pegged at LIMITS.
NO_FIT - if set, then return only the initial estimates without
fitting. Useful to find out what the estimates the
automatic guessing algorithm produced. If NO_FIT is set,
then SIGMA and CHISQ values are not produced. The
routine returns, NAN, and STATUS=5.
NTERMS - An integer describing the number of fitting terms.
NTERMS must have a minimum value, but can optionally be
larger depending on the desired baseline.
For gaussian and lorentzian models, NTERMS must be three
(zero baseline), four (constant baseline) or five (linear
baseline). Default: 4
For the Moffat model, NTERMS must be four (zero
baseline), five (constant baseline), or six (linear
baseline). Default: 5
PERROR - upon return, the 1-sigma uncertainties of the parameter
values A. These values are only meaningful if the ERRORS
or WEIGHTS keywords are specified properly.
If the fit is unweighted (i.e. no errors were given, or
the weights were uniformly set to unity), then PERROR
will probably not represent the true parameter
uncertainties.
*If* you can assume that the true reduced chi-squared
value is unity -- meaning that the fit is implicitly
assumed to be of good quality -- then the estimated
parameter uncertainties can be computed by scaling PERROR
by the measured chi-squared value.
DOF = N_ELEMENTS(X) - N_ELEMENTS(PARMS) ; deg of freedom
PCERROR = PERROR * SQRT(BESTNORM / DOF) ; scaled uncertainties
QUIET - if set then diagnostic fitting messages are suppressed.
Default: QUIET=1 (i.e., no diagnostics)
SIGMA - synonym for PERROR (1-sigma parameter uncertainties), for
compatibility with GAUSSFIT. Do not confuse this with the
Gaussian "sigma" width parameter.
WEIGHTS - Array of weights to be used in calculating the
chi-squared value. If WEIGHTS is specified then the ERROR
keyword is ignored. The chi-squared value is computed
as follows:
CHISQ = TOTAL( (Y-MYFUNCT(X,P))^2 * ABS(WEIGHTS) )
Here are common values of WEIGHTS:
1D/ERR^2 - Normal weighting (ERR is the measurement error)
1D/Y - Poisson weighting (counting statistics)
1D - Unweighted
The ERROR keyword takes precedence over any WEIGHTS
keyword values. If no ERROR or WEIGHTS are given, then
the fit is unweighted.
NOTE: the following special cases apply:
* if WEIGHTS is zero, then the corresponding data point
is ignored
* if WEIGHTS is NaN or Infinite, and the NAN keyword is
set, then the corresponding data point is ignored
* if WEIGHTS is negative, then the absolute value of
WEIGHTS is used.
YERROR - upon return, the root-mean-square variance of the
residuals.
Example
; First, generate some synthetic data
npts = 200
x = dindgen(npts) * 0.1 - 10. ; Independent variable
yi = gauss1(x, [2.2D, 1.4, 3000.]) + 1000 ; "Ideal" Y variable
y = yi + randomn(seed, npts) * sqrt(1000. + yi); Measured, w/ noise
sy = sqrt(1000.D + y) ; Poisson errors
; Now fit a Gaussian to see how well we can recover the original
yfit = mpfitpeak(x, y, a, error=sy)
print, p
Generates a synthetic data set with a Gaussian peak, and Poisson
statistical uncertainty. Then the same function is fitted to the
data. References
MINPACK-1, Jorge More', available from netlib (www.netlib.org).
"Optimization Software Guide," Jorge More' and Stephen Wright,
SIAM, *Frontiers in Applied Mathematics*, Number 14.
Modification History
New algorithm for estimating starting values, CM, 31 Oct 1999
Documented, 02 Nov 1999
Small documentation fixes, 02 Nov 1999
Slight correction to calculation of dx, CM, 02 Nov 1999
Documented PERROR for unweighted fits, 03 Nov 1999, CM
Copying permission terms have been liberalized, 26 Mar 2000, CM
Change requirements on # elements in X and Y, 20 Jul 2000, CM
(thanks to David Schlegel <schlegel@astro.princeton.edu>)
Added documentation on area under curve, 29 Aug 2000, CM
Added POSITIVE and NEGATIVE keywords, 17 Nov 2000, CM
Added reference to Moffat paper, 10 Jan 2001, CM
Added usage message, 26 Jul 2001, CM
Documentation clarification, 05 Sep 2001, CM
Make more consistent with comparable IDL routines, 30 Jun 2003, CM
Assumption of sorted data was removed, CM, 06 Sep 2003, CM
Add some defensive code against divide by zero, 30 Nov 2005, CM
Add some defensive code against all Y values equal to each other,
17 Apr 2005, CM
Convert to IDL 5 array syntax (!), 16 Jul 2006, CM
Move STRICTARR compile option inside each function/procedure, 9 Oct 2006
Add COMPATIBILITY section, CM, 13 Dec 2007
Missed some old IDL 4 () array syntax, now corrected, 13 Jun 2008
Slightly more error checking for pathalogical case, CM, 11 Nov 2008
Clarify documentation regarding what happens when ESTIMATES is not
set, CM, 14 Dec 2008
Add the NAN keyword, document how NAN, WEIGHTS and ERROR interact,
CM, 30 Mar 2009
Correct one case of old IDL 4 () array syntax (thanks to I. Urra),
CM, 25 Jan 2010
Improve performance by analytic derivative computation, added AUTODERIV
keyword, W. Landsman, 2011-01-21
Move estimation code to its own function; allow the user to compute
only the estimate and return immediately without fitting,
C. Markwardt, 2011-07-12