[SciPy-User] Return sigmas from curve_fit

Wed Oct 17 12:12:36 EDT 2012

On Tue, Oct 16, 2012 at 7:55 PM, Matt Newville
<newville at cars.uchicago.edu>wrote:

> Hi Gökhan,
>
> On Tue, Oct 16, 2012 at 2:32 PM, Gökhan Sever <gokhansever at gmail.com>
> wrote:
> > Thanks Travis.
> >
> > That doesn't indeed, I missed the part that part of the curve_fit return
> was
> > variance of the estimate(s).
> >
> > I am comparing IDL's curvefit and Scipy's curve_fit, and got slightly
> > different results for the same data using the same fit function. I guess
> > IDL's result is slightly wrong when the default tol value is used (The
> > default value is 1.0 x  10-3.) comparing to the SciPy's default ftol of
> > 1.49012e-08.
>
> IDL's curvefit procedure is an implementation of the
> Levenberg-Marquardt algorithm in IDL, but not using or calling
> MINPACK-1 of Garbow, Hillstrom, and More.  I believe curvefit.pro is
> based on the implementation from Bevington's book, though it may have
> evolved over the years from that.  Scipy's leastsq() (and so
> curve_fit()) calls MINPACK-1, which is generally more stable and
> robust.  Thus, it is perfectly reasonable for there to be small
> differences in results even thought the two naively claim to be using
> the same algorithm.  Certainly having tolerances as high as 1.e-3 can
> also effect the results.
>
> The IDL mpfit.pro
> (http://cow.physics.wisc.edu/~craigm/idl/fitting.html) procedure is a
> bit closer to scipy.optimize.leastsq(), being a translation of MINPACK
> to IDL, and might be a better comparison.  The mpfit.pro adds a few
> bells and whistles (parameter bounds) which MINPACK-1 does not have,
> but ignoring this gives (in my experience) very similar results to
> MINPACK-1.  In general,  scipy.optimize.leastsq() will be faster, and
> also has the good fortune of not being IDL.
>
> Daπid warned that the estimated uncertainties from the covariance
> matrix (which is automatically returned from leastsq() and
> curve_fit()) assumes that the errors are normally distributed, and
> that this assumption is questionable.  This is equally true for all
> the implementations in question, so I doubt it would explain any
> differences you see.  I would also implore all to recognize that even
> if the estimated uncertainties from scipy.optimize.leastsq() are
> imperfect, they are far better than having none at all.  It is very
> easy for the armchair analyst to claim that errors are not normally
> distributed (especially when the problem at hand hasn't even been
> identified!), and a bit harder for the practicing analyst to show that
> the errors are significantly non-normal in practice.  Even when this
> claim is borne out to be true, it does not necessarily imply that the
> simple estimate of uncertainties is significantly wrong.  Rather, it
> implies that 1 statistic (stddev) is not the whole story.
>
> You may find lmfit-py (http://newville.github.com/lmfit-py/) useful.
> This is built on top of scipy.optimize.leastsq(), and add the ability
> to apply bounds, fix parameters, and place algebraic constraints
> between parameters (IMHO in a manner easier and more robust than IDL's
> mpfit.pro, and more python-ic).  It also provides functions to walk
> through the parameter space to more explicitly determine confidence
> intervals, and to test whether errors are non-normally distributed
> (see http://newville.github.com/lmfit-py/confidence.html).  The
> example there shows a case with clearly non-normal distribution of
> uncertainties, and a very skewed parameter space.  The automatically
> estimated uncertainties are off by 20% or less of the more carefully
> (and slowly) found values.  I would say that's a pretty good
> endorsement of the automatic estimate from the covariance matrix, but
> it's good to be able to check this out yourself even if only to show
> that the the automatic estimate is close enough and a more careful
> analysis doesn't change your conclusions.
>
> Hope that helps,
>
> --Matt Newville
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Thanks for the detailed input Matt.

Linked you can find the script that I use to estimate fit parameters by
applying a function in the form of N=Cs^k

Perhaps the function I use is not the best for to construct the fit, or
lack of data in lower supersaturation results with the curve kinking at
lower values. Other than this point, the best and sigma estimates that I
get from curve_fit is useful to me. I am not worrying too much about the
error distribution at this point, since most of the data points lie within
+- 1 sigmas.

If there is any interest I can provide a similar script written in IDL to
see the differences.

Thanks again.

http://atmos.uwyo.edu/~gsever/data/test/curvefit_test.py
http://atmos.uwyo.edu/~gsever/data/test/curvefit_test.png

-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20121017/6462c034/attachment.html>