[SciPy-User] Revisit Unexpected covariance matrix from scipy.optimize.curve_fit

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Feb 22 13:43:30 EST 2013


On Fri, Feb 22, 2013 at 1:30 PM, Christoph Deil
<deil.christoph at googlemail.com> wrote:
> (I posted an hour ago, but my message apparently didn't get through, but
> also didn't bounce … trying again.)
>
> Hi Tom,
>
> I think I understood what scipy.optimize.curve_fit is doing thanks to
> Josef's comments in the previous thread you mentioned.
>
> It scales the covariance matrix (i.e. the inverse of the HESSE matrix of
> second derivatives of the chi2 fit statistic) from the fit by a factor.
> If you want to get the covariance matrix that e.g. sherpa
> (http://cxc.harvard.edu/sherpa/index.html) or MINUIT
> (https://github.com/iminuit/iminuit) would return and that e.g. physicists /
> astronomers expect, you can re-compute this factor and divide the scaled
> covariance matrix from curve_fit like this:
>
> # Define inputs: model, x, y, p0 and sigma
>>
> # Compute best-fit values and "scaled covariance matrix" with curve_fit
> popt, pcov = curve_fit(model, x, y, p0=p0, sigma=sigma)
>
> # Undo the scale factor to get the "real covariance matrix", which was
> automatically applied by curve_fit
> chi = (y - model(x, *popt)) / sigma
> chi2 = (chi ** 2).sum()
> dof = len(x) - len(popt)
> factor = (chi2 / dof)
> pcov /= factor
>
> (I haven't checked if this is equivalent to the code Eric gave above.)
>
> If I understand correctly, the motivation for multiplying pcov by this
> factor in curve_fit is that this was written by an economist, and there it

I have never seen it outside of economics either. I doubt you find it
in any (general) statistics package.

However, when curve_fit got introduced and until the first round of
this discussion started, I didn't know that there is any field that
does not standardize by actual noise sigma.

> is common to interpret the sigma not as errors on measurement points, but as
> relative weights between measurement points, with no meaning for the
> absolute scale of these weights.
> I think applying this scale factor to pcov is equivalent to re-scaling the
> sigmas to achieve a chi2 / dof of 1, which is a reasonable thing to do if
> you say the sigmas are only relative weights.
>
> Does this make sense?
>
> How about adding an option "scale_pcov" to curve_fit whether this scale
> factor should be applied.
> To keep backward compatibility the default would have to be scale_pcov=True.
> The advantage of this option would be that the issue is explained in the
> docstring and that people with "sigma=error" instead of "sigma=relative
> weight" can easily get what they want.
> Should I make a pull request?

I'm +1, both for clarification and for users who really have "sigma"
and not "weights"

(aside:
in statsmodels we also have generalized least squares which allows for
error correlation, in this case (capital) Sigma is the covariance
matrix of the error, but it is still scaled by the (lower case) sigma
which is estimated from the residuals as in curve_fit)

Josef

>
> Christoph
>
>
> On Feb 22, 2013, at 7:27 PM, Tom Aldcroft <aldcroft at head.cfa.harvard.edu>
> wrote:
>
> The 0.11 documentation on curve_fit says:
>
> sigma : None or N-length sequence
>  If not None, it represents the standard-deviation of ydata. This
> vector, if given, will be used as weights in the least-squares
> problem.
>
> It unambiguously states that sigma is the standard deviation of ydata,
> which is different from a relative weight.  That gives a clear
> implication that increasing the standard deviation of all the data
> points by some factor should change the parameter covariance.
>
> Can the doc string be changed to say "If not None, it represents the
> relative weighting of data points."  I would say that most astronomers
> and physicists are likely to be tripped up by this otherwise because
> "sigma" has such a well-understood meaning.
>
> - Tom
>
>
> On Fri, Feb 22, 2013 at 1:03 PM, Pierre Barbier de Reuille
> <pierre at barbierdereuille.net> wrote:
>
> I don't know about this result I must say, do you have a reference?
>
> But intuitively, perr shouldn't change when applying the same weight to all
> the values.
>
> --
> Barbier de Reuille Pierre
>
>
> On 22 February 2013 17:12, Moore, Eric (NIH/NIDDK) [F] <eric.moore2 at nih.gov>
> wrote:
>
>
> -----Original Message-----
> From: Tom Aldcroft [mailto:aldcroft at head.cfa.harvard.edu]
> Sent: Friday, February 22, 2013 10:42 AM
> To: SciPy Users List
> Subject: [SciPy-User] Revisit Unexpected covariance matrix from
> scipy.optimize.curve_fit
>
> In Aug 2011 there was a thread [Unexpected covariance matrix from
> scipy.optimize.curve_fit](http://mail.scipy.org/pipermail/scipy-
> user/2011-August/030412.html)
> where Christoph Deil reported that "scipy.optimize.curve_fit returns
> parameter errors that don't scale with sigma, the standard deviation
> of ydata, as I expected."  Today I independently came to the same
> conclusion.
>
> This thread generated some discussion but seemingly no agreement that
> the covariance output of `curve_fit` is not what would be expected.  I
> think the discussion wasn't as focused as possible because the example
> was too complicated.  With that I provide here about the simplest
> possible example, which is fitting a constant to a constant dataset,
> aka computing the mean and error on the mean.  Since we know the
> answers we can compare the output of `curve_fit`.
>
> To illustrate things more easily I put the examples into an IPython
> notebook which is available at:
>
>  http://nbviewer.ipython.org/5014170/
>
> This was run using scipy 0.11.0 by the way.  Any further discussion on
> this topic to come to an understanding of the covariance output from
> `curve_fit` would be appreciated.
>
> Thanks,
> Tom
> _______________________________________________
>
>
> chi2 = np.sum(((yn-const(x, *popt))/sigma)**2)
> perr = np.sqrt(np.diag(pcov)/(chi2/(x.shape[0]-1)))
>
> Perr is then the actual error in the fit parameter. No?
>
> -Eric
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list