[Numpy-discussion] Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit

Tue May 29 16:54:52 EDT 2018

On Tue, May 29, 2018 at 2:21 PM, Jonathan Tammo Siebert <
jotasi_numpy_scipy at posteo.de> wrote:

> On Tue, 2018-05-29 at 10:47 -0400, josef.pktd at gmail.com wrote:
> > On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert <
> > jotasi_numpy_scipy at posteo.de> wrote:
> >
> > > Hi,
> > >
> > > I hope this is the appropriate place to ask something like
> > > this, otherwise please let me know (or feel free to ignore
> > > this). Also I hope that I do not misunderstood something or
> > > did some silly mistake. If so, please let me know as well!
> > >
> > > TLDR:
> > > When scaling the covariance matrix based on the residuals,
> > > scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N)
> > > (with M=number of point, N=number of parameters) and
> > > numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which
> > > is correct.
> > >
> > > I am somewhat confused about different results I am getting
> > > for the covariance matrix of a simple linear fit, when
> > > comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I
> > > am aware, that `curve_fit` solves the more general non-linear
> > > problem numerically, while `polyfit` finds an analytical
> > > solution to the linear problem. However, both converge to the
> > > same solution, so I suspect that this difference is not
> > > important here. The difference, I am curious about is not in
> > > the returned parameters but in the estimate of the
> > > corresponding covariance matrix. As I understand, there are
> > > two different ways to estimate it, based either on the
> > > absolute values of the provided uncertainties or by
> > > interpreting those only as weights and then scaling the
> > > matrix to produce an appropriate reduced chisq. To that end,
> > > curve_fit has the parameter:
> > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize
> > > .cur
> > > ve_fit.html:
> > > "absolute_sigma : bool, optional
> > > If True, sigma is used in an absolute sense and the
> > > estimated parameter covariance pcov reflects these absolute
> > > values.
> > > If False, only the relative magnitudes of the sigma values
> > > matter. The returned parameter covariance matrix pcov is
> > > based on scaling sigma by a constant factor. This constant
> > > is set by demanding that the reduced chisq for the optimal
> > > parameters popt when using the scaled sigma equals unity. In
> > > other words, sigma is scaled to match the sample variance of
> > > the residuals after the fit. Mathematically,
> > > pcov(absolute_sigma=False) = pcov(absolute_sigma=True) *
> > > chisq(popt)/(M-N)"
> > > https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.
> > > html
> > > on the other hand, does not say anything about how the
> > > covariance matrix is estimated. To my understanding, its
> > > default should correspond to `absolute_sigma=False` for
> > > `curve_fit`. As `polyfit` has a weight parameter instead of
> > > an uncertainty parameter, I guess the difference in default
> > > behavior is not that surprising.
> > > However, even when specifying `absolute_sigma=False`,
> > > `curve_fit` and `polyfit` produce different covariance
> > > matrices as the applied scaling factors are
> > > chisq(popt)/(M-N-2) for `polyfit`
> > > (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2
> > > ec82
> > > 21348b/numpy/lib/polynomial.py#L598-L605)
> > > and chisq(popt)/(M-N) for `curve_fit`
> > > (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994
> > > 137a
> > > 3ccd5b/scipy/optimize/minpack.py#L781-L782).
> > > The argument given in a comment to the scaling `polyfit` is:
> > > "Some literature ignores the extra -2.0 factor in the
> > > denominator, but it is included here because the covariance
> > > of Multivariate Student-T (which is implied by a Bayesian
> > > uncertainty analysis) includes it. Plus, it gives a slightly
> > > more conservative estimate of uncertainty.",
> > > but honestly, in a quick search, I was not able to find any
> > > literature not ignoring the extra "factor". But obviously, I
> > > could very well be misunderstanding something.
> > > Nonetheless, as `curve_fit` ignores it as well, I was
> > > wondering whether those two shouldn't give consistent
> > > results and if so, which would be the correct solution.
> > >
> >
> >
> > I've never seen the -2 in any literature, and there is no reference
> > in the
> > code comment.
> > (I would remove it as a bug-fix. Even if there is some Bayesian
> > interpretation, it is not what users would expect.)
>
> That would be my preferred fix as well. If there aren't any objections,
> I'll open a corresponding issue and PR.
>
> > There was a similar thread in 2013
> > https://mail.scipy.org/pipermail/numpy-discussion/2013-
> > February/065664.html
>
> Thanks for the link. I must've somehow missed that earlier discussion.
> Would it be appropriate to also add an additional parameter along the
> lines of curve_fit's `absolute_sigma` with default `False` to keep it
> consistent? I already felt that something like this was missing for
> cases where proper standard errors are known for the data points and it
> was apparently already discussed in 2013. As far as I can see, the main
> reason against that is the fact that `polyfit` accepts `w`
> (weights->`1/sigma`) as a parameter and not `sigma`, which would make
> the documentation somewhat less intuitive than in the case of
> `curve_fit`.
>

It would work with "absolute_weights"

After the long discussions for scaling in curve_fit, I think it's fine to
add it.

asides:

I still don't really understand why users would want it for the covariance
of the parameter estimates.
However, I also added an option to statsmodels OLS and WLS to keep the
scale fixed instead of using the estimated scale.

There is a reason that polyfit might have different, larger standard errors
than curve_fit, if we assume that curve_fit has a correctly specified mean
function and polyfit is just a low order approximation to an arbitrary
non-linear function. That is polyfit combines a functional approximation
error with the stochastic error from the random observations. (which is
also ignore if scale is fixed.)
(But I never tried to figure out those non- or semi-parametric
approximation details.)

aside further away: In Poisson regression, we go the opposite way and use
an estimated residual scale instead of a fixed scale = 1 so we can correct
the standard errors for the parameters when there is over-dispersion. (I
just ran a Monte Carlo example where a hypothesis test under Poisson
assumption is very wrong because of the over dispersion. I.e. if the chi2
is far away from 1, then the standard errors for the parameters are
"useless" if scale is assumed to be 1.)

Josef

>
>
> Jonathan
>
> >
> > Josef
> >
> >
> > >
> > >
> > > Best,
> > >
> > > Jonathan
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180529/e80253fe/attachment.html>