[Numpy-discussion] Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit

Tue May 29 09:14:29 EDT 2018

Hi,

I hope this is the appropriate place to ask something like
this, otherwise please let me know (or feel free to ignore
this). Also I hope that I do not misunderstood something or
did some silly mistake. If so, please let me know as well!

TLDR:
When scaling the covariance matrix based on the residuals,
scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N)
(with M=number of point, N=number of parameters) and
numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which
is correct.

I am somewhat confused about different results I am getting
for the covariance matrix of a simple linear fit, when
comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I
am aware, that `curve_fit` solves the more general non-linear
problem numerically, while `polyfit` finds an analytical
solution to the linear problem. However, both converge to the
same solution, so I suspect that this difference is not
important here. The difference, I am curious about is not in
the returned parameters but in the estimate of the
corresponding covariance matrix. As I understand, there are
two different ways to estimate it, based either on the
absolute values of the provided uncertainties or by
interpreting those only as weights and then scaling the
matrix to produce an appropriate reduced chisq. To that end,
curve_fit has the parameter:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur
ve_fit.html:
"absolute_sigma : bool, optional
If True, sigma is used in an absolute sense and the
estimated parameter covariance pcov reflects these absolute
values.
If False, only the relative magnitudes of the sigma values
matter. The returned parameter covariance matrix pcov is
based on scaling sigma by a constant factor. This constant
is set by demanding that the reduced chisq for the optimal
parameters popt when using the scaled sigma equals unity. In
other words, sigma is scaled to match the sample variance of
the residuals after the fit. Mathematically,
pcov(absolute_sigma=False) = pcov(absolute_sigma=True) *
chisq(popt)/(M-N)"
https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html
on the other hand, does not say anything about how the
covariance matrix is estimated. To my understanding, its
default should correspond to `absolute_sigma=False` for
`curve_fit`. As `polyfit` has a weight parameter instead of
an uncertainty parameter, I guess the difference in default
behavior is not that surprising.
However, even when specifying `absolute_sigma=False`,
`curve_fit` and `polyfit` produce different covariance
matrices as the applied scaling factors are
chisq(popt)/(M-N-2) for `polyfit`
(https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82
21348b/numpy/lib/polynomial.py#L598-L605)
and chisq(popt)/(M-N) for `curve_fit`
(https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a
3ccd5b/scipy/optimize/minpack.py#L781-L782).
The argument given in a comment to the scaling `polyfit` is:
"Some literature ignores the extra -2.0 factor in the
denominator, but it is included here because the covariance
of Multivariate Student-T (which is implied by a Bayesian
uncertainty analysis) includes it. Plus, it gives a slightly
more conservative estimate of uncertainty.",
but honestly, in a quick search, I was not able to find any
literature not ignoring the extra "factor". But obviously, I
could very well be misunderstanding something.
Nonetheless, as `curve_fit` ignores it as well, I was
wondering whether those two shouldn't give consistent
results and if so, which would be the correct solution.

Best, 

Jonathan