[SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit

Thu Sep 1 21:31:22 EDT 2011

On Thu, Sep 1, 2011 at 5:29 PM, Christoph Deil
<deil.christoph at googlemail.com> wrote:
>
> On Sep 1, 2011, at 9:45 PM, Matt Newville wrote:
>
>> Christoph,
>>
>> I ran some tests with the scaling of the covariance matrix from
>> scipy.optimize.leastsq.  Using lmfit-py, and scipy.optimize.leastsq, I
>> did fits with sample function of a Gaussian + Line + Simulated Noise
>> (random.normal(scale=0.023).  The "data" had 201 points, and the fit
>> had 5 variables.  I ran fits with covariance scaling on and off, and
>> "data sigma" = 0.1, 0.2, and 0.5.  The full results are below, and the
>> code to run this is
>>
>>   https://github.com/newville/lmfit-py/blob/master/tests/test_covar.py
>>
>> You'll need the latest git version of lmfit-py for this to be able to
>> turn on/off the covariance scaling.
>>
>> As you can see from the results below, by scaling the covariance, the
>> estimated uncertainties in the parameters are independent of sigma,
>> the estimated uncertainty in the data.  In many cases this is a
>> fabulously convenient feature.
>>
>> As expected, when the covariance matrix is not scaled by sum_sqr /
>> nfree, the estimated uncertainties in the variables depends greatly on
>> the value of sigma.  For the "correct" sigma of 0.23 in this one test
>> case, the scaled and unscaled values are very close:
>>
>>   amp_g = 20.99201 +/- 0.05953 (unscaled) vs +/- 0.05423 (scaled) [True=21.0]
>>   cen_g =  8.09857 +/- 0.00435 (unscaled) vs +/- 0.00396 (scaled) [True= 8.1]
>>
>> and so on.  The scaled uncertainties appear to be about 10% larger
>> than the unscaled ones.  Since this was just one set of data (ie, one
>> set of simulated noise), I'm not sure whether this difference is
>> significant or important to you.  Interestingly, these two cases are
>> in better agreement than comparing sigma=0.20 and sigma=0.23 for the
>> unscaled covariance matrix.
>>
>> For myself, I much prefer having estimated uncertainties that may be
>> off by 10% than being expected to know the uncertainties in the data
>> to 10%.  But then, I work in a field where we have lots of data and
>> systematic errors in collection and processing swamp any simple
>> estimate of the noise in the data.
>>
>> As a result, I think the scaling should stay the way it is.
>
> I think there are use cases for scaling and for not scaling.
> Adding an option to scipy.optimize.curve_fit, as you did for lmfit is a nice solution.
> Returning the scaling factor s = chi2 / ndf (or chi2 and ndf independently) would be another option to let the user decide what she wants.
>
> The numbers you give in your example are small because your chi2 / ndf is approximately one, so your scaling factor is approximately one.
> If the model doesn't represent the data well, then chi2 / ndf is larger than one and the differences in estimated parameter errors become larger.
>
> IMO if the user does give sigmas to curve_fit, it means that she has reason to believe that these are the errors on the data points
> and thus the default should be to not apply the scaling factor in that case.
> On the other hand at the moment the scaling factor is always applied, so having a keyword option
> scale_covariance=True as default means backwards compatibility.

I think it depends on the expectation of the user, and whether the
options and descriptions are confusing.

>From what I have seen, I would always expect that the estimator also
estimates the error variance. Given noise, functional approximation
error and left out variables (environment), I have never seen a case
that would specify the error variance in advance, (although there
might be different options to estimate the scale as in robust models,
and there are two stage estimators.)

Before curve_fit there was quite a bit of confusion on the mailing
list, if I remember correctly, whether cov_x of leastsq has to be
scaled or not. If there are no weights/sigma, then cov_x is not the
covariance matrix of the parameter estimate (unless y is scaled such
that the error variance is 1).

Will it confuse many users of curve_fit if they have to figure out
whether they need scaled or not? Also, if the default treatment
differs depending on whether sigma is specified or not, it might be
confusing.

The terminology for "sigma" might be a bit confusing. In statsmodels
the similar model uses "weights", and weighted least squares (WLS) has
at least in my textbooks a clear definition, in contrast to
"curve_fit".

sigma might indicate already scaled variances, but I had interpreted
it just as 1./weights for WLS.
In statsmodels, we use sigma for the unscaled covariance matrix of the
entire vector of errors in generalized least squares, GLS. Textbook
often use something like cov_errors = \sigma * \SIGMA . sigma in
statsmodels GLS is just used to transform the variables to standard
OLS.

I'm in favor of keeping the current scaled cov_params as default.
curve_fit produces now the same results as minuit except for numerical
problems.

Additional options are fine as long as they are clearly explained.

Josef
(not a user of curve_fit but of leastsq)

>
>>
>> Cheers,
>>
>> --Matt Newville <newville at cars.uchicago.edu> 630-252-0431