[SciPy-user] incorrect variance in optimize.curvefit and leastsq

Fri Feb 13 11:21:41 EST 2009

> I think for the weighted least squares problem the weights should go
> into the SSE calculation. I didn't find directly the reference, but I
> am somewhat confident that this is correct, from the analogy to the
> transformed
> model ynew = y*weight where weight_i = 1/sigma_i in the linear case.
> But it's too late today to try to check this.
>
> SSE = np.sum((err*weight)**2)
>

I looked at the weighted function some more:

Since the error calculation for the s_sq uses `func =
_weighted_general_function`
the above weighting for the SSE is automatically done. But then, in
this case there is no renormalization
necessary when calculation s_sq. The special casing of the weighted
function should be dropped (the commented out part below)

    if (len(ydata) > len(p0)) and pcov is not None:
        s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
##        if sigma is not None:
##            s_sq /= (args[-1]**2).sum()
        pcov = pcov * s_sq
    else:
        pcov = inf

Below is the test for  the weighted case, rewritten from the
unweighted NIST example.

Note: I multiplied y and and f(x) by the weight, which then is
corrected again by the weighted least squares, and so I get the
original NIST estimates back. This test passes with the above
correction but fails in the current version, r5551.

Josef
-----------------------

def test_curvefit_weights():
    '''test against NIST certified case at
    http://www.itl.nist.gov/div898/strd/nls/data/misra1b.shtml
    add weights to test curvefit with weights'''

    data =  array([[  10.07,   77.6 ],
                   [  14.73,  114.9 ],
                   [  17.94,  141.1 ],
                   [  23.93,  190.8 ],
                   [  29.61,  239.9 ],
                   [  35.18,  289.  ],
                   [  40.02,  332.8 ],
                   [  44.82,  378.4 ],
                   [  50.76,  434.8 ],
                   [  55.05,  477.3 ],
                   [  61.01,  536.8 ],
                   [  66.4 ,  593.1 ],
                   [  75.47,  689.1 ],
                   [  81.78,  760.  ]])
    y,x = data[:,0],data[:,1]

    pstd_c = [3.1643950207E+00, 4.2547321834E-06]
    popt_c = [3.3799746163E+02, 3.9039091287E-04]
    SSE_c = 7.5464681533E-02
    Rstd_c = 7.9301471998E-02

    decimal = 5 #accuracy of parameter estimate and standard deviation

    w = 1.0 + np.arange(len(x))  # 1/weights  = sigma
    print w
    def funct(x, b1, b2):
        #weighted function for y
        return w * b1 * (1-(1+b2*x/2)**(-2))
    y = w * y #weighted y
    start1 = [500, 0.0001]
    start2 = [300, 0.0002]
    for start in [start1, start2]:
        popt, pcov = curve_fit(funct, x, y, p0=start, sigma=w)
        pstd = np.sqrt(np.diag(pcov))
        assert_almost_equal(popt, popt_c, decimal=decimal)
        assert_almost_equal(pstd, pstd_c, decimal=decimal)

---------------------------