[SciPy-User] How to estimate error in polynomial coefficients from scipy.polyfit?

Anne Archibald peridot.faceted at gmail.com
Thu Mar 25 18:21:31 EDT 2010


On 25 March 2010 17:20, Jeremy Conlin <jlconlin at gmail.com> wrote:
> On Thu, Mar 25, 2010 at 2:52 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Thu, Mar 25, 2010 at 2:32 PM, Jeremy Conlin <jlconlin at gmail.com> wrote:
>>>
>>> I am using scipy.polyfit to fit a curve to my data.  Members of this
>>> list have been integral in my understanding of how to use this
>>> function.  Now I would like to know how I can get the uncertainties
>>> (standard deviations) of polynomial coefficients from the returned
>>> values from scipy.polyfit.  If I understand correctly, the residuals
>>> are sometimes called the R^2 error, right?  That gives an estimate of
>>> how well we have fit the data.  I don't know how to use rank or any of
>>> the other returned values to get the uncertainties.
>>>
>>> Can someone please help?
>>
>> You want the covariance of the coefficients, (A.T * A)^-1/var, where A is
>> the design matrix. I'd have to see what the scipy fit returns to tell you
>> more. In anycase, from that you can plot curves at +/- sigma to show the
>> error bounds on the result. I can be more explicit if you want.
>>
>> Chuck
>
> Thanks Chuck.  That seems to get closer to what I need.  I am just
> fitting data to a polynomial, nothing too fancy.  I would like the
> variance (not the covariance) of the coeffficients.  As far as what is
> returned from scipy.polyfit this is what the documentation says:

Just an important warning: for polynomials, especially high degree
polynomials, the coefficients are an awful way to specify them. If
what you really need is the coefficients, then you're stuck with it,
but if what you need is to estimate interpolated values, there are
better approaches.

I mention this in particular because you talk about needing "variance
rather than covariance". When you do a linear fit of a
several-parameter model to data with normal errors, the uncertainty in
the fitted parameters is a multidimensional Gaussian - that is, the
(say) one-sigma error region is a multidimensional ellipsoid. If you
chose a good parameterization for your model, this ellipsoid will line
up nicely with the axes, and you can describe it by its size along
each axis: that is, you only need a variance in each parameter.

But if your parameterization is not good, then this ellipse will be
some tilted long skinny shape, and taking its size along one axis can
drastically underestimate its size. You also have the problem that
there are correlations in your parameters: if one is high, say, then
the other is also likely to be high. The covariance matrix captures
all this information along with the variances you asked for. As you
might expect, life is vastly simpler if the ellipsoid is aligned with
the axes, so that this matrix is nearly diagonal and you can just read
the variances off the diagonal.

Unfortunately, the parameterization of polynomials by their
coefficients is not "good" in this sense: you almost always get very
long skinny non-aligned error ellipses. An easy example is if you're
fitting positive x,y pairs with a straight line y=mx+b. Then if m is
too low, b is almost certainly too high, since the uncertainty pivots
the line around the middle of the data set. With linear data you can
help this by using y = m(x-x_middle)+b, but when you go to
higher-order polynomials it all becomes messier and you can't usually
cure the problems this easily.

I would say, look closely at your problem and think about whether you
really need the polynomial coefficients.

Anne

> residuals, rank, singular_values, rcond : present only if `full` = True
>        Residuals of the least-squares fit, the effective rank of the scaled
>        Vandermonde coefficient matrix, its singular values, and the specified
>        value of `rcond`. For more details, see `linalg.lstsq`.
>
> I don't think any of these things are "design matrix" as you have
> indicated I need.  The documentation for linalg.lstsq does not say
> what rcond is unfortunately.   Any ideas?
>
> Jeremy
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list