[SciPy-User] How to estimate error in polynomial coefficients from scipy.polyfit?

Thu Mar 25 18:45:56 EDT 2010

On 25 March 2010 18:40, Jeremy Conlin <jlconlin at gmail.com> wrote:
> On Thu, Mar 25, 2010 at 4:21 PM, Anne Archibald
> <peridot.faceted at gmail.com> wrote:
>> On 25 March 2010 17:20, Jeremy Conlin <jlconlin at gmail.com> wrote:
>>> On Thu, Mar 25, 2010 at 2:52 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>>>
>>>>
>>>> On Thu, Mar 25, 2010 at 2:32 PM, Jeremy Conlin <jlconlin at gmail.com> wrote:
>>>>>
>>>>> I am using scipy.polyfit to fit a curve to my data.  Members of this
>>>>> list have been integral in my understanding of how to use this
>>>>> function.  Now I would like to know how I can get the uncertainties
>>>>> (standard deviations) of polynomial coefficients from the returned
>>>>> values from scipy.polyfit.  If I understand correctly, the residuals
>>>>> are sometimes called the R^2 error, right?  That gives an estimate of
>>>>> how well we have fit the data.  I don't know how to use rank or any of
>>>>> the other returned values to get the uncertainties.
>>>>>
>>>>> Can someone please help?
>>>>
>>>> You want the covariance of the coefficients, (A.T * A)^-1/var, where A is
>>>> the design matrix. I'd have to see what the scipy fit returns to tell you
>>>> more. In anycase, from that you can plot curves at +/- sigma to show the
>>>> error bounds on the result. I can be more explicit if you want.
>>>>
>>>> Chuck
>>>
>>> Thanks Chuck.  That seems to get closer to what I need.  I am just
>>> fitting data to a polynomial, nothing too fancy.  I would like the
>>> variance (not the covariance) of the coeffficients.  As far as what is
>>> returned from scipy.polyfit this is what the documentation says:
>>
>> Just an important warning: for polynomials, especially high degree
>> polynomials, the coefficients are an awful way to specify them. If
>> what you really need is the coefficients, then you're stuck with it,
>> but if what you need is to estimate interpolated values, there are
>> better approaches.
>>
>> I mention this in particular because you talk about needing "variance
>> rather than covariance". When you do a linear fit of a
>> several-parameter model to data with normal errors, the uncertainty in
>> the fitted parameters is a multidimensional Gaussian - that is, the
>> (say) one-sigma error region is a multidimensional ellipsoid. If you
>> chose a good parameterization for your model, this ellipsoid will line
>> up nicely with the axes, and you can describe it by its size along
>> each axis: that is, you only need a variance in each parameter.
>>
>> But if your parameterization is not good, then this ellipse will be
>> some tilted long skinny shape, and taking its size along one axis can
>> drastically underestimate its size. You also have the problem that
>> there are correlations in your parameters: if one is high, say, then
>> the other is also likely to be high. The covariance matrix captures
>> all this information along with the variances you asked for. As you
>> might expect, life is vastly simpler if the ellipsoid is aligned with
>> the axes, so that this matrix is nearly diagonal and you can just read
>> the variances off the diagonal.
>>
>> Unfortunately, the parameterization of polynomials by their
>> coefficients is not "good" in this sense: you almost always get very
>> long skinny non-aligned error ellipses. An easy example is if you're
>> fitting positive x,y pairs with a straight line y=mx+b. Then if m is
>> too low, b is almost certainly too high, since the uncertainty pivots
>> the line around the middle of the data set. With linear data you can
>> help this by using y = m(x-x_middle)+b, but when you go to
>> higher-order polynomials it all becomes messier and you can't usually
>> cure the problems this easily.
>>
>> I would say, look closely at your problem and think about whether you
>> really need the polynomial coefficients.
>
> Yikes!  This sounds like it may be more trouble than it's worth.  I
> have a few sets of statistical data that each need to have curves fit
> to them.  I can see what the parameters are and they are similar
> between each set.  I want to know if the coefficients are within a few
> standard deviations of each other.  That's what I was hoping to get
> at, but it seems to be more trouble than I'm willing to accept.
> Unless anyone has a simpler solution.

You could try using the chebyshev polynomials; they're in the new
numpy, or you can just use the formulas (the degree-n one is cos(n
arccos(x)), and they're sensible on [0,1]). They're not perfect, but
they're a much better representation of polynomials that gets rid of
most of the problems I described above (the covariances tend to be
much less).

Anne

> Thanks,
> Jeremy
>>
>> Anne
>>
>>> residuals, rank, singular_values, rcond : present only if `full` = True
>>>        Residuals of the least-squares fit, the effective rank of the scaled
>>>        Vandermonde coefficient matrix, its singular values, and the specified
>>>        value of `rcond`. For more details, see `linalg.lstsq`.
>>>
>>> I don't think any of these things are "design matrix" as you have
>>> indicated I need.  The documentation for linalg.lstsq does not say
>>> what rcond is unfortunately.   Any ideas?
>>>
>>> Jeremy
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>