[SciPy-user] linear regression

Robert Kern robert.kern at gmail.com
Wed May 27 15:37:14 EDT 2009


On Wed, May 27, 2009 at 14:22,  <josef.pktd at gmail.com> wrote:
> On Wed, May 27, 2009 at 3:03 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Wed, May 27, 2009 at 13:28,  <josef.pktd at gmail.com> wrote:
>>> On Wed, May 27, 2009 at 12:35 PM, ms <devicerandom at gmail.com> wrote:
>>>> josef.pktd at gmail.com ha scritto:
>>>>>> Have a look here <http://www.scipy.org/Cookbook/LinearRegression>
>>>>>
>>>>> y = Beta0 + Beta1 * x + Beta2 * x**2   is the second order polynomial.
>>>>>
>>>>> I also should have looked, polyfit returns the polynomial coefficients
>>>>> but doesn't calculate the variance-covariance matrix or standard
>>>>> errors of the OLS estimate.
>>>>
>>>> AFAIK, the ODR fitting routines return all these parameters, so one can
>>>> maybe use that for linear fitting too.
>>>
>>> you mean scipy.odr?
>>>
>>> I never looked at it in details. Conceptionally it is very similar to
>>> standard regression, but I've never seen an application for it, nor do
>>> I know the probability theoretic or econometric background of it.
>>
>> ODR is nonlinear least-squares with errors in both variables (e.g.
>> minimizing the weighted sum of squared distances from each point to
>> the corresponding closest points on the curve rather than "straight
>> down" as in OLS). scipy.odr implements both ODR and OLS. It also
>> implements implicit regression, where the relationship between
>> variables is not expressed as "y=f(x)" but "f(x,y)=0" such as fitting
>> an ellipse.
>>
>>> The
>>> results for many cases will be relatively close to standard least
>>> squares.
>>> A google search shows links to curve fitting but not to any
>>> econometric theory. On the other hand, there is a very large
>>> literature on how to treat measurement errors and endogeneity of
>>> regressors for (standard) least squares and maximum likelihood.
>>
>> The extension is straightforward. ODR is really just a generalization
>> of least-squares. Unfortunately, the links to the relevant papers seem
>> to have died. I've put them up here:
>>
>> http://www.mechanicalkern.com/static/odr_vcv.pdf
>> http://www.mechanicalkern.com/static/odr_ams.pdf
>> http://www.mechanicalkern.com/static/odrpack_guide.pdf
>>
>
> Thanks for the links, I finally also found out that in Wikipedia it is
> under "Total Regression". Under "Errors-in-Variables model" it says
>
> "
> Error-in-variables models can be estimated in several different ways.
> Besides those outlined here, see:
>        * total least squares for a method of fitting which does not
> arise from a statistical model;
> "
>
> >From a brief reading, I think that the main limitation is that it
> doesn't allow you to explicitly model the joint error structure. I
> looks like, this will be implicitly done by the scaling factors and
> other function parameters. But this is just my first impression.

For "y=f(x)" models, this is true. Both y and x can be multivariate,
and you can express the covariance of the uncertainties for each, but
not covariance between the y and x uncertainties. This is because of
the numerical tricks used for efficient implementation. However,
"f(x)=0" models can express covariances between all dimensions of x.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the SciPy-User mailing list