Confused about np.polyfit()

Sun Jul 19 12:39:03 EDT 2020

On 19/07/2020 16:11, Dino wrote:
> On 7/19/2020 4:54 PM, duncan smith wrote:
>>
>> It depends on what you expect the result to be. There's nothing
>> inherently wrong with transforming variables before using least squares
>> fitting. Whether it gives you the "best" estimates for the coefficients
>> is a different issue.
> 
> Thanks a lot for this, Duncan. I guess I have to follow-up questions at
> this point:
> 
> 1) in which ways is this approach sub-optimal?
> 
> 2) what's the "right" way to do it?
> 
> Thank you
> 

You'll have to read up a bit on ordinary least squares (e.g.
https://en.wikipedia.org/wiki/Ordinary_least_squares). It is based on
assumptions that might not necessarily hold for a given model / dataset.
Depending on which assumptions are violated the estimates can be
affected in different ways. It is usual to fit the model, then check the
residuals to see if the assumptions (approximately) hold. If not, it
might indicate a poor model fit or suggest fitting a transformed model
(to estimate the same coefficients while satisfying the assumptions).
e.g. For the latter case,

Y = a + bX

has the same coefficients as

Y/X = a * 1/X + b

but the latter regression might satisfy the assumption of constant
variance for the errors.

Regression analysis is a bit of an art, and it's a long time since I did
any. Ordinary least squares is optimal in a certain sense when the
assumptions hold. When they don't there's no single answer to what the
best alternative is (unless it's employ a good statistician).

Duncan