[SciPy-user] error estimate in stats.linregress

Bruce Southey bsouthey at gmail.com
Mon Feb 23 16:45:02 EST 2009


josef.pktd at gmail.com wrote:
> On Mon, Feb 23, 2009 at 3:01 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>   
>> Hi,
>> Yes, the formula is incorrect. The reason is that the sum of squares
>> terms are not corrected by the means because the ss function just
>> computes the uncorrected sum of squares.
>>
>> Thus the correct formula should :
>> sterrest = np.sqrt(((1-r*r)*(ss((y-ymean))))/(df*(ss(x-xmean))))
>> Alternatively:
>> sterrest = np.sqrt((1-r*r)*(ss(y)-n*ymean*ymean)/ (ss(x)-n*xmean*xmean)
>> / df)
>>
>> Note the formula is derived using the definition of R-squared:
>> The estimated variance of the slope = MSE/Sxx= ((1-R*R)*Syy)/(df*Sxx)
>> where Syy and Sxx are the corrected sums of squares for Y and X,
>> respectively,
>>
>> Regards
>> Bruce
>>
>> Hi all,
>> I was working with linear regression in scipy and met some problems
>> with value of standard error of the estimate returned by
>> scipy.stats.linregress() function. I could not compare it to similar
>> outputs of other linear regression routines (for example in Origin),
>> so I took a look in the source (stats.py).
>>
>> In the source it is defined as
>> sterrest = np.sqrt((1-r*r)*ss(y) / ss(x) / df)
>> where r is correlation coefficient, df is degrees of freedom (N-2) and
>> ss() is sum of squares of elements.
>>
>> After digging through literature the only formula looking somewhat the
>> same was found to be
>> stderrest = np.sqrt((1-r*r)*ss(y-y.mean())/df)
>> which gives the same result as a standard definition (in notation of
>> the source of linregress)
>> stderrest = np.sqrt(ss(y-slope*x-intercept)/df)
>> but the output of linregress is different.
>>
>> I humbly suppose this is a bug, but maybe somebody could explain me
>> what is it if I'm wrong...
>> Pavlo.
>>
>>     
>
> Thank you for reporting and checking this.
>
> I fixed it in trunk, but still have to add a test.
> There are still small  (1e-4) numerical differences to the
> multivariate ols version in the example I tried
>
> Josef
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>   

Assuming that linregress is different from expected, linregress is just 
a scalar implementation so it is probably prone to rounding error in 
numerous places.
If you want, I can look at the example and see.

Bruce



More information about the SciPy-User mailing list