[SciPy-user] error estimate in stats.linregress

Sat Feb 21 14:40:47 EST 2009

Hi all,
I was working with linear regression in scipy and met some problems
with value of standard error of the estimate returned by
scipy.stats.linregress() function. I could not compare it to similar
outputs of other linear regression routines (for example in Origin),
so I took a look in the source (stats.py).

In the source it is defined as
sterrest = np.sqrt((1-r*r)*ss(y) / ss(x) / df)
where r is correlation coefficient, df is degrees of freedom (N-2) and
ss() is sum of squares of elements.

After digging through literature the only formula looking somewhat the
same was found to be
stderrest = np.sqrt((1-r*r)*ss(y-y.mean())/df)
which gives the same result as a standard definition (in notation of
the source of linregress)
stderrest = np.sqrt(ss(y-slope*x-intercept)/df)
but the output of linregress is different.

I humbly suppose this is a bug, but maybe somebody could explain me
what is it if I'm wrong...
Pavlo.