[SciPy-User] scipy.stats.mstats.linregress bug?

Skipper Seabold jsseabold at gmail.com
Fri Jun 24 12:10:43 EDT 2011


On Fri, Jun 24, 2011 at 11:59 AM,  <josef.pktd at gmail.com> wrote:
> On Fri, Jun 24, 2011 at 12:02 PM, Andreas <lists at hilboll.de> wrote:
>>>>> try to rescale, take away the e15, small numerical differences are
>>>>> possible because of the different way the results are calculated.
>>>>> There might still be a difference in the definition of the returns,
>>>>> but I haven't checked recently.
>>>>
>>>> Rescaling doesn't change a thing (see below). And, we're not talking
>>>> about
>>>> small numerical differences here. The problem is the last return value,
>>>> stderr. It differs by almost a factor 15!
>>>>
>>>> Cheers,
>>>> Andreas.
>>>>
>>>> In [15]: scipy.stats.linregress(x,data/1E15)
>>>> Out[15]:
>>>> (0.14916317817857139,
>>>>  4.8326781674166659,
>>>>  0.53093100793359616,
>>>>  0.041709303490157057,
>>>>  0.066031024254034967)
>>>>
>>>> In [16]: scipy.stats.mstats.linregress(x,data/1E15)
>>>> Out[16]:
>>>> (0.14916317817857139,
>>>>  4.8326781674166659,
>>>>  0.53093100793359627,
>>>>  masked_array(data = 0.0417093034902,
>>>>             mask = False,
>>>>       fill_value = 1e+20)
>>>> ,
>>>>  1.0286155756515489)
>>>>
>>>>
>>>
>>> ma linregress
>>> sterrest = ma.sqrt(1.-r*r) * y.std()
>>>
>>> linregress
>>> sterrest = np.sqrt((1-r*r)*ssym / ssxm / df)
>>
>> So, why is it treated differently in the two functions that everyone would
>> expect to behave identically? What's the mathematical background. What's
>> ssym, ssxm, df?
>>
>> And: Which one is a better estimate? (In my case, the stats.linregress one
>> seems to be a lot more reasonable ...)
>
> stats.stats reports the stderror of the estimate of the slope parameter b
> stats.mstats reports the stderror of the regression error/residual) y - (a + bx)
>

It's a biased estimate in mstats as well by the look of it?

> stats.stats got changed by accident, and mstats didn't follow.
>

Either way, the docs need to be fixed at the least.

Skipper



More information about the SciPy-User mailing list