[SciPy-user] Limits of linrgress - underflow encountered in stdtr

Tue Jun 9 07:57:18 EDT 2009

Hi,

for z = 30 my code sample prints

===== dependency_with_noise =====
slope: 2.0022556391
intercept: -0.771428571429
r^2: 0.953601402677
p-value: 0.0
stderr: 0.0258507089053

so I'm just confused that the p-value claims the match is absolutely 
perfect while it is not (also its pretty close to perfect). If compared 
this result to R (www.*r*-project.org) :

> summary(lm(y~x))

Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.2624   0.7325   0.7477   0.7635   7.7511 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.77143    0.28728  -2.685  0.00745 ** 
x            2.00226    0.02585  77.455  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 3.651 on 598 degrees of freedom
Multiple R-squared: 0.9094,     Adjusted R-squared: 0.9092 
F-statistic:  5999 on 1 and 598 DF,  p-value: < 2.2e-16 

> summary(lm(y~x))$coefficients
              Estimate Std. Error   t value      Pr(>|t|)
(Intercept) -0.7714286 0.28728036 -2.685281  7.447975e-03
x            2.0022556 0.02585071 77.454574 6.009953e-314

The intercept, slope (x) and stderr values are equal but the p-value is 
6.009953e-314 and r-squared is different. While 6.009953e-314 is small 
enough to say its 0 and the result is highly significant, I just wonder 
if Scipy decides its small enough to return 0.0 or if it returns 0.0 
because it cant actually compute it. If 0.0 is returned deliberately 
what's the threshold for this decision. Maybe this behavior should be 
documented.

regards
robert

josef.pktd at gmail.com schrieb:
> On Mon, Jun 8, 2009 at 5:16 PM, wierob<wierob83 at googlemail.com> wrote:
>   
>> Hi,
>>
>>     
>>> turn of numpy.seterr(all="raise")
>>> as explained in the reply to your previous messages
>>>
>>> Josef
>>>
>>>       
>> turning of the error reporting doesn't prevent the error. Thus the
>> result may be wrong, doesn't it? E.g. a p-value of 0.0 looks suspicious.
>>
>>     
>
> anything else than a p-value of 0 would be suspicious, you have a
> perfect fit and the probability is zero that we observe a slope equal
> to the estimated slope under the null hypothesis( that the slope is
> zero). So (loosely speaking) we can reject the null of zero slope with
> probability 1.
> The result is not "maybe" wrong, it is correct. your r_square is 1,
> the standard error of the slope estimate is zero.
>
>
> floating point calculation with inf are correct (if they don't have a
> definite answer we get a nan). Dividing a non-zero number by zero has
> a well defined result, even if python raises a zerodivisionerror.
>
>   
>>>> np.array(1)/0.
>>>>         
> inf
>   
>>>> 1/(np.array(1)/0.)
>>>>         
> 0.0
>   
>>>> np.seterr(all="raise")
>>>>         
> {'over': 'ignore', 'divide': 'ignore', 'invalid': 'ignore', 'under': 'ignore'}
>   
>>>> 1/(np.array(1)/0.)
>>>>         
> Traceback (most recent call last):
>   File "<pyshell#39>", line 1, in <module>
>     1/(np.array(1)/0.)
> FloatingPointError: divide by zero encountered in divide
>
> Josef
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>