[SciPy-User] scipy.stats.kendalltau bug?

Jeffrey zfyuan at mail.ustc.edu.cn
Sun Jul 29 05:42:52 EDT 2012


On 07/29/2012 03:47 PM, Nathaniel Smith wrote:
> On Sun, Jul 29, 2012 at 8:27 AM, Jeffrey <zfyuan at mail.ustc.edu.cn> wrote:
>> Thanks eat. I found the reason is that numpy.sqrt cannot deal with too large
>> number. When calculating kendalltau, assume n=len(x),then the total pair
>> number is 'tot' below:
>>
>>      tot=(n-1)*n//2
>>
>> when calculating tau, the de-numerator is as below:
>>
>>      np.sqrt((tot-u)*(tot-v))
>>
>> u and v stands for ties in x[] and y[perm[]], which is zero if the two array
>> sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range for
>> the C written ufunc 'np.sqrt', and an Error is then raised.
>>
>> What about using math.sqrt here, or multiply two np.sqrt in the
>> de-numerator? Since big data sets are often seen these days.
> It seems like the bug is that np.sqrt is raising an AttributeError on
> valid input... can you give an example of a value that np.sqrt fails
> on? Like

Assume the input array x and y has n=100000 length, which is common 
seen, and assume there is no tie in both x and y, hence u=0, v=0 and t=0 
in the scipy.stats.kendalltau subroutine. Hence the de-numerator of 
expression for calculating tau would be as follows:

     np.sqrt( (tot-u) * (tot-v) )

Here above, tot= n * (n-1) //2=499950000, and (tot-u) * (tot-v)= tot*tot 
= 24999500002500000000L, this long int will raise Error when np.sqrt is 
applied. I think type convert, like 'float()' should be done before 
np.sqrt, or write like np.sqrt(tot-u) * np.sqrt(tot-v) to avoid long 
integer.

Thanks a lot : )

>>>> np.sqrt(<something>)
> AttributeError
>
> -n
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


-- 
袁振飞
中国科技大学统计与金融系
安徽省,合肥市,230026
联系电话:13155190081





More information about the SciPy-User mailing list