[SciPy-User] scipy.stats.kendalltau bug?

Ralf Gommers ralf.gommers at googlemail.com
Wed Aug 1 15:03:46 EDT 2012


On Sun, Jul 29, 2012 at 12:30 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Sun, Jul 29, 2012 at 10:42 AM, Jeffrey <zfyuan at mail.ustc.edu.cn> wrote:
> > On 07/29/2012 03:47 PM, Nathaniel Smith wrote:
> >> On Sun, Jul 29, 2012 at 8:27 AM, Jeffrey <zfyuan at mail.ustc.edu.cn>
> wrote:
> >>> Thanks eat. I found the reason is that numpy.sqrt cannot deal with too
> large
> >>> number. When calculating kendalltau, assume n=len(x),then the total
> pair
> >>> number is 'tot' below:
> >>>
> >>>      tot=(n-1)*n//2
> >>>
> >>> when calculating tau, the de-numerator is as below:
> >>>
> >>>      np.sqrt((tot-u)*(tot-v))
> >>>
> >>> u and v stands for ties in x[] and y[perm[]], which is zero if the two
> array
> >>> sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range
> for
> >>> the C written ufunc 'np.sqrt', and an Error is then raised.
> >>>
> >>> What about using math.sqrt here, or multiply two np.sqrt in the
> >>> de-numerator? Since big data sets are often seen these days.
> >> It seems like the bug is that np.sqrt is raising an AttributeError on
> >> valid input... can you give an example of a value that np.sqrt fails
> >> on? Like
> >
> > Assume the input array x and y has n=100000 length, which is common
> > seen, and assume there is no tie in both x and y, hence u=0, v=0 and t=0
> > in the scipy.stats.kendalltau subroutine. Hence the de-numerator of
> > expression for calculating tau would be as follows:
> >
> >      np.sqrt( (tot-u) * (tot-v) )
> >
> > Here above, tot= n * (n-1) //2=499950000, and (tot-u) * (tot-v)= tot*tot
> > = 24999500002500000000L, this long int will raise Error when np.sqrt is
> > applied. I think type convert, like 'float()' should be done before
> > np.sqrt, or write like np.sqrt(tot-u) * np.sqrt(tot-v) to avoid long
> > integer.
> >
> > Thanks a lot : )
>
> Thanks, that clarifies things: https://github.com/numpy/numpy/issues/368
>
> For now, yeah, some sort of workaround makes sense, though... in
> addition to the ones you mention, I noticed that this also seems to
> work:
>
> np.sqrt(bignum, dtype=float)
>
> You should submit a pull request :-).
>

This was already fixed for 0.10.x:
https://github.com/scipy/scipy/commit/ce14ddb

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20120801/4f7ec16e/attachment.html>


More information about the SciPy-User mailing list