[SciPy-User] bug in rankdata?

Chris Rodgers xrodgers at gmail.com
Fri Feb 15 13:29:38 EST 2013


Thanks very much! I discovered this bug because mann-whitney U was
giving me bizarre results, like a negative U statistic. My data is a
large number of integer counts, mostly zeros, which is the worst case
for ties.

Until I can update scipy, I'll either write my own rankdata method,
which will be very slow, or I'll use the R equivalent which is more
feature-ful (but then I have to figure out rpy2 which will also be
slow).

On Fri, Feb 15, 2013 at 10:22 AM, Warren Weckesser
<warren.weckesser at gmail.com> wrote:
>
>
> On Fri, Feb 15, 2013 at 10:32 AM, Warren Weckesser
> <warren.weckesser at gmail.com> wrote:
>>
>> On 2/14/13, Chris Rodgers <xrodgers at gmail.com> wrote:
>> > The results I'm getting from rankdata seem completely wrong for large
>> > datasets. I'll illustrate with a case where all data are equal, so
>> > every rank should be len(data) / 2 + 0.5.
>> >
>> > In [220]: rankdata(np.ones((10000,), dtype=np.int))
>> > Out[220]: array([ 5000.5,  5000.5,  5000.5, ...,  5000.5,  5000.5,
>> > 5000.5])
>> >
>> > In [221]: rankdata(np.ones((100000,), dtype=np.int))
>> > Out[221]:
>> > array([ 7050.82704,  7050.82704,  7050.82704, ...,  7050.82704,
>> >         7050.82704,  7050.82704])
>> >
>> > In [222]: rankdata(np.ones((1000000,), dtype=np.int))
>> > Out[222]:
>> > array([ 1784.293664,  1784.293664,  1784.293664, ...,  1784.293664,
>> >         1784.293664,  1784.293664])
>> >
>> > In [223]: scipy.__version__
>> > Out[223]: '0.11.0'
>> >
>> > In [224]: numpy.__version__
>> > Out[224]: '1.6.1'
>> >
>> >
>> > The results are completely off for N>10000 or so. Am I doing something
>> > wrong?
>>
>>
>> Looks like a bug.  The code that accumulates the ranks of the tied
>> values is using a 32 bit integer for the sum of the ranks, and this is
>> overflowing.  I'll see if I can get this fixed for the imminent
>> release of 0.12.
>>
>> Warren
>>
>
>
> A pull  request with the fix is here:
> https://github.com/scipy/scipy/pull/436
>
>
> Warren
>
>
>>
>> > _______________________________________________
>> > SciPy-User mailing list
>> > SciPy-User at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-user
>> >
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list