[SciPy-User] bug in rankdata?
Warren Weckesser
warren.weckesser at gmail.com
Fri Feb 15 13:22:09 EST 2013
On Fri, Feb 15, 2013 at 10:32 AM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:
> On 2/14/13, Chris Rodgers <xrodgers at gmail.com> wrote:
> > The results I'm getting from rankdata seem completely wrong for large
> > datasets. I'll illustrate with a case where all data are equal, so
> > every rank should be len(data) / 2 + 0.5.
> >
> > In [220]: rankdata(np.ones((10000,), dtype=np.int))
> > Out[220]: array([ 5000.5, 5000.5, 5000.5, ..., 5000.5, 5000.5,
> > 5000.5])
> >
> > In [221]: rankdata(np.ones((100000,), dtype=np.int))
> > Out[221]:
> > array([ 7050.82704, 7050.82704, 7050.82704, ..., 7050.82704,
> > 7050.82704, 7050.82704])
> >
> > In [222]: rankdata(np.ones((1000000,), dtype=np.int))
> > Out[222]:
> > array([ 1784.293664, 1784.293664, 1784.293664, ..., 1784.293664,
> > 1784.293664, 1784.293664])
> >
> > In [223]: scipy.__version__
> > Out[223]: '0.11.0'
> >
> > In [224]: numpy.__version__
> > Out[224]: '1.6.1'
> >
> >
> > The results are completely off for N>10000 or so. Am I doing something
> > wrong?
>
>
> Looks like a bug. The code that accumulates the ranks of the tied
> values is using a 32 bit integer for the sum of the ranks, and this is
> overflowing. I'll see if I can get this fixed for the imminent
> release of 0.12.
>
> Warren
>
>
A pull request with the fix is here:
https://github.com/scipy/scipy/pull/436
Warren
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20130215/e89f414f/attachment.html>
More information about the SciPy-User
mailing list