[SciPy-User] bug in rankdata?

Thu Feb 14 20:46:31 EST 2013

The results I'm getting from rankdata seem completely wrong for large
datasets. I'll illustrate with a case where all data are equal, so
every rank should be len(data) / 2 + 0.5.

In [220]: rankdata(np.ones((10000,), dtype=np.int))
Out[220]: array([ 5000.5,  5000.5,  5000.5, ...,  5000.5,  5000.5,  5000.5])

In [221]: rankdata(np.ones((100000,), dtype=np.int))
Out[221]:
array([ 7050.82704,  7050.82704,  7050.82704, ...,  7050.82704,
        7050.82704,  7050.82704])

In [222]: rankdata(np.ones((1000000,), dtype=np.int))
Out[222]:
array([ 1784.293664,  1784.293664,  1784.293664, ...,  1784.293664,
        1784.293664,  1784.293664])

In [223]: scipy.__version__
Out[223]: '0.11.0'

In [224]: numpy.__version__
Out[224]: '1.6.1'

The results are completely off for N>10000 or so. Am I doing something wrong?