[SciPy-User] bug in rankdata?
Chris Rodgers
xrodgers at gmail.com
Thu Feb 14 20:46:31 EST 2013
The results I'm getting from rankdata seem completely wrong for large
datasets. I'll illustrate with a case where all data are equal, so
every rank should be len(data) / 2 + 0.5.
In [220]: rankdata(np.ones((10000,), dtype=np.int))
Out[220]: array([ 5000.5, 5000.5, 5000.5, ..., 5000.5, 5000.5, 5000.5])
In [221]: rankdata(np.ones((100000,), dtype=np.int))
Out[221]:
array([ 7050.82704, 7050.82704, 7050.82704, ..., 7050.82704,
7050.82704, 7050.82704])
In [222]: rankdata(np.ones((1000000,), dtype=np.int))
Out[222]:
array([ 1784.293664, 1784.293664, 1784.293664, ..., 1784.293664,
1784.293664, 1784.293664])
In [223]: scipy.__version__
Out[223]: '0.11.0'
In [224]: numpy.__version__
Out[224]: '1.6.1'
The results are completely off for N>10000 or so. Am I doing something wrong?
More information about the SciPy-User
mailing list