[SciPy-user] scipy.stats.scoreatpercentile(...)

Robert Kern rkern at ucsd.edu
Mon Aug 15 19:15:37 EDT 2005


David K wrote:
> Hi,
> 
> I was trying the scipy.stats.scoreatpercentile function:
> 
> 
>>>>import scipy
>>>>a = scipy.arange(1,11)
>>>>a
> 
> array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
> 
>>>>scipy.stats.scoreatpercentile( a, 50 ) # find the 50th percentile
> 
> 5.9050000000000002
> 
> Shouldn't the result be 5.5?  Or perhaps I've misunderstood something?

I think scoreatpercentile() is kind of broken for this input. It uses 
histogram() under the covers; I think the defaults for histogram() (e.g. 
10 bins) and its boundary heuristics are a bit pathological for this 
input. Well, they're not so bad by themselves, but scoreatpercentile() 
trusts them more than is wise in this case.

In [3]: a = arange(1,11)

In [4]: a
Out[4]: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [5]: stats.scoreatpercentile(a, 100.0)
Out[5]: 10.265000000000001

In [6]: stats.scoreatpercentile(a, 10.0)
Out[6]: -9.3550000000000004

In [7]: stats.scoreatpercentile(a, 11.0)
Out[7]: 1.6540000000000001

In [8]: stats.histogram(a)
Out[8]:
(array([1, 1, 1, 1, 1, 1, 2, 1, 1, 0]),
  0.45499999999999996,
  1.0900000000000001,
  0)

Alternately,

In [9]: a = stats.uniform.rvs(1.0, 9.0, size=1000)

In [10]: stats.histogram(a)
Out[10]:
(array([ 49, 124, 123, 118, 117, 124, 115, 137,  93,   0]),
  0.46266254770569504,
  1.0880739132501185,
  0)

In [11]: stats.scoreatpercentile(a, 50.0)
Out[11]: 5.6147390258301879

In [12]: stats.scoreatpercentile(a, 10.0)
Out[12]: 1.9982507317280398

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter




More information about the SciPy-User mailing list