[SciPy-dev] percentileofscore

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Nov 16 22:44:47 EST 2008


What is percentileofscore supposed to do?
I did not find any good interpretation what the numbers
are supposed to mean.

>From statistics, I am used to a definition according to the
cdf, i.e. fraction of elements weakly smaller than the "score".
Instead a strictly smaller definition could be useful, as
used eg. in ranking of schools.
The current implementation with histogram, does not give
results that I can easily interpret.
The proposed implementation, has still one error as mentioned
by Stefan. It uses the mean when there are multiple elements presents.

I looked at 3 cases:
* the score element is uniquely present in array
* multiple elements in the array are equal to the score
* no element in the array is equal to the score

I tried out 5 different definitions
percentileofscore_proposed: taken from google review with correction
percentileofscore_mean: similar to proposed, give mean rank if multiple present
     This just adds another correction to the proposed version (start
index at one instead of zero)
percentileofscore_meaninterp: similar to proposed, interpolate if missing
percentileofscore_strict: one liner, Fraction(x<score)
percentileofscore_weak one liner, Fraction(x<=score)

this is what I get:

#unique element
>>> percentileofscore_proposed([1,2,3,4,5,6,7,8,9,10],4)
30.0
>>> percentileofscore_mean([1,2,3,4,5,6,7,8,9,10],4)
40.0
>>> percentileofscore_meaninterp([1,2,3,4,5,6,7,8,9,10],4)
40.0
>>> percentileofscore_strict([1,2,3,4,5,6,7,8,9,10],4)
30.0
>>> percentileofscore_weak([1,2,3,4,5,6,7,8,9,10],4)
40.0


#multiple elements
>>> percentileofscore_proposed([1,2,3,4,4,5,6,7,8,9],4)
35.0
>>> percentileofscore_mean([1,2,3,4,4,5,6,7,8,9],4)
45.0
>>> percentileofscore_meaninterp([1,2,3,4,4,5,6,7,8,9],4)
45.0
>>> percentileofscore_weak([1,2,3,4,4,5,6,7,8,9],4)
50.0
>>> percentileofscore_strict([1,2,3,4,4,5,6,7,8,9],4)
30.0

#missing elements
>>> percentileofscore_proposed([1,2,3,5,6,7,8,9,10,11],4)
30.0
>>> percentileofscore_mean([1,2,3,5,6,7,8,9,10,11],4)
30.0
>>> percentileofscore_meaninterp([1,2,3,5,6,7,8,9,10,11],4)
35.0
>>> percentileofscore_weak([1,2,3,5,6,7,8,9,10,11],4)
30.0
>>> percentileofscore_strict([1,2,3,5,6,7,8,9,10,11],4)
30.0

What's the use case for percentileofscore?
I just use Fraction(x<=score) or Fraction(x<score)

Josef
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: percofscore.py
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20081116/b4aa8ce0/attachment.ksh>


More information about the SciPy-Dev mailing list