[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)
Sturla Molden
sturla at molden.no
Wed Mar 18 08:11:35 EDT 2009
On 3/18/2009 12:09 AM, Sturla Molden wrote:
>> Hollander, M., and D. A. Wolfe. 1999. Nonparametric statistical methods
>> is supposed to have a discussion on tie handling for kendall's tau,
>> but I don't have access to it.
>
> I have this book in my book shelf at work.
Here is a completely naïve Kendall's tau based on Hollander & Wolfe's book:
def tau(x,y):
"""
Kendall's tau according to Hollander, M.,
and D. A. Wolfe. 1999. Nonparametric statistical
methods. 2nd edition. New York: Wiley. Page 382.
"""
def Q((a,b),(c,d)):
if (d-b)*(c-a) > 0: return 1
if (d-b)*(c-a) == 0: return 0
if (d-b)*(c-a) < 0: return -1
raise ValueError, 'this should never occur'
K = 0
n = len(x)
assert(len(x)==len(y))
for i in range(n-1):
for j in range(i+1,n):
K += Q((x[i],y[i]),(x[j],y[j]))
return 2.0*K/(n*(n-1)) # Eq 8.34
And with this:
>>> tau([1,1,2],[1,1,2])
0.66666666666666663
So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical
Receipes says 1.0. Intuitively a vector correlation should be exactly
correlated with itself, but I am inclined to trust Hollander & Wolfe
more than Numerical Receipes. For example, if we use this probability
definition of tau:
tau = P{concordant pair} - P{disconcordant pair}
then tau should indeed be 0.67.
Best regards,
Sturla Molden
More information about the SciPy-Dev
mailing list