[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

Wed Mar 18 08:11:35 EDT 2009

On 3/18/2009 12:09 AM, Sturla Molden wrote:
>> Hollander, M., and D. A. Wolfe. 1999. Nonparametric statistical methods
>> is supposed to have a discussion on tie handling for kendall's tau,
>> but I don't have access to it.
> 
> I have this book in my book shelf at work.

Here is a completely naïve Kendall's tau based on Hollander & Wolfe's book:

def tau(x,y):
     """
     Kendall's tau according to Hollander, M.,
     and D. A. Wolfe. 1999. Nonparametric statistical
     methods. 2nd edition. New York: Wiley. Page 382.
     """

     def Q((a,b),(c,d)):
        if (d-b)*(c-a) > 0: return 1
        if (d-b)*(c-a) == 0: return 0
        if (d-b)*(c-a) < 0: return -1
        raise ValueError, 'this should never occur'

     K = 0
     n = len(x)
     assert(len(x)==len(y))
     for i in range(n-1):
         for j in range(i+1,n):
             K += Q((x[i],y[i]),(x[j],y[j]))

     return 2.0*K/(n*(n-1)) # Eq 8.34

And with this:

 >>> tau([1,1,2],[1,1,2])
0.66666666666666663

So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical 
Receipes says 1.0. Intuitively a vector correlation should be exactly 
correlated with itself, but I am inclined to trust Hollander & Wolfe 
more than Numerical Receipes. For example, if we use this probability 
definition of tau:

    tau = P{concordant pair} - P{disconcordant pair}

then tau should indeed be 0.67.

Best regards,
Sturla Molden