[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

Wed Mar 18 09:19:38 EDT 2009

Hello,

On Wed, 2009-03-18 at 13:11 +0100, Sturla Molden wrote:
> So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical 
> Receipes says 1.0. Intuitively a vector correlation should be exactly 
> correlated with itself, but I am inclined to trust Hollander & Wolfe 
> more than Numerical Receipes.

Ah, I was under the impression you already checked Hollander & Wolfe.
Anyway, it seems my initial interpretation was right then. Repeating the
formula here (augmented) for future reference:

Kendall's tau-b (tie handling):
-------------------------------
Given two rankings R1 and R2, Kendall's tau-b is calculated by:
        t = (P - Q) / SQRT((P + Q + T) * (P + Q + U))
where P is the number of concordant pairs, Q the number of discordant
pairs, T the number of ties in R1 and U the number of ties in R2.
[Ties are always counted regardless of whether they occur for the same
pair in R1 and R2 or different pairs]
-------------------------------

Some tests I ran today with the R implementation of Kendall's Tau(-a)
and the original implementation in SciPy.stats.stats (Kendall's Tau-b)
seem to suggests that if we do NOT count ties on the same pair (the
current situation in SciPy.stats.stats) effectively Kendall's Tau-b
gives the same outcomes as Kendall's Tau-a for about 36 test cases.

This seems to suggest that Kendall's Tau-b (tie correction) in SciPy as
it is behaves like Kendall's Tau-a (no tie correction), possibly because
of leaving out ties on identical pairs in T and U above.

I unfortunately do not have the time to mathematically prove (or
disprove) the equivalence of Kendall's Tau-a and the current SciPy
implementation right now, but I thought I'd be useful to mention these
test results.

-- 
With kind regards,

Almer S. Tigelaar
University of Twente