[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

Almer S. Tigelaar almer at gnome.org
Wed Mar 18 05:12:42 EDT 2009


Hi Josef,

On Tue, 2009-03-17 at 16:41 -0400, josef.pktd at gmail.com wrote:
> The problem, I had with Kendalls tau was that I didn't find a good,
> non-ambiguous reference, also with the hints to different versions of
> kendalls tau, it wasn't clear to me what exactly is implemented or how
> the different versions are defined.

For clarity (and future reference), there are three versions that I know
of (I will give them in full, repeating some text for each definition):

Kendall tau-a (with NO handling for ties):
------------------------------------------
	t = (P - Q) / (0.5 * n * (n - 1))
where P is the number of concordant pairs, Q the number of discordant
pairs and n is the number of items to rank (the denominator is actually
equal to the number of pairs).
[I have verified this with multiple academic sources and implementations
and am quite sure that this definition is correct]

Kendall tau-b (tie handling):
-----------------------------
        t = (P - Q) / SQRT((P + Q + T) * (P + Q + U))
where P is the number of concordant pairs, Q the number of discordant
pairs, T the number ties in R1 and U the number of ties in R2.
(and were we are still discussing whether T and U should be incremented
a tie occurs in the same pair for both).

Kendall tau-c (alternative tie handling):
-----------------------------------------
(also called Stuart's tau-c or Kendall-Stuart's tau-c)
	t = (m * (P - Q)) / (n^2 * (m - 1)) 
where P is the number of concordant pairs, Q the number of discordant
pairs, n the number of items and m = min(r,s) where r and s are the
number of rows and columns in the data.

[Note that there are some incorrect definition of Kendall tau-c floating
 around which substitute 2m instead of m in the numerator, as this
 can yield values outside of the (-1, +1) range this is obviously wrong]

---

Some sources state that Kendall tau-b is more appropriate for square
tables and Kendall tau-c for rectangular ones. However, this is an
argument I admittedly do not yet fully grasp.

> I'm a bit surprised that R doesn't do the tie handling.

I might be wrong, my knowledge of R is not (yet) very thorough, but this
is the conclusion I drew from the comments in this file:
https://svn.r-project.org/R/trunk/src/library/stats/src/kendall.c

With kind regards,

Almer S. Tigelaar




More information about the SciPy-Dev mailing list