[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)
Almer S. Tigelaar
almer at gnome.org
Wed Mar 18 05:12:42 EDT 2009
Hi Josef,
On Tue, 2009-03-17 at 16:41 -0400, josef.pktd at gmail.com wrote:
> The problem, I had with Kendalls tau was that I didn't find a good,
> non-ambiguous reference, also with the hints to different versions of
> kendalls tau, it wasn't clear to me what exactly is implemented or how
> the different versions are defined.
For clarity (and future reference), there are three versions that I know
of (I will give them in full, repeating some text for each definition):
Kendall tau-a (with NO handling for ties):
------------------------------------------
t = (P - Q) / (0.5 * n * (n - 1))
where P is the number of concordant pairs, Q the number of discordant
pairs and n is the number of items to rank (the denominator is actually
equal to the number of pairs).
[I have verified this with multiple academic sources and implementations
and am quite sure that this definition is correct]
Kendall tau-b (tie handling):
-----------------------------
t = (P - Q) / SQRT((P + Q + T) * (P + Q + U))
where P is the number of concordant pairs, Q the number of discordant
pairs, T the number ties in R1 and U the number of ties in R2.
(and were we are still discussing whether T and U should be incremented
a tie occurs in the same pair for both).
Kendall tau-c (alternative tie handling):
-----------------------------------------
(also called Stuart's tau-c or Kendall-Stuart's tau-c)
t = (m * (P - Q)) / (n^2 * (m - 1))
where P is the number of concordant pairs, Q the number of discordant
pairs, n the number of items and m = min(r,s) where r and s are the
number of rows and columns in the data.
[Note that there are some incorrect definition of Kendall tau-c floating
around which substitute 2m instead of m in the numerator, as this
can yield values outside of the (-1, +1) range this is obviously wrong]
---
Some sources state that Kendall tau-b is more appropriate for square
tables and Kendall tau-c for rectangular ones. However, this is an
argument I admittedly do not yet fully grasp.
> I'm a bit surprised that R doesn't do the tie handling.
I might be wrong, my knowledge of R is not (yet) very thorough, but this
is the conclusion I drew from the comments in this file:
https://svn.r-project.org/R/trunk/src/library/stats/src/kendall.c
With kind regards,
Almer S. Tigelaar
More information about the SciPy-Dev
mailing list