[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

Almer S. Tigelaar almer at gnome.org
Wed Mar 18 05:36:05 EDT 2009


Hi Josef,

On Tue, 2009-03-17 at 19:11 -0400, josef.pktd at gmail.com wrote:
> I saw it mentioned somewhere, that Kendall's tau is the correlation
> coefficient of pairwise ranking indicators.
> I think, this wouldn't hold if we don't exclude matching ties in the
> counts for the denominator as is done with the current implementation.

Okay, given this and reading the other posts (from Sturla) then my
initial interpretation is probably incorrect. In closing we can agree
that the theoretical definition should really be as follows:

Kendall's tau-b (tie handling):
-------------------------------
        t = (P - Q) / SQRT((P + Q + T) * (P + Q + U))
where P is the number of concordant pairs, Q the number of discordant
pairs, T the number ties only in R1 and U the number of ties only in R2.
If a tie occurs for the same pair in both R1 and R2, it is not added to
either T or U.
-------------------------------

I have not yet been able to find a source myself that unambiguously
gives precisely this same definition. But given Sturla's interpretation,
his access to the books by Hollander&Wolfe and the Numerical Recipes
book and your confirmation using the correlation coefficient, I am
inclined to accept this definition.

Thanks all for your feedback and help!





More information about the SciPy-Dev mailing list