[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

Wed Mar 18 12:30:51 EDT 2009

On 3/18/2009 4:55 PM, Sturla Molden wrote:

> The ide is that Kendall's tau works on ordinal scale, not rank scale as 
> Spearman's r. You can use the number of categories for X and Y you like, 
> but the categories have to be ordered. You thus get a table of counts. 
> If you for example use two categories (small or big) in X and four 
> categories (tiny, small, big, huge) in Y, the table is 2 x 4. If you go 
> all the way up to rank scale, you get a very sparse table with a lot 0 
> counts. With few categories, ties will be quite common, and that is the 
> justification for tau-b instead of gamma.

One very important aspect of this is that it can reduce the 
computational burden substantially. If you e.q. know that 100 categories 
is sufficient resolution, you get a 100 x 100 contigency table. tau-b 
can be computed directly from the table. So for large data sets, this 
avoids the O(N**2) complexity of tau. The complexity of tau-b becomes 
O(N) and O(C*D), with C and D the number of categories in X and Y.

So having a contingency-table version of tau-b would be very useful.

Sturla Molden