[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)
Sturla Molden
sturla at molden.no
Wed Mar 18 12:30:51 EDT 2009
On 3/18/2009 4:55 PM, Sturla Molden wrote:
> The ide is that Kendall's tau works on ordinal scale, not rank scale as
> Spearman's r. You can use the number of categories for X and Y you like,
> but the categories have to be ordered. You thus get a table of counts.
> If you for example use two categories (small or big) in X and four
> categories (tiny, small, big, huge) in Y, the table is 2 x 4. If you go
> all the way up to rank scale, you get a very sparse table with a lot 0
> counts. With few categories, ties will be quite common, and that is the
> justification for tau-b instead of gamma.
One very important aspect of this is that it can reduce the
computational burden substantially. If you e.q. know that 100 categories
is sufficient resolution, you get a 100 x 100 contigency table. tau-b
can be computed directly from the table. So for large data sets, this
avoids the O(N**2) complexity of tau. The complexity of tau-b becomes
O(N) and O(C*D), with C and D the number of categories in X and Y.
So having a contingency-table version of tau-b would be very useful.
Sturla Molden
More information about the SciPy-Dev
mailing list