[SciPy-dev] Fast (O(n log(n)) ) implementation of Kendall Tau

Enzo Michelangeli enzomich at gmail.com
Thu Oct 1 00:07:42 EDT 2009


From: "Sturla Molden" <sturla at molden.no>
Sent: Thursday, October 01, 2009 6:09 AM

> Enzo Michelangeli skrev:
>> Dear all,
>>
>> A few days ago I posted to http://projects.scipy.org/scipy/ticket/999 a
>> drop-in replacement for scipy.stats.kendalltau.
> Why do you re-implement mergesort in pure Python?
>
> ndarrays have a sort method that can use mergesort.
>
> Python lists has the same (timsort is mergesort on steroids).

Because Knight's algorithm needs to count the number of swaps (or, to be
more precise, the number of swaps that would be performed by an equivalent
bubblesort). In the code, that's the purpose of the variable exchcnt .

An alternative algorithm for the Kendall Tau developed by David Christensen,
and described at http://www.springerlink.com/content/p33qu44058984082/ (with
a Delphi implementation), replaces the mergesort step with one based on
balanced binary trees (AVL in his case, but I guess RBT would also work).
Unfortunately, neither the standard Python library nor NumPy/SciPy appear to
implement such useful data structures, and what is available either doesn't
allow O(log(n)) random access (heapq) or lacks a O(log(n)) insert (sorted
lists accessed through bisect).

Enzo




More information about the SciPy-Dev mailing list