[SciPy-Dev] Suggestion: Fast hierarchical clustering

Daniel Müllner fastcluster at math.stanford.edu
Wed May 9 14:51:30 EDT 2012


Dear SciPy developers,

I am the author of a package for fast hierarchical clustering:

http://cran.r-project.org/web/packages/fastcluster/

(The C++ library has two interfaces for R and Python, hence the source 
code is published on CRAN.)

The package improves the time complexity of the current algorithms in 
scipy.cluster.hierarchy from O(N^3) to O(N^2). The syntax of the Python 
interface agrees with the SciPy methods, so that users can quickly 
switch to the faster algorithms. But really, the best place for the 
clustering package would be to incorporate it into SciPy so that 
everyone could use the faster algorithms by default.

Here are my questions:

(1) Is there sufficient interest to replace 
scipy.cluster.hierarchy.linkage by faster code so that it makes the 
effort worthwhile for me?

(2) To whom can I submit the suggested changes?  Making a SciKit as 
described on http://www.scipy.org/Developer_Zone does not seem the right 
approach here since I am not offering a new, independent package but 
replacement of certain code within the existing module 
scipy.cluster.hierarchy.

(3) Who decides whether the suggested changes are accepted or not?

Best,

Daniel


Here are a few facts that you might want to know:

* The core library is in C++ since I use templates a lot. However, I 
already took care that the R interface compiles on a variety of systems, 
see:

http://cran.r-project.org/web/checks/check_results_fastcluster.html

Therefore, I don't expect compilation issues for the Python interface.

* The license is currently GPL. I am willing to publish the code under a 
different license if this is required for SciPy.

* The latest version (not published yet) compiles and works under Python 
2 and 3, so there are also no issues here.



More information about the SciPy-Dev mailing list