[SciPy-Dev] Suggestion: Fast hierarchical clustering
Daniel Müllner
fastcluster at math.stanford.edu
Wed May 9 14:51:30 EDT 2012
Dear SciPy developers,
I am the author of a package for fast hierarchical clustering:
http://cran.r-project.org/web/packages/fastcluster/
(The C++ library has two interfaces for R and Python, hence the source
code is published on CRAN.)
The package improves the time complexity of the current algorithms in
scipy.cluster.hierarchy from O(N^3) to O(N^2). The syntax of the Python
interface agrees with the SciPy methods, so that users can quickly
switch to the faster algorithms. But really, the best place for the
clustering package would be to incorporate it into SciPy so that
everyone could use the faster algorithms by default.
Here are my questions:
(1) Is there sufficient interest to replace
scipy.cluster.hierarchy.linkage by faster code so that it makes the
effort worthwhile for me?
(2) To whom can I submit the suggested changes? Making a SciKit as
described on http://www.scipy.org/Developer_Zone does not seem the right
approach here since I am not offering a new, independent package but
replacement of certain code within the existing module
scipy.cluster.hierarchy.
(3) Who decides whether the suggested changes are accepted or not?
Best,
Daniel
Here are a few facts that you might want to know:
* The core library is in C++ since I use templates a lot. However, I
already took care that the R interface compiles on a variety of systems,
see:
http://cran.r-project.org/web/checks/check_results_fastcluster.html
Therefore, I don't expect compilation issues for the Python interface.
* The license is currently GPL. I am willing to publish the code under a
different license if this is required for SciPy.
* The latest version (not published yet) compiles and works under Python
2 and 3, so there are also no issues here.
More information about the SciPy-Dev
mailing list