[SciPy-Dev] Faster implementation of cluster.hierarchy

Conrad Lee conradlee at gmail.com
Wed Oct 12 07:12:18 EDT 2011


A mathematician at Stanford named Daniel Müllner recently came up with a
package that implements the hierarchical clustering methods found in
scipy.cluster.hierarchy.  His implementation is in C++, but includes a
python API that uses the same interface as scipy.cluster.hierarchy.

Müllner has posted benchmarks as well as algorithmic explanations of why his
implementation is faster in a paper on arXiv<http://arxiv.org/abs/1109.2378>.
 He also has a webpage that describes the package
here<http://math.stanford.edu/~muellner/fastcluster.html>
.

Because the results of the benchmarks look good, I am interested in getting
the scikit-learn package to use this implementation for the hierarchical
clustering provided by that package.  Rather than integrate the code in
scikit-learn, it seems more appropriate to integrate it upstream in
scipy.cluster.hierarchy.  Is there anyone who is interested in this
integration?  I am inexperienced with integrating C++ code and python code,
and also with how things work in the scipy project, so I'm not sure how to
proceed.

Note: Although Müllner's code is currently under a GPL license, he has
stated to me in e-mail that he would be willing to put it under the BSD-2
license it somebody put the time to integrate it into scipy.

Best regards,

Conrad Lee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20111012/57dc46fa/attachment.html>


More information about the SciPy-Dev mailing list