[SciPy-Dev] GSoC Draft Proposal: Rewrite and improve cluster package in Cython

Richard Tsai richard9404 at gmail.com
Mon Mar 17 04:31:38 EDT 2014


Hi,

I looked at several ML packages and I found that ELKI has implemented a
optimized single linkage algorithm called SLINK[1][2]. And I also found a
similar algorithm called CLINK[3], which is for complete linkage. It seems
that these two algorithms are much faster and use less memory than the
naive algorithms we are using in cluster.hierarchy currently.

I also read some IPython notebooks and StackOverflow posts recently and I
found that many people are discussing how to plot a heatmap of hierarchical
clustering. I think if we integrate it into cluster.hierarchy, it will be a
good complement to hierarchy.dendrogram.

Besides, I noticed that the cluster package is single-threaded currently. I
don't know if parallelization in scipy level rather than BLAS level is
proper, but at least we can just make use of the BLAS library (if it
supports) to parallelize the kmeans algorithm.

[1]:
http://elki.dbs.ifi.lmu.de/releases/release0.6.0/doc/de/lmu/ifi/dbs/elki/algorithm/clustering/hierarchical/SLINK.html
[2]: http://www.cs.ucsb.edu/~veronika/MAE/SLINK_sibson.pdf
[3]: http://comjnl.oxfordjournals.org/content/20/4/364.abstract

Regards,
Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140317/c2e012c3/attachment.html>


More information about the SciPy-Dev mailing list