[SciPy-Dev] GSoC Draft Proposal: Rewrite and improve cluster package in Cython
Richard Tsai
richard9404 at gmail.com
Mon Mar 17 04:31:38 EDT 2014
Hi,
I looked at several ML packages and I found that ELKI has implemented a
optimized single linkage algorithm called SLINK[1][2]. And I also found a
similar algorithm called CLINK[3], which is for complete linkage. It seems
that these two algorithms are much faster and use less memory than the
naive algorithms we are using in cluster.hierarchy currently.
I also read some IPython notebooks and StackOverflow posts recently and I
found that many people are discussing how to plot a heatmap of hierarchical
clustering. I think if we integrate it into cluster.hierarchy, it will be a
good complement to hierarchy.dendrogram.
Besides, I noticed that the cluster package is single-threaded currently. I
don't know if parallelization in scipy level rather than BLAS level is
proper, but at least we can just make use of the BLAS library (if it
supports) to parallelize the kmeans algorithm.
[1]:
http://elki.dbs.ifi.lmu.de/releases/release0.6.0/doc/de/lmu/ifi/dbs/elki/algorithm/clustering/hierarchical/SLINK.html
[2]: http://www.cs.ucsb.edu/~veronika/MAE/SLINK_sibson.pdf
[3]: http://comjnl.oxfordjournals.org/content/20/4/364.abstract
Regards,
Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140317/c2e012c3/attachment.html>
More information about the SciPy-Dev
mailing list