[SciPy-Dev] GSoC Draft Proposal: Rewrite and improve cluster package in Cython

Ralf Gommers ralf.gommers at gmail.com
Mon Mar 17 17:29:56 EDT 2014


On Mon, Mar 17, 2014 at 9:31 AM, Richard Tsai <richard9404 at gmail.com> wrote:

> Hi,
>
> I looked at several ML packages and I found that ELKI has implemented a
> optimized single linkage algorithm called SLINK[1][2]. And I also found a
> similar algorithm called CLINK[3], which is for complete linkage. It seems
> that these two algorithms are much faster and use less memory than the
> naive algorithms we are using in cluster.hierarchy currently.
>

Looks like ELKI as a mix of licenses including BSD, but the default is
AGPL. Did you check for this algorithm? Not entirely clear from the above
if you planned to integrate or reimplement it, for the former it matters.


> I also read some IPython notebooks and StackOverflow posts recently and I
> found that many people are discussing how to plot a heatmap of hierarchical
> clustering. I think if we integrate it into cluster.hierarchy, it will be a
> good complement to hierarchy.dendrogram.
>

Assuming that it doesn't take too much time to implement (plotting
shouldn't be a focus), that sounds fine. There are some more functions in
several packages that optionally use MPL.


> Besides, I noticed that the cluster package is single-threaded currently.
> I don't know if parallelization in scipy level rather than BLAS level is
> proper,
>

That has been out of scope for scipy until now.


> but at least we can just make use of the BLAS library (if it supports) to
> parallelize the kmeans algorithm.
>

Are you talking about the for-loop over observations in vq()? I don't see
any linalg going on in kmeans.

Ralf



>
> [1]:
> http://elki.dbs.ifi.lmu.de/releases/release0.6.0/doc/de/lmu/ifi/dbs/elki/algorithm/clustering/hierarchical/SLINK.html
> [2]: http://www.cs.ucsb.edu/~veronika/MAE/SLINK_sibson.pdf
> [3]: http://comjnl.oxfordjournals.org/content/20/4/364.abstract
>
> Regards,
> Richard
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140317/07014411/attachment.html>


More information about the SciPy-Dev mailing list