[SciPy-User] K-means clustering algorithm

Wed Feb 9 06:44:26 EST 2011

Tobjan,

  about how many data points do you have, what dimension, what k ?
One size cannot fit all.

Plain scipy.cluster http://docs.scipy.org/doc/scipy/reference/cluster.html
has hierarchical clustering, good for large k,
but its kmeans calls cholesky on a maybe-singular matrix;
try cluster.vq.kmeans2( data, k, minit="points" ).
(Be aware that k-means can be noisy, and measuring "quality" is
tough.)

As Gael says, scikits.learn has a number of clustering methods.
pycluster is asfarasiknow designed for low-dim gene data.
See also http://stackoverflow.com/questions/tagged/k-means .

cheers
  -- denis

On Feb 7, 6:17 pm, Tobjan Brejicz <toba... at gmail.com> wrote:
> Hello Scipy List:
>
> I would like to know about good implementations of clustering-type algorithm
> in scipy, or maybe also in related package.   Specific, I want to do k-means
> clustering.