[SciPy-User] kmeans

Fri Jul 23 16:18:58 EDT 2010

On Jul 23, 2010, at 12:40 PM, Benjamin Root wrote
> 
> Just to be clear, the C Clustering library's implementation of kmeans is entirely
> different from SciPy's implementation.  While I am certainly no expert in determining
> which approach is better than another, I can say that I have used it before and it has
> worked very nicely for me and my uses.

I am not sure the implementations are so different (possible bugs not withstanding ;). At implementation in the C clustering library does the following:

1. Start with an initial guess of the cluster assignments
2. Compute means for each cluster
3. Assign each data point to the nearest cluster mean.
4. If the cost function did not decrease or the maximum number of iterations has been reached => exit
5. Go to 2.

This algorithm finds a local mimimum, and can be repeated a number of times with different initial clusterings to select from a number of locally optimal solutions.

I am less familiar with the k-means implementation in scipy, but at first glance it seems pretty similar. However, the implementation in the C clustering library is more robust in that it detects cycles in the iteration process, and it makes sure that each cluster contains at least one data point.

  Lutz