[SciPy-User] kmeans

Keith Goodman kwgoodman at gmail.com
Fri Jul 23 19:48:52 EDT 2010


On Fri, Jul 23, 2010 at 4:00 PM, Benjamin Root <ben.root at ou.edu> wrote:

> The stopping condition uses the change in the distortion, not a non-squared
> distance.  The distortion is already a sum of squares.  The only place that
> a non-squared distance is used is in _py_vq_1d() which appears to be very
> old code and it has a raise error at the very first statement.

That's good news.

Another place that a non-squared distance is used is the return value:

>> import numpy as np
>> from scipy import cluster
>> v = np.array([1,2,3,4,10],dtype=float)
>> cluster.vq.kmeans(v, 1)
   (array([ 4.]), 2.3999999999999999)

>> np.sqrt(np.dot(v-4, v-4) / 5.0)
   3.1622776601683795  # Nope, not returned
>> np.absolute(v - 4).mean()
   2.3999999999999999 # Yep, this one is returned

Is that a code bug or a doc bug?



More information about the SciPy-User mailing list