[SciPy-user] Mysterious kmeans() error

Fri Feb 6 19:35:08 EST 2009

On Fri, Feb 6, 2009 at 7:14 PM, David Cournapeau <cournape at gmail.com> wrote:
> On Sat, Feb 7, 2009 at 1:25 AM,  <josef.pktd at gmail.com> wrote:
>> On Fri, Feb 6, 2009 at 11:05 AM, David Cournapeau <cournape at gmail.com> wrote:
>>> On Fri, Feb 6, 2009 at 11:37 PM, Roy H. Han
>>> <starsareblueandfaraway at gmail.com> wrote:
>>>> Well I feel like there are numerical problems with scipy's kmeans2(),
>>>> at least in the 0.6.0 version of scipy.
>>>
>>> kmeans and kmeans2 are fairly low level - they will fail if you have
>>> empty cluster, indeed.
>>
>> I thought that the tests  test_kmeans_lost_cluster(self) verifies that
>> empty clusters
>> are handled.
>
> Actually, it tests a warning/exception is raised, instead of silently
> fail - so you can for example repeat the kmeans procedure with
> different initializations values (that's how I use kmeans in the em
> toolbox).

Doesn't random initialization automatically restart with different
random values. When I ran the example in test_kmeans_lost_cluster, it
seemed to produce reasonable results after the warning, but I didn't
verify any numbers.  Also the follow up error that the OP got was in
the cov calculation for the random init. So it seems to me that there
is a failure in reinitializing the process.

(But, I only looked at the source for this part and don't know how the
cluster analysis in scipy is constructed overall.)

Josef

>
> But again, a better kmeans algorithm implementation would be nice - I
> just not sure it should be in scipy, though,
>
> David
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>