[SciPy-user] 2D clustering question

Hazen Babcock hbabcock at mac.com
Mon May 4 19:06:07 EDT 2009


Hello,

I've been using scipy.cluster.hierarchy.fclusterdata() to cluster groups 
of points based on their x and y position. This works well for data sets 
without out too many points, but seems to get pretty slow as the number 
of points gets into the high thousands (i.e. 6000+). Does anyone know of 
a more specialized clustering algorithm that might be able to handle 
even larger numbers of points, i.e. up to 10e6 or so? The points are 
spread out over 0 - 200 or so in X and Y and I'm clustering with a 0.5 
cutoff. One approach is to break the data set down into smaller sections 
based on X,Y coordinate, but perhaps something like this already exists?

thanks,
-Hazen




More information about the SciPy-User mailing list