[Numpy-discussion] MemoryError : with scipy.spatial.distance

Abhishek Pratap apratap at lbl.gov
Thu Apr 5 16:05:01 EDT 2012


Also in my case I dont really have a good approximate on value of K in K-means.

-A

On Thu, Apr 5, 2012 at 8:06 AM, Abhishek Pratap <apratap at lbl.gov> wrote:
> Hi Gael
>
> The MemoryError exception I am getting is from using scikit's DBSCAN
> implementation. I can check mini-batch implementation of Kmeans.
>
> Best,
> -Abhi
>
> On Wed, Apr 4, 2012 at 10:33 PM, Gael Varoquaux
> <gael.varoquaux at normalesup.org> wrote:
>> On Wed, Apr 04, 2012 at 04:41:51PM -0700, Abhishek Pratap wrote:
>>> Thanks Chris. So I guess the question becomes how can I efficiently
>>> cluster 1 million x,y coordinates.
>>
>> Did you try the scikit-learn's implementation of DBSCAN:
>> http://scikit-learn.org/stable/modules/clustering.html#dbscan
>> ? I am not sure that it scales, but it's worth trying.
>>
>> Alternatively, the best way to cluster massive datasets is to use the
>> mini-batch implementation of KMeans:
>> http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means
>>
>> Hope this helps,
>>
>> Gael
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list