[scikit-learn] DBScan freezes my computer !!!

Andreas Mueller t3kcit at gmail.com
Wed May 16 13:37:36 EDT 2018


You might also consider looking at hdbscan:

https://github.com/scikit-learn-contrib/hdbscan


On 05/13/2018 11:07 PM, Joel Nothman wrote:
> Note that this has long been documented under "Memory consumption for 
> large sample sizes" at 
> http://scikit-learn.org/stable/modules/clustering.html#dbscan
>
> On 14 May 2018 at 12:59, Joel Nothman <joel.nothman at gmail.com 
> <mailto:joel.nothman at gmail.com>> wrote:
>
>     This is quite a common issue with our implementation of DBSCAN,
>     and improvements to documentation would be very, very welcome.
>
>     The high memory cost comes from constructing the pairwise radius
>     neighbors for all points. If using a distance metric that cannot
>     be indexed with a KD-tree or Ball Tree, this results in n^2 floats
>     being stored in memory even before the radius neighbors are computed.
>
>     You have the following strategies available to you currently:
>
>     1. Calculate the radius neighborhoods using radius_neighbors_graph
>     in chunks, so as to avoid all pairs being calculated and stored at
>     once. This produces a sparse graph representation, which can be
>     passed into dbscan with metric='precomputed'. (I've just seen
>     Sebastian suggested the same.)
>     2. Reduce the number of samples in your dataset and represent
>     (near-)duplicate points with sample_weight (i.e. two identical
>     points would be merged but would have a sample_weight of 2).
>
>     There is also a proposal to offer an alternative memory-efficient
>     mode at https://github.com/scikit-learn/scikit-learn/pull/6813
>     <https://github.com/scikit-learn/scikit-learn/pull/6813>. Feedback
>     is welcome.
>
>     Cheers,
>
>     Joel
>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180516/07f64d1f/attachment.html>


More information about the scikit-learn mailing list