[scikit-learn] Finding a single cluster in 1d data

Pedro Pazzini pedropazzini at gmail.com
Thu Apr 12 21:19:54 EDT 2018


Hi Raphael.

An option to highlight a dense region in your vector is to use a density
estimator (http://scikit-learn.org/stable/modules/density.html).

But I think that the python module jenkspy (
https://pypi.python.org/pypi/jenkspy and https://github.com/mthh/jenkspy)
can help you also. The method finds the natural breaks of data in 1d (
https://en.wikipedia.org/wiki/Jenks_natural_breaks_optimization). I think
that if you find a good value for the 'nb_class' parameter you can separate
the dense region of your data from the sparse one.

K-means is a generalization of Jenks break optimization for multivariate
data, so, maybe, you could use the K-means module of scikit-learn for that
also. On this approach, personally, I think the jenskpy module more
straightforward.

I hope it helps.

Pedro Pazzini

2018-04-12 16:22 GMT-03:00 Raphael C <drraph at gmail.com>:

> I have a set of points in 1d represented by a list X of floating point
> numbers.  The list has one dense section and the rest is sparse and I
> want to find the dense part. I can't release the actual data but here
> is a simulation:
>
> N = 100
>
> start = 0
> points = []
> rate = 0.1
> for i in range(N):
>     points.append(start)
>     start = start + random.expovariate(rate)
> rate = 10
> for i in range(N*10):
>     points.append(start)
>     start = start + random.expovariate(rate)
> rate = 0.1
> for i in range(N):
>     points.append(start)
>     start = start + random.expovariate(rate)
> plt.hist(points, bins = 100)
> plt.show()
>
> I would like to use scikit learn to find the dense region. This feels
> a little like outlier detection or the task of finding one cluster
> with noise.
>
> Is there a suitable method in scikit learn for this task?
>
> Raphael
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180412/185de71f/attachment.html>


More information about the scikit-learn mailing list