[scikit-learn] Finding a single cluster in 1d data

Raphael C drraph at gmail.com
Thu Apr 12 15:22:44 EDT 2018


I have a set of points in 1d represented by a list X of floating point
numbers.  The list has one dense section and the rest is sparse and I
want to find the dense part. I can't release the actual data but here
is a simulation:

N = 100

start = 0
points = []
rate = 0.1
for i in range(N):
    points.append(start)
    start = start + random.expovariate(rate)
rate = 10
for i in range(N*10):
    points.append(start)
    start = start + random.expovariate(rate)
rate = 0.1
for i in range(N):
    points.append(start)
    start = start + random.expovariate(rate)
plt.hist(points, bins = 100)
plt.show()

I would like to use scikit learn to find the dense region. This feels
a little like outlier detection or the task of finding one cluster
with noise.

Is there a suitable method in scikit learn for this task?

Raphael


More information about the scikit-learn mailing list