[scikit-learn] fit before partial_fit ?

Mon Jun 10 00:25:24 EDT 2019

The clusters produces by your examples are actually the same (despite the
different labels).

I'd guess that "fit" and "partial_fit" draw a different amount of
random_numbers before actually assigning a label to the first (randomly
drawn) sample from "x" (in your code). This is why the labeling is
permutated.

Best regards
  Christian

Am Mo., 10. Juni 2019 um 04:12 Uhr schrieb lampahome <pahome.chen at mirlab.org
>:

>
>
> federico vaggi <vaggi.federico at gmail.com> 於 2019年6月7日 週五 上午1:08寫道：
>
>> k-means isn't a convex problem, unless you freeze the initialization, you
>> are going to get very different solutions (depending on the dataset) with
>> different initializations.
>>
>>
> Nope, I specify the random_state=0. u can try it.
>
> >>> x = np.array([[1,2],[2,3]])
> >>> y = np.array([[3,4],[4,5],[5,6]])
> >>> z = np.append(x,y, axis=0)
> >>> from sklearn.cluster import MiniBatchKMeans as MBK
> >>> m = MBK(random_state=0, n_clusters=2)
> >>> m.fit(x) ; m.labels_
> array([1,0], dtype=int32)  <-- (1-a)
> >>> m.partial_fit(y) ; m.labels_
> array([0,0,0], dtype=int32)  <-- (1-b)
>
> >>> m = MBK(random_state=0, n_clusters=2)
> >>> m.partial_fit(x) ; m.labels_
> array([0,1], dtype=int32)  <-- (2-a)
> >>> m.partial_fit(y) ; m.labels_
> array([1,1,1], dtype=int32)  <-- (2-b)
>
> 1-a,1-b and 2-a, 2-b are all different, especially the members of each
> cluster.
> I'm just confused about what usage of partial_fit and fit is the
> suitable(reasonable?) way to cluster incrementally?
>
> thx
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190610/9bbb772e/attachment.html>