[SciPy-Dev] ENH: add Lloyd's algorithm

Andras Deak deak.andris at gmail.com
Thu Aug 26 16:41:54 EDT 2021


On Thu, Aug 26, 2021 at 10:25 PM Tyler Reddy <tyler.je.reddy at gmail.com> wrote:
>
> It may be good to clarify why it goes in SciPy vs. say scikit-learn next to k-means clustering. Their docs for k-means ( https://scikit-learn.org/stable/modules/clustering.html#k-means ) even mention several things that sound
> very similar to what I saw in the PR when I took a quick look:
>
> "K-means is often referred to as Lloyd’s algorithm."
>
> and the description of the algo steps sounds very similar:
>
> "The algorithm can also be understood through the concept of Voronoi diagrams. First the Voronoi diagram of the points is calculated using the current centroids. Each segment in the Voronoi diagram becomes a separate cluster. Secondly, the centroids are updated to the mean of each segment. The algorithm then repeats this until a stopping criterion is fulfilled."

That sounds like (ab)using K-means clustering by requesting n clusters
for n points: you want each point to be its own "cluster" so that they
each get updated. Except I suspect that this could mess with stopping
criteria (assuming there's one for the makeup of each cluster in each
iteration). I wonder if there would also be overhead from targetting
what is probably a pathological case for clustering.
I guess it sounds more like that a k-means implementation could make
use of this new functionality (wherever it goes) to handle the
centroid updates.

András


>
> Anyway, not saying SciPy shouldn't take it, but if it isn't tightly coupled to the stats qmc stuff, it might be well suited to be placed next to its closest siblings in the ecosystem?
>
> If it is mostly for qmc *internals* in SciPy and we can't depend on scikit-learn, I suppose that might be a reason, though if qmc only could it be private and used as a flag/option where needed in qmc?
>
> If it is used for i.e., *input* to qmc, or on the *output* from qmc stuff, then having an end-user use scikit-learn seems less of an ask.
>
> Tyler
>
> On Thu, 26 Aug 2021 at 13:25, Pamphile Roy <roy.pamphile at gmail.com> wrote:
>>
>> Hi everyone,
>>
>> I opened a PR which proposes to add Lloyd’s algorithm.
>>
>> https://github.com/scipy/scipy/pull/14642
>>
>> The TL;DR is: Lloyd's algorithm, or Central Voronoi Tesselation (CVT), post-process a sample to uniformize the inter-sample distances.
>> It's an iterative process. From a sample, the Voronoi cells are calculated, then each sample is moved toward the centroid of its cell.
>>
>> Initially I though about adding this to scipy.stats.qmc as I intended to use this to improve a sample.
>> But Matt rightly pointed out that this could just be put in scipy.spatial.
>>
>> What do you think?
>>
>> Thanks in advance for your inputs!
>>
>> Cheers,
>> Pamphile
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev


More information about the SciPy-Dev mailing list