[SciPy-Dev] ENH: add Lloyd's algorithm

Pamphile Roy roy.pamphile at gmail.com
Fri Aug 27 01:58:22 EDT 2021


Thank you both for your inputs.

K-means and Lloyd’s algorithm are very different and do not serve the same purpose.

First of, let’s just focus on K-means and Voronoi diagrams.
K-means will find the centroids of clusters of points. Hence all clusters would have a centroid and a boundary, or cell (based on L2 usually).
Now, Voronoi diagrams separate each single point into a cell. And then you can compute the center of mass of the cell to get a centroid.
This is why it can be viewed as K-means, but it’s very exaggerated. Also Voronoi diagram is deterministic and unique.

Lloyd’s algorithm is a way to post process a sample of points in terms of L2. It’s used in a lot of fields to make blue noise, better meshes, integration, etc.
It’s an iterative process where we build a Voronoi map and move the current points towards their centroid. This process tends to lower the variance in the volume of each cells,
hence lowering the variance on the L2 between points.

I would argue that it fits SciPy because it’s a slight extension of Voronoi diagrams (which we have). We don’t have much in spatial right now and that might be a good candidate for
a new feature.

Pamphile



> On 26.08.2021, at 22:41, Andras Deak <deak.andris at gmail.com> wrote:
> 
> On Thu, Aug 26, 2021 at 10:25 PM Tyler Reddy <tyler.je.reddy at gmail.com <mailto:tyler.je.reddy at gmail.com>> wrote:
>> 
>> It may be good to clarify why it goes in SciPy vs. say scikit-learn next to k-means clustering. Their docs for k-means ( https://scikit-learn.org/stable/modules/clustering.html#k-means <https://scikit-learn.org/stable/modules/clustering.html#k-means> ) even mention several things that sound
>> very similar to what I saw in the PR when I took a quick look:
>> 
>> "K-means is often referred to as Lloyd’s algorithm."
>> 
>> and the description of the algo steps sounds very similar:
>> 
>> "The algorithm can also be understood through the concept of Voronoi diagrams. First the Voronoi diagram of the points is calculated using the current centroids. Each segment in the Voronoi diagram becomes a separate cluster. Secondly, the centroids are updated to the mean of each segment. The algorithm then repeats this until a stopping criterion is fulfilled."
> 
> That sounds like (ab)using K-means clustering by requesting n clusters
> for n points: you want each point to be its own "cluster" so that they
> each get updated. Except I suspect that this could mess with stopping
> criteria (assuming there's one for the makeup of each cluster in each
> iteration). I wonder if there would also be overhead from targetting
> what is probably a pathological case for clustering.
> I guess it sounds more like that a k-means implementation could make
> use of this new functionality (wherever it goes) to handle the
> centroid updates.
> 
> András
> 
> 
>> 
>> Anyway, not saying SciPy shouldn't take it, but if it isn't tightly coupled to the stats qmc stuff, it might be well suited to be placed next to its closest siblings in the ecosystem?
>> 
>> If it is mostly for qmc *internals* in SciPy and we can't depend on scikit-learn, I suppose that might be a reason, though if qmc only could it be private and used as a flag/option where needed in qmc?
>> 
>> If it is used for i.e., *input* to qmc, or on the *output* from qmc stuff, then having an end-user use scikit-learn seems less of an ask.
>> 
>> Tyler
>> 
>> On Thu, 26 Aug 2021 at 13:25, Pamphile Roy <roy.pamphile at gmail.com> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I opened a PR which proposes to add Lloyd’s algorithm.
>>> 
>>> https://github.com/scipy/scipy/pull/14642
>>> 
>>> The TL;DR is: Lloyd's algorithm, or Central Voronoi Tesselation (CVT), post-process a sample to uniformize the inter-sample distances.
>>> It's an iterative process. From a sample, the Voronoi cells are calculated, then each sample is moved toward the centroid of its cell.
>>> 
>>> Initially I though about adding this to scipy.stats.qmc as I intended to use this to improve a sample.
>>> But Matt rightly pointed out that this could just be put in scipy.spatial.
>>> 
>>> What do you think?
>>> 
>>> Thanks in advance for your inputs!
>>> 
>>> Cheers,
>>> Pamphile
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>> 
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
> https://mail.python.org/mailman/listinfo/scipy-dev <https://mail.python.org/mailman/listinfo/scipy-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210827/9f9129f3/attachment-0001.html>


More information about the SciPy-Dev mailing list