[scikit-learn] A basic question about kmeans algorithms elkan and llyod

Wed Mar 25 22:16:03 EDT 2020

Hi admins,

My team is working on optimization on scikit-learn staff now. When it comes to kmeans, I find there are two algorithms, one of which is lloyd and the other is elkan, which is the optimized one for lloyd using triangle inequality.  In the older version of scikit-learn, elkan only supports dense dataset instead of sparse one. And in the latest version, elkan supports both type of datasets. So there is a question why both two algorithms are kept in kmeans since they do the almost same thing and elkan is a optimized one for lloyd. Are there any precision difference between two algorithms and how can I decide what algorithm to use?

Best regards,
George Fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200326/8c076dbd/attachment.html>