[scikit-learn] How to determine suitable cluster algo
Matti Viljamaa
matti.v.viljamaa at gmail.com
Fri Jan 25 15:31:20 EST 2019
Also,
Remember that some algos may exhibit “sweet spots” w.r.t. computation time and gained accuracy.
So you might want to keep measuring “explained variance”, while you add complexity to your models. And then do plots of model complexity vs explained variance.
E.g. in MLPClassifier you’d plot e.g. hidden layers against explained variance to figure out where adding hidden layers starts to exhibit lesser gain in explained variance.
Lähetetty Windows 10:n Sähköpostista
Lähettäjä: Matti Viljamaa
Lähetetty: Friday, 25 January 2019 13.43
Vastaanottaja: Scikit-learn mailing list
Aihe: VS: [scikit-learn] How to determine suitable cluster algo
For determining what one can afford computaionally, see e.g.:
https://stackoverflow.com/questions/22443041/predicting-how-long-an-scikit-learn-classification-will-take-to-run
https://www.reddit.com/r/scikit_learn/comments/a746h0/is_there_any_way_to_estimate_how_long_a_given/
Lähetetty Windows 10:n Sähköpostista
Lähettäjä: lampahome
Lähetetty: Friday, 25 January 2019 3.42
Vastaanottaja: Scikit-learn mailing list
Aihe: Re: [scikit-learn] How to determine suitable cluster algo
Maybe the suitable way is try-and-error?
What I'm interesting is that my datasets is very huge and I can't try number of cluster from 1 to N if I have N samples
That cost too much time for me.
Maybe I should define the initial number of cluster based on execution time?
Then analyze the next step is increase/decrease the number of cluster?
thx
Virus-free. www.avast.com
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190125/e0caa2ba/attachment.html>
More information about the scikit-learn
mailing list