[scikit-learn] Scaling model selection on a cluster
Vlad Ionescu
ionescu.vlad1 at gmail.com
Sun Aug 7 04:42:03 EDT 2016
Hello,
I am interested in scaling grid searches on an HPC LSF cluster with about
60 nodes, each with 20 cores. I thought i could just set n_jobs=1000 then
submit a job with bsub -n 1000, but then I dug deeper and understood that
the underlying joblib used by scikit-learn will create all of those jobs on
a single node, resulting in no performance benefits. So I am stuck using a
single node.
I've read a lengthy discussion some time ago about adding something like
this in scikit-learn:
https://sourceforge.net/p/scikit-learn/mailman/scikit-learn-general/thread/4F26C3CB.8070603@ais.uni-bonn.de/
However, it hasn't materialized in any way, as far as I can tell.
Do you know of any way to do this, or any modern cluster computing
libraries for python that might help me write something myself (I found a
lot, but it's hard to tell what's considered good or even still under
development)?
Also, are there still plans to implement this in scikit-learn? You seemed
to like the idea back then.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160807/f6abb995/attachment.html>
More information about the scikit-learn
mailing list