[scikit-learn] Scikit Learn in a Cray computer

Brown J.B. jbbrown at kuhp.kyoto-u.ac.jp
Fri Jun 28 05:29:42 EDT 2019


>
> where you can see "ncpus = 1" (I still do not know why 4 lines were
> printed -
>
> (total of 40 nodes) and each node has 1 CPU and 1 GPU!
>


> #PBS -l select=1:ncpus=8:mpiprocs=8
> aprun -n 4 p.sh ./ncpus.py
>

You can request 8 CPUs from a job scheduler, but if each node the script
runs on contains only one virtual/physical core, then cpu_count() will
return 1.
If that CPU supports multi-threading, you would typically get 2.

For example, on my workstation:
`--> egrep "processor|model name|core id" /proc/cpuinfo
processor : 0
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 0
processor : 1
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 1
processor : 2
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 0
processor : 3
model name : Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
core id : 1
`--> python3 -c "from sklearn.externals import joblib;
print(joblib.cpu_count())"
4

It seems that in this situation, if you're wanting to parallelize
*independent* sklearn calculations (e.g., changing dataset or random seed),
you'll ask for the MPI by PBS processes like you have, but you'll need to
place the sklearn computations in a function and then take care of
distributing that function call across the MPI processes.

Then again, if the runs are independent, it's a lot easier to write a for
loop in a shell script that changes the dataset/seed and submits it to the
job scheduler to let the job handler take care of the parallel distribution.
(I do this when performing 10+ independent runs of sklearn modeling, where
models use multiple threads during calculations; in my case, SLURM then
takes care of finding the available nodes to distribute the work to.)

Hope this helps.
J.B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190628/1f730439/attachment.html>


More information about the scikit-learn mailing list