[SciPy-Dev] General discussion on parallelisation

Mon Sep 3 16:10:50 EDT 2018

On Mon, Sep 3, 2018 at 1:05 PM Gael Varoquaux <gael.varoquaux at normalesup.org>
wrote:

> On Mon, Sep 03, 2018 at 10:16:51AM -0700, Ralf Gommers wrote:
> >     So, reading the `concurrent.futures.ProcessPoolExecutor`
> documentation
> >     indicates that it is resistant to this issue. concurrent.futures is
> >     available in Python3, but wasn't ported to 2.7.
>
>
> > The PR now uses multiprocessing on Python 2.7 and concurrent.futures on
> 3.x -
> > this seems fine to me. We're not supporting 2.7 for that much longer, so
> the
> > code can be simplified a bit when we drop 2.7
>
> OK. I can think of two quite use features that joblib add:
>
> * Support of dask.distributed as a backend, to distribute code across
>   computers.
>
> * Fallback to threading in case of nested parallelism, and in case of two
>   levels of nesting, fall back to sequential to avoid over committing.
>

Those are both quite useful. How would you add an API for those to SciPy
functions (if that's necessary - I assume threading fallback is automatic)?

Right now we have a single keyword `workers=1`, which behaves like
scikit-learn's n_jobs=1 (# of CPUs), and also accept objects with a map()
method like multiprocessing.Pool

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180903/ad82a294/attachment.html>