[SciPy-Dev] General discussion on parallelisation

Gael Varoquaux gael.varoquaux at normalesup.org
Tue Jan 9 08:34:48 EST 2018


> 5. joblib.Parallel doesn't have a map method (desirable to allow 3) so a small

joblib has a custom backend framework that can be used for such purpose
(if I understnad you well):
https://pythonhosted.org/joblib/parallel.html#custom-backend-api-experimental

There are currently a Yarn and a dask.distributed backend that are
getting better and better.

> 6. joblib.Parallel creates/destroys a multiprocessing.Pool each time the
> Parallel object is `__call__`ed. This leads to significant overhead. One can
> use the Parallel object with a context manager, which allows reuse of the Pool,
> but I don't think that's do-able in the context of using the
> DifferentialEvolutionSolver (DES) object as an iterator:

This is evolving. However, the reason behind this is that Pool get
corrupted and lead to deadlock. Olivier Grisel and Thomas Moreau are
working on fixing this in the Python standard library (first PR merged
recently)!

One of the vision of joblib is to provide very light mid-layer that can
connect to multiprocessing and threading (though we are considering
switching to concurrent.futures) as well as other backends. Hopefully
this common language makes it easier to do things like embedding dask in
numerical algorithms without a hard dependencies (yes we are working with
the dask team on this).

Gaël


More information about the SciPy-Dev mailing list