[SciPy-Dev] General discussion on parallelisation

Ralf Gommers ralf.gommers at gmail.com
Mon Sep 3 16:44:44 EDT 2018


On Mon, Sep 3, 2018 at 1:10 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Mon, Sep 3, 2018 at 1:05 PM Gael Varoquaux <
> gael.varoquaux at normalesup.org> wrote:
>
>> On Mon, Sep 03, 2018 at 10:16:51AM -0700, Ralf Gommers wrote:
>> >     So, reading the `concurrent.futures.ProcessPoolExecutor`
>> documentation
>> >     indicates that it is resistant to this issue. concurrent.futures is
>> >     available in Python3, but wasn't ported to 2.7.
>>
>>
>> > The PR now uses multiprocessing on Python 2.7 and concurrent.futures on
>> 3.x -
>> > this seems fine to me. We're not supporting 2.7 for that much longer,
>> so the
>> > code can be simplified a bit when we drop 2.7
>>
>> OK. I can think of two quite use features that joblib add:
>>
>> * Support of dask.distributed as a backend, to distribute code across
>>   computers.
>>
>> * Fallback to threading in case of nested parallelism, and in case of two
>>   levels of nesting, fall back to sequential to avoid over committing.
>>
>
> Those are both quite useful. How would you add an API for those to SciPy
> functions (if that's necessary - I assume threading fallback is automatic)?
>

Okay found the answer I think (from
http://matthewrocklin.com/blog/work/2017/02/07/dask-sklearn-simple):

from joblib import parallel_backend
with parallel_backend('dask.distributed',
scheduler_host='scheduler-address:8786'):
   some_scipy_func(..., workers=N)

That would be quite nice. If we vendor joblib then the import can be: from
scipy import parallel_backend

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180903/a245007e/attachment.html>


More information about the SciPy-Dev mailing list