Scheduling used in the multiprocessing.Pool.map() function

rpg rpg.314 at gmail.com
Sat Oct 31 11:12:02 EDT 2009


Hi all,

I have been using the map() function in the multiprocessing module to
parallelize my tasks on a dual core CPU. My tasks are embarrassingly
parallel, shared nothing tasks. In one of my runs, I found that the
this function interleaves execution of two processes over a single
list.

So far so good. But the problem is that the last remnant job is
executed serially. I mean that it seems that the job scheduling is
essentially static, and the last piece does not execute in parallel.

Why can't there be a task-stealing scheduler in multiprocessing? Each
of my individual function call in map takes over half hour (Each
function call internally calls out to c++ code). This could be a very
useful addition to multiprocessing's utility.

Thanks.



More information about the Python-list mailing list