Adding a Par construct to Python?

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Tue May 19 23:29:22 EDT 2009


On Tue, 19 May 2009 05:52:04 -0500, Grant Edwards wrote:

> On 2009-05-19, Steven D'Aprano <steven at REMOVE.THIS.cybersource.com.au>
> wrote:
>> On Mon, 18 May 2009 02:27:06 -0700, jeremy wrote:
>>
>>> Let me clarify what I think par, pmap, pfilter and preduce would mean
>>> and how they would be implemented.
>> [...]
>>
>> Just for fun, I've implemented a parallel-map function, and done a
>> couple of tests. Comments, criticism and improvements welcome!
> 
> My only comment would be that your "slow function" might not be a very
> simulation for the general-case, since it uses time.sleep() which
> releases the GIL:


I didn't expect my code to magically overcome fundamental limitations of 
the CPython interpreter :)



>> def f(arg):  # Simulate a slow function.
>>     time.sleep(0.5)
>>     return 3*arg-2
> 
> Any Python function that isn't calling a library function written in C
> that releases the GIL won't show any speedup will it?

Not necessarily. Here's another function, that uses a loop instead of 
sleep.

def g(arg, SIZE=8*10**6):
    # Default SIZE is chosen so that on my machine, the loop 
    # takes approximately 0.5 second.
    for x in xrange(SIZE):
        pass
    return 3*arg-2


>>> setup = 'from __main__ import pmap, g; data = range(50)'
>>> min(Timer('map(g, data)', setup).repeat(repeat=5, number=3))
65.093590974807739
>>> min(Timer('pmap(g, data)', setup).repeat(repeat=5, number=3))
20.268381118774414


However, if the function is fast enough that the GIL doesn't get a chance 
to be released often enough, then pmap() is a pessimation:

>>> def g(arg):
...     return 3*arg+2
...
>>> min(Timer('map(g, data)', setup).repeat(repeat=5, number=3))
0.00012803077697753906
>>> min(Timer('pmap(g, data)', setup).repeat(repeat=5, number=3))
19.960626125335693

Concurrency doesn't come for free. There are setup and teardown costs, 
and management costs. Presumably if pmap() was written more cleverly, in 
C, those costs could be reduced, but not eliminated.



> I don't have a
> multi-core machine to try it on, but what happens when you replace your
> "slow function" code with something that actually burns CPU using
> pure-Python code instead of blocking on a timer in the OS?

Two simple work-arounds are:

* use Jython or IronPython; or

* insert time.sleep(0.000001) into your function at various points.


-- 
Steven



More information about the Python-list mailing list