Adding a Par construct to Python?

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Tue May 19 22:43:26 EDT 2009


On Tue, 19 May 2009 03:57:43 -0700, jeremy wrote:

>> you want it so simple to use that amateurs can mechanically replace
>> 'for' with 'par' in their code and everything will Just Work, no effort
>> or thought required.
> 
> Yes I do want the par construction to be simple, but of course you can't
> just replace a for loop with a par loop in the general case.

But that's exactly what you said you wanted people to be able to do:

"with my suggestion they could potentially get a massive speed up just by 
changing 'for' to 'par' or 'map' to 'pmap'."

I am finding this conversation difficult because it seems to me you don't 
have a consistent set of requirements.



> This issue
> arises when people use OpenMP: you can take a correct piece of code, add
> a comment to indicate that a loop is 'parallel', and if you get it wrong
> the code with no longer work correctly. 

How will 'par' be any different? It won't magically turn code with 
deadlocks into bug-free code.


> With my 'par' construct the
> programmer's intention is made explicit in the code, rather than by a
> compiler directive and so I think that is clearer than OpenMP.

A compiler directive is just as clear about the programmer's intention as 
a keyword. Possibly even more so.

#$ PARALLEL-LOOP
for x in seq:
    do(x)

Seems pretty obvious to me. (Not that I'm suggesting compiler directives 
is a good solution to this problem.)


> As I wrote before, concurrency is one of the hardest things for
> professional programmers to grasp. For 'amateur' programmers we need to
> make it as simple as possible, 

The problem is that "as simple as possible" is Not Very Simple. There's 
no getting around the fact that concurrency is inherently complex. In 
some special cases, you can keep it simple, e.g. parallel-map with a 
function that has no side-effects. But in the general case, no, you can't 
avoid dealing with the complexity, at least a little bit.


> and I think that a parallel loop
> construction and the dangers that lurk within would be reasonably
> straightforward to explain: there are no locks to worry about, no
> message passing.

It's *already* easy to explain. And having explained it, you still need 
to do something about it. You can't just say "Oh well, I've had all the 
pitfalls explained to me, so now I don't have to actually do anything 
about avoiding those pitfalls". You still need to actually avoid them. 
For example, you can choose one of four tactics:

(1) the loop construct deals with locking;

(2) the caller deals with locking;

(3) nobody deals with locking, therefore the code is buggy and risks 
deadlocks; or

(4) the caller is responsible for making sure he never shares data while 
looping over it.

I don't think I've missed any possibilities. You have to pick one of 
those four. 


> The only advanced concept is the 'sync' keyword, which
> would be used to rendezvous all the threads. That would only be used to
> speed up certain codes in order to avoid having to repeatedly shut down
> and start up gangs of threads.

So now you want a second keyword as well.



-- 
Steven



More information about the Python-list mailing list