Adding a Par construct to Python?
Gary Herron
gherron at islandtraining.com
Sun May 17 13:56:18 EDT 2009
MRAB wrote:
> Steven D'Aprano wrote:
>> On Sun, 17 May 2009 09:26:35 -0500, Grant Edwards wrote:
>>
>>> On 2009-05-17, Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au>
>>> wrote:
>>>> On Sun, 17 May 2009 05:05:03 -0700, jeremy wrote:
>>>>
>>>>> From a user point of view I think that adding a 'par' construct to
>>>>> Python for parallel loops would add a lot of power and simplicity,
>>>>> e.g.
>>>>>
>>>>> par i in list:
>>>>> updatePartition(i)
>>>>>
>>>>> There would be no locking and it would be the programmer's
>>>>> responsibility to ensure that the loop was truly parallel and
>>>>> correct.
>>>> What does 'par' actually do there?
>>> My reading of the OP is that it tells the interpreter that it can
>>> execute any/all iterations of updatePartion(i) in parallel (or
>>> presumably serially in any order) rather than serially in a strict
>>> sequence.
>>>
>>>> Given that it is the programmer's responsibility to ensure that
>>>> updatePartition was actually parallelized, couldn't that be written
>>>> as:
>>>>
>>>> for i in list:
>>>> updatePartition(i)
>>>>
>>>> and save a keyword?
>>> No, because a "for" loop is defined to execute it's iterations serially
>>> in a specific order. OTOH, a "par" loop is required to execute once
>>> for
>>> each value, but those executions could happen in parallel or in any
>>> order.
>>>
>>> At least that's how I understood the OP.
>>
>> I can try guessing what the OP is thinking just as well as anyone
>> else, but "in the face of ambiguity, refuse the temptation to guess" :)
>>
>> It isn't clear to me what the OP expects the "par" construct is
>> supposed to actually do. Does it create a thread for each iteration?
>> A process? Something else? Given that the rest of Python will be
>> sequential (apart from explicitly parallelized functions), and that
>> the OP specifies that updatePartition still needs to handle its own
>> parallelization, does it really matter if the calls to
>> updatePartition happen sequentially?
>>
>> If it's important to make the calls in arbitrary order,
>> random.shuffle will do that. If there's some other non-sequential and
>> non-random order to the calls, the OP should explain what it is. What
>> else, if anything, does par do, that it needs to be a keyword and
>> statement rather than a function? What does it do that (say) a
>> parallel version of map() wouldn't do?
>>
>> The OP also suggested:
>>
>> "There could also be parallel versions of map, filter and reduce
>> provided."
>>
>> It makes sense to talk about parallelizing map(), because you can
>> allocate a list of the right size to slot the results into as they
>> become available. I'm not so sure about filter(), unless you give up
>> the requirement that the filtered results occur in the same order as
>> the originals.
>>
>> But reduce()? I can't see how you can parallelize reduce(). By its
>> nature, it has to run sequentially: it can't operate on the nth item
>> until it is operated on the (n-1)th item.
>>
> It can calculate the items in parallel, but the final result must be
> calculated sequence, although if the final operation is commutative then
> some of them could be done in parallel.
That should read "associative" not "commutative".
For instance A+B+C+D could be calculated sequentially as implied by
((A+B)+C)+D
or with some parallelism as implied by
(A+B)+(C+D)
That's an application of the associativity of addition.
Gary Herron
More information about the Python-list
mailing list