[Cython] CEP: prange for parallel loops

Stefan Behnel stefan_ml at behnel.de
Mon Apr 4 15:04:11 CEST 2011


Dag Sverre Seljebotn, 04.04.2011 13:53:
> On 04/04/2011 01:23 PM, Stefan Behnel wrote:
>> Dag Sverre Seljebotn, 04.04.2011 12:17:
>>> CEP up at http://wiki.cython.org/enhancements/prange
>>
>> """
>> Variable handling
>>
>> Rather than explicit declaration of shared/private variables we rely on
>> conventions:
>>
>> * Thread-shared: Variables that are only read and not written in the loop
>> body are shared across threads. Variables that are only used in the else
>> block are considered shared as well.
>>
>> * Thread-private: Variables that are assigned to in the loop body are
>> thread-private. Obviously, the iteration counter is thread-private as well.
>>
>> * Reduction: Variables that only used on the LHS of an inplace operator,
>> such as s above, are marked as targets for reduction. If the variable is
>> also used in other ways (LHS of assignment or in an expression) it does
>> instead turn into a thread-private variable. Note: This means that if
>> one, e.g., inserts printf(... s) above, s is turned into a thread-local
>> variable. OTOH, there is simply no way to correctly emulate the effect
>> printf(... s) would have in a sequential loop, so such code must be
>> discouraged anyway.
>> """
>>
>> What about simply (ab-)using Python semantics and creating a new inner
>> scope for the prange loop body? That would basically make the loop behave
>> like a closure function, but with the looping header at the 'right' place
>> rather than after the closure.
>
> I'm not quite sure what the concrete changes to the CEP this would lead to
> (assuming you mean this as a proposal for alternative semantics, and not an
> implementation detail).

What I would like to avoid is having to tell users "and now for something 
completely different". It looks like a loop, but then there's a whole page 
of new semantics for it. And this also cannot be used in plain Python code 
due to the differing scoping behaviour.


> How would we treat reduction variables? They need to be supported, and
> there's nothing in Python semantics to support reduction variables, they
> are a rather special case everywhere. I suppose keeping the reduction
> clause above, or use the "nonlocal" keyword in the loop body...

That's what I thought, yes. It looks unexpected, sure. That's the clear 
advantage of using inner functions, which do not add anything new at all. 
But if we want to add something that looks more like a loop, we should at 
least make it behave like something that's easy to explain.

Sorry for not taking the opportunity to articulate my scepticism in the 
workshop discussion. Skipping through the CEP now, I think this feature 
adds quite some complexity to the language, and I'm not sure it's worth 
that when compared to the existing closures. The equivalent 
closure+decorator syntax is certainly easier to explain, and could 
translate into exactly the same code. But with the clear advantage that the 
scope of local, nonlocal and thread-configuring variables is immediately 
obvious.

Basically, your example would become

def f(np.ndarray[double] x, double alpha):
     cdef double s = 0

     with cython.nogil:
         @cython.run_parallel_for_loop( range(x.shape[0]) )
         cdef threaded_loop(i):    # 'nogil' is inherited
             cdef double tmp = alpha * i
             nonlocal s
             s += x[i] * tmp
         s += alpha * (x.shape[0] - 1)
     return s

We likely agree that this is not beautiful. It's also harder to implement 
than a "simple" for-in-prange loop. But I find it at least easier to 
explain and semantically 'obvious'. And it would allow us to write a pure 
mode implementation for this based on the threading module.


> Also there's the else:-block, although we could make that part of the
> scope.

Since that's supposed to run single-threaded anyway, it can be written 
after the loop, right? Or is there really a use case where one of the 
threads has to do something in parallel, especially based on its local 
thread state, that the others don't do?


> And the "lastprivate" functionality, although that could be dropped
> without much loss.

I'm not sure how the "else" block and "lastprivate" could be integrated 
into the closures approach.

Stefan


More information about the cython-devel mailing list