[Cython] CEP: prange for parallel loops
Stefan Behnel
stefan_ml at behnel.de
Mon Apr 4 15:04:11 CEST 2011
Dag Sverre Seljebotn, 04.04.2011 13:53:
> On 04/04/2011 01:23 PM, Stefan Behnel wrote:
>> Dag Sverre Seljebotn, 04.04.2011 12:17:
>>> CEP up at http://wiki.cython.org/enhancements/prange
>>
>> """
>> Variable handling
>>
>> Rather than explicit declaration of shared/private variables we rely on
>> conventions:
>>
>> * Thread-shared: Variables that are only read and not written in the loop
>> body are shared across threads. Variables that are only used in the else
>> block are considered shared as well.
>>
>> * Thread-private: Variables that are assigned to in the loop body are
>> thread-private. Obviously, the iteration counter is thread-private as well.
>>
>> * Reduction: Variables that only used on the LHS of an inplace operator,
>> such as s above, are marked as targets for reduction. If the variable is
>> also used in other ways (LHS of assignment or in an expression) it does
>> instead turn into a thread-private variable. Note: This means that if
>> one, e.g., inserts printf(... s) above, s is turned into a thread-local
>> variable. OTOH, there is simply no way to correctly emulate the effect
>> printf(... s) would have in a sequential loop, so such code must be
>> discouraged anyway.
>> """
>>
>> What about simply (ab-)using Python semantics and creating a new inner
>> scope for the prange loop body? That would basically make the loop behave
>> like a closure function, but with the looping header at the 'right' place
>> rather than after the closure.
>
> I'm not quite sure what the concrete changes to the CEP this would lead to
> (assuming you mean this as a proposal for alternative semantics, and not an
> implementation detail).
What I would like to avoid is having to tell users "and now for something
completely different". It looks like a loop, but then there's a whole page
of new semantics for it. And this also cannot be used in plain Python code
due to the differing scoping behaviour.
> How would we treat reduction variables? They need to be supported, and
> there's nothing in Python semantics to support reduction variables, they
> are a rather special case everywhere. I suppose keeping the reduction
> clause above, or use the "nonlocal" keyword in the loop body...
That's what I thought, yes. It looks unexpected, sure. That's the clear
advantage of using inner functions, which do not add anything new at all.
But if we want to add something that looks more like a loop, we should at
least make it behave like something that's easy to explain.
Sorry for not taking the opportunity to articulate my scepticism in the
workshop discussion. Skipping through the CEP now, I think this feature
adds quite some complexity to the language, and I'm not sure it's worth
that when compared to the existing closures. The equivalent
closure+decorator syntax is certainly easier to explain, and could
translate into exactly the same code. But with the clear advantage that the
scope of local, nonlocal and thread-configuring variables is immediately
obvious.
Basically, your example would become
def f(np.ndarray[double] x, double alpha):
cdef double s = 0
with cython.nogil:
@cython.run_parallel_for_loop( range(x.shape[0]) )
cdef threaded_loop(i): # 'nogil' is inherited
cdef double tmp = alpha * i
nonlocal s
s += x[i] * tmp
s += alpha * (x.shape[0] - 1)
return s
We likely agree that this is not beautiful. It's also harder to implement
than a "simple" for-in-prange loop. But I find it at least easier to
explain and semantically 'obvious'. And it would allow us to write a pure
mode implementation for this based on the threading module.
> Also there's the else:-block, although we could make that part of the
> scope.
Since that's supposed to run single-threaded anyway, it can be written
after the loop, right? Or is there really a use case where one of the
threads has to do something in parallel, especially based on its local
thread state, that the others don't do?
> And the "lastprivate" functionality, although that could be dropped
> without much loss.
I'm not sure how the "else" block and "lastprivate" could be integrated
into the closures approach.
Stefan
More information about the cython-devel
mailing list