[Cython] CEP: prange for parallel loops

Tue Apr 5 18:32:25 CEST 2011

On 04/05/2011 05:26 PM, Robert Bradshaw wrote:
> On Tue, Apr 5, 2011 at 8:02 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 04/05/2011 04:58 PM, Dag Sverre Seljebotn wrote:
>>> On 04/05/2011 04:53 PM, Robert Bradshaw wrote:
>>>> On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel<stefan_ml at behnel.de>
>>>>   wrote:
>>>>> mark florisson, 04.04.2011 21:26:
>>>>>> For clarity, I'll add an example:
>>>>>>
>>>>>> def f(np.ndarray[double] x, double alpha):
>>>>>>      cdef double s = 0
>>>>>>      cdef double tmp = 2
>>>>>>      cdef double other = 6.6
>>>>>>
>>>>>>      with nogil:
>>>>>>          for i in prange(x.shape[0]):
>>>>>>              # reading 'tmp' makes it firstprivate in addition to
>>>>>> lastprivate
>>>>>>              # 'other' is only ever read, so it's shared
>>>>>>              printf("%lf %lf %lf\n", tmp, s, other)
>>>>> So, adding a printf() to your code can change the semantics of your
>>>>> variables? That sounds like a really bad design to me.
>>>> That's what I was thinking. Basically, if you do an inlace operation,
>>>> then it's a reduction variable, no matter what else you do to it
>>>> (including possibly a direct assignment, though we could make that a
>>>> compile-time error).
>>> -1, I think that's too obscure. Not being able to use inplace operators
>>> for certain variables will be at the very least be nagging.
> You could still use inplace operators to your hearts content--just
> don't bother using the reduced variable outside the loop. (I guess I'm
> assuming reducing a variable has negligible performance overhead,
> which it should.) For the rare cases that you want the non-aggregated
> private, make an assignment to another variable, or use non-inplace
> operations.

Ahh! Of course! With some control flow analysis we could even eliminate 
the reduction if the variable isn't used after the loop, although I 
agree the cost should be trivial.

> Not being able to mix inplace operators might be an annoyance. We
> could also allow explicit declarations, as per Pauli's suggestion, but
> not require them. Essentially, as long as we have

I think you should be able to mix them, but if you do a reduction 
doesn't happen. This is slightly uncomfortable, but I believe control 
flow analysis and disabling firstprivate can solve it, see below.

I believe I'm back in the implicit-camp. And the CEP can probably be 
simplified a bit too, I'll try to do that tomorrow.

Two things:

  * It'd still be nice with something like a parallel block for thread 
setup/teardown rather than "if firstthreaditeration():". So, a prange 
for the 50% simplest cases, followed by a parallel-block for the next 30%.

  * Control flow analysis can help us tight it up a bit: For loops where 
you actually depend on values of thread-private variables computed in 
the previous iteration (beyond reduction), it'd be nice to raise a 
warning unless the variable is explicitly declared thread-local or 
similar. There are uses for such variables but they'd be rather rare, 
and such a hint could be very helpful.

I'm still not sure if we want firstprivate, even if we can do it. It'd 
be good to see a usecase for it. I'd rather have NaN and 0x7FFFFFFF 
personally, as relying on the firstprivate value is likely a bug -- yes, 
it makes the sequential case work, but that is exactly in the case where 
parallelizing the sequential case would be wrong!!

Grepping through 30000 lines of heavily OpenMP-ified Fortran code here 
there's no mention of firstprivate or lastprivate (although we certainly 
want lastprivate to align with the sequential case).

Dag Sverre