[Cython] CEP: prange for parallel loops
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Tue Apr 5 18:32:25 CEST 2011
On 04/05/2011 05:26 PM, Robert Bradshaw wrote:
> On Tue, Apr 5, 2011 at 8:02 AM, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no> wrote:
>> On 04/05/2011 04:58 PM, Dag Sverre Seljebotn wrote:
>>> On 04/05/2011 04:53 PM, Robert Bradshaw wrote:
>>>> On Tue, Apr 5, 2011 at 3:51 AM, Stefan Behnel<stefan_ml at behnel.de>
>>>> wrote:
>>>>> mark florisson, 04.04.2011 21:26:
>>>>>> For clarity, I'll add an example:
>>>>>>
>>>>>> def f(np.ndarray[double] x, double alpha):
>>>>>> cdef double s = 0
>>>>>> cdef double tmp = 2
>>>>>> cdef double other = 6.6
>>>>>>
>>>>>> with nogil:
>>>>>> for i in prange(x.shape[0]):
>>>>>> # reading 'tmp' makes it firstprivate in addition to
>>>>>> lastprivate
>>>>>> # 'other' is only ever read, so it's shared
>>>>>> printf("%lf %lf %lf\n", tmp, s, other)
>>>>> So, adding a printf() to your code can change the semantics of your
>>>>> variables? That sounds like a really bad design to me.
>>>> That's what I was thinking. Basically, if you do an inlace operation,
>>>> then it's a reduction variable, no matter what else you do to it
>>>> (including possibly a direct assignment, though we could make that a
>>>> compile-time error).
>>> -1, I think that's too obscure. Not being able to use inplace operators
>>> for certain variables will be at the very least be nagging.
> You could still use inplace operators to your hearts content--just
> don't bother using the reduced variable outside the loop. (I guess I'm
> assuming reducing a variable has negligible performance overhead,
> which it should.) For the rare cases that you want the non-aggregated
> private, make an assignment to another variable, or use non-inplace
> operations.
Ahh! Of course! With some control flow analysis we could even eliminate
the reduction if the variable isn't used after the loop, although I
agree the cost should be trivial.
> Not being able to mix inplace operators might be an annoyance. We
> could also allow explicit declarations, as per Pauli's suggestion, but
> not require them. Essentially, as long as we have
I think you should be able to mix them, but if you do a reduction
doesn't happen. This is slightly uncomfortable, but I believe control
flow analysis and disabling firstprivate can solve it, see below.
I believe I'm back in the implicit-camp. And the CEP can probably be
simplified a bit too, I'll try to do that tomorrow.
Two things:
* It'd still be nice with something like a parallel block for thread
setup/teardown rather than "if firstthreaditeration():". So, a prange
for the 50% simplest cases, followed by a parallel-block for the next 30%.
* Control flow analysis can help us tight it up a bit: For loops where
you actually depend on values of thread-private variables computed in
the previous iteration (beyond reduction), it'd be nice to raise a
warning unless the variable is explicitly declared thread-local or
similar. There are uses for such variables but they'd be rather rare,
and such a hint could be very helpful.
I'm still not sure if we want firstprivate, even if we can do it. It'd
be good to see a usecase for it. I'd rather have NaN and 0x7FFFFFFF
personally, as relying on the firstprivate value is likely a bug -- yes,
it makes the sequential case work, but that is exactly in the case where
parallelizing the sequential case would be wrong!!
Grepping through 30000 lines of heavily OpenMP-ified Fortran code here
there's no mention of firstprivate or lastprivate (although we certainly
want lastprivate to align with the sequential case).
Dag Sverre
More information about the cython-devel
mailing list