[Cython] CEP: prange for parallel loops

mark florisson markflorisson88 at gmail.com
Tue Apr 5 15:56:55 CEST 2011


On 5 April 2011 14:55, Pauli Virtanen <pav at iki.fi> wrote:
>
> Mon, 04 Apr 2011 21:26:34 +0200, mark florisson wrote:
> [clip]
> > For clarity, I'll add an example:
> [clip]
>
> How about making all the special declarations explicit? The automatic
> inference of variables has a problem in that a small change in a part of
> the code can have somewhat unintuitive non-local effects, as the private/
> shared/reduction status of the variable changes in the whole function
> scope (if Python scoping is retained).
>
> Like so with explicit declarations:
>
> def f(np.ndarray[double] x, double alpha):
>    cdef double alpha = 6.6
>    cdef char *ptr = something()
>
>    # Parallel variables are declared beforehand;
>    # the exact syntax could also be something else
>    cdef cython.parallel.private[int] tmp = 2, tmp2
>    cdef cython.parallel.reduction[int] s = 0
>
>    # Act like ordinary cdef outside prange(); in the prange they are
>    # firstprivate if initialized or written to outside the loop anywhere
>    # in the scope. Or, they could be firstprivate always, if this
>    # has a negligible performance impact.
>    tmp = 3

The problem with firstprivate() is that it doesn't give you the same
semantics as in the sequential version. That's why I think it would be
best to forget about firstprivate entirely and allow reading of
private variables only after they are assigned to in the loop body.

>
>    with nogil:
>        s = 9
>
>        for i in prange(x.shape[0]):
>            if cython.parallel.first_iteration(i):
>                # whatever initialization; Cython is in principle allowed
>                # to move this outside the loop, at least if it is
>                # the first thing here
>                pass

For this I prefer the aforementioned 'with cython.parallel:' block.

>
>            # tmp2 is not firstprivate, as it's not written to outside
>            # the loop body; also, it's also not lastprivate as it's not
>            # read outside the loop
>            tmp2 = 99
>
>            # Increment a private variable
>            tmp += 2*tmp
>
>            # Add stuff to reduction
>            s += alpha*i
>
>            # The following raise a compilation error -- the reduction
>            # variable cannot be assigned to, and can be only operated on
>            # with only a single reduction operation inside prange
>            s *= 9
>            s = 8

I think OpenMP allows arbitrary assignments and expressions to the
reduction variable, all the spec says "usually it will be of the form
'x <binop>= ...'".

>
>            # It can be read, however, provided openmp supports this
>            tmp = s
>
>            # Assignment to non-private variables causes a compile-time
>            # error; this avoids common mistakes, such as forgetting to
>            # declare the reduction variable.
>            alpha += 42
>            alpha123 = 9
>            ptr = 94
>
>            # These, however, need to be allowed:
>            # the users are on their own to make sure they don't clobber
>            # non-local variables
>            x[i] = 123
>            (ptr + i)[0] = 123
>            some_routine(x, ptr, i)

Indeed. They could be either shared or firstprivate (as the pointer
would be firstprivate, and not the entire array, unless it was
declared as a C array of certain size).

>        else:
>            # private variables are lastprivate if read outside the loop
>            foo = tmp
>
>        # The else: block can be added, but actually has no effect
>        # as it is always executed --- the code here could as well
>        # be written after the for loop
>        foo = tmp  # <- same result
>
>    with nogil:
>        # Suppose Cython allowed cdef inside blocks with usual scoping
>        # rules
>        cdef cython.parallel.reduction[double] r = 0
>
>        # the same variables can be used again in a second parallel loop
>        for i in prange(x.shape[0]):
>            r += 1.5
>            s -= i
>            tmp = 9
>
>        # also the iteration variable is available after the loop
>        count = i
>
>    # As per usual Cython scoping rules
>    return r, s
>
> What did I miss here? As far as I see, the above would have the same
> semantics and scoping as a single-threaded Python implementation.
>
> The only change required to make things parallel is replacing range() by
> prange() and adding the variable declarations.

Basically, I like your approach. It's only slightly more verbose as
the implicit way, as you need to declare the type of each variable
anyway.

I also still like the implicit way, but it has a couple of problems:
     - inplace operators suddenly declare a reduction
     - assigning to a variable has implicit (last)private semantics,
whereas assigning to an element in a buffer has shared semantics

Your explicit version solves both these problems. So I'm +1.

> --
> Pauli Virtanen
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


More information about the cython-devel mailing list