[Numpy-discussion] Fast threading solution thoughts

Thu Feb 12 09:27:51 EST 2009

On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote:

> FYI, I am one of the core Cython developers and can make such 
> modifications in Cython itself as long as there's consensus on how it 
> should look on the Cython mailing list.  My problem is that I don't 
> really know OpenMP and have little experience with it, so I'm not the 
> best person for creating a draft for how such high-level OpenMP 
> constructs should look like in Cython.

I don't know the Cython internals, but I do know OpenMP. I mostly use it 
with Fortran.

The question is: Should OpenMP be comments in the Cython code (as they 
are in C and Fortran), or should OpenMP be special objects?

As for the GIL: No I don't think nogil should be implied. But Python 
objects should only be allowed as shared variables. Synchronization will 
then be as usual for shared variables in OpenMP (#pragma omp critical).

Here is my suggestion for syntax. If you just follow a consistent 
translation scheme, you don't need to know OpenMP in details. Here is a 
suggestion:

with openmp('parallel for', argument=iterable, ...):
    --> insert pragma directly above for

with openmp(directive, argument=iterable, ...):
    --> insert pragma and brackets

with openmp('atomic'): --> insert pragma directly

openmp('barrier') --> insert pragma directly

This by the way covers all of OpenMP. This is how it should translate:

with openmp('parallel for', private=(i,), shared=(n,),
     schedule='dynamic'):

     for i in range(n):
        pass

Compiles to:

#pragma omp parallel for \
private(i) \
shared(n) \
schedule(dynamic)
for(i=0; i<n; i++) {
   /* whatever */
}

with openmp('parallel sections',
      reduction=('+',k), private=(i,j)):

     with openmp('section'):
         i = foobar()

     with openmp('section'):
         j = foobar()

     k = i + j

Compiles to:

#pragma omp parallel sections\
reduction(+:k)\
private(i,j)
{
     #pragma omp section
     {
        i = foobar();
     }

     #pragma omp section
     {
        j = foobar();
     }

     k = i+j;
}

With Python objects, the programmer must synchronize access:

with openmp('parallel for', shared=(pyobj,n), private=(i,)):
     for i in range(n):
         with openmp('critical'):
             pyobj += i

#pragma omp parallel for \
shared(pyobj,n) \
private(i)
for (i=0; i<n; i++) {
    #pragma omp critical
    {
       pyobj += i;
    }
}

Atomic and barriers:

with openmp('atomic'): i += j

#pragma omp atomic
i += j;

with openmp('parallel for', default='private', shared(n,)):
    for i in range(n):
       openmp('barrier')

#pragma omp parallel for \
default(private)\
shared(n)\
for (i=0; i<n; i++)
{
    #pragma omp barrier
}

That is my suggestion. Easy to implement as you don't need to learn 
OpenMP first (not that it is difficult).

Sturla Molden