[Numpy-discussion] Fast threading solution thoughts
Sturla Molden
sturla at molden.no
Thu Feb 12 09:27:51 EST 2009
On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote:
> FYI, I am one of the core Cython developers and can make such
> modifications in Cython itself as long as there's consensus on how it
> should look on the Cython mailing list. My problem is that I don't
> really know OpenMP and have little experience with it, so I'm not the
> best person for creating a draft for how such high-level OpenMP
> constructs should look like in Cython.
I don't know the Cython internals, but I do know OpenMP. I mostly use it
with Fortran.
The question is: Should OpenMP be comments in the Cython code (as they
are in C and Fortran), or should OpenMP be special objects?
As for the GIL: No I don't think nogil should be implied. But Python
objects should only be allowed as shared variables. Synchronization will
then be as usual for shared variables in OpenMP (#pragma omp critical).
Here is my suggestion for syntax. If you just follow a consistent
translation scheme, you don't need to know OpenMP in details. Here is a
suggestion:
with openmp('parallel for', argument=iterable, ...):
--> insert pragma directly above for
with openmp(directive, argument=iterable, ...):
--> insert pragma and brackets
with openmp('atomic'): --> insert pragma directly
openmp('barrier') --> insert pragma directly
This by the way covers all of OpenMP. This is how it should translate:
with openmp('parallel for', private=(i,), shared=(n,),
schedule='dynamic'):
for i in range(n):
pass
Compiles to:
#pragma omp parallel for \
private(i) \
shared(n) \
schedule(dynamic)
for(i=0; i<n; i++) {
/* whatever */
}
with openmp('parallel sections',
reduction=('+',k), private=(i,j)):
with openmp('section'):
i = foobar()
with openmp('section'):
j = foobar()
k = i + j
Compiles to:
#pragma omp parallel sections\
reduction(+:k)\
private(i,j)
{
#pragma omp section
{
i = foobar();
}
#pragma omp section
{
j = foobar();
}
k = i+j;
}
With Python objects, the programmer must synchronize access:
with openmp('parallel for', shared=(pyobj,n), private=(i,)):
for i in range(n):
with openmp('critical'):
pyobj += i
#pragma omp parallel for \
shared(pyobj,n) \
private(i)
for (i=0; i<n; i++) {
#pragma omp critical
{
pyobj += i;
}
}
Atomic and barriers:
with openmp('atomic'): i += j
#pragma omp atomic
i += j;
with openmp('parallel for', default='private', shared(n,)):
for i in range(n):
openmp('barrier')
#pragma omp parallel for \
default(private)\
shared(n)\
for (i=0; i<n; i++)
{
#pragma omp barrier
}
That is my suggestion. Easy to implement as you don't need to learn
OpenMP first (not that it is difficult).
Sturla Molden
More information about the NumPy-Discussion
mailing list