[Cython] OpenMP support

mark florisson markflorisson88 at gmail.com
Tue Mar 8 18:00:18 CET 2011


On 8 March 2011 17:38, Sturla Molden <sturla at molden.no> wrote:
> Den 08.03.2011 17:10, skrev mark florisson:
>>
>> But how useful is it to parallelize CPU-bound code while holding GIL?
>> Or do you mean to run the CPU-intensive section in a 'with nogil'
>> block and when you need to do locking, or when you need to deal with
>> Python objects you reaqcuire the GIL?
>
> The Python C API is not thread-safe, thus we cannot allow threads concurrent
> access.
>
> It does not matter if we use the GIL or something else as mutex. A
> user-space spinlock would probably be faster than kernel objects like the
> GIL. But that is just implementation details.

Right, but if you want to deal with Python objects, then you can't
just use another lock then the GIL, because there might still be other
Python threads. But perhaps you were talking of non-python-object
synchronization in nogil blocks.

>> The point of OpenMP is convenience, i.e., having your CPU-bound
>> algorithm parallelized with just a few annotations. If you rewrite
>> your code as a closure for say, a parallel for construct, you'd have
>> to call your closure at every iteration.
>
> No, that would be hidden away with a decorator.
>
> for i in range(n):
> <suite>
>
> becomes
>
> @openmp.parallel_range(n)
> def loop(i):
> <suite>
>
Sure, that's not what I was hinting at. What I meant was that the
wrapper returned by the decorator would have to call the closure for
every iteration, which introduces function call overhead.

>
>> And then you still have to
>> take care of any reduction and corresponding synchronization (unless
>> you have the GIL already). And then there's still the issue of
>> ordered, single and master constructs.
>
> Yes, and this is not difficult to implement. Ordered can be implemented with
> a Queue, master is just a check on thread id. Single can be implemented with
> an atomic CAS operation. This is just a line or two of library code each.
>>
>> Of course, using threads manually is always possible, it's just not
>> very convenient.
>
> No it's not, but I am not taking about that. I am talking about how to best
> map OpenMP to Python.
>
> Closure is one method, another that might be possible is a context manager
> (with-statement). I am not sure if this would be doable or not:
>
> with OpenMP( private=(i,j,k), shared=(x,y,z) ) as openmp:
> <suite>
>
> instead of #pragma omp parallel.
>
> But should we care if this is implemented with OpenMP or Python threads?
> It's just an implementation detail in the library, not visible to the user.

Indeed. I guess we just have to establish what we want to do: do we
want to support code with Python objects (and exceptions etc), or just
C code written in Cython? If it's the latter, then I still think using
OpenMP directly would be easier to implement and more convenient for
the user than decorators with closures, but maybe I'm too focussed on
OpenMP. And indeed, the syntax I proposed (apart from maybe
cython.openmp.(p)range) does not look very attractive.

> Also I am not against OpenMP, I use it all the time in Fortran :-)
>
> Another problem with using OpenMP inside the compiler, as opposed to an
> external library, is that it depends on a stabile ABI. If an ABI change to
> Cython's generated C code is made, even a minor change, OpenMP support will
> be broken.
>
>
> Sturla
>
>
>
>
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>


More information about the cython-devel mailing list