[Cython] prange CEP updated

Wed Apr 13 21:57:07 CEST 2011

On 04/13/2011 09:31 PM, mark florisson wrote:
> On 5 April 2011 22:29, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>  wrote:
>> I've done a pretty major revision to the prange CEP, bringing in a lot of
>> the feedback.
>>
>> Thread-private variables are now split in two cases:
>>
>>   i) The safe cases, which really require very little technical knowledge ->
>> automatically inferred
>>
>>   ii) As an advanced feature, unsafe cases that requires some knowledge of
>> threading ->  must be explicitly declared
>>
>> I think this split simplifies things a great deal.
>>
>> I'm rather excited over this now; this could turn out to be a really
>> user-friendly and safe feature that would not only allow us to support
>> OpenMP-like threading, but be more convenient to use in a range of common
>> cases.
>>
>> http://wiki.cython.org/enhancements/prange
>>
>> Dag Sverre
>>
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>>
>
> If we want to support cython.parallel.threadsavailable outside of
> parallel regions (which does not depend on the schedule used for
> worksharing constructs!), then we have to disable dynamic scheduling.
> For instance, if OpenMP sees some OpenMP threads are already busy,
> then with dynamic scheduling it dynamically establishes how many
> threads to use for any parallel region.
> So basically, if you put omp_get_num_threads() in a parallel region,
> you have a race when you depend on that result in a subsequent
> parallel region, because the number of busy OpenMP threads may have
> changed.

Ah, I don't know why I thought there wouldn't be a race condition. I 
wonder if the whole threadsavailable() idea should just be ditched and 
that we should think of something else. It's not a very common usecase. 
Starting to disable some forms of scheduling just to, essentially, 
shoehorn in one particular syntax, doesn't seem like the way to go.

Perhaps this calls for support for the critical(?) block then, after 
all. I'm at least +1 on dropping threadsavailable() and instead require 
that you call numthreads() in a critical block:

with parallel:
     with critical:
         # call numthreads() and allocate global buffer
         # calling threadid() not allowed, if we can manage that
     # get buffer slice for each thread

> So basically, to make threadsavailable() work outside parallel
> regions, we'd have to disable dynamic scheduling (omp_set_dynamic(0)).
> Of course, when OpenMP cannot request the amount of threads desired
> (because they are bounded by a configurable thread limit (and the OS
> of course)), the behaviour will be implementation defined. So then we
> could just put a warning in the docs for that, and users can check for
> this in the parallel region using threadsavailable() if it's really
> important.

Do you have any experience with what actually happen with, say, GNU 
OpenMP? I blindly assumed from the specs that it was an error condition 
("flag an error any way you like"), but I guess that may be wrong.

Just curious, I think we can just fall back to OpenMP behaviour; unless 
it terminates the interpreter in an error condition, in which case we 
should look into how expensive it is to check for the condition up front...

Dag Sverre