[Cython] prange CEP updated

Thu Apr 14 21:58:52 CEST 2011

On 14 April 2011 21:37, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 04/14/2011 09:08 PM, mark florisson wrote:
>>
>> On 14 April 2011 20:58, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>
>>  wrote:
>>>
>>> On 04/14/2011 08:42 PM, mark florisson wrote:
>>>>
>>>> On 14 April 2011 20:29, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>
>>>>  wrote:
>>>>>
>>>>> On 04/13/2011 11:13 PM, mark florisson wrote:
>>>>>>
>>>>>> Although there is omp_get_max_threads():
>>>>>>
>>>>>> "The omp_get_max_threads routine returns an upper bound on the number
>>>>>> of threads that could be used to form a new team if a parallel region
>>>>>> without a num_threads clause were encountered after execution returns
>>>>>> from this routine."
>>>>>>
>>>>>> So we could have threadsvailable() evaluate to that if encountered
>>>>>> outside a parallel region. Inside, it would evaluate to
>>>>>> omp_get_num_threads(). At worst, people would over-allocate a bit.
>>>>>
>>>>> Well, over-allocating could well mean 1 GB, which could well mean
>>>>> getting
>>>>> an
>>>>> unecesarry MemoryError (or, like in my case, if I'm not careful to set
>>>>> ulimit, getting a SIGKILL sent to you 2 minutes after the fact by the
>>>>> cluster patrol process...)
>>>>>
>>>>> But even ignoring this, we also have to plan for people misusing the
>>>>> feature. If we put it in there, somebody somewhere *will* write code
>>>>> like
>>>>> this:
>>>>>
>>>>> nthreads = threadsavailable()
>>>>> with parallel:
>>>>>    for i in prange(nthreads):
>>>>>        for j in range(100*i, 100*(i+1)): [...]
>>>>>
>>>>> (Yes, they shouldn't. Yes, they will.)
>>>>>
>>>>> Combined with a race condition that will only very seldomly trigger,
>>>>> this
>>>>> starts to sound like a very bad idea indeed.
>>>>>
>>>>> So I agree with you that we should just leave it for now, and do
>>>>> single/barrier later.
>>>>
>>>> omp_get_max_threads() doesn't have a race, as it returns the upper
>>>> bound. So e.g. if between your call and your parallel section less
>>>> OpenMP threads become available, then you might get less threads, but
>>>> never more.
>>>
>>> Oh, now I'm following you.
>>>
>>> Well, my argument was that I think erroring in that direction is pretty
>>> bad
>>> as well.
>>>
>>> Also, even if we're not making it available in cython.parallel, we're not
>>> stopping people from calling omp_get_max_threads directly themselves,
>>> which
>>> should be OK for the people who know enough to do this safely...
>>
>> True, but it wouldn't be as easy to wrap in a #ifdef _OPENMP. In any
>> event, we could just put a warning in the docs stating that using
>> threadsavailable outside parallel sections returns an upper bound on
>> the actual number of threads in a subsequent parallel section.
>
> I don't think outside or within makes a difference -- what about nested
> parallel sections? At least my intention in the CEP was that
> threadsavailable was always for the next section (so often it would be 1
> after entering the section).
>
> Perhaps just calling it "maxthreads" instead solves the issue.
>
> (Still, I favour just dropping threadsavailable/maxthreads for the time
> being. It is much simpler to add something later, when we've had some time
> to use it and reflect about it, than to remove something that shouldn't have
> been added.)
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>

Definitely true, I'll disable it for now.