[Cython] cython.parallel tasks, single, master, critical, barriers

mark florisson markflorisson88 at gmail.com
Sun Oct 9 15:44:05 CEST 2011


On 9 October 2011 14:39, mark florisson <markflorisson88 at gmail.com> wrote:
> On 9 October 2011 14:30, mark florisson <markflorisson88 at gmail.com> wrote:
>> On 9 October 2011 13:57, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no> wrote:
>>> On 10/09/2011 02:18 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> On 10/09/2011 02:11 PM, mark florisson wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> So far people have been enthusiastic about the cython.parallel features,
>>>>> I think we should introduce some new features. I propose the following,
>>>>
>>>> Great!!
>>>>
>>>> I only have time for a very short feedback now, perhaps more will follow.
>>>>
>>>>> assume parallel has been imported from cython:
>>>>>
>>>>> with parallel.master():
>>>>> this is executed in the master thread in a parallel (non-prange)
>>>>> section
>>>>>
>>>>> with parallel.single():
>>>>> same as master, except any thread may do the execution
>>>>>
>>>>> An optional keyword argument 'nowait' specifies whether there will be a
>>>>> barrier at the end. The default is to wait.
>>>
>>> I like
>>>
>>> if parallel.is_master():
>>>    ...
>>> explicit_barrier_somehow() # see below
>>>
>>> better as a Pythonization. One could easily support is_master to be used in
>>> other contexts as well, simply by assigning a status flag in the master
>>> block.
>>>
>>> Using an if-test flows much better with Python I feel, but that naturally
>>> lead to making the barrier explicit. But I like the barrier always being
>>> explicit, rather than having it as a predicate on all the different
>>> constructs like in OpenMP....
>>
>> Hmm, that might mean you also want the barrier for a prange in a
>> parallel to be explicit. I like the 'if' test though, although it
>> wouldn't make sense for 'single'.
>>
>>> I'm less sure about single, since making it a function indicates one could
>>> use it in other contexts and the whole thing becomes too magic (since it's
>>> tied to the position of invocation). I'm tempted to suggest
>>>
>>> for _ in prange(1):
>>>    ...
>>>
>>> as our syntax for single.
>>
>> I think that syntax is absolutely terrible :) Perhaps single is not so
>> important and one can just use master instead (or, if really needed,
>> master + a task with the actual work).
>>
>>>>>
>>>>> with parallel.task():
>>>>> create a task to be executed by some thread in the team
>>>>> once a thread takes up the task it shall only be executed by that
>>>>> thread and no other thread (so the task will be tied to the thread)
>>>>>
>>>>> C variables will be firstprivate
>>>>> Python objects will be shared
>>>>>
>>>>> parallel.taskwait() # wait on any direct descendent tasks to finish
>>>>
>>>> Regarding tasks, I think this is mapping OpenMP too close to Python.
>>>> Closures are excellent for the notion of a task, so I think something
>>>> based on the futures API would work better. I realize that makes the
>>>> mapping to OpenMP and implementation a bit more difficult, but I think
>>>> it is worth it in the long run.
>>>>
>>>>>
>>>>> with parallel.critical():
>>>>> this section of code is mutually exclusive with other critical sections
>>>>> optional keyword argument 'name' specifies a name for the critical
>>>>> section,
>>>>> which means all sections with that name will exclude each other,
>>>>> but not
>>>>> critical sections with different names
>>>>>
>>>>> Note: all threads that encounter the section will execute it, just
>>>>> not at the same time
>>>
>>> Yes, this works well as a with-statement...
>>>
>>> ..except that it is slightly magic in that it binds to call position (unlike
>>> anything in Python). I.e. this would be more "correct", or at least
>>> Pythonic:
>>>
>>> with parallel.critical(__file__, __line__):
>>>    ...
>>>
>>
>> I'm not entirely sure what you mean here. Critical is really about the
>> block contained within, not about a position in a file. Not all
>> threads have to encounter the critical region, and not specifying a
>> name means you exclude with *all other* unnamed critical sections (not
>> just this one).
>>
>>>>>
>>>>> with parallel.barrier():
>>>>> all threads wait until everyone has reached the barrier
>>>>> either no one or everyone should encounter the barrier
>>>>> shared variables are flushed
>>>
>>> I have problems with requiring a noop with block...
>>>
>>> I'd much rather write
>>>
>>> parallel.barrier()
>>
>> Although in OpenMP it doesn't have any associated code, but we could
>> give it those semantics: apply the barrier at the end of the block of
>> code. The con is that the barrier is at the top while it only affects
>> leaving the block, you would write:
>>
>> with parallel.barrier():
>>    if rand() > .5:
>>        ...
>>    else:
>>        ...
>> # the barrier is here
>>
>>> However, that ties a function call to the place of invocation, and suggests
>>> that one could do
>>>
>>> if rand() > .5:
>>>    barrier()
>>> else:
>>>    i += 3
>>>    barrier()
>>>
>>> and have the same barrier in each case. Again,
>>>
>>> barrier(__file__, __line__)
>>>
>>> gets us purity at the cost of practicality.
>>
>> In this case (unlike the critical construct), yes. I think a warning
>> in the docs stating that either all or none of the threads must
>> encounter the barrier should suffice.
>>
>>> Another way is the pthreads
>>> approach (although one may have to use pthread rather then OpenMP to get it,
>>> unless there are named barriers?):
>>>
>>> barrier_a = parallel.barrier()
>>> barrier_b = parallel.barrier()
>>> with parallel:
>>>    barrier_a.wait()
>>>    if rand() > .5:
>>>        barrier_b.wait()
>>>    else:
>>>        i += 3
>>>        barrier_b.wait()
>>>
>>>
>>> I'm really not sure here.
>>
>> I think we should really just say to the user: "dont do this". There
>> are no named barriers, implementing this wouldn't be easy at all (in
>> fact, I'm not sure you can specify sane semantics for this if you have
>> more branches and some do not contain the same barrier). The block
>> structure for barriers would help here, as blocks are inconvenient to
>> write:
>>
>> if C:
>>    with barrier(): ...
>> else:
>>    with barrier(): ...
>>
>> is just not nice to write, you would instead write
>>
>> with barrier():
>>    if C:
>>        ...
>>    else:
>>        ...
>
> This would also allow one to write
>
> with barrier(), master():
>    ...
>
> Basically it's up to the user to use it sensibly. Usually you want a
> barrier to ensure that you have a well-defined state set by some code.
> One could (correctly) only put the last line of such code in the with
> block, but it would make more sense to put all associated code in
> there.
>
> If there isn't really any associated code, you could just put 'pass'
> in the block.
>
> Does that make sense? I haven't even convinced myself of it yet.
>
>>>>>
>>>>> Unfortunately, gcc again manages to horribly break master and single
>>>>> constructs in loops (versions 4.2 throughout 4.6), so I suppose I'll
>>>>> first file a bug report. Other (better) compilers like Portland (and I'm
>>>>> sure Intel) work fine. I suppose a warning in the documentation will
>>>>> suffice there.
>>>>>
>>>>> If we at some point implement vector/SIMD operations we could also try
>>>>> out the Fortran openmp workshare construct.
>>>>
>>>> I'm starting to learn myself OpenCL as part of a course. It's very neat
>>>> for some kinds of parallelism. What I'm saying is that at least of the
>>>> case of SIMD, we should not lock ourselves to Fortran+OpenMP thinking
>>>> too early, but also look forward to coming architectures (e.g., AMD's
>>>> GPU-and-CPU on same die design).
>>>>
>>>> Dag Sverre
>>>> _______________________________________________
>>>> cython-devel mailing list
>>>> cython-devel at python.org
>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel at python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>
>
> Of course, a 'with barrier():' means you can apply it anywhere:
>
> with parallel():
>    lots of code
>
>    with barrier():
>        single line of code
>
> But the trick for readable programs would be to find the section of code that is
>

It seems I didn't finish my last mail. I wanted to say that readable
programs would try to find a logical block of code which you're
synchronizing on with the barrier.


More information about the cython-devel mailing list