[Numpy-discussion] Improving Python+MPI import performance

Sat Jan 14 02:21:32 EST 2012

On 01/14/2012 12:28 AM, Sturla Molden wrote:
> Den 13.01.2012 22:42, skrev Sturla Molden:
>> Den 13.01.2012 22:24, skrev Robert Kern:
>>> Do these systems have a ramdisk capability?
>> I assume you have seen this as well :)
>>
>> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf
>>
>
> This paper also repeats a common mistake about the GIL:
>
> "A future challenge is the increasing number of CPU cores per node,
> which is normally addressed by hybrid thread and message passing based
> parallelization. Whereas message passing can be used transparently by
> both on Python and C level, the global interpreter lock in CPython
> limits the thread based parallelization to the C-extensions only. We are
> currently investigating hybrid OpenMP/MPI implementation with the hope
> that limiting threading to only C-extension provides enough performance."
>
> This is NOT true.
>
> Python threads are native OS threads. They can be used for parallel
> computing on multi-core CPUs. The only requirement is that the Python
> code calls a C extension that releases the GIL. We can use threads in C
> or Python code: OpenMP and threading.Thread perform equally well, but if
> we use threading.Thread the GIL must be released for parallel execution.
> OpenMP is typically better for fine-grained parallelism in C code and
> threading.Thread is better for course-grained parallelism in Python
> code. The latter is also where mpi4py and multiprocessing can be used.

I don't see how you contradict their statement. The only code that can 
run without the GIL is in C-extensions (even if it is written in, say, 
Cython).

Dag Sverre