Parallelization in Python 2.6

Wed Aug 19 08:16:17 EDT 2009

On 18 Aug, 11:19, Robert Dailey <rcdai... at gmail.com> wrote:

> I'm looking for a way to parallelize my python script without using
> typical threading primitives. For example, C++ has pthreads and TBB to
> break things into "tasks".

In C++, parallelization without "typical threading primitives" usually
means one of three things:

- OpenMP pragmas
- the posix function fork(), unless you are using Windows
- MPI

In Python, you find the function os.fork and wrappers for MPI, and
they are used as in C++. With os.fork, I like to use a context
manager, putting the calls to fork in __enter__ and the calls to
sys.exit in __exit__. Then I can just write code like this:

with parallel():
   # parallel block here

You can also program in the same style as OpenMP using closures. Just
wrap whatever loop or block you want to execute in parallel in a
closure. It requires minimal edition of the serial code. Instead of

def foobar():
   for i in iterable:
       #whatever

you can add a closure (internal function) and do this:

def foobar():
   def section(): # add a closure
       for i in sheduled(iterable): # balance load
           #whatever
   parallel(section) # spawn off threads

Programs written in C++ are much more difficult to parallelize with
threads because C++ do not have closures. This is why pragma-based
parallelization (OpenMP) was invented:

#pragma omp parallel for private(i)
for (i=0; i<n; i++) {
   // whatever
}

You should know about the GIL. It prevents multiple threads form using
the Python interpreter simultaneously. For parallel computing, this is
a blessing and a curse. Only C extensions can release the GIL; this
includes I/0 routines in Python's standard library. If the GIL is not
released, the C library call are guaranteed to be thread-safe.
However, the Python interpreter will be blocked while waiting for the
library call to return. If the GIL is released, parallelization works
as expected; you can also utilise multi-core CPUs (it is a common
misbelief that Python cannot do this).

What the GIL prevents you from doing, is writing parallel compute-
bound code in "pure python" using threads. Most likely, you don't want
to do this. There is a 200x speed penalty from using Python over a C
extension. If you care enough about speed to program for parallel
execution, you should always use some C. If you still want to do this,
you can use processes instead (os.fork, multiprocessing, MPI), as the
GIL only affects threads.

It should be mentioned that compute-bound code is very rare, and
typically involves scientific computing. The only every-day example is
3D graphics. However, this is taken care of by the GPU and libraries
like OpenGL and Direct3D. Most parallel code you will want to write
are I/O bound. You can use the Python standard library and threads for
this, as it releases the GIL whenever a blocking call is made.

I program Python for scientific computing daily (computational
neuroscience). I have yet to experience that the GIL has hindered me
in my work. This is because whenever I run into a computational
bottleneck I cannot solve with NumPy, putting this tiny piece of code
in Fortran, C or Cython involves very little work. 95% is still
written in plain Python. The human brain is bad at detecting
computational bottlenecks though. So it almost always pays off to
write everything in Python first, and use the profiler to locate the
worst offenders.

Regards,
Sturla Molden