Is this a bug in multiprocessing or in my script?

Wed Aug 5 01:21:20 EDT 2009

On Aug 5, 4:37 am, erikcw <erikwickst... at gmail.com> wrote:

> It's not always the same traceback, but they are always short like
> this.  I'm running Python 2.6.2 on Ubuntu 9.04.
>
> Any idea how I can debug this?

In my experience, multiprocessing is fragile. Scripts tend fo fail for
no obvious reason, case processes to be orphaned and linger, system-
wide resource leaks, etc. For example, multiprocessing uses os._exit
to stop a spawned process, even though it inevitably results in
resource leaks on Linux (it should use sys.exit). Gaël Varoquaux and I
noticed this when we implemented shared memory ndarrays for numpy; we
consistently got memory leaks with System V IPC for no obvious reason.
Even after Jesse Noller was informed of the problem (about half a year
ago), the bug still lingers. It is easy edit multiprocessing's
forking.py file on you own, but bugs like this is a pain in the ass,
and I suspect multiprocessing has many of them. Of course unless you
show us you whole script, identifying the source of your bug will be
impossible. But it may very likely be in multiprocessing as well. The
quality of this module is not impressing. I am beginning to think that
multiprocessing should never have made it into the Python standard
library. The GIL cannot be that bad! If you can't stand the GIL, get a
Unix (or Mac, Linux, Cygwin) and use os.fork. Or simply switch to a
non-GIL Python: IronPython or Jython.

Allow me to show you something better. With os.fork we can write code
like this:

class parallel(object):

   def __enter__(self):
       # call os.fork

   def __exit__(self, exc_type, exc_value, traceback):
       # call sys.exit in the child processes and
       # os.waitpid in the parent

   def __call__(self, iterable):
       # return different sub-subsequences depending on
       # child or parent status

with parallel() as p:
    # parallel block starts here

    for item in p(iterable):
        # whatever

    # parallel block ends here

This makes parallel code a lot cleaner than anything you can do with
multiprocessing, allowing you to use constructs similar to OpenMP.
Further, if you make 'parallel' a dummy context manager, you can
develop and test the algorithms serially. The only drawback is that
you have to use Cygwin to get os.fork on Windows, and forking will be
less efficient (no copy-on-write optimization). Well, this is just one
example of why Windows sucks from the perspective of the programmer.
But it also shows that you can do much better by not using
multiprocessing at all.

The only case I can think of where multiprocessing would be usesful,
is I/O bound code on Windows. But here you will almost always resort
to C extension modules. For I/O bound code, Python tends to give you a
200x speed penalty over C. If you are resorting to C anyway, you can
just use OpenMP in C for your parallel processing. We can thus forget
about multiprocessing here as well, given that we have access to the C
code. If we don't, it is still very likely that the C code releases
the GIL, and we can get away with using Python threads instead of
multiprocessing.

IMHO, if you are using multiprocessing, you are very likely to have a
design problem.

Regards,
Sturla