Is this a bug in multiprocessing or in my script?

Wed Aug 5 09:40:57 EDT 2009

On Aug 5, 1:21 am, sturlamolden <sturlamol... at yahoo.no> wrote:
> On Aug 5, 4:37 am, erikcw <erikwickst... at gmail.com> wrote:
>
> > It's not always the same traceback, but they are always short like
> > this.  I'm running Python 2.6.2 on Ubuntu 9.04.
>
> > Any idea how I can debug this?
>
> In my experience,multiprocessingis fragile. Scripts tend fo fail for
> no obvious reason, case processes to be orphaned and linger, system-
> wide resource leaks, etc. For example,multiprocessinguses os._exit
> to stop a spawned process, even though it inevitably results in
> resource leaks on Linux (it should use sys.exit). Gaël Varoquaux and I
> noticed this when we implemented shared memory ndarrays for numpy; we
> consistently got memory leaks with System V IPC for no obvious reason.
> Even after Jesse Noller was informed of the problem (about half a year
> ago), the bug still lingers. It is easy editmultiprocessing's
> forking.py file on you own, but bugs like this is a pain in the ass,
> and I suspectmultiprocessinghas many of them. Of course unless you
> show us you whole script, identifying the source of your bug will be
> impossible. But it may very likely be inmultiprocessingas well. The
> quality of this module is not impressing. I am beginning to think thatmultiprocessingshould never have made it into the Python standard
> library. The GIL cannot be that bad! If you can't stand the GIL, get a
> Unix (or Mac, Linux, Cygwin) and use os.fork. Or simply switch to a
> non-GIL Python: IronPython or Jython.
>
> Allow me to show you something better. With os.fork we can write code
> like this:
>
> class parallel(object):
>
>    def __enter__(self):
>        # call os.fork
>
>    def __exit__(self, exc_type, exc_value, traceback):
>        # call sys.exit in the child processes and
>        # os.waitpid in the parent
>
>    def __call__(self, iterable):
>        # return different sub-subsequences depending on
>        # child or parent status
>
> with parallel() as p:
>     # parallel block starts here
>
>     for item in p(iterable):
>         # whatever
>
>     # parallel block ends here
>
> This makes parallel code a lot cleaner than anything you can do withmultiprocessing, allowing you to use constructs similar to OpenMP.
> Further, if you make 'parallel' a dummy context manager, you can
> develop and test the algorithms serially. The only drawback is that
> you have to use Cygwin to get os.fork on Windows, and forking will be
> less efficient (no copy-on-write optimization). Well, this is just one
> example of why Windows sucks from the perspective of the programmer.
> But it also shows that you can do much better by notusingmultiprocessingat all.
>
> The only case I can think of wheremultiprocessingwould be usesful,
> is I/O bound code on Windows. But here you will almost always resort
> to C extension modules. For I/O bound code, Python tends to give you a
> 200x speed penalty over C. If you are resorting to C anyway, you can
> just use OpenMP in C for your parallel processing. We can thus forget
> aboutmultiprocessinghere as well, given that we have access to the C
> code. If we don't, it is still very likely that the C code releases
> the GIL, and we can get away withusingPython threads instead ofmultiprocessing.
>
> IMHO, if you areusingmultiprocessing, you are very likely to have a
> design problem.
>
> Regards,
> Sturla

Sturla;

That bug was fixed unless I'm missing something. Also, patches and
continued bug reports are welcome.

jesse