[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Greg Brockman report at bugs.python.org
Tue Jul 27 11:05:38 CEST 2010


Greg Brockman <gdb at ksplice.com> added the comment:

> You can't have a sensible default timeout, because the worker may be
> processing something important...
In my case, the jobs are either functional or idempotent anyway, so aborting halfway through isn't a problem.  In general though, I'm not sure what kinds of use cases would tolerate silently-dropped jobs.  And for example, if an OOM kill has just occurred, then you're already in a state where a job was unexpectedly terminated... you wouldn't be violating any more contracts by aborting.

In general, I can't help but feel that the approach of "ignore errors and keep going" leads to rather unexpected bugs (and in this case, it leads to infinite hangs).  But even in languages where errors are ignored by default (e.g. sh), there are mechanisms for turning on abort-on-error handlers (e.g. set -e).

So my response is yes, you're right that there's no great default here.  However, I think it'd be worth (at least) letting the user specify "if something goes wrong, then abort".  Keep in mind that this will only happen in very exceptional circumstances anyway.

> Not everything can be simple.
Sure, but given the choice between a simple solution and a complex one, all else being equal the simple one is desirable.  And in this case, the more complicated mechanism seems to introduce subtle race conditions and failures modes.

Anyway, Jesse, it's been a while since we've heard anything from you... do you have thoughts on these issues?  It would probably be useful to get a fresh opinion :).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9205>
_______________________________________


More information about the Python-bugs-list mailing list