[issue22393] multiprocessing.Pool shouldn't hang forever if a worker process dies unexpectedly

Mon Nov 12 17:57:52 EST 2018

Oscar Esteban <oesteban at stanford.edu> added the comment:

I tried to reuse as much as I could from the patch, but it didn't solve the issue at first.

I have changed the responsibility of identifying and prescribing a solution when a worker got killed. In the proposed patch, the thread handling results (i.e. tasks queued by one worker as done) was responsible. In the PR, the responsibility is reassigned to the thread handling workers (since, basically, one or more workers suddenly die).

The patch defined a new BROKEN state that was assigned to the results handler thread. I transferred this behavior to the worker handler thread. But, I'm guessing that the BROKEN state should be assigned to the Pool object instead, to be fully semantic. Although that would require passing the reference to the object around and complicate unnecessarily the implementation. Happy to reconsider though.

I added three tests, one that was present with the patch, a variation of it adding some wait before killing the worker, and the one that Francis Bolduc posted here (https://bugs.python.org/issue22393#msg294968).

Please let me know whether any conversation about this bug should take place in GitHub, with the PR instead of here.

Thanks a lot for the guidance, Antoine.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue22393>
_______________________________________