Multiprocessing, join(), and crashed processes

Cameron Simpson cs at cskk.id.au
Thu Feb 6 00:53:23 EST 2020


On 05Feb2020 15:48, Israel Brewster <ijbrewster at alaska.edu> wrote:
>In a number of places I have constructs where I launch several 
>processes using the multiprocessing library, then loop through said 
>processes calling join() on each one to wait until they are all 
>complete. In general, this works well, with the *apparent* exception of 
>if something causes one of the child processes to crash (not throw an 
>exception, actually crash). In that event, it appears that the call to 
>join() hangs indefinitely. How can I best handle this? Should I put a 
>timeout on the join, and put it in a loop, such that every 5 seconds or 
>so it breaks, checks to see if the process is still actually running, 
>and if so goes back and calls join again? Or is there a better option 
>to say “wait until this process is done, however long that may be, 
>unless it crashes”?

What's your platform/OS? And what does "crash" mean, precisely?

If a subprocess exits, join() should terminate.

If the subprocess _hangs_, then join will not see it exit, because it 
hasn't. And join will hang.

You'll need to define what happens when your subprocesses crash.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list