[issue34059] multiprocessing deadlock

Guillaume Perrault-Archambault report at bugs.python.org
Fri Jul 6 13:24:58 EDT 2018


Guillaume Perrault-Archambault <gperr050 at uottawa.ca> added the comment:

Hi Victor and Yang,

Thanks for your fast replies.

I did initially think it could be a torch issue. Indeed, I have an
equivalent numpy testcase that does not deadlock. However, the fact that it
gets stuck inside a multiprocessing wait statement makes me think it's
still a multiprocessing issue.

I've spent two weeks full time on this issue. Over at torch forums I've had
no replies (
https://discuss.pytorch.org/t/multiprocessing-code-works-using-numpy-but-deadlocked-using-pytorch/20473
).

On stackexchange I only got a workaround suggestion that works sporadically
(
https://stackoverflow.com/questions/51093970/multiprocessing-code-works-using-numpy-but-deadlocked-using-pytorch).
Basically I can get rid of the deadlock (sometimes) if I impose only one
thread per process. But this is not a solution anyway.

I have tried stepping through the code, but because it is multiprocessed,
you cannot step through it (at least not in the conventional way, since the
main thread is not doing the heavy lifting).

I've tried adding print statements in the multiprocess library and mucking
around with it a bit, but debugging multi-processed code in this way is an
absolute nightmare because you can't even trust the order in which print
statements display on the screen. And probably more relevant, I'm out of my
league here.

I'm really at a complete dead end. I'm blocked and my work cannot progress
without fixing this issue. I'd be very grateful if you could try to
reproduce and rule out the multiprocessing library. If you need help
reproducing I can send a different testcase that deadlocked on my friend's
Mac (for him, the original testcase did not deadlock).

Testcase I attached in my original post it sometimes deadlocks and
sometimes doesn't, depending on the machine I run on. So I'm not suprised
you got no deadlock when you tried to reproduce.

I can always get it deadlocking on Linux/Mac though, by tweaking the code.

To give you a sense of how unreliably it deadlocks, just removing the for
loop in the code (which is outside the multiprocessing portion of the
code!) somehow gets rid of the deadlock. Also, it never deadlocks on
Windows.

If you could provide any help on this issue I'd be very grateful.

Regards,
Guillaume.

On Fri, Jul 6, 2018 at 11:21 AM STINNER Victor <report at bugs.python.org>
wrote:

>
> STINNER Victor <vstinner at redhat.com> added the comment:
>
> IMHO it's an issue with your usage of the torch module which is not part
> of the Python stdlib, so I suggest to close this issue as "third party" or
> "not a bug".
>
> ----------
> nosy: +vstinner
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <https://bugs.python.org/issue34059>
> _______________________________________
>

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34059>
_______________________________________


More information about the Python-bugs-list mailing list