[issue46726] Thread spuriously marked dead after interrupting a join call

Tim Peters report at bugs.python.org
Sat Feb 12 21:06:38 EST 2022


Tim Peters <tim at python.org> added the comment:

Na, we've been doing excruciatingly clever stuff to deal with thread shutdown for decades, and it always proves to be wrong in some way. Even if code like

except:
    if lock.locked():
        lock.release()
        self._stop()
    raise

did work as hoped for, it would still be broken, but in a different way: suppose we really did acquire the tstate lock because the thread actually ended while we were in acquire(). But then a signal dumped us into the `except` block before doing the release. Oops! There's nothing I can see to stop _another_ (say) KeyboardInterrupt preventing us from doing the "failsafe" release too. So the lock remains locked forever after (we hold the lock now, and missed our chances to release it). And I think that's pretty likely: if I don't see an instant response to Ctrl-C, I'm likely to do another very soon after.

So I don't think catching exceptions can be made to work for this. Or `finally` blocks either. Indeed, it appears that any way whatsoever of spelling `lock.release()` in Python can be defeated by an unfortunately timed signal.

Which isn't unique to this code, of course. The failure modes of this code just happen to be unusually visible ;-)

Two other approaches come to mind:

- Wrap everything needed in a custom C function. CPython can't interfere with its control flow.

- Add new sys gimmicks to suppress and re-enable raising Python level exceptions for signals. Then, e.g., something here like:

with sys.delay_signal_exceptions():
    # Dead simple code, like _before_ we "fixed" it ;-)
    # In particular, while Ctrl-C may terminate the `acquire()` call,
    # KeyboardInterrupt will not be raised until the `with` block
    # exits.
    # Possibly intractable: arranging then for the traceback to
    # point at the code where the exception would have been raised
    # had temporary suspension not been enabled. Then again, since
    # it's not _actually_ raised there at the Python level, maybe
    # it's a Good Thing to ignore.
    if lock.acquire(block, timeout):
        lock.release()
        self._stop()

The second way is more general, but would probably require a PEP.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue46726>
_______________________________________


More information about the Python-bugs-list mailing list