Unnoticed traceback in a thread

Fri Mar 9 01:55:14 EST 2018

Skip Montanaro <skip.montanaro at gmail.com> writes:

> I have a program which is almost always running in a single thread. It
> uses a daemon thread (via threading.Timer) to periodically checkpoint
> some state. The program runs for days at a time.
>
> Over the past couple days, two instances of the subthread croaked with
> tracebacks because while they were iterating over the checkpointable
> data, the main thread modified the data. Blammo! It was an easy fix
> (threading.Lock) and everything is back on an even keel.
>
> After the bug was tickled, the program continued to do its thing. It
> just stopped checkpointing. Given the way the program is managed, the
> traceback wound up in an obscure log file, and it was a couple days
> before I noticed it. I've seen this sort of
> thread-dies-but-program-doesn't-crash situation before. I can look at
> ways to more actively monitor the contents of the obscure log file,
> but is there some non-hackish way that the demise of the daemon thread
> can take down the entire program? (Sometimes it's good that it doesn't
> crash. Other times it would at least be handy if it did.)

I approach situations like this by running the thread function
inside a "try: ... except: ..." block. In the "except" handler,
I can then so whatever is necessary if the thread function has died
unexpectedly -- e.g. kill the complete process.

Concretely: instead of "start_new_thread(my_thread_function, ...)",
I use

        def wrapped_thread_function(*args, **kw):
          try:
            my_thread_function(*args, **kw)
          except:
            ... do whatever is necessary should "my_thread_function" fails ...

        start_new_thread(wrapped_thread_function, ...)

Similar, should you use the "Thread" class (instead of "start_new_thread").