[Python-Dev] Evil reference cycles caused Exception.__traceback__

Victor Stinner victor.stinner at gmail.com
Mon Sep 18 05:31:12 EDT 2017


Hi,

Python 3 added a __traceback__ attribute to exception objects. I guess
that it was added to be able to get the original traceback when an
exception is re-raised. Artifical example (real code is more complex
with subfunctions and conditional code):

try:
   ...
except Exception as exc:
  ...
  raise exc   # keep original traceback


The problem is that Exception.__traceback__ creates reference cycles.
If you store an exception in a local variable, suddently, all local
variables of all frames (of the current traceback) become part of a
giant reference cycle. Sometimes all these variables are kept alive
very long (until you exit Python?), at least, much longer than the
"expected" lifetime (destroyed "magically" local variables when you
exit the function).

The reference cycle is:

1) frame -> local variable -> variable which stores the exception
2) exception -> traceback -> frame

exception -> ... -> frame -> ... -> same exception


Breaking manually the reference cycle is complex. First, you must be
aware of the reference cycle! Second, you have to identify which
functions of your large application create the reference cycle: this
task is long and painful even with good tooling. Finally, you have to
explicitly clear variables or attributes to break the reference cycle
manually.


asyncio.Future.set_exception() keeps an exception object and its
traceback object alive: asyncio creates reference cycles *by design*.
Enjoy! asyncio tries hard to reduce the consequence of reference
cycles, or even try to break cycles, using hacks like "self = None" in
methods... Setting self to None is really surprising and requires a
comment explainaing the hack.


Last years, I fixed many reference cycles in various parts of the
Python 3 standard library. Sometimes, it takes years to become aware
of the reference cycle and finally fix it.

For example, recently, I worked on fixing all "dangling threads"
leaked by tests of the Python test suite, and I found and fixed many
reference cycles which probably existed since Python 3 was created
(forked from Python 2):

* socket.create_connection(): commit
acb9fa79fa6453c2bbe3ccfc9cad2837feb90093, bpo-31234

* concurrent.futures.ThreadPoolExecutor: commit
bc61315377056fe362b744d9c44e17cd3178ce54, bpo-31249

* pydoc: commit 4cab2cd0c05fcda5fcb128c9eb230253fff88c21, bpo-31238

* xmlrpc.server: commit 84524454d0ba77d299741c47bd0f5841ac3f66ce, bpo-31247

Other examples:

* test_ssl: commit 868710158910fa38e285ce0e6d50026e1d0b2a8c, bpo-31323

* test_threading: commit 3d284c081fc3042036adfe1bf2ce92c34d743b0b, bpo-31234

Another example of a recent change fixing a reference cycle, by Antoine Pitrou:

* multiprocessing: commit 79d37ae979a65ada0b2ac820279ccc3b1cd41ba6, bpo-30775


For socket.create_connection(), I discovered the reference cycle
because a test started to log a warning about dangling thred. The
warning was introduced indirectly by a change which modified
support.HOST value from '127.0.0.1' to 'localhost'... It's hard to see
to link between support.HOST value and a reference cycle. Full story:

https://bugs.python.org/issue29639#msg302087


Again, it's just yet another random example of a very tricky reference
cycle bug caused by Exception.__traceback__.


Ideally, CPython 3.x should never create reference cycles. Removing
Exception.__traceback__ is the obvious "fix" for the issue. But I
expect that slowly, a lot of code started to rely on the attribute,
maybe even for good reasons :-)

A more practical solution would be to log a warning. Maybe the garbage
collector can emit a warning if it detects an exception part of a
reference cycle? Or maybe detect frames?

If the GC cannot do it, maybe we might use a debug thread (enabled
manually) which checks manually if an exception is part of a reference
cycle using gc.get_objects(): check if an exception remains alive
longer than X seconds? I had the same idea for asyncio, to detect
reference cycles or if a task is never "awaited", but I never
implemented the idea.

Victor


More information about the Python-Dev mailing list