[Python-Dev] refleaks & test_tcl & threads

Tim Peters tim.peters at gmail.com
Mon Apr 17 23:05:18 CEST 2006


[Thomas Wouters]
>> test_threading_local is not entirely consistent, but it looks a lot more
>> reliable on my box than on Neal's automated mails:
>>
>> test_threading_local
>> beginning 11 repetitions
>> 12345678901
>> ...........
>> test_threading_local leaked [34, 34, 34, 34, 34, 26, 26, 22, 34]
>> references

[also Thomas]
> This is caused by _threading_local.local's __del__ method, or rather the
> fact that it's part of a closure enclosing threading.enumerate . Fixing the
> inner __del__ to call enumerate (which is 'threading.enumerate') directly,
> rather than through the cellvar 'threading_enumerate', makes the leak go
> away. The fact that the leakage is inconsistent is actually easily
> explained: the 'local' instances are stored on the 'currentThread' object
> indexed by 'key', and keys sometimes get reused (since they're basically
> id(self)), so sometimes an old reference is overwritten. It doesn't quite
> explain why using the cellvar causes the cycle, nor does it explain why
> gc.garbage remains empty. I guess some Thread objects linger in threading's
> _active or _limbo dicts, but I have no idea why having a cellvar in the
> cycle matters; they seem to be participating in GC just fine, and I cannot
> reproduce the leak with a simpler test.
>
> And on top of that, I'm not sure *why* _threading_local.local is doing the
> song and dance to get a cellvar. If the global 'enumerate' (which is
> threading.enumerate) disappears, it will be because Python is cleaning up.
> Even if we had a need to clean up the local dict at that time (which I don't
> believe we do), using a cellvar doesn't guarantee anything more than using a
> global name.

The

            threading_enumerate = enumerate

line creates a persistent local variable at module import time, which
(unlike a global name) can't get "None'd out" at shutdown time.  BTW,
it's very easy to miss that this line _is_ executed at module import
time, and is executed only once over the life of the interpreter; more
on that below.

> Chances are very good that the 'threading' module has also been
> cleared, meaning that while we still have a reference to
> threading.enumerate, it cannot use the three globals it uses (_active,
> _limbo, _active_limbo_lock.) All in all, I think matters improve
> significantly if it just deals with the NameError it'll get at cleanup
> (which it already does.)

Well, you missed something obvious :-):  the code is so clever now
that its __del__ doesn't actually do anything.  In outline:

""|"
...
# Threading import is at end

...

class local(_localbase):

    ...
    def __del__():
        threading_enumerate = enumerate
        ...

        def __del__(self):
            try:
                threads = list(threading_enumerate())
            except:
                # if enumerate fails, as it seems to do during
                # shutdown, we'll skip cleanup under the assumption
                # that there is nothing to clean up
                return

        ...

        return __del__
    __del__ = __del__()

from threading import currentThread, enumerate, RLock
"""

Lots of questions pop to mind, from why the import is at the bottom of
the file, to why it's doing this seemingly bizarre nested-__del__
dance.  I don't have good answers to any of them <0.1 wink>, but this
is the bottom line:  at the time

            threading_enumerate = enumerate

is executed, `enumerate` is bound to __builtin__.enumerate, not to
threading.enumerate.  That line is executed during the _execution_ of
the "class local" statement, the first time the module is imported,
and the import at the bottom of the file has not been executed by that
time.  So there is no global `enumerate` at the time, but there is one
in __builtin__ so that's the one it captures.  As a result, "the real"
__del__'s

                threads = list(threading_enumerate())

line _always_ raises an exception, namely the TypeError "enumerate()
takes exactly 1 argument (0 given)", which is swallowed by the bare
"except:", and __del__ then returns having accomplished nothing.

Of course that's the real reason it never cleans anything up now --
while virtually any way of rewriting it causes it to get the
_intended_ threading.enumerate, and then the leaks stop.

I'll check one of those in.


More information about the Python-Dev mailing list