Can someone explain this weakref behavior?

David MacQuigg dmq at gain.com
Mon Jun 14 19:25:57 EDT 2004


On Fri, 11 Jun 2004 18:09:32 -0400, "Tim Peters" <tim.one at comcast.net>
wrote:

>[David MacQuigg, trying to keep track of how many instances of a class
> currently exist]
>
>...
>
>> Seems like we could do this more easily with a function that lists
>> instances, like __subclasses__() does with subclasses.  This doesn't have
>> to be efficient, just reliable.  So when I call cls.__instances__(), I
>> get a current list of all instances in the class.
>>
>> Maybe we could implement this function using weak references.  If I
>> understand the problem with weak references, we could have a
>> WeakValueDictionary with references to objects that actually have a
>> refcount of zero.
>
>Not in CPython today (and in the presence of cycles, the refcount on an
>object isn't related to whether it's garbage).
>
>> There may be too many entries in the dictionary, but never too few.
>
>Right!
>
>> In that case, maybe I could just loop over every item in
>> my WeakValueDictionary, and ignore any with a refcount of zero.
>>
>>     def _getInstances(cls):
>>         d1 = cls.__dict__.get('_instances' , {})
>>         d2 = {}
>>         for key in d1:
>>             if sys.getrefcount(d1[key]) > 0:
>>                 d2[key] = d1[key]
>>         return d2
>>     _getInstances = staticmethod(_getInstances)
>>
>> I'm making some assumptions here that may not be valid, like
>> sys.getrefcount() for a particular object really will be zero immediately
>> after all normal references to it are gone. i.e. we don't have any
>> temporary "out-of-sync" problems like with the weak references
>> themselves.
>>
>> Does this seem like a safe strategy?
>
>An implementation of Python that doesn't base its garbage collection
>strategy on reference counting won't *have* a getrefcount() function, so if
>you're trying to guard against Python switching gc strategies, this is a
>non-starter (it solves the problem for, and only for, implementations of
>Python that don't have the problem to begin with <wink>).
>
>Note that CPython's getrefcount() can't return 0 (see the docs).  Maybe
>comparing against 1 would capture your intent.
>
>Note this part of the weakref docs:
>
>    NOTE: Caution: Because a WeakValueDictionary is built on top of a Python
>  dictionary, it must not change size when iterating over it. This can be
>  difficult to ensure for a WeakValueDictionary because actions performed by
>  the program during iteration may cause items in the dictionary to vanish
>  "by magic" (as a side effect of garbage collection). 
>
>If you have threads too, it can be worse than just that.
>
>Bottom line:  if you want semantics that depend on the implementation using
>refcounts, you can't worm around that.  Refcounts are the only way to know
>"right away" when an object has become trash, and even that doesn't work in
>the presence of cycles.  Short of that, you can settle for an upper bound on
>the # of objects "really still alive" across implementations by using weak
>dicts, and you can increase the likely precision of that upper bound by
>forcing a run of garbage collection immediately before asking for the
>number.  In the absence of cycles, none of that is necessary in CPython
>today (or likely ever).
>
>Using a "decrement count in a __del__" approach isn't better:  only a
>reference-counting based implementation can guarantee to trigger __del__
>methods as soon as an object (not involved in a cycle) becomes unreachable.
>Under any other implementation, you'll still just get an upper bound.
>
>Note that all garbage collection methods are approximations to true
>lifetimes anyway.  Even refcounting in the absence of cycles:  just because
>the refcount on an object is 10 doesn't mean that any of the 10 ways to
>reach the object *will* get used again.  An object may in reality be dead as
>a doorknob no matter how high its refcount.  Refcounting is a conservative
>approximation too (it can call things "live" that will in fact never be used
>again, but won't call things "dead" that will in fact be used again).

Thank you for this very thorough answer to my questions.  I have a
much better understanding of the limitations of weakrefs now.  I also
see my suggestion of using sys.getrefcount() suffers from the same
limitations.  I've decided to leave my code as is, but put some
prominent warnings in the documentation:
'''
[Note 1]  As always, there are limitations.  Nothing is ever absolute
when it comes to reliability.  In this case we are depending on the
Python interpreter to immediately delete a weak reference when the
normal reference count goes to zero.  This depends on the
implementation details of the interpreter, and is *not* guaranteed by
the language.  Currently, it works in CPython, but not in JPython.
For further discussion, see the post under {"... weakref behavior" by
Tim Peters in comp.lang.python, 6/11/04}.
-- One other limitation - if there is any possibility of an instance
you are tracking with a weakref being included in a cycle (a group of
objects that reference each other, but have no references from
anything outside the group), then this scheme won't work.  Cyclic
garbage remains in memory until a special garbage collector gets
around to sniffing it out.
'''

You might want to put something similar in the Library Reference.

-- Dave





More information about the Python-list mailing list