Can someone explain this weakref behavior?

Fri Jun 11 18:09:32 EDT 2004

[David MacQuigg, trying to keep track of how many instances of a class
 currently exist]

...

> Seems like we could do this more easily with a function that lists
> instances, like __subclasses__() does with subclasses.  This doesn't have
> to be efficient, just reliable.  So when I call cls.__instances__(), I
> get a current list of all instances in the class.
>
> Maybe we could implement this function using weak references.  If I
> understand the problem with weak references, we could have a
> WeakValueDictionary with references to objects that actually have a
> refcount of zero.

Not in CPython today (and in the presence of cycles, the refcount on an
object isn't related to whether it's garbage).

> There may be too many entries in the dictionary, but never too few.

Right!

> In that case, maybe I could just loop over every item in
> my WeakValueDictionary, and ignore any with a refcount of zero.
>
>     def _getInstances(cls):
>         d1 = cls.__dict__.get('_instances' , {})
>         d2 = {}
>         for key in d1:
>             if sys.getrefcount(d1[key]) > 0:
>                 d2[key] = d1[key]
>         return d2
>     _getInstances = staticmethod(_getInstances)
>
> I'm making some assumptions here that may not be valid, like
> sys.getrefcount() for a particular object really will be zero immediately
> after all normal references to it are gone. i.e. we don't have any
> temporary "out-of-sync" problems like with the weak references
> themselves.
>
> Does this seem like a safe strategy?

An implementation of Python that doesn't base its garbage collection
strategy on reference counting won't *have* a getrefcount() function, so if
you're trying to guard against Python switching gc strategies, this is a
non-starter (it solves the problem for, and only for, implementations of
Python that don't have the problem to begin with <wink>).

Note that CPython's getrefcount() can't return 0 (see the docs).  Maybe
comparing against 1 would capture your intent.

Note this part of the weakref docs:

    NOTE: Caution: Because a WeakValueDictionary is built on top of a Python
  dictionary, it must not change size when iterating over it. This can be
  difficult to ensure for a WeakValueDictionary because actions performed by
  the program during iteration may cause items in the dictionary to vanish
  "by magic" (as a side effect of garbage collection). 

If you have threads too, it can be worse than just that.

Bottom line:  if you want semantics that depend on the implementation using
refcounts, you can't worm around that.  Refcounts are the only way to know
"right away" when an object has become trash, and even that doesn't work in
the presence of cycles.  Short of that, you can settle for an upper bound on
the # of objects "really still alive" across implementations by using weak
dicts, and you can increase the likely precision of that upper bound by
forcing a run of garbage collection immediately before asking for the
number.  In the absence of cycles, none of that is necessary in CPython
today (or likely ever).

Using a "decrement count in a __del__" approach isn't better:  only a
reference-counting based implementation can guarantee to trigger __del__
methods as soon as an object (not involved in a cycle) becomes unreachable.
Under any other implementation, you'll still just get an upper bound.

Note that all garbage collection methods are approximations to true
lifetimes anyway.  Even refcounting in the absence of cycles:  just because
the refcount on an object is 10 doesn't mean that any of the 10 ways to
reach the object *will* get used again.  An object may in reality be dead as
a doorknob no matter how high its refcount.  Refcounting is a conservative
approximation too (it can call things "live" that will in fact never be used
again, but won't call things "dead" that will in fact be used again).