how to count lines in a file ?

Tim Peters tim.one at comcast.net
Sat Jul 27 01:04:35 EDT 2002


[Jonathan Hogg]
> Not quite. Any cycle containing an object with a __del__ method will be
> skipped and left to languish in memory-limbo forever.

[Steve Holden]
> You mean the garbage collector doesn't *collect* data that appears in
> cycles? That's not my understanding (which doesn't nevessarily mean it's
> dissonant with reality).

The key here is "a __del__ method", meaning an object of a user-defined
class that explicitly defines a method named "__del__".  If a cycle contains
at least one such object, gc will not collect that object, or the cycle it's
in.  Instead the object is placed in a list of unreachable yet uncollectable
trash, which you can access as gc.garbage.  Here's an example:

>>> class A:
...      def __del__(self):
...          self.x.remove()
...      def remove(self):
...          print "oops!"
...
>>> i = A()
>>> j = A()
>>> i.x = j
>>> j.x = i
>>> import gc
>>> gc.garbage
[]
>>> del i, j
>>> gc.collect()
4
>>> gc.garbage
[<__main__.A instance at 0x0065B558>, <__main__.A instance at 0x0065B5A8>]
>>>

The difficulty is that when objects with __del__ methods are in a cycle,
it's impossible to guess a safe order to tear them down:  any object in a
cycle is-- by definition --reachable from every other object in the cycle,
so no matter which Python decided to tear down first, some other object Y in
the cycle may try to use it from Y's own __del__ method.  Random errors are
the best you can hope to get from that, and system crashes the worst (the
latter can be prevented, but only with a silly level of difficulty).

So Python punts in this case -- if you create such beasts, it's your problem
to tear them down in a safe order (which only you can know).  At least you
can scan gc.garbage for such unreachable beasts, and break the cycles in an
order that you know (or hope) is safe.  Better never to create such things
to begin with, of course.

> ...
> >>> a = [1,2,3,4]
> >>> a.append(a)
> >>> a
> [1, 2, 3, 4, [...]]
> >>>
>
> Notice how the interpreter cleverly avoids an infinite recursion in the
> repr() of that list. The fact that it contains references to itelf need
> not stop its collection, and that is precisely why [I understand] the
> new GC scheme was introduced to supplement reference counting.

Right on all counts.  That isn't an instance of an object with a __del__
method, and it's only __del__ methods that inhibit cyclic gc.

> ...
> So, what you *seem* to be saying is that it would be perfectly
> possible to collect objects that don't have __del__() methods, but
> once they acquire such they are no longer collectible even when only
> cyclic references exist? That seems an odd asserion.

Nevertheless, a true one <wink>.  Objects with finalizers in cycles are a
problem for all automatic memory recycling schemes.  Python's approach here
is an instance of refusing to guess in the face of ambiguity.  Other
approaches that have been implemented in other languages/systems include
running finalizers in an arbitrary order (and hoping things don't blow up);
running finalizers in order of object creation, or the reverse of that order
(and hoping the user has a good enough handle on that so they don't write
code that blows up); simply letting the objects leak; reclaiming the memory
but not running the finalizers at all; defining incomprehensible schemes
where an object's finalizer is run at most once no matter how often the
object may resurrect itself, or be resurrected by other objects from the
cycle(s) it's in (that's a technical hack Java uses to ensure that Java
itself doesn't die with NULL-pointer errors (etc) after running a finalizer
in an unreachable cycle:  note that if there are two objects A and B in a
cycle with __del__ methods, and if A is torn down first, B's __del__ method
may resurrect A (e.g., make A reachable from a global again), and that's a
recipe for system disasters; Java is careful to avoid that, via a mass of
technical rules that only make sense to implementers); assorted registration
schemes for explicitly informing the system of the order you want cycles
torn down; and severely limiting what kind of code *can* appear in
user-defined finalizers so as to prevent any possiblity of a finalizer doing
harm to the system.  They all suck.  Python's sucks the least <wink>.





More information about the Python-list mailing list