how to count lines in a file ?

Delaney, Timothy tdelaney at avaya.com
Wed Jul 24 19:18:58 EDT 2002


> From: Gerhard Häring [mailto:gerhard.haering at gmx.de]
> 
> * Delaney, Timothy <tdelaney at avaya.com> [2002-07-25 08:31 +1000]:
> > > From: Bo M. Maryniuck [mailto:b.maryniuk at forbis.lt]
> > > 
> > > print len(open('/etc/passwd').readlines())
> > 
> > There is currently discussion on python-dev of the file 
> object possibly
> > becoming collectable by GC (and hence not going away 
> immediately the last
> > reference you know about disappears).
> 
> :-/
> 
> I thought that Python had a reference-counting garbage collector
> combined with a mark-and-sweep gc?! So that if a refcount 
> goes to zero,
> I can count on the object being collected immediately?
> 
> If this is not true, then some of my own code is buggy, too :-(

In CPython 2.0+, any code which creates cycles will lead to garbage which is
*not* released immediately when the last reference goes away. A GC thread
breaks those cycles, allowing them to then be collected - but this takes
time.

However, not all objects are collectable - if those are in a cycle they will
just sit around as garbage forever.

A patch was proposed to fix iterator semantics on files, but which would
have resulted in the file object being in a cycle. Currently file objects
are not collectable (since there was no way to put them in a cycle) so that
would have been the worst case (all opened files would sit around forever,
not being closed). It was proposed that file objects be made collectable.
This would still mean that files would not be closed immediately (or
possibly ever).

Another patch is being worked on because putting a file in a cycle would
kill *lots* of (broken) code written as above. Although the above is not as
bad as *writing* to a file and not closing it ... but the principal is the
same.

The python developers don't want to break code if they can avoid it - I
personally would be in favour of files not being collected immediately as it
would remove *any* legitimacy from arguments that "this is OK because it's
what the implementation does".

Tim Delaney




More information about the Python-list mailing list