[Python-Dev] Single- vs. Multi-pass iterability

Alex Martelli aleax@aleax.it
Wed, 17 Jul 2002 19:46:42 +0200


On Wednesday 17 July 2002 07:38 pm, Guido van Rossum wrote:
	...
> But leaving the file object as an exception to the rule helps as a
> reminder that it's just a rule of thumb and cannot be taken as
> absolute law.

The sublunar world has enough reminders of its imperfections that
we need not strive to add more.


> > Still, it doesn't solve the reference-loop-between-two-deuced-things-
> > that-don't-cooperate-with-gc problem.  And I can't see how either
> > could be made into a WEAK reference given that xreadlines objects
> > in other contexts need to hold a strong ref to the file they work on --
> > we'd have to refactor xreadlines objects too, a core part holding a
> > weak ref and a shell around it (holding a strong ref to the file) to
> > support ordinary calls to xreadlines.xreadlines.  Messy:-(.
>
> I don't think that a weak ref to the file would be sufficient for
> xreadlines -- e.g.
>
>     for line in open(filename):
>         print line,
>
> would close the file right away.

If the iterator were the file itself, no it wouldn't, whatever kind of
ref the xreadlines object had to the file.

What would break without refactoring would be:

    for line in xreadlines.xreadlines(open(filename)):
        ...

The refactoring would be to have a, say _xreadlines, object, with
the functionality of today's xreadlines object BUT a weak ref to
the file, and an xreadlines object with strong refs to the file and
the _xreadlines object and delegating functionality to the latter.
A bit of a mess.


> Likewise, the file needs a strong ref to the xreadlines, otherwise the

Definitely!  Otherwise nothing keeps the xreadlines (or _xreadlines)
object around _at all_ -- it's even worse than you indicate below, it
seems to me:

> following would create a new iterator in the second for loop, and lose
> data buffered by the first iterator.
>
>     f = open(filename)
>     it = iter(f)

...with the patch it would be "it is f", and so, I don't really get it...

>     for i in range(10):
>         it.next()
>     del it
>     for line in f:
>         print line,
>
> I think I will have to reject Oren's patch because of this, and the
> situation with file iterators will remain as it is: once you've asked
> for the iterator, all operations on the file are unsafe, and the only
> way to get back to using the file is to abandon the file and do an

Abandon the iterator, you mean?  Or am I hopelessly confused?

> absolute seek on the file.  (This is sort of like switching between
> the raw integer file descriptor and the stream object in C -- or in
> Python if you care to use f.fileno() and os.read() etc.)

In these cases you do get some control on the buffering, though,
if you care to exercise it.


Alex