[Python-Dev] Single- vs. Multi-pass iterability

Just van Rossum just@letterror.com
Thu, 11 Jul 2002 09:44:13 +0200


Oren Tirosh wrote:

> Xreadlines is buffered and therefore leaves the file position of the file 
> in an unexpected state.  If you use xreadlines explicitly you should expect 
> that. The fact that file.__iter__ returns an xreadlines object implicitly is 
> therefore a bit surprising. 
> 
> What's the reason for using xreadlines as a file iterator?  Was it 
> performance or was it just the easiest way to implement it using an existing 
> object?

The rationale was something like "the simple most way to iterate over the lines
in a file should be the fastest". I'd agree with that, but not at the expense of
the surprises mentioned in the bug. I would perhaps help if the file object
would cache the xreadlines iterator, that would limit the scope of the problem
to the case where iteration and explicit .read() calls are mixed.

> "Files support the iterator protocol. Each iteration returns the same
> result as file.readline()"
> 
> This is not correct. Files support what I call the iterable protocol. Objects 
> supporting the iterator protocol have a .next() method, files don't. While 
> it's true that each iteration has the same result as readline it doesn't 
> have the same side effects.
> 
> Proposal: make files really support the iterator protocol. __iter__ would
> return self and next() would call readline and raise StopIteration if ''.
> If anyone wants the xreadline performance improvement it should be explicit.

+1

(But, since the bug is closed as "won't fix" I doubt this has a big chance of
happening.)

Just