[Python-Dev] lifting of prohibition against readlines inside a "for line in file" in Py3?

Thu Feb 19 22:41:03 CET 2009

On Wed, 18 Feb 2009 at 20:31, Guido van Rossum wrote:
> On Wed, Feb 18, 2009 at 6:38 PM,  <rdmurray at bitdance.com> wrote:
>> On Wed, 18 Feb 2009 at 21:25, Antoine Pitrou wrote:
>>>
>>> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>>>>
>>>> I *think* the 2.x system had an internal buffer that was used by the
>>>> file iterator, but not by the file methods. With the new IO stack for
>>>> 3.0, there is now a common buffer shared by all the file operations
>>>> (including iteration).
>>>>
>>>> However, given that the lifting of the restriction is currently
>>>> undocumented, I wouldn't want to see a commitment to keeping it lifted
>>>> until we know that it won't cause any problems for the io-in-c rewrite
>>>> for 3.1 (hopefully someone with more direct involvement with that
>>>> rewrite will chime in, since they'll know a lot more about it than I do).
>>>
>>> As you said, there is no special buffering for the file iterator in 3.x,
>>> which
>>> means the restriction could be lifted (actually there is nothing relying
>>> on this
>>> restriction in the current code, except perhaps the "telling" flag in
>>> TextIOWrapper).
>>
>> Currently I have python (2.x) code that uses 'readline' instead of 'for
>> x in myfile' in order to avoid the 'for' buffering (ie: be presented
>> with the next line as soon as it is written to the other end of a pipe,
>> instead of waiting for the buffer to fill).  Does "no special buffering"
>> mean that 'for' doesn't use a read-ahead buffer in p3k, or that readline
>> does use such a buffer, because the latter could make my code break
>> unexpectedly when porting to p3k.
>
> Have a look at the code in io.py (class TextIOWrapper):
>
> http://svn.python.org/view/python/branches/py3k/Lib/io.py?view=log
>
> I believe it doesn't force reading ahead more than necessary. If a
> single low-level read() returns enough data to satisfy the __next__()
> or readline() (or it can be satisfied from data already buffered) then
> it won't force reading more.

Hmm.  I'm not sure I'm reading the code right, but it looks from the
docstrings like TextIOWrapper expects to read from a BufferedIOBase
object, whose doc string contains this comment:

         If the argument is positive, and the underlying raw stream is
         not 'interactive', multiple raw reads may be issued to satisfy
         the byte count (unless EOF is reached first).  But for
         interactive raw streams (XXX and for pipes?), at most one raw
         read will be issued, and a short result does not imply that
         EOF is imminent.

Since the 'pipe' comment is an XXX, it is not clear that my use case
is covered.  However, the actual implementation of readinto seems to
only call 'read' once, so as long as the 'read' of the subclass returns
whatever bytes are available, then it looks good to me :)

Since TextIOWrapper is careful to call 'read1' on the wrapped buffer
object, and the one place that 'read1' has a docstring clearly indicates
that it does at most one read and returns whatever data is ready, it
seems that the _intent_ of the code is as you expressed.

I'm a python programmer first, and my C is pretty rusty, so I'm not
sure if I'm up to looking through the new C code to see how this got
translated.  I'm thinking that both my use case (and in my case 'for'
should now work for me) and the OP's are the way it is intended to work,
but documentation of this seems like it would be a good idea.

Since the OP doesn't seem to have opened a ticket, I did so:
http://bugs.python.org/issue5323.  As I said there, I'm willing to work
on doc and test patches if this is the behavior the io library is required
to have in 3.x.

--RDM