iterating over the lines of a file - difference between Python 2.7 and 3?

Terry Reedy tjreedy at udel.edu
Thu Jan 17 08:39:52 EST 2013


On 1/17/2013 7:04 AM, Peter Otten wrote:
> Wolfgang Maier wrote:
>
>> I just came across an unexpected behavior in Python 3.3, which has to do
>> with file iterators and their interplay with other methods of file/IO
>> class methods, like readline() and tell(): Basically, I got used to the
>> fact that it is a bad idea to mix them because the iterator would use that
>> hidden read-ahead buffer, so what you got with subsequent calls to
>> readline() or tell() was what was beyond that buffer, but not the next
>> thing after what the iterator just returned.
>>
>> Example:
>>
>> in_file_object=open(‘some_file’,’rb’)
>>
>> for line in in_file_object:
>>
>>                  print (line)
>>
>>                  if in_file_object.tell() > 300:
>>
>>                                 # assuming that individual lines are
>>                                 # shorter
>>
>>                                 break
>>
>>
>>
>> This wouldn´t print anything in Python 2.7 since next(in_file_object)
>> would read ahead beyond the 300 position immediately, as evidenced by a
>> subsequent call to in_file_object.tell() (returning 8192 on my system).
>>
>> However, I find that under Python 3.3 this same code works: it prints some
>> lines from my file and after completing in_file_object.tell() returns a
>> quite reasonable 314 as the current position in the file.
>>
>> I couldn´t find this difference anywhere in the documentation. Is the 3.3
>> behavior official, and if so, when was it introduced and how is it
>> implemented? I assume the read-ahead buffer still exists?
>
> You can get the Python 3 behaviour with io.open() in Python 2.7. There is an
> implementation in Python in _pyio.py:
>
>      def tell(self):
>          return _BufferedIOMixin.tell(self) - len(self._read_buf) +
> self._read_pos

In 2.7, open returns file object, which is a thin wrapper of the 
particular (proprietary) C compiler stdio library. They vary because the 
C standard leaves some things implementation-defined, and people 
interpret differently (no official test suite, at least not originally), 
and people make mistakes. The io module is intended to bring more 
uniformity, and there is a test suite for other implementations to match 
actual behavior to.

-- 
Terry Jan Reedy





More information about the Python-list mailing list