tail

Cameron Simpson cs at cskk.id.au
Mon May 9 01:14:16 EDT 2022


On 08May2022 22:48, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>On Sun, 8 May 2022 at 22:34, Barry <barry at barrys-emacs.org> wrote:
>> >> In text mode you can only seek to a value return from f.tell() 
>> >> otherwise the behaviour is undefined.
>> >
>> > Why? I don't see any recommendation about it in the docs:
>> > https://docs.python.org/3/library/io.html#io.IOBase.seek
>>
>> What does adding 1 to a pos mean?
>> If it’s binary it mean 1 byte further down the file but in text mode it may need to
>> move the point 1, 2 or 3 bytes down the file.
>
>Emh. I re-quote
>
>seek(offset, whence=SEEK_SET)
>Change the stream position to the given byte offset.
>
>And so on. No mention of differences between text and binary mode.

You're looking at IOBase, the _binary_ basis of low level common file 
I/O. Compare with: https://docs.python.org/3/library/io.html#io.TextIOBase.seek
The positions are "opaque numbers", which means you should not ascribe 
any deeper meaning to them except that they represent a point in the 
file. It clearly says "offset must either be a number returned by 
TextIOBase.tell(), or zero. Any other offset value produces undefined 
behaviour."

The point here is that text is a very different thing. Because you 
cannot seek to an absolute number of characters in an encoding with 
variable sized characters. _If_ you did a seek to an arbitrary number 
you can end up in the middle of some character. And there are encodings 
where you cannot inspect the data to find a character boundary in the 
byte stream.

Reading text files backwards is not a well defined thing without 
additional criteria:
- knowing the text file actually ended on a character boundary
- knowing how to find a character boundary

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list