tail

Cameron Simpson cs at cskk.id.au
Sat Apr 23 18:09:44 EDT 2022


On 24Apr2022 07:15, Chris Angelico <rosuav at gmail.com> wrote:
>On Sun, 24 Apr 2022 at 07:13, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>> Emh, why chunks? My function simply reads byte per byte and compares 
>> it to b"\n". When it find it, it stops and do a readline():
[...]
>> This is only for one line and in utf8, but it can be generalised.

For some encodings that generalisation might be hard. But mostly, yes.

>Ah. Well, then, THAT is why it's inefficient: you're seeking back one
>single byte at a time, then reading forwards. That is NOT going to
>play nicely with file systems or buffers.

An approach I think you both may have missed: mmap the file and use 
mmap.rfind(b'\n') to locate line delimiters.
https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind

Avoids sucking the whole file into memory in the usualy sense, instead 
the file is paged in as needed. Far more efficient that a seek/read 
single byte approach.

If the file's growing you can do this to start with, then do a normal 
file open from your end point to follow accruing text. (Or reuse the 
descriptor you sues for the mmap, but using s.read().)

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list