tail

Chris Angelico rosuav at gmail.com
Sat Apr 23 18:21:34 EDT 2022


On Sun, 24 Apr 2022 at 08:18, Cameron Simpson <cs at cskk.id.au> wrote:
>
> On 24Apr2022 07:15, Chris Angelico <rosuav at gmail.com> wrote:
> >On Sun, 24 Apr 2022 at 07:13, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
> >> Emh, why chunks? My function simply reads byte per byte and compares
> >> it to b"\n". When it find it, it stops and do a readline():
> [...]
> >> This is only for one line and in utf8, but it can be generalised.
>
> For some encodings that generalisation might be hard. But mostly, yes.
>
> >Ah. Well, then, THAT is why it's inefficient: you're seeking back one
> >single byte at a time, then reading forwards. That is NOT going to
> >play nicely with file systems or buffers.
>
> An approach I think you both may have missed: mmap the file and use
> mmap.rfind(b'\n') to locate line delimiters.
> https://docs.python.org/3/library/mmap.html#mmap.mmap.rfind

Yeah, I made a vague allusion to use of mmap, but didn't elaborate
because I actually have zero idea of how efficient this would be.
Would it be functionally equivalent to the chunking, but with the
chunk size defined by the system as whatever's most optimal? It would
need to be tested.

I've never used mmap for this kind of job, so it's not something I'm
comfortable predicting the performance of.

ChrisA


More information about the Python-list mailing list