tail

Sat Apr 23 16:41:18 EDT 2022

On Sat, 23 Apr 2022 at 20:59, Chris Angelico <rosuav at gmail.com> wrote:
>
> On Sun, 24 Apr 2022 at 04:37, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
> >
> > What about introducing a method for text streams that reads the lines
> > from the bottom? Java has also a ReversedLinesFileReader with Apache
> > Commons IO.
>
> It's fundamentally difficult to get precise. In general, there are
> three steps to reading the last N lines of a file:
>
> 1) Find out the size of the file (currently, if it's being grown)
> 2) Seek to the end of the file, minus some threshold that you hope
> will contain a number of lines
> 3) Read from there to the end of the file, split it into lines, and
> keep the last N
>
> Reading the preceding N lines is basically a matter of repeating the
> same exercise, but instead of "end of the file", use the byte position
> of the line you last read.
>
> The problem is, seeking around in a file is done by bytes, not
> characters. So if you know for sure that you can resynchronize
> (possible with UTF-8, not possible with some other encodings), then
> you can do this, but it's probably best to build it yourself (opening
> the file in binary mode).

Well, indeed I have an implementation that does more or less what you
described for utf8 only. The only difference is that I just started
from the end of file -1. I'm just wondering if this will be useful in
the stdlib. I think it's not too difficult to generalise for every
encoding.

> This is quite inefficient in general.

Why inefficient? I think that readlines() will be much slower, not
only more time consuming.