tail

Chris Angelico rosuav at gmail.com
Sat Apr 23 16:58:29 EDT 2022


On Sun, 24 Apr 2022 at 06:41, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>
> On Sat, 23 Apr 2022 at 20:59, Chris Angelico <rosuav at gmail.com> wrote:
> >
> > On Sun, 24 Apr 2022 at 04:37, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
> > >
> > > What about introducing a method for text streams that reads the lines
> > > from the bottom? Java has also a ReversedLinesFileReader with Apache
> > > Commons IO.
> >
> > It's fundamentally difficult to get precise. In general, there are
> > three steps to reading the last N lines of a file:
> >
> > 1) Find out the size of the file (currently, if it's being grown)
> > 2) Seek to the end of the file, minus some threshold that you hope
> > will contain a number of lines
> > 3) Read from there to the end of the file, split it into lines, and
> > keep the last N
> >
> > Reading the preceding N lines is basically a matter of repeating the
> > same exercise, but instead of "end of the file", use the byte position
> > of the line you last read.
> >
> > The problem is, seeking around in a file is done by bytes, not
> > characters. So if you know for sure that you can resynchronize
> > (possible with UTF-8, not possible with some other encodings), then
> > you can do this, but it's probably best to build it yourself (opening
> > the file in binary mode).
>
> Well, indeed I have an implementation that does more or less what you
> described for utf8 only. The only difference is that I just started
> from the end of file -1. I'm just wondering if this will be useful in
> the stdlib. I think it's not too difficult to generalise for every
> encoding.
>
> > This is quite inefficient in general.
>
> Why inefficient? I think that readlines() will be much slower, not
> only more time consuming.

It depends on which is more costly: reading the whole file (cost
depends on size of file) or reading chunks and splitting into lines
(cost depends on how well you guess at chunk size). If the lines are
all *precisely* the same number of bytes each, you can pick a chunk
size and step backwards with near-perfect efficiency (it's still
likely to be less efficient than reading a file forwards, on most file
systems, but it'll be close); but if you have to guess, adjust, and
keep going, then you lose efficiency there.

I don't think this is necessary in the stdlib. If anything, it might
be good on PyPI, but I for one have literally never wanted this.

ChrisA


More information about the Python-list mailing list