tail

Chris Angelico rosuav at gmail.com
Sat Apr 23 14:57:20 EDT 2022


On Sun, 24 Apr 2022 at 04:37, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>
> What about introducing a method for text streams that reads the lines
> from the bottom? Java has also a ReversedLinesFileReader with Apache
> Commons IO.

It's fundamentally difficult to get precise. In general, there are
three steps to reading the last N lines of a file:

1) Find out the size of the file (currently, if it's being grown)
2) Seek to the end of the file, minus some threshold that you hope
will contain a number of lines
3) Read from there to the end of the file, split it into lines, and
keep the last N

Reading the preceding N lines is basically a matter of repeating the
same exercise, but instead of "end of the file", use the byte position
of the line you last read.

The problem is, seeking around in a file is done by bytes, not
characters. So if you know for sure that you can resynchronize
(possible with UTF-8, not possible with some other encodings), then
you can do this, but it's probably best to build it yourself (opening
the file in binary mode).

This is quite inefficient in general. It would be far FAR easier to do
this instead:

1) Read the entire file and decode bytes to text
2) Split into lines
3) Iterate backwards over the lines

Tada! Done. And in Python, quite easy. The downside, of course, is
that you have to store the entire file in memory.

So it's up to you: pay the memory price, or pay the complexity price.

Personally, unless the file is tremendously large and I know for sure
that I'm not going to end up iterating over it all, I would pay the
memory price.

ChrisA


More information about the Python-list mailing list