tail

Peter J. Holzer hjp-python at hjp.at
Sat Apr 23 18:02:29 EDT 2022


On 2022-04-24 04:57:20 +1000, Chris Angelico wrote:
> On Sun, 24 Apr 2022 at 04:37, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
> > What about introducing a method for text streams that reads the lines
> > from the bottom? Java has also a ReversedLinesFileReader with Apache
> > Commons IO.
> 
> It's fundamentally difficult to get precise. In general, there are
> three steps to reading the last N lines of a file:
> 
> 1) Find out the size of the file (currently, if it's being grown)
> 2) Seek to the end of the file, minus some threshold that you hope
> will contain a number of lines
> 3) Read from there to the end of the file, split it into lines, and
> keep the last N
[...]
> This is quite inefficient in general. It would be far FAR easier to do
> this instead:
> 
> 1) Read the entire file and decode bytes to text
> 2) Split into lines
> 3) Iterate backwards over the lines

Which one is more efficient depends very much on the size of the file.
For a file of a few kilobytes, the second solution is probably more
efficient. But for a few gigabytes, that's almost certainly not the
case.

> Tada! Done. And in Python, quite easy. The downside, of course, is
> that you have to store the entire file in memory.

Not just memory. You have to read the whole file in the first place. Which is
hardly efficient if you only need a tiny fraction.

> Personally, unless the file is tremendously large and I know for sure
> that I'm not going to end up iterating over it all, I would pay the
> memory price.

Me, too. Problem with a library function (as Marco proposes) is that you
don't know how it will be used.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20220424/b73bae0c/attachment.sig>


More information about the Python-list mailing list