tail

Chris Angelico rosuav at gmail.com
Sat Apr 23 17:15:58 EDT 2022


On Sun, 24 Apr 2022 at 07:13, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>
> On Sat, 23 Apr 2022 at 23:00, Chris Angelico <rosuav at gmail.com> wrote:
> > > > This is quite inefficient in general.
> > >
> > > Why inefficient? I think that readlines() will be much slower, not
> > > only more time consuming.
> >
> > It depends on which is more costly: reading the whole file (cost
> > depends on size of file) or reading chunks and splitting into lines
> > (cost depends on how well you guess at chunk size). If the lines are
> > all *precisely* the same number of bytes each, you can pick a chunk
> > size and step backwards with near-perfect efficiency (it's still
> > likely to be less efficient than reading a file forwards, on most file
> > systems, but it'll be close); but if you have to guess, adjust, and
> > keep going, then you lose efficiency there.
>
> Emh, why chunks? My function simply reads byte per byte and compares it to b"\n". When it find it, it stops and do a readline():
>
> def tail(filepath):
>     """
>     @author Marco Sulla
>     @date May 31, 2016
>     """
>
>     try:
>         filepath.is_file
>         fp = str(filepath)
>     except AttributeError:
>         fp = filepath
>
>     with open(fp, "rb") as f:
>         size = os.stat(fp).st_size
>         start_pos = 0 if size - 1 < 0 else size - 1
>
>         if start_pos != 0:
>             f.seek(start_pos)
>             char = f.read(1)
>
>             if char == b"\n":
>                 start_pos -= 1
>                 f.seek(start_pos)
>
>             if start_pos == 0:
>                 f.seek(start_pos)
>             else:
>                 for pos in range(start_pos, -1, -1):
>                     f.seek(pos)
>
>                     char = f.read(1)
>
>                     if char == b"\n":
>                         break
>
>         return f.readline()
>
> This is only for one line and in utf8, but it can be generalised.
>

Ah. Well, then, THAT is why it's inefficient: you're seeking back one
single byte at a time, then reading forwards. That is NOT going to
play nicely with file systems or buffers.

Compare reading line by line over the file with readlines() and you'll
see how abysmal this is.

If you really only need one line (which isn't what your original post
suggested), I would recommend starting with a chunk that is likely to
include a full line, and expanding the chunk until you have that
newline. Much more efficient than one byte at a time.

ChrisA


More information about the Python-list mailing list