tail

Marco Sulla Marco.Sulla.Python at gmail.com
Thu May 19 13:50:16 EDT 2022


On Wed, 18 May 2022 at 23:32, Cameron Simpson <cs at cskk.id.au> wrote:
>
> On 17May2022 22:45, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
> >Well, I've done a benchmark.
> >>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, number=100000)
> >1.5963431186974049
> >>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, number=100000)
> >2.5240604374557734
> >>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", globals={"tail":tail}, number=100000)
> >1.8944984432309866
>
> This suggests that the file size does not dominate uour runtime.

Yes, this is what I wanted to test and it seems good.

> Ah.
> _Or_ that there are similar numbers of newlines vs text in the files so
> reading similar amounts of data from the end. If the "line desnity" of
> the files were similar you would hope that the runtimes would be
> similar.

No, well, small.txt has very short lines. Lorem.txt is a lorem ipsum,
so really long lines. Indeed I get better results tuning chunk_size.
Anyway, also with the default value the performance is not bad at all.

> >But the time of Linux tail surprise me:
> >
> >marco at buzz:~$ time tail lorem.txt
> >[text]
> >
> >real    0m0.004s
> >user    0m0.003s
> >sys    0m0.001s
> >
> >It's strange that it's so slow. I thought it was because it decodes
> >and print the result, but I timed
>
> You're measuring different things. timeit() tries hard to measure just
> the code snippet you provide. It doesn't measure the startup cost of the
> whole python interpreter. Try:
>
>     time python3 your-tail-prog.py /home/marco/lorem.txt

Well, I'll try it, but it's not a bit unfair to compare Python startup with C?
> BTW, does your `tail()` print output? If not, again not measuring the
> same thing.
> [...]
> Also: does tail(1) do character set / encoding stuff? Does your Python
> code do that? Might be apples and oranges.

Well, as I wrote I also timed

timeit.timeit("print(tail('/home/marco/lorem.txt').decode('utf-8'))",
globals={"tail":tail}, number=100000)

and I got ~36 seconds.

> If you have the source of tail(1) to hand, consider getting to the core
> and measuring `time()` immediately before and immediately after the
> central tail operation and printing the result.

IMHO this is a very good idea, but I have to find the time(). Ahah. Emh.


More information about the Python-list mailing list