tail

Cameron Simpson cs at cskk.id.au
Wed May 18 17:30:20 EDT 2022


On 17May2022 22:45, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>Well, I've done a benchmark.
>>>> timeit.timeit("tail('/home/marco/small.txt')", globals={"tail":tail}, number=100000)
>1.5963431186974049
>>>> timeit.timeit("tail('/home/marco/lorem.txt')", globals={"tail":tail}, number=100000)
>2.5240604374557734
>>>> timeit.timeit("tail('/home/marco/lorem.txt', chunk_size=1000)", globals={"tail":tail}, number=100000)
>1.8944984432309866

This suggests that the file size does not dominate uour runtime. Ah.  
_Or_ that there are similar numbers of newlines vs text in the files so 
reading similar amounts of data from the end. If the "line desnity" of 
the files were similar you would hope that the runtimes would be 
similar.

>small.txt is a text file of 1.3 KB. lorem.txt is a lorem ipsum of 1.2
>GB. It seems the performance is good, thanks to the chunk suggestion.
>
>But the time of Linux tail surprise me:
>
>marco at buzz:~$ time tail lorem.txt
>[text]
>
>real    0m0.004s
>user    0m0.003s
>sys    0m0.001s
>
>It's strange that it's so slow. I thought it was because it decodes
>and print the result, but I timed

You're measuring different things. timeit() tries hard to measure just 
the code snippet you provide. It doesn't measure the startup cost of the 
whole python interpreter. Try:

    time python3 your-tail-prog.py /home/marco/lorem.txt

BTW, does your `tail()` print output? If not, again not measuring the 
same thing.

If you have the source of tail(1) to hand, consider getting to the core 
and measuring `time()` immediately before and immediately after the 
central tail operation and printing the result.

Also: does tail(1) do character set / encoding stuff? Does your Python 
code do that? Might be apples and oranges.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list