[issue36103] Increase shutil.COPY_BUFSIZE

Inada Naoki report at bugs.python.org
Tue Feb 26 02:12:05 EST 2019


Inada Naoki <songofacandy at gmail.com> added the comment:

>
> desbma <dutch109 at gmail.com> added the comment:
>
> If you do a benchmark by reading from a file, and then writing to /dev/null several times, without clearing caches, you are measuring *only* the syscall overhead:
> * input data is read from the Linux page cache, not the file on your SSD itself

Yes.  I measures syscall overhead to determine reasonable buffer size.
shutil may be used when page cache is warm.

> * no data is written (obviously because output is /dev/null)

As I said before, my SSD doesn't have stable write performance.  (It
is typical for consumer SSD).
So this is intensional.
And there are use cases copy from/to io.BytesIO or other file-like objects.

>
> Your current command line also measures open/close timings, without that I think the speed should linearly increase when doubling buffer size, but of course this is misleading, because its a synthetic benchmark.

I'm not measuring speed of my cheap SSD.  The goal of this benchmark is finding
reasonable buffer size.
There are vary real usages.  So reducing syscall overhead with
reasonable buffer size
is worth enough.

>
> Also if you clear caches in between tests, and  write the output file to the SSD itself, sendfile will be used, and should be even faster.

No.  sendfile is not used by shutil.copyfileobj, even if dst is real
file on disk.

>
> So again I'm not sure this means much compared to real world usage.
>

"Real world usage" is vary.  Sometime it is not affected.  Sometime it affects.

On the other hand, what is the cons of changing 16KiB to 64KiB?
Windows used 1MiB already.  And CPython runtime uses a few MBs of memory too.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36103>
_______________________________________


More information about the Python-bugs-list mailing list