[Python-ideas] speeding up shutil.copy*()

Charles-François Natali cf.natali at gmail.com
Sun Mar 3 19:40:04 CET 2013


> This allocates and frees a lot of buffers, and could be optimized with
> readinto().
> Unfortunately, I don't think we can change copyfileobj(), because it
> might be passed objects that don't implement readinto().

Or we could just use:
if hasattr(fileobj, 'readinto')

hoping that readinto() is really a readinto() implementation and not
an unrelated method :-)

> sendfile() is a Linux-only syscall. It's also limited to certain kinds
> of file descriptors. The limitations have been lifted in recent kernel
> versions.

No, it's not Linux-only, many BSD also have it, although all don't
support an arbitrary output file descriptor (Solaris does allow
regular files too). It would be possible to catch EINVAL/EBADF, and
fall back to a regular copy loop.

Note that the above benchmark is really biased by writing the data to
/dev/null: with a real target file, the zero-copy wouldn't bring such
a large gain, because the bottleneck will really be the I/O devices
(also a read()/write() loop is more expensive in Python than in C).
But I see at least two cases where it could be interesting: when
reading/writing from/to a tmpfs partition, or when the source and
target files are on different disks.

I'm not sure it's worth it though, that's why I'm asking here :-) (but
I do think readinto() is interesting).



More information about the Python-ideas mailing list