[New-bugs-announce] [issue43317] python -m gzip could use a larger buffer
Ruben Vorderman
report at bugs.python.org
Wed Feb 24 10:06:00 EST 2021
New submission from Ruben Vorderman <r.h.p.vorderman at lumc.nl>:
python -m gzip reads in chunks of 1024 bytes: https://github.com/python/cpython/blob/1f433406bd46fbd00b88223ad64daea6bc9eaadc/Lib/gzip.py#L599
This hurts performance somewhat. Using io.DEFAULT_BUFFER_SIZE will improve it. Also 'io.DEFAULT_BUFFER_SIZE' is better than: 'ARBITRARY_NUMBER_WITH_NO_COMMENT_EXPLAINING_WHY'.
With 1024 blocks
Decompression:
$ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null'
Benchmark #1: cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null
Time (mean ± σ): 926.9 ms ± 7.7 ms [User: 901.2 ms, System: 59.1 ms]
Range (min … max): 913.3 ms … 939.4 ms 10 runs
Compression:
$ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null'
Benchmark #1: cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null
Time (mean ± σ): 2.514 s ± 0.030 s [User: 2.469 s, System: 0.125 s]
Range (min … max): 2.472 s … 2.563 s 10 runs
with io.DEFAULT_BUFFER_SIZE
Decompression:
$ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null'
Benchmark #1: cat ~/test/500000reads.fastq.gz | ./prefix/bin/python3 -m gzip -d > /dev/null
Time (mean ± σ): 839.9 ms ± 7.3 ms [User: 816.0 ms, System: 57.3 ms]
Range (min … max): 830.1 ms … 851.3 ms 10 runs
Compression:
$ hyperfine -r 10 -w 3 'cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null'
Benchmark #1: cat ~/test/500000reads.fastq | ./prefix/bin/python3 -m gzip --fast > /dev/null
Time (mean ± σ): 2.275 s ± 0.024 s [User: 2.247 s, System: 0.096 s]
Range (min … max): 2.254 s … 2.322 s 10 runs
Speedups:
- Decompression 840 / 927 = 0.906 ~= 9% reduction in runtime
- Compression 2.275 / 2.514 = 0.905 ~= 9% reduction in runtime.
It is not stellar, but it is a quite nice improvement for such a tiny change.
----------
components: Library (Lib)
messages: 387624
nosy: rhpvorderman
priority: normal
severity: normal
status: open
title: python -m gzip could use a larger buffer
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43317>
_______________________________________
More information about the New-bugs-announce
mailing list