[issue45509] Gzip header corruption not properly checked.
Ruben Vorderman
report at bugs.python.org
Mon Nov 22 05:44:01 EST 2021
Ruben Vorderman <r.h.p.vorderman at lumc.nl> added the comment:
1. Quite a lot
I tested it for the two most common use case.
import timeit
import statistics
WITH_FNAME = """
from gzip import GzipFile, decompress
import io
fileobj = io.BytesIO()
g = GzipFile(fileobj=fileobj, mode='wb', filename='compressable_file')
g.write(b'')
g.close()
data=fileobj.getvalue()
"""
WITH_NO_FLAGS = """
from gzip import decompress
import zlib
data = zlib.compress(b'', wbits=31)
"""
def benchmark(name, setup, loops=10000, runs=10):
print(f"{name}")
results = [timeit.timeit("decompress(data)", setup, number=loops) for _ in range(runs)]
# Calculate microseconds
results = [(result / loops) * 1_000_000 for result in results]
print(f"average: {round(statistics.mean(results), 2)}, "
f"range: {round(min(results), 2)}-{round(max(results),2)} "
f"stdev: {round(statistics.stdev(results),2)}")
if __name__ == "__main__":
benchmark("with_fname", WITH_FNAME)
benchmark("with_noflags", WITH_FNAME)
BEFORE:
with_fname
average: 3.27, range: 3.21-3.36 stdev: 0.05
with_noflags
average: 3.24, range: 3.14-3.37 stdev: 0.07
AFTER:
with_fname
average: 4.98, range: 4.85-5.14 stdev: 0.1
with_noflags
average: 4.87, range: 4.69-5.05 stdev: 0.1
That is a dramatic increase in overhead. (Okay the decompressed data is empty, but still)
2. Haven't tested this yet. But the regression is quite unacceptable already.
3. Not that I know of. But if it is set, it is safe to assume they care. Nevertheless this is a bit of an edge-case.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45509>
_______________________________________
More information about the Python-bugs-list
mailing list