[issue45509] Gzip header corruption not properly checked.

Mon Nov 22 05:44:01 EST 2021

Ruben Vorderman <r.h.p.vorderman at lumc.nl> added the comment:

1. Quite a lot

I tested it for the two most common use case. 
import timeit
import statistics

WITH_FNAME = """
from gzip import GzipFile, decompress
import io
fileobj = io.BytesIO()
g = GzipFile(fileobj=fileobj, mode='wb', filename='compressable_file')
g.write(b'')
g.close()
data=fileobj.getvalue()
"""
WITH_NO_FLAGS = """
from gzip import decompress
import zlib
data = zlib.compress(b'', wbits=31)
"""

def benchmark(name, setup, loops=10000, runs=10):
    print(f"{name}")
    results = [timeit.timeit("decompress(data)", setup, number=loops) for _ in range(runs)]
    # Calculate microseconds
    results = [(result / loops) * 1_000_000 for result in results]
    print(f"average: {round(statistics.mean(results), 2)}, "
          f"range: {round(min(results), 2)}-{round(max(results),2)} "
          f"stdev: {round(statistics.stdev(results),2)}")

if __name__ == "__main__":
    benchmark("with_fname", WITH_FNAME)
    benchmark("with_noflags", WITH_FNAME)

BEFORE:

with_fname
average: 3.27, range: 3.21-3.36 stdev: 0.05
with_noflags
average: 3.24, range: 3.14-3.37 stdev: 0.07

AFTER:
with_fname
average: 4.98, range: 4.85-5.14 stdev: 0.1
with_noflags
average: 4.87, range: 4.69-5.05 stdev: 0.1

That is a dramatic increase in overhead. (Okay the decompressed data is empty, but still)

2. Haven't tested this yet. But the regression is quite unacceptable already.

3. Not that I know of. But if it is set, it is safe to assume they care. Nevertheless this is a bit of an edge-case.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45509>
_______________________________________