[New-bugs-announce] [issue28436] GzipFile doesn't properly handle short reads and writes on the underlying stream

Evgeny Kapun report at bugs.python.org
Thu Oct 13 16:29:24 EDT 2016


New submission from Evgeny Kapun:

GzipFile's underlying stream can be a raw stream (such as FileIO), and such streams can return short reads and writes at any time (e.g. due to signals). The correct behavior in case of short read or write is to retry the call to read or write the remaining data.

GzipFile doesn't do this. This program demonstrates the problem with reading:

    import io, gzip

    class MyFileIO(io.FileIO):
        def read(self, n):
            # Emulate short read
            return super().read(1)

    raw = MyFileIO('test.gz', 'rb')
    gzf = gzip.open(raw, 'rb')
    gzf.read()

Output:

    $ gzip -c /dev/null > test.gz
    $ python3 test.py
    Traceback (most recent call last):
      File "test.py", line 10, in <module>
        gzf.read()
      File "/usr/lib/python3.5/gzip.py", line 274, in read
        return self._buffer.read(size)
      File "/usr/lib/python3.5/gzip.py", line 461, in read
        if not self._read_gzip_header():
      File "/usr/lib/python3.5/gzip.py", line 409, in _read_gzip_header
        raise OSError('Not a gzipped file (%r)' % magic)
    OSError: Not a gzipped file (b'\x1f')

And this shows the problem with writing:

    import io, gzip

    class MyIO(io.RawIOBase):
        def write(self, data):
            print(data)
            # Emulate short write
            return 1

    raw = MyIO()
    gzf = gzip.open(raw, 'wb')
    gzf.close()

Output:

    $ python3 test.py 
    b'\x1f\x8b'
    b'\x08'
    b'\x00'
    b'\xb9\xea\xffW'
    b'\x02'
    b'\xff'
    b'\x03\x00'
    b'\x00\x00\x00\x00'
    b'\x00\x00\x00\x00'

It can be seen that there is no attempt to write all the data. Indeed, the return value of write() method is completely ignored.

I think that either gzip module should be changed to handle short reads and writes properly, or its documentation should reflect the fact that it cannot be used with raw streams.

----------
components: Library (Lib)
messages: 278606
nosy: abacabadabacaba
priority: normal
severity: normal
status: open
title: GzipFile doesn't properly handle short reads and writes on the underlying stream
type: behavior
versions: Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28436>
_______________________________________


More information about the New-bugs-announce mailing list