[issue23529] Limit decompressed data when reading from LZMAFile, BZ2File, GzipFile

Martin Panter report at bugs.python.org
Mon Mar 9 01:51:17 CET 2015


Martin Panter added the comment:

I have decided not to reuse the same _DecompressReader for the "gzip" module. This is because the code for parsing the "gzip" header pulls in data by calling read(), and I would have to adapt this code to accept whatever data is pushed to it via a decompress() method. However I have rewritten the “gzip” module to use BufferedReader, removing its own readline() etc implementations.

I have split the BaseStream and DecompressReader classes into a new file called Lib/_compression.py. Let me know if there is anything else to be done to add a new file in the Lib/ directory, or if you have any better ideas where these classes should live.

Other changes in the new patch (LZMAFile-etc.v4.patch):
* Fix documented GzipFile.peek(n) signature to match implementation
* Removed unused and inconsistent code paths in gzip._PaddedFile
* New code assumes zlib.decompressobj().flush() returns a limited amount of data, since it has no “max_length” parameter. In reality I think flush() does not do anything at all and should be deprecated or removed; see Issue 23200.

I hope that the patch does not need any more major changes, so it should be ready for review. There are a few more suggested enhancements that I have not implemented. While they would be nice, I would prefer to handle them in separate bugs and patches. I believe strengthening Python against gzip bombs is important, and any extra refactoring of the code will probably make things harder to review, more likely to break, and less likely to make it into 3.5.

Some possible enhancements I did not do:
* Support open(buffering=...) parameter, passing through to the [...]File(buffer_size=...) parameter
* open(buffering=0) returning an unbuffered reader object, probably just a direct DecompressReader instance
* detach() method on the BufferedIOBase file classes
* Further factoring out of a common CompressedFile base class
* Apply the buffer size parameter to write mode
* Rewrite the gzip module to use the common DecompressReader base class

----------
title: Limit decompressed data when reading from LZMAFile and BZ2File -> Limit decompressed data when reading from LZMAFile, BZ2File, GzipFile
Added file: http://bugs.python.org/file38397/LZMAFile-etc.v4.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23529>
_______________________________________


More information about the Python-bugs-list mailing list