[Python-checkins] bpo-21417: Add compresslevel= to the zipfile module (GH-5385)

Gregory P. Smith webhook-mailer at python.org
Tue Jan 30 00:54:10 EST 2018


https://github.com/python/cpython/commit/ce237c7d58ba207575cdfb0195a58a6407fbf717
commit: ce237c7d58ba207575cdfb0195a58a6407fbf717
branch: master
author: Bo Bayles <bbayles at gmail.com>
committer: Gregory P. Smith <greg at krypto.org>
date: 2018-01-29T21:54:07-08:00
summary:

bpo-21417: Add compresslevel= to the zipfile module (GH-5385)

This allows the compression level to be specified when writing zipfiles
(for the entire file *and* overridden on a per-file basis).

Contributed by Bo Bayles

files:
A Misc/NEWS.d/next/Library/2018-01-28-07-55-10.bpo-21417.JFnV99.rst
M Doc/library/zipfile.rst
M Lib/test/test_zipfile.py
M Lib/zipfile.py

diff --git a/Doc/library/zipfile.rst b/Doc/library/zipfile.rst
index 5b8c776ed648..d58efe0b4175 100644
--- a/Doc/library/zipfile.rst
+++ b/Doc/library/zipfile.rst
@@ -130,10 +130,12 @@ ZipFile Objects
 ---------------
 
 
-.. class:: ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True)
+.. class:: ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, \
+                   compresslevel=None)
 
    Open a ZIP file, where *file* can be a path to a file (a string), a
    file-like object or a :term:`path-like object`.
+
    The *mode* parameter should be ``'r'`` to read an existing
    file, ``'w'`` to truncate and write a new file, ``'a'`` to append to an
    existing file, or ``'x'`` to exclusively create and write a new file.
@@ -145,16 +147,27 @@ ZipFile Objects
    adding a ZIP archive to another file (such as :file:`python.exe`).  If
    *mode* is ``'a'`` and the file does not exist at all, it is created.
    If *mode* is ``'r'`` or ``'a'``, the file should be seekable.
+
    *compression* is the ZIP compression method to use when writing the archive,
    and should be :const:`ZIP_STORED`, :const:`ZIP_DEFLATED`,
    :const:`ZIP_BZIP2` or :const:`ZIP_LZMA`; unrecognized
-   values will cause :exc:`NotImplementedError` to be raised.  If :const:`ZIP_DEFLATED`,
-   :const:`ZIP_BZIP2` or :const:`ZIP_LZMA` is specified but the corresponding module
-   (:mod:`zlib`, :mod:`bz2` or :mod:`lzma`) is not available, :exc:`RuntimeError`
-   is raised. The default is :const:`ZIP_STORED`.  If *allowZip64* is
-   ``True`` (the default) zipfile will create ZIP files that use the ZIP64
-   extensions when the zipfile is larger than 4 GiB. If it is  false :mod:`zipfile`
-   will raise an exception when the ZIP file would require ZIP64 extensions.
+   values will cause :exc:`NotImplementedError` to be raised.  If
+   :const:`ZIP_DEFLATED`, :const:`ZIP_BZIP2` or :const:`ZIP_LZMA` is specified
+   but the corresponding module (:mod:`zlib`, :mod:`bz2` or :mod:`lzma`) is not
+   available, :exc:`RuntimeError` is raised. The default is :const:`ZIP_STORED`.
+
+   If *allowZip64* is ``True`` (the default) zipfile will create ZIP files that
+   use the ZIP64 extensions when the zipfile is larger than 4 GiB. If it is
+   ``false`` :mod:`zipfile` will raise an exception when the ZIP file would
+   require ZIP64 extensions.
+
+   The *compresslevel* parameter controls the compression level to use when
+   writing files to the archive.
+   When using :const:`ZIP_STORED` or :const:`ZIP_LZMA` it has no effect.
+   When using :const:`ZIP_DEFLATED` integers ``0`` through ``9`` are accepted
+   (see :class:`zlib <zlib.compressobj>` for more information).
+   When using :const:`ZIP_BZIP2` integers ``1`` through ``9`` are accepted
+   (see :class:`bz2 <bz2.BZ2File>` for more information).
 
    If the file is created with mode ``'w'``, ``'x'`` or ``'a'`` and then
    :meth:`closed <close>` without adding any files to the archive, the appropriate
@@ -187,6 +200,9 @@ ZipFile Objects
    .. versionchanged:: 3.6.2
       The *file* parameter accepts a :term:`path-like object`.
 
+   .. versionchanged:: 3.7
+      Add the *compresslevel* parameter.
+
 
 .. method:: ZipFile.close()
 
@@ -351,13 +367,15 @@ ZipFile Objects
       :exc:`ValueError`.  Previously, a :exc:`RuntimeError` was raised.
 
 
-.. method:: ZipFile.write(filename, arcname=None, compress_type=None)
+.. method:: ZipFile.write(filename, arcname=None, compress_type=None, \
+                          compresslevel=None)
 
    Write the file named *filename* to the archive, giving it the archive name
    *arcname* (by default, this will be the same as *filename*, but without a drive
    letter and with leading path separators removed).  If given, *compress_type*
    overrides the value given for the *compression* parameter to the constructor for
-   the new entry.
+   the new entry. Similarly, *compresslevel* will override the constructor if
+   given.
    The archive must be open with mode ``'w'``, ``'x'`` or ``'a'``.
 
    .. note::
@@ -383,7 +401,8 @@ ZipFile Objects
       a :exc:`RuntimeError` was raised.
 
 
-.. method:: ZipFile.writestr(zinfo_or_arcname, data[, compress_type])
+.. method:: ZipFile.writestr(zinfo_or_arcname, data, compress_type=None, \
+                             compresslevel=None)
 
    Write the string *data* to the archive; *zinfo_or_arcname* is either the file
    name it will be given in the archive, or a :class:`ZipInfo` instance.  If it's
@@ -393,7 +412,8 @@ ZipFile Objects
 
    If given, *compress_type* overrides the value given for the *compression*
    parameter to the constructor for the new entry, or in the *zinfo_or_arcname*
-   (if that is a :class:`ZipInfo` instance).
+   (if that is a :class:`ZipInfo` instance). Similarly, *compresslevel* will
+   override the constructor if given.
 
    .. note::
 
diff --git a/Lib/test/test_zipfile.py b/Lib/test/test_zipfile.py
index 3bc867ea51c9..94db858a1517 100644
--- a/Lib/test/test_zipfile.py
+++ b/Lib/test/test_zipfile.py
@@ -53,9 +53,10 @@ def setUp(self):
         with open(TESTFN, "wb") as fp:
             fp.write(self.data)
 
-    def make_test_archive(self, f, compression):
+    def make_test_archive(self, f, compression, compresslevel=None):
+        kwargs = {'compression': compression, 'compresslevel': compresslevel}
         # Create the ZIP archive
-        with zipfile.ZipFile(f, "w", compression) as zipfp:
+        with zipfile.ZipFile(f, "w", **kwargs) as zipfp:
             zipfp.write(TESTFN, "another.name")
             zipfp.write(TESTFN, TESTFN)
             zipfp.writestr("strfile", self.data)
@@ -63,8 +64,8 @@ def make_test_archive(self, f, compression):
                 for line in self.line_gen:
                     f.write(line)
 
-    def zip_test(self, f, compression):
-        self.make_test_archive(f, compression)
+    def zip_test(self, f, compression, compresslevel=None):
+        self.make_test_archive(f, compression, compresslevel)
 
         # Read the ZIP archive
         with zipfile.ZipFile(f, "r", compression) as zipfp:
@@ -297,6 +298,22 @@ def test_writestr_compression(self):
         info = zipfp.getinfo('b.txt')
         self.assertEqual(info.compress_type, self.compression)
 
+    def test_writestr_compresslevel(self):
+        zipfp = zipfile.ZipFile(TESTFN2, "w", compresslevel=1)
+        zipfp.writestr("a.txt", "hello world", compress_type=self.compression)
+        zipfp.writestr("b.txt", "hello world", compress_type=self.compression,
+                       compresslevel=2)
+
+        # Compression level follows the constructor.
+        a_info = zipfp.getinfo('a.txt')
+        self.assertEqual(a_info.compress_type, self.compression)
+        self.assertEqual(a_info._compresslevel, 1)
+
+        # Compression level is overridden.
+        b_info = zipfp.getinfo('b.txt')
+        self.assertEqual(b_info.compress_type, self.compression)
+        self.assertEqual(b_info._compresslevel, 2)
+
     def test_read_return_size(self):
         # Issue #9837: ZipExtFile.read() shouldn't return more bytes
         # than requested.
@@ -370,6 +387,21 @@ def test_repr(self):
                 self.assertIn('[closed]', repr(zipopen))
             self.assertIn('[closed]', repr(zipfp))
 
+    def test_compresslevel_basic(self):
+        for f in get_files(self):
+            self.zip_test(f, self.compression, compresslevel=9)
+
+    def test_per_file_compresslevel(self):
+        """Check that files within a Zip archive can have different
+        compression levels."""
+        with zipfile.ZipFile(TESTFN2, "w", compresslevel=1) as zipfp:
+            zipfp.write(TESTFN, 'compress_1')
+            zipfp.write(TESTFN, 'compress_9', compresslevel=9)
+            one_info = zipfp.getinfo('compress_1')
+            nine_info = zipfp.getinfo('compress_9')
+            self.assertEqual(one_info._compresslevel, 1)
+            self.assertEqual(nine_info._compresslevel, 9)
+
     def tearDown(self):
         unlink(TESTFN)
         unlink(TESTFN2)
diff --git a/Lib/zipfile.py b/Lib/zipfile.py
index 37ce3281e092..f9db45f58a2b 100644
--- a/Lib/zipfile.py
+++ b/Lib/zipfile.py
@@ -295,6 +295,7 @@ class ZipInfo (object):
         'filename',
         'date_time',
         'compress_type',
+        '_compresslevel',
         'comment',
         'extra',
         'create_system',
@@ -334,6 +335,7 @@ def __init__(self, filename="NoName", date_time=(1980,1,1,0,0,0)):
 
         # Standard values:
         self.compress_type = ZIP_STORED # Type of compression for the file
+        self._compresslevel = None      # Level for the compressor
         self.comment = b""              # Comment for each file
         self.extra = b""                # ZIP extra data
         if sys.platform == 'win32':
@@ -654,12 +656,16 @@ def _check_compression(compression):
         raise NotImplementedError("That compression method is not supported")
 
 
-def _get_compressor(compress_type):
+def _get_compressor(compress_type, compresslevel=None):
     if compress_type == ZIP_DEFLATED:
-        return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
-                                zlib.DEFLATED, -15)
+        if compresslevel is not None:
+            return zlib.compressobj(compresslevel, zlib.DEFLATED, -15)
+        return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15)
     elif compress_type == ZIP_BZIP2:
+        if compresslevel is not None:
+            return bz2.BZ2Compressor(compresslevel)
         return bz2.BZ2Compressor()
+    # compresslevel is ignored for ZIP_LZMA
     elif compress_type == ZIP_LZMA:
         return LZMACompressor()
     else:
@@ -963,7 +969,8 @@ def __init__(self, zf, zinfo, zip64):
         self._zinfo = zinfo
         self._zip64 = zip64
         self._zipfile = zf
-        self._compressor = _get_compressor(zinfo.compress_type)
+        self._compressor = _get_compressor(zinfo.compress_type,
+                                           zinfo._compresslevel)
         self._file_size = 0
         self._compress_size = 0
         self._crc = 0
@@ -1035,7 +1042,8 @@ def close(self):
 class ZipFile:
     """ Class with methods to open, read, write, close, list zip files.
 
-    z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True)
+    z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True,
+                compresslevel=None)
 
     file: Either the path to the file, or a file-like object.
           If it is a path, the file will be opened and closed by ZipFile.
@@ -1046,13 +1054,19 @@ class ZipFile:
     allowZip64: if True ZipFile will create files with ZIP64 extensions when
                 needed, otherwise it will raise an exception when this would
                 be necessary.
+    compresslevel: None (default for the given compression type) or an integer
+                   specifying the level to pass to the compressor.
+                   When using ZIP_STORED or ZIP_LZMA this keyword has no effect.
+                   When using ZIP_DEFLATED integers 0 through 9 are accepted.
+                   When using ZIP_BZIP2 integers 1 through 9 are accepted.
 
     """
 
     fp = None                   # Set here since __del__ checks it
     _windows_illegal_name_trans_table = None
 
-    def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=True):
+    def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=True,
+                 compresslevel=None):
         """Open the ZIP file with mode read 'r', write 'w', exclusive create 'x',
         or append 'a'."""
         if mode not in ('r', 'w', 'x', 'a'):
@@ -1066,6 +1080,7 @@ def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=True):
         self.NameToInfo = {}    # Find file info given name
         self.filelist = []      # List of ZipInfo instances for archive
         self.compression = compression  # Method of compression
+        self.compresslevel = compresslevel
         self.mode = mode
         self.pwd = None
         self._comment = b''
@@ -1342,6 +1357,7 @@ def open(self, name, mode="r", pwd=None, *, force_zip64=False):
         elif mode == 'w':
             zinfo = ZipInfo(name)
             zinfo.compress_type = self.compression
+            zinfo._compresslevel = self.compresslevel
         else:
             # Get info object for name
             zinfo = self.getinfo(name)
@@ -1575,7 +1591,8 @@ def _writecheck(self, zinfo):
                 raise LargeZipFile(requires_zip64 +
                                    " would require ZIP64 extensions")
 
-    def write(self, filename, arcname=None, compress_type=None):
+    def write(self, filename, arcname=None,
+              compress_type=None, compresslevel=None):
         """Put the bytes from filename into the archive under the name
         arcname."""
         if not self.fp:
@@ -1597,6 +1614,11 @@ def write(self, filename, arcname=None, compress_type=None):
             else:
                 zinfo.compress_type = self.compression
 
+            if compresslevel is not None:
+                zinfo._compresslevel = compresslevel
+            else:
+                zinfo._compresslevel = self.compresslevel
+
         if zinfo.is_dir():
             with self._lock:
                 if self._seekable:
@@ -1617,7 +1639,8 @@ def write(self, filename, arcname=None, compress_type=None):
             with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
                 shutil.copyfileobj(src, dest, 1024*8)
 
-    def writestr(self, zinfo_or_arcname, data, compress_type=None):
+    def writestr(self, zinfo_or_arcname, data,
+                 compress_type=None, compresslevel=None):
         """Write a file into the archive.  The contents is 'data', which
         may be either a 'str' or a 'bytes' instance; if it is a 'str',
         it is encoded as UTF-8 first.
@@ -1629,6 +1652,7 @@ def writestr(self, zinfo_or_arcname, data, compress_type=None):
             zinfo = ZipInfo(filename=zinfo_or_arcname,
                             date_time=time.localtime(time.time())[:6])
             zinfo.compress_type = self.compression
+            zinfo._compresslevel = self.compresslevel
             if zinfo.filename[-1] == '/':
                 zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x
                 zinfo.external_attr |= 0x10           # MS-DOS directory flag
@@ -1648,6 +1672,9 @@ def writestr(self, zinfo_or_arcname, data, compress_type=None):
         if compress_type is not None:
             zinfo.compress_type = compress_type
 
+        if compresslevel is not None:
+            zinfo._compresslevel = compresslevel
+
         zinfo.file_size = len(data)            # Uncompressed size
         with self._lock:
             with self.open(zinfo, mode='w') as dest:
diff --git a/Misc/NEWS.d/next/Library/2018-01-28-07-55-10.bpo-21417.JFnV99.rst b/Misc/NEWS.d/next/Library/2018-01-28-07-55-10.bpo-21417.JFnV99.rst
new file mode 100644
index 000000000000..50207a0e4c33
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2018-01-28-07-55-10.bpo-21417.JFnV99.rst
@@ -0,0 +1 @@
+Added support for setting the compression level for zipfile.ZipFile.



More information about the Python-checkins mailing list