[Python-ideas] zipfile refactor and AES

Steve Barnes GadgetSteve at live.co.uk
Mon Jun 3 17:24:03 EDT 2019


One specific pain point with zipfile is that if you zip a directory that contains the target zip file you end up trying to add the target file to itself which leads to a rapidly growing archive.

From: Python-ideas <python-ideas-bounces+gadgetsteve=live.co.uk at python.org> On Behalf Of Robert Collins
Sent: 03 June 2019 21:12
To: Daniel Hillier <dhilliercode at gmail.com>
Cc: Python-Ideas <python-ideas at python.org>
Subject: Re: [Python-ideas] zipfile refactor and AES

This sounds like a valuable refactoring to me.

Is it API compatible with the current zipfile module docs?

On Mon, 3 Jun 2019, 20:23 Daniel Hillier, <dhilliercode at gmail.com<mailto:dhilliercode at gmail.com>> wrote:
Hi,

I've written a package that can read and write zip files encrypted with
Winzip's AES encryption scheme (https://github.com/danifus/pyzipper/). It is
based on Python's zipfile module which I refactored to facilitate adding the
AES code as subclasses of the classes defined in the zipfile module.

I would like to explore integrating some of the refactoring effort into Python
if they are wanted. While the AES specfic code probably isn't suitable for
integration due to its dependence on a crypto package, my hope is that the
refactor of the zipfile project may be beneficial and, in particular, help
development on other features of the zip spec. I'm happy to rework the
changes for inclusion in Python.

The general goals of the refactor were:
- Keep the AES implementation changes in a separate file so changes to
  zipfile.py could potentially be merged into Python (the cryptically named
  "minimal_pep8" branch contains minimal cosmetic changes to the code
  unless the lines around it were changed. The master branch contains all the
  changes in the minimal_pep8 plus pep8 and a few other changes).
- Add hooks for extending the way zipfile works to enable the addition of AES
  encryption without having to duplicate most of the zipfile module. This
  included adding hooks to:
    - Select and call encrypt and decrypt methods.
    - Read and write new "extra" data records in the central file directory
      and local header records.
    - Provide a mechanism to substitute ZipInfo, ZipExtFile and ZipWriteFile
      classes used in a subclass of ZipFile to ease use of subclassed ZipInfo,
      ZipExtFile or ZipWriteFile. This avoids having to rewrite large parts of
      the zipfile module if we only want to change the behaviour of a small
      part of one of those classes.
- Contain all code that reads the header, contents and tail of a file in the
  archive to within ZipExtFile. Previously reading the header and some other
  things were done in the ZipFile class before handing the rest of the
  processing to ZipExtFile.
- Contain all code that writes the header, contents and tail of a file in the
  archive to within ZipWriteFile. Previously reading the header and some other
  things were done in the ZipFile class before handing the rest of the
  processing to ZipExtFile.
- Move generation of local file header and central directory record content to
  the ZipInfo class to be alongside the data that it is packing.
- Add comments to provide context from the zip spec. Replace explicit numbers
  to variables with explanatory names or adding comments.

Concerns:
- The change set is not small. I've attempted to keep each commit in the
  minimal_pep8 branch focused on a single change to simplify review and kept
  the unrelated changes, like pep8, to a minimum. I'm happy to put in
the work to
  get these changes into a patch for the Python project, if it is deemed useful.
- This could move the internals of zipfile towards a public API (maybe it
  would be?) which brings additional complexity in managing future changes.
- Are the hooks inline with Python's coding style? Is there a different
  approach to extensibility preferred within the Python project?

Let me know any improvements, suggestions or concerns to this refactoring
approach you may have. Happy for feedback even if it isn't about integrating the
code into Python.

Thanks,
Dan
_______________________________________________
Python-ideas mailing list
Python-ideas at python.org<mailto:Python-ideas at python.org>
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190603/f6438178/attachment-0001.html>


More information about the Python-ideas mailing list