[Python-Dev] Lack of sequential decompression in the zipfile module

Brett Cannon brett at python.org
Fri Feb 16 22:09:10 CET 2007


On 2/16/07, Derek Shockey <derek.shockey at gmail.com> wrote:
> Though I am an avid Python programmer, I've never forayed into the area of
> developing Python itself, so I'm not exactly sure how all this works.
>
> I was confused (and somewhat disturbed) to discover recently that the
> zipfile module offers only one-shot decompression of files, accessible only
> via the read() method. It is my understanding that the module will handle
> files of up to 4 GB in size, and the idea of decompressing 4 GB directly
> into memory makes me a little queasy. Other related modules (zlib, tarfile,
> gzip, bzip2) all offer sequential decompression, but this does not seem to
> be the case for zipfile (even though the underlying zlib makes it easy to
> do).
>
> Since I was writing a script to work with potentially very large zipped
> files, I took it upon myself to write an extract() method for zipfile, which
> is essentially an adaption of the read() method modeled after tarfile's
> extract(). I feel that this is something that should really be provided in
> the zipfile module to make it more usable. I'm wondering if this has been
> discussed before,

Not that I know of, but searching Google would better answer that question.

> or if anyone has ever viewed this as a problem.

Not that I know of.

> I can post
> the code I wrote as a patch, though I'm not sure if my file IO handling is
> as robust as it needs to be for the stdlib. I'd appreciate any insight into
> the issue or direction on where I might proceed from here so as to fix what
> I see as a significant problem.
>

Best way is to post it as a patch to the SF tracker for Python
(http://sourceforge.net/patch/?group_id=5470).  Then hopefully someone
will eventually get to it and have a look.  Just please understand
that it might be a while as it requires someone to take an interest in
your patch to put the time and effort to make sure it up to including.

To help your chances of getting it included, make sure you do the following:

1. Make it match PEP 7/8 style guidelines.
2. Have unit tests.
3. Have proper documentation.  It is okay if it is not in LaTeX if you
don't already know the language.

-Brett


> Thanks,
> Derek
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
>


More information about the Python-Dev mailing list