Memory errors with large zip files

John Machin sjmachin at lexicon.net
Sat May 21 05:51:27 EDT 2005


On 20 May 2005 18:04:22 -0700, "Lorn" <efoda5446 at yahoo.com> wrote:

>Ok, I'm not sure if this helps any, but in debugging it a bit I see the
>script stalls on:
>
>newFile.write (zf.read (zfilename))
>
>The memory error generated references line 357 of  the zipfile.py
>program at the point of decompression:
>
>elif zinfo.compress_type == ZIP_DEFLATED:
>   if not zlib:
>      raise RuntimeError, \
>      "De-compression requires the (missing) zlib module"
>      # zlib compress/decompress code by Jeremy Hylton of CNRI
>            dc = zlib.decompressobj(-15)
>            bytes = dc.decompress(bytes)  ###  <------ right here
>


The basic problem is that the zipfile module is asking the "dc" object
to decompress the whole file at once -- so you would need (at least)
enough memory to hold both the compressed file (C) and the
uncompressed file (U). There is also a possibility that this could
rise to 2U instead of U+C -- read a few lines further on:

bytes = bytes + ex

>Is there anyway to modify how my code is approaching this

You're doing the best you can, as far as I can tell.

> or perhaps
>how the zipfile code is handling it

Read this:
http://docs.python.org/lib/module-zlib.html

If you think you can work out how to modify zipfile.py to feed
dc.decompressobj a chunk of data at a time, properly manipulating
dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
it :-)

Reading the source of the Python zlib module, plus this page from the
zlib website could be helpful, perhaps even necessary:
http://www.gzip.org/zlib/zlib_how.html

See also the following post to this newsgroup:
From: John Goerzen <jgoer... at complete.org>
Newsgroups: comp.lang.python
Subject: Fixes to zipfile.py [PATCH]
Date: Fri, 07 Mar 2003 16:39:25 -0600

... his patch obviously wasn't accepted :-(


> or do I need to just invest in more
>RAM? I currently have 512 MB and thought that would be plenty....
>perhaps I was wrong :-(.

Before you do anything rash (hacking zipfile.py or buying more
memory), take a step back for a moment:

Is this a one-off exercise or a regular exercise? Does it *really*
need to be done programatically? There will be at least one
command-line unzipper program for your platform . One-off req't: do it
manually.
Regular: Try using the unzipper manually; if all the available
unzippers on your platform die with a memory allocation problem then
you really have a problem. If it works, then instead of using the
zipfile module, use the unzipper program from your Python code via a
subprocess.

HTH,
John





More information about the Python-list mailing list