Memory errors with large zip files
John Machin
sjmachin at lexicon.net
Sat May 21 05:51:27 EDT 2005
On 20 May 2005 18:04:22 -0700, "Lorn" <efoda5446 at yahoo.com> wrote:
>Ok, I'm not sure if this helps any, but in debugging it a bit I see the
>script stalls on:
>
>newFile.write (zf.read (zfilename))
>
>The memory error generated references line 357 of the zipfile.py
>program at the point of decompression:
>
>elif zinfo.compress_type == ZIP_DEFLATED:
> if not zlib:
> raise RuntimeError, \
> "De-compression requires the (missing) zlib module"
> # zlib compress/decompress code by Jeremy Hylton of CNRI
> dc = zlib.decompressobj(-15)
> bytes = dc.decompress(bytes) ### <------ right here
>
The basic problem is that the zipfile module is asking the "dc" object
to decompress the whole file at once -- so you would need (at least)
enough memory to hold both the compressed file (C) and the
uncompressed file (U). There is also a possibility that this could
rise to 2U instead of U+C -- read a few lines further on:
bytes = bytes + ex
>Is there anyway to modify how my code is approaching this
You're doing the best you can, as far as I can tell.
> or perhaps
>how the zipfile code is handling it
Read this:
http://docs.python.org/lib/module-zlib.html
If you think you can work out how to modify zipfile.py to feed
dc.decompressobj a chunk of data at a time, properly manipulating
dc.unconsumed_tail, and keeping memory usage to a minimum, then go for
it :-)
Reading the source of the Python zlib module, plus this page from the
zlib website could be helpful, perhaps even necessary:
http://www.gzip.org/zlib/zlib_how.html
See also the following post to this newsgroup:
From: John Goerzen <jgoer... at complete.org>
Newsgroups: comp.lang.python
Subject: Fixes to zipfile.py [PATCH]
Date: Fri, 07 Mar 2003 16:39:25 -0600
... his patch obviously wasn't accepted :-(
> or do I need to just invest in more
>RAM? I currently have 512 MB and thought that would be plenty....
>perhaps I was wrong :-(.
Before you do anything rash (hacking zipfile.py or buying more
memory), take a step back for a moment:
Is this a one-off exercise or a regular exercise? Does it *really*
need to be done programatically? There will be at least one
command-line unzipper program for your platform . One-off req't: do it
manually.
Regular: Try using the unzipper manually; if all the available
unzippers on your platform die with a memory allocation problem then
you really have a problem. If it works, then instead of using the
zipfile module, use the unzipper program from your Python code via a
subprocess.
HTH,
John
More information about the Python-list
mailing list