Unzipping a .zip properly, and from a remote URL

Tino Wildenhain tino at wildenhain.de
Tue Feb 3 08:32:55 EST 2009


Hi,

Christopher Culver wrote:
> Returning to Python after several years away, I'm working on a little
> script that will download a ZIP archive from a website and unzip it to
> a mounted filesystem. The code is below, and it works so far, but I'm
> unsure of a couple of things.
> 
> The first is, is there a way to read the .zip into memory without the
> use of a temporary file? If I do archive = zipfile.ZipFile(remotedata.read())
> directly without creating a temporary file, the zipfile module
> complains that the data is in the wrong string type.

Which makes sense given the documentation (note you can either browse
the HTML online/offline or just use help() within the interpreter/ide:

Help on class ZipFile in module zipfile:

class ZipFile
  |  Class with methods to open, read, write, close, list zip files.
  |
  |  z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True)
  |
  |  file: Either the path to the file, or a file-like object.
  |        If it is a path, the file will be opened and closed by ZipFile.
  |  mode: The mode can be either read "r", write "w" or append "a".
  |  compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires 
zlib).
  |  allowZip64: if True ZipFile will create files with ZIP64 extensions 
when
  |              needed, otherwise it will raise an exception when this 
would
  |              be necessary.
  |
...

so instead you would use archive = zipfile.ZipFile(remotedata)


> The second issue is that I don't know if this is the correct way to
> unpack a file onto the filesystem. It's strange that the zipfile
> module has no one simple function to unpack a zip onto the disk. Does
> this code seem especially liable to break?
> 
>     try:
>         remotedata = urllib2.urlopen(theurl)
>     except IOError:
>         print("Network down.")
>         sys.exit()
>     data = os.tmpfile()
>     data.write(remotedata.read())
> 
>     archive = zipfile.ZipFile(data)
>     if archive.testzip() != None:
>         print "Invalid zipfile"
>         sys.exit()
>     contents = archive.namelist()
> 
>     for item in contents:
...

here you should check the zipinfo entry and normalize
and clean the path just in case to avoid unpacking a zipfile
with special crafted paths (like /etc/passwd and such)

Maybe also checking for the various encodings (like utf8)
in pathnames makes sense.

The dir-creation could be put into a class with caching
of already existing subdirectories created and recursive
creation of missing subdirectories as well es to make
sure you do not ascend out of your target directory by
accident (or crafted zip, see above).

Regards
Tino
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3241 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20090203/b270f615/attachment-0001.bin>


More information about the Python-list mailing list