[Patches] New module zipfile.py

James C. Ahlstrom jim@interet.com
Fri, 11 Feb 2000 11:23:38 -0500


This is a multi-part message in MIME format.
--------------D45D15E3F3D3AD5E54347B06
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

This new module consists of a single Python file, and it
reads and writes ZIP files.  It requires binascii.crc32,
which is a separate patch.  See docs below.

LEGAL
=====
I confirm that, to the best of my knowledge and belief, this
contribution is free of any claims of third parties under
copyright, patent or other rights or interests ("claims").  To
the extent that I have any such claims, I hereby grant to CNRI a
nonexclusive, irrevocable, royalty-free, worldwide license to
reproduce, distribute, perform and/or display publicly, prepare
derivative versions, and otherwise use this contribution as part
of the Python software and its related documentation, or any
derivative versions thereof, at no cost to CNRI or its licensed
users, and to authorize others to do so.

I acknowledge that CNRI may, at its sole discretion, decide
whether or not to incorporate this contribution in the Python
software and its related documentation.  I further grant CNRI
permission to use my name and other identifying information
provided to CNRI by me for use in connection with the Python
software and its related documentation.

DOCS
====

zipfile -- Read and write files in ZIP format

The ZIP file format is a common archive and compression standard.  This
module provides tools to create, read, write, append, and list a ZIP
file.


The available attributes of this module are:

error 
            The error raised for bad ZIP files. 
_debug
            Level of printing, default 1.
class ZipFile 
            The class for reading and writing ZIP files. 
is_zipfile(path) 
         Return 1/0 if "path" is/is not a valid ZIP file based on its
magic
         number. This module does not currently handle ZIP files which
have
         appended comments.
zip2date(zdate) 
         Return (year, month, day) for a zip date code. 
zip2time(ztime) 
         Return (hour, minute, second) for a zip time code. 
date2zip(year, month, day) 
         Return a zip date code. 
time2zip(hour, minute, second) 
         Return a zip time code.
ZIP_STORED
         The numeric constant (zero) for an uncompressed archive.
ZIP_DEFLATED
         The numeric constant for the usual ZIP compression
         method.  This requires the zlib module.  No other
         compression methods are currently supported.

An instance of class ZipFile has self.TOC, a read-only dictionary whose
keys
are the names in the archive, and whose values are tuples as follows:
 
0:  File data seek offset. 
1:  Zip file "extra" data as a string. 
2:  Zip file bit flags. 
3:  Zip file compression type. 
4:  File modification time in DOS format. 
5:  File modification date in DOS format. 
6:  The CRC-32 of the uncompressed data. 
7:  The compressed size of the file. 
8:  The uncompressed size of the file. 

The class ZipFile has these methods: 

__init__(self, filename, mode = "r", compression = 0) 
Open a ZIP file named "filename".  Mode is "r" to read an existing file,
"w" to truncate and write a new file, or "a" to append.  For mode "a",
if filename is a ZIP file, then additional files are added to it.
If filename is not a ZIP file, then a new ZIP file is appended to the
file.
This is meant for adding a ZIP archive to another file such as
python.exe.
But "cat myzip.zip >> python.exe" also works, and at least WinZip can
read such files.
The "compression" is the ZIP compression method to use when writing the
archive.

listdir(self) 
Return a list of names in the archive.  Equivalent to self.TOC.keys(). 

printdir(self) 
Print a table of contents for the archive to stdout. 

read(self, name) 
Return the bytes of the file in the archive.  The archive must be open
for read or append. 

writestr(self, bytes, arcname, year, month, day, hour, minute, second,
extra = "") 
Write the string "bytes" and the other data to the archive, and give it
the name "arcname".  The "extra" is the ZIP extra
data string.  The archive must be open with mode "w" or "a". 

write(self, filename, arcname, extra = "") 
Write the file named "filename" to the archive, and give it the archive
name "arcname".  The "extra" is the ZIP extra data
string.  The archive must be open with mode "w" or "a". 

writepy(self, pathname, basename = "") 
Search for files *.py and add the corresponding file to the archive.
The corresponding file is a *.pyo file if available,
else a *.pyc file, compiling if necessary.  If the pathname is a file,
the file must end with ".py", and just the (corresponding *.py[oc]) file
is
added at the top level (no path information).  If it is a directory, and
the
directory is not a package directory, then all the files *.py[oc] are
added at the top level.
If the directory is a package directory, then all *.py[oc] are added
under the package
name as a file path, and if any subdirectories are package directories,
all of these are
added recursively.  The "basename" is intended for internal use only.
The writepy() method makes archives with file names like this: 
    string.pyc                                # Top level name 
    test/__init__.pyc                         # Package directory 
    test/testall.pyc                          # Package "test.testall"
file 
    test/bogus/__init__.pyc                   # Subpackage directory 
    test/bogus/myfile.pyc                     # Subpackage
"test.bogus.myfile" file 

close(self) 
Close the archive file.  You must call close() before exiting your
program or
essential records will not be written. 

PATCH
=====
See attached

JimA
--------------D45D15E3F3D3AD5E54347B06
Content-Type: text/plain; charset=us-ascii;
 name="zipfile.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="zipfile.py"

"Read and write ZIP files"
# Written by James C. Ahlstrom jim@interet.com
# All rights transferred to CNRI pursuant to the Python contribution agreement

import struct, os, time
import binascii, py_compile

try:
  import zlib	# We may need its compression method
except:
  pass

class _BadZipfile(Exception):
  pass
error = _BadZipfile	# The exception raised by this module

_debug = 1

# constants for Zip file compression methods
ZIP_STORED = 0
ZIP_DEFLATED = 8
# other ZIP compression methods not supported

def is_zipfile(filename):
  """Quickly see if file is a ZIP file by checking the magic number.
Will not accept a ZIP archive with an ending comment."""
  try:
    fpin = open(filename, "rb")
    fpin.seek(-22, 2)		# Seek to end-of-file record
    endrec = fpin.read()
    fpin.close()
    if endrec[0:4] == "PK\005\006" and endrec[-2:] == "\000\000":
      return 1	# file has correct magic number
  except:
    pass

def zip2date(d):
  "Return (year, month, day) for a date in zip format"
  return (d>>9)+1980, (d>>5)&0xF, d&0x1F

def zip2time(t):
  "Return (hour, minute, second) for a time in zip format"
  return t>>11, (t>>5)&0x3F, t&0x1F * 2

def date2zip(year, month, day):
  "Return 16-bit zip date for year, month, day"
  return (year - 1980) << 9 | month << 5 | day

def time2zip(hour, minute, second):
  "Return 16-bit zip time for hour, minute, second"
  return hour << 11 | minute << 5 | second / 2

class ZipFile:
  "Class with methods to open, read, write, close, list zip files"
  # Here are some struct module formats for reading headers
  structEndArchive = "<4s4H2lH"		# 9 items, end of archive, 22 bytes
  stringEndArchive = "PK\005\006"	# magic number for end of archive record
  structCentralDir = "<4s4B4H3i5H2i"	# 19 items, central directory, 46 bytes
  stringCentralDir = "PK\001\002"	# magic number for central directory
  structFileHeader = "<4s2B4H3i2H"	# 12 items, file header record, 30 bytes
  stringFileHeader = "PK\003\004"	# magic number for file header
  def __init__(self, filename, mode = "r", compression = 0):
    """Construct a ZipFile instance and open the ZIP file named "filename" with
mode read "r", write "w" or append "a"."""
    if compression == ZIP_STORED:
      pass
    elif compression == ZIP_DEFLATED:
      try:
        import zlib
      except:
        raise RuntimeError, "Compression requires the (missing) zlib module"
    else:
      raise RuntimeError, "Compression method must be 0 or 8"
    self.TOC = {}	# Table of contents for the archive
    self.compression = compression	# Method of compression
    self.filename = filename
    self.mode = key = mode[0]
    if key == 'r':
      self.fp = open(filename, "rb")
      self._getTOC()
    elif key == 'w':
      self.fp = open(filename, "wb")
    elif key == 'a':
      fp = self.fp = open(filename, "r+b")
      fp.seek(-22, 2)		# Seek to end-of-file record
      endrec = fp.read()
      if endrec[0:4] == self.stringEndArchive and endrec[-2:] == "\000\000":
        self._getTOC()	# file is a zip file
        fp.seek(self.start_dir, 0)	# seek to start of directory and overwrite
      else:		# file is not a zip file, just append
        fp.seek(0, 2)
    else:
      raise RuntimeError, 'Mode must be "r", "w" or "a"'
  def _getTOC(self):
    "Read in the table of contents for the zip file"
    fp = self.fp
    fp.seek(-22, 2)		# Start of end-of-archive record
    filesize = fp.tell() + 22	# Get file size
    endrec = fp.read(22)	# Archive must not end with a comment!
    if endrec[0:4] != self.stringEndArchive or endrec[-2:] != "\000\000":
      raise BadZipfile, "File is not a zip file, or ends with a comment"
    endrec = struct.unpack(self.structEndArchive, endrec)
    if _debug > 2:
      print endrec
    size_cd = endrec[5]		# bytes in central directory
    offset_cd = endrec[6]	# offset of central directory
    x = filesize - 22 - size_cd
    concat = x - offset_cd	# zero, unless zip was concatenated to another file
    if _debug > 2:
      print "given, inferred, offset", offset_cd, x, concat
    self.start_dir = offset_cd + concat	# Position of start of central directory
    fp.seek(self.start_dir, 0)
    total = 0
    flist = []		# List of file header offsets
    while total < size_cd:
      centdir = fp.read(46)
      total = total + 46
      if centdir[0:4] != self.stringCentralDir:
        raise BadZipfile, "Bad magic number for central directory"
      centdir = struct.unpack(self.structCentralDir, centdir)
      if _debug > 2:
        print centdir
      fname = fp.read(centdir[12])
      extra = fp.read(centdir[13])
      comment = fp.read(centdir[14])
      total = total + centdir[12] + centdir[13] + centdir[14]
      if _debug > 2:
        print "total", total
      flist.append(centdir[18])	# Offset of file header record
    if _debug > 2:
      print flist
    toc = self.TOC	# Table of contents
    for offset in flist:
      fp.seek(offset + concat, 0)
      fheader = fp.read(30)
      if fheader[0:4] != self.stringFileHeader:
        raise BadZipfile, "Bad magic number for file header"
      fheader = struct.unpack(self.structFileHeader, fheader)
      fname = fp.read(fheader[10])
      if _debug > 1:
        print "File", fname, fheader
      extra = fp.read(fheader[11])
      toc[fname] = (fp.tell(), extra) + fheader[3:10]
      # toc key is the file name, value is:
      # 0:file offset, 1:extra data as string, 2:bit flags, 3:compression type,
      # 4:file time, 5:file date, 6:CRC-32, 7:compressed size, 
      # 8:uncompressed size.
  def listdir(self):
    return self.TOC.keys()
  def printdir(self):
    "Print table of contents for zip file"
    toc = self.TOC
    if _debug > 2:
      print toc
    print "%-30s %19s %12s" % ("File Name", "Modified    ", "Size")
    for name, data in toc.items():
      bytes = self.read(name)	# Just to check CRC-32
      d = data[5]	# Date
      t = data[4]	# Time
      date = "%d-%02d-%02d %02d:%02d:%02d" % (zip2date(d) + zip2time(t))
      print "%-30s %s %12d" % (name, date, data[8])
  def read(self, name):
    "Return file bytes (as a string) for name"
    if self.mode not in ("r", "a"):
      raise RuntimeError, 'read() requires mode "r" or "a"'
    if not self.fp:
      raise RuntimeError, "Attempt to read ZIP archive that was already closed"
    data = self.TOC[name]
    filepos = self.fp.tell()
    self.fp.seek(data[0], 0)
    bytes = self.fp.read(data[7])
    self.fp.seek(filepos, 0)
    if data[3] == ZIP_STORED:		# Compression method: none
      pass
    elif data[3] == ZIP_DEFLATED:	# Compression method: deflation
      # zlib compress/decompress code by Jeremy Hylton of CNRI
      dc = zlib.decompressobj(-15)
      bytes = dc.decompress(bytes)
      # need to feed in unused pad byte so that zlib won't choke
      ex = dc.decompress('Z') + dc.flush()
      if ex:
        bytes = bytes + ex
    else:
      raise BadZipfile, "Unsupported compression method %d for file %s" % (data[3], name)
    crc = binascii.crc32(bytes)
    if crc != data[6]:
      raise BadZipfile, "Bad CRC-32 for file %s" % name
    return bytes
  def write(self, filename, arcname, extra = ""):
    """Put the bytes from filename into the archive under the name arcname.
The "extra" is the extra data string."""
    mtime = os.stat(filename)[8]
    mtime = time.localtime(mtime)
    year, month, day, hour, minute, second = mtime[0:6]
    fp = open(filename, "rb")
    bytes = fp.read()
    fp.close()
    self.writestr(bytes, arcname, year, month, day, hour, minute, second, extra)
  def writestr(self, bytes, arcname, year, month, day, hour, minute, second, extra = ""):
    """Write bytes and other data into the archive under the name arcname.
The "extra" is the extra data string."""
    if self.TOC.has_key(arcname):	# Warning for duplicate names
      if _debug:
        print "Duplicate name:", arcname
    if self.mode not in ("w", "a"):
      raise RuntimeError, 'write() requires mode "w" or "a"'
    if not self.fp:
      raise RuntimeError, "Attempt to write ZIP archive that was already closed"
    dosdate = date2zip(year, month, day)
    dostime = time2zip(hour, minute, second)
    compression = self.compression	# Method of compression
    u_size = len(bytes)		# Uncompressed size
    crc = binascii.crc32(bytes)		# CRC-32 checksum
    if compression == ZIP_DEFLATED:
      co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15)
      bytes = co.compress(bytes) + co.flush()
      c_size = len(bytes)	# Compressed size
    else:
      c_size = u_size
    header = struct.pack(self.structFileHeader, self.stringFileHeader,
       10, 0, 0, compression, dostime, dosdate,
       crc, c_size, u_size, len(arcname), len(extra))
    self.fp.write(header)
    self.fp.write(arcname)
    self.fp.write(extra)
    offset = self.fp.tell()	# Start of file bytes
    self.fp.write(bytes)
    self.TOC[arcname] = (offset, extra, 0, compression,
       dostime, dosdate, crc, c_size, u_size)
  def writepy(self, pathname, basename = ""):
    """If pathname is a package directory, search the directory and all package
subdirectories recursively for all *.py and enter the modules into the archive.
If pathname is a plain directory, listdir *.py and enter all modules.
Else, pathname must be a Python *.py file and the module will be put into the archive.
Added modules are always module.pyo or module.pyc.
This method will compile the module.py into module.pyc if necessary."""
    dir, name = os.path.split(pathname)
    if os.path.isdir(pathname):
      initname = os.path.join(pathname, "__init__.py")
      if os.path.isfile(initname):
        # This is a package directory, add it
        if basename:
          basename = "%s/%s" % (basename, name)
        else:
          basename = name
        if _debug:
          print "Adding package in", pathname, "as", basename
        fname, arcname = self._get_codename(initname[0:-3], basename)
        if _debug:
          print "Adding", arcname
        self.write(fname, arcname)
        dirlist = os.listdir(pathname)
        dirlist.remove("__init__.py")
        # Add all *.py files and package subdirectories in the directory
        for filename in dirlist:
          path = os.path.join(pathname, filename)
          root, ext = os.path.splitext(filename)
          if os.path.isdir(path):
            if os.path.isfile(os.path.join(path, "__init__.py")):
              # This is a package directory, add it
              self.writepy(path, basename)	# Recursive call
          elif ext == ".py":
            fname, arcname = self._get_codename(path[0:-3], basename)
            if _debug:
              print "Adding", arcname
            self.write(fname, arcname)
      else:
        # This is NOT a package directory, add its files at top level
        if _debug:
          print "Adding files from directory", pathname
        for filename in os.listdir(pathname):
          path = os.path.join(pathname, filename)
          root, ext = os.path.splitext(filename)
          if ext == ".py":
            fname, arcname = self._get_codename(path[0:-3], basename)
            if _debug:
              print "Adding", arcname
            self.write(fname, arcname)
    else:
      if pathname[-3:] != ".py":
        raise RuntimeError, 'Files added with writepy() must end with ".py"'
      fname, arcname = self._get_codename(pathname[0:-3], basename)
      if _debug:
        print "Adding file", arcname
      self.write(fname, arcname)
  def _get_codename(self, pathname, basename):
    """Given a module name path, return the correct file path
and archive name, compiling if necessary.
For example, for /python/lib/string, return (/python/lib/string.pyc, string)"""
    file_py  = pathname + ".py"
    file_pyc = pathname + ".pyc"
    file_pyo = pathname + ".pyo"
    if os.path.isfile(file_pyo) and os.stat(file_pyo)[8] >= os.stat(file_py)[8]:
      fname = file_pyo	# Use .pyo file
    elif not os.path.isfile(file_pyc) or os.stat(file_pyc)[8] < os.stat(file_py)[8]:
      if _debug:
        print "Compiling", file_py
      py_compile.compile(file_py, file_pyc)
      fname = file_pyc
    else:
      fname = file_pyc
    archivename = os.path.split(fname)[1]
    if basename:
      archivename = "%s/%s" % (basename, archivename)
    return (fname, archivename)
  def __del__(self):	# User should have called close(), but just in case
    if self.fp:
      self.fp.close()
      self.fp = None
  def close(self):
    if self.mode in ("w", "a"):		# write ending records
      attrib = 0666 << 16		# file attributes
      count = 0
      pos1 = self.fp.tell()
      for name, data in self.TOC.items():	# write central directory
        count = count + 1
        namesize = len(name)
        extrasize = len(data[1])
        centdir = struct.pack(self.structCentralDir, self.stringCentralDir,
            20, 3, 10, 0, data[2], data[3], data[4], data[5], data[6], data[7],
            data[8], namesize, extrasize, 0, 0, 0, attrib,
            data[0] - namesize - extrasize - 30)
        self.fp.write(centdir)
        self.fp.write(name)
        self.fp.write(data[1])
      pos2 = self.fp.tell()
      endrec = struct.pack(self.structEndArchive, self.stringEndArchive,
             0, 0, count, count, pos2 - pos1, pos1, 0)
      self.fp.write(endrec)
    self.fp.close()
    self.fp = None

def test():
  if 1:
    # Run from Python home directory
    z = ZipFile("temp.zip", "w")
    z.writepy("Lib")	# Directory ./Lib must exist
    z.close()
    z = ZipFile("temp.zip", "r")
    z.printdir()
    z.close()
  if 0:
    z = ZipFile("jim1.zip", "w")
    z.write("jim.1", "jim.1")
    z.write("jim.2", "users/jim/jim.2")
    z.close()
  if 0:
    z = ZipFile("jim2.zip", "w")
    z.writepy("N:/prd/winlease/vest")
    z.write("jim.1", "jim.1")
    z.writepy("N:/python/Python-1.5.2/Lib/string.py")
    z.writepy("N:/python/Python-1.5.2/Lib/test")
    z.close()
  if 0:
    fp = open("python.exe", "rb")
    bytes = fp.read()
    fp.close()
    fp = open("python2.exe", "wb")
    fp.write(bytes)
    fp.close()
    z = ZipFile("python2.exe", "a")
    z.write("jim.2", "append/users/jim/jim.2")
    z.close()
  if 0:
    z = ZipFile("python2.exe", "r")
    z.printdir()
    z.close()
  if 0:
    z = ZipFile("jimcomp.zip", "w", ZIP_DEFLATED)
    z.write("jim.1", "jim.1")
    z.write("jim.2", "compress/jim.2")
    z.close()
  if 0:
    z = ZipFile("winzip.zip", "r")
    z.printdir()
    z.close()

--------------D45D15E3F3D3AD5E54347B06--