BadZipfile "file is not a zip file"

Carl Banks pavlovevidence at gmail.com
Fri Jan 9 03:46:27 EST 2009


On Jan 9, 2:16 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Thu, 08 Jan 2009 16:47:39 -0800, webcomm wrote:
> > The error...
> ...
> > BadZipfile: File is not a zip file
>
> > When I look at data.zip in Windows, it appears to be a valid zip file.
> > I am able to uncompress it in Windows XP, and can also uncompress it
> > with 7-Zip.  It looks like zipfile is not able to read a "table of
> > contents" in the zip file.  That's not a concept I'm familiar with.
>
> No, ZipFile can read table of contents:
>
>     Help on method printdir in module zipfile:
>
>     printdir(self) unbound zipfile.ZipFile method
>         Print a table of contents for the zip file.
>
> In my experience, zip files originating from Windows sometimes have
> garbage at the end of the file. WinZip just ignores the garbage, but
> other tools sometimes don't -- if I recall correctly, Linux unzip
> successfully unzips the file but then complains that the file was
> corrupt. It's possible that you're running into a similar problem.


The zipfile format is kind of brain dead, you can't tell where the end
of the file is supposed to be by looking at the header.  If the end of
file hasn't yet been reached there could be more data.  To make
matters worse, somehow zip files came to have text comments simply
appended to the end of them.  (Probably this was for the benefit of
people who would cat them to the terminal.)

Anyway, if you see something that doesn't adhere to the zipfile
format, you don't have any foolproof way to know if it's because the
file is corrupted or if it's just an appended comment.

Most zipfile readers use a heuristic to distinguish.  Python's zipfile
module just assumes it's corrupted.

The following post from a while back gives a solution that tries to
snip the comment off so that zipfile module can handle it.  It might
help you out.

http://groups.google.com/group/comp.lang.python/msg/c2008e48368c6543


Carl Banks



More information about the Python-list mailing list