zipfile [module and file format, both] stupidly broken

Sat May 19 13:00:01 EDT 2007

Larry Bates <larry.bates at websafe.com> bristled:
> Are you serious? A zipfile with a comment > 4Kbytes.  I've never encountered
> such a beast.

If I hadn't run into one I would never have had a clue that Python's
zipfile module had this silly bug.

> As with any open source product it is much better to roll up your sleeves
> and pitch in to fix a problem than to rail about "how it is stupidly
> broken".  You are welcome to submit a patch or at the very least a good
> description of the problem and possible solutions.  If you have gotten a
> lot of value out of Python, you might consider this "giving back".  You
> haven't paid anything for the value it has provided.

Ah yes, the old "well, if you found it you should fix it" meme -
another reason I found it pretty easy to stop reading this group.  It's
as stupid a position as it ever was (and FWIW I don't believe I've ever
seen any of the real Python developers mouth this crap).

Now, I have learned somewhat more than I knew (or ever wanted to know)
about zipfiles since I smacked headfirst into this bug, and I've
changed the subject line to reflect my current understanding.  :-/  Back
then it had already occurred to me that *just* changing the size of the
step back seemed an incomplete fix: after all, that leaves you scanning
through random binary glop looking for the signature.  With the
signature being four bytes, okay, it will *nearly* always work (just as
the exisiting 4K scan does), but... well, from what I've read in the
format specs that's about as good as it gets.  The alternative, some
sort of backwards scan, would avoid the binary glop but has much the
same problem, in principle, with finding the signature embedded in the
archive comment.  Even worse, arguably, since that comment is
apparently entirely up to the archive creator, so if there's a way to
use a fake central directory for nefarious purposes, that would make it
trivial to do.  Which is the point where I decided that the file format
itself is broken...  (oh, and then I came across something from the
info-zip crew that said much the same thing, though they didn't mention
this particular design, uhm, shortcoming.)

So I guess that perhaps the stupidly obvious fix:

-     END_BLOCK = min(filesize, 1024 * 4)
+     END_BLOCK = min(filesize, 1024 * 64 + 22)

is after all about the best that can be done.  (the lack of the
size-of-End-Of-Central-Directory-record in the existing code isn't a
separate bug, but if we're going to pretend we accomodate all valid
zipfiles it wouldn't do to overlook it)

So now you may imagine that your rudeness has had the result you
intended after all, and I guess it has, though at a cost - well, you
probably never cared what I thought about you anyway.

BTW, thanks for the pointer someone else gave to the proper place for
posting bugs.  I'd had the silly idea that I would be able to find that
easily at www.python.org, but if I had then I'd not have posted here
and had so much fun.

-- 
The most effective way to get information from usenet is not to ask
a question; it is to post incorrect information.  -- Aahz's Law

Apparently denigrating the bug reporter can sometimes result in a
patch, too, but I don't think that's in the same spirit.