Another Windows zipfile encoding problem, with patch

Martin v. Löwis martin at v.loewis.de
Mon May 5 01:31:49 EDT 2003


Vepxistqaosani <vepxistqaosani at netscape.net> writes:

> I created it in June 2000 using some command-line version of pkzip
> (I've always been allergic to WinZip); almost certainly v. 2.50.

That indicates a bug in pkzip to me. I couldn't reproduce the same
problem in Winzip. I don't think zipfile.py should work around this
bug.

> The error message zipfile.py gives is
> 'File name in directory "SendTo/3½ Floppy (A).lnk" and header
> "SendTo/3+ Floppy (A).lnk" differ.'

And what byte is ½?

> The former string is data.filename; the latter, fname -- and '+' is a
> DOS high-bit character, hex BD (a single horizontal bottom rule
> abutting a double vertical right rule).

It seem Winzip has taken the position that file names in a zipfile are
always cp850 or cp437 encoded (which of these, I don't know). It then
seems that pkzip uses cp850 in one place, and cp1252 in the other
place. zipfile.py would use cp1252 (*) in both places.

Of these three alternatives, the WinZip and zipfile.py approach are
atleast consistent; the pkzip approach is clearly bogus. Neither
approach scales beyond Latin, so I'm unwilling to modify zipfile.py to
follow the limited view of some commercial product; using the limited
view of free software is not necessarily better, but isn't worse,
either.

I've tried to contact Winzip.com, to find out what their official
position on this matter is, but no luck so far. If they publish a
strategy to support all of Unicode in Winzip, I'll happily implement
it (or accept patches).

Regards,
Martin

(*) Actually, it uses the platform native encoding. So it supports
other code pages as well, but you can't exchange zipfiles across
Windows installations with different code pages correctly.




More information about the Python-list mailing list