Unicode File Names

John Machin sjmachin at lexicon.net
Fri Oct 17 05:02:33 EDT 2008


On Oct 17, 6:32 pm, "Martin v. Lo"wis" <mar... at v.loewis.de> wrote:
> > Step 4: Either wait for Python 2.7 or apply the patch to your own copy
> > of zipfile ...
>
> Actually, this is released in Python 2.6, see r62724.

Hi Martin,

That's good. I was lead astray by the fact that the 2.6 docs still
contain the note that the OP asked about: "There is no official file
name encoding for ZIP files. If you have unicode file names, you must
convert them to byte strings in your desired encoding before passing
them to write(). WinZip interprets all file names as encoded in CP437,
also known as DOS Latin."

The first sentence was and is bafflegab, the second didn't mention the
portability issues arising from its suggestion (and is now not true),
and the third needs explanation or omission. I believe that WinZip has
supported utf8 since v11.2.

Should the note be removed, or should it say something like "Unicode
file names are supported. New in Python 2.6."? Is there anything else
that should be mentioned?

More on cp437: I see where you mentioned to the patch author that a
unicode string should be encoded in cp437 if possible, but this was
not done -- it first tries ascii. What are your views on what encoding
should be assumed if the utf8 flag is not set?

Cheers,
John



More information about the Python-list mailing list