zipfile module: problems with filename having non ascii characters

"Martin v. Löwis" martin at v.loewis.de
Sun Aug 22 04:39:23 EDT 2004


vincent_delft at yahoo.com wrote:
> That limitation is only valid for zip files ?

It appears that WinZip and other tools interpret the file names in a
zipfile in CP437. So to properly put non-ASCII file names into a
zipfile, you need to convert them into CP437. If the file name
contains a character which is not available in CP437, you cannot
save the file in a zipfile (without renaming it).

Not really a Unicode problem, but rather a problem that Unicode
tries to solve.

> Is there an another "compression tool" that don't have such limitation
> (tgz? , bz2? , ???à

tar, traditionally, is also unaware of character sets. Single Unix 3
(and I believe also earlier) ended the tar wars with the introduction
of the pax utility, which does allow for specification of a character
set in a pax file; among the supported character sets are ISO-8859-n,
and UTF-8.

Jörg Schilling's star(1) also uses UTF-8 for file names.

On the non-tar side of the world, WinRAR supports Unicode in archives.
For compatibility, they also put a non-Unicode name into the archive,
but the Unicode name, if present, is meant to take precedence.

Regards,
Martin



More information about the Python-list mailing list