[issue15602] zipfile: wrong encoding charset of member filename

Martin v. Löwis report at bugs.python.org
Thu Aug 9 10:47:49 CEST 2012


Martin v. Löwis added the comment:

You are mistaken: there *is* a character set specification for file names in zip files, see

http://www.pkware.com/documents/casestudies/APPNOTE.TXT

Appendix D says

"The ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437.  This limits storing file name characters to only those within the original MS-DOS range of values and does not properly support file names in other character encodings, or languages."

Using bytes objects for file names is not acceptable; in Python 3, file names are (unicode) strings.

Adding a new parameter is an option, and already discussed in issue 10614 .

People using non-437 code sets should really start using UTF-8 encoded file names in the zip files, and set the general purpose bit 11.

Closing this report as a duplicate.

----------
nosy: +loewis
resolution:  -> duplicate
status: open -> closed
superseder:  -> ZipFile: add a filename_encoding argument

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15602>
_______________________________________


More information about the Python-bugs-list mailing list