[issue28080] Allow reading member names with bogus encodings in zipfile

Serhiy Storchaka report at bugs.python.org
Sun Mar 20 10:22:14 EDT 2022


Serhiy Storchaka <storchaka+cpython at gmail.com> added the comment:

I experimented with this a lot. There is a problem with the append mode. We can read in the append mode, therefore we need an encoding. But when we close a ZipFile after appending, non-ASCII file names will be encoded in UTF-8 in the central directory. Next time when we open the archive for reading with different encoding we will get an error because filenames in the central directory and in local headers are different. We need to write non-ASCII files back with the specified encoding to get a self-consistent data.

Finally I left this as it was initially. We can return to the problem with the append module later.

The differences between PR 32007 and your patches:

* The parameter was renamed to metadata_encoding to avoid confusion with existing parameter of ZipFile.open() encoding. In future I am going to use it also for comments. The attribute and the CLI option were renamed correspondingly.
* --metadata-encoding can also be used with the -t option.
* "surrogateescape" no longer used. If the encoding in not suitable, you will get an error. Use the default and decode filenames manually in such cases. We can change this in future.
* Updated documentation.
* Tests were significantly rewritten. Now they test the behavior with wrong metadata_encoding, mixed UTF-8 and legacy encodings, and reading after append.

I was going to make more changes, but left it for future.

----------
versions: +Python 3.11 -Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue28080>
_______________________________________


More information about the Python-bugs-list mailing list