[Python-Dev] zipfile and unicode filenames

Alexey Borzenkov snaury at gmail.com
Sat Jun 9 22:23:20 CEST 2007


Hi everyone,

Today I've stumbled upon a bug in my program that wasn't very
straightforward to understand. The problem is that I was passing
unicode filenames to zipfile.ZipFile.write and I had
sys.setdefaultencoding() in effect, which resulted in a situation
where most of the bytes generated in zipfile.ZipInfo.FileHeader would
pass thru, except for a few, which caused codec error on another
machine (where filenames got infectiously upgraded to unicode). The
problem here is that it was absolutely unclear at first that I get
unicode filenames passed to write, and it incorrectly accepted them
silently. Is it worth to submit a bug report on this? The desired
behavior here would be to either a) disallow unicode strings as
arcname are raise an exception (since it is used in concatenation with
raw data it is likely to cause problems because of auto upgrading raw
data to unicode), or b) silently encode unicode strings to raw strings
(something like if isinstance(filename, unicode): filename =
filename.encode() in zipfile.ZipInfo constructor).

So, should I submit a bug report, and which behavior would be actually correct?


More information about the Python-Dev mailing list