Unicode and Zipfile problems

Peter Otten __peter__ at web.de
Wed Nov 5 16:23:03 EST 2003


Gerson Kurz wrote:

>>Of course, this codec does not work for your original problem: Just
>>see try it on your original data, and then see how Winzip
>>misinterprets the file names.
> 
> You are of course right (although my original problem was not with the
> filenames - I am using an english version of Windows - but with the
> header information).
> 
> But, if you look at the number of people that have run into problems
> with pythons strictness in unicode matters - isn't it time to offer
> some more "relaxed" way of handling all this? I mean, come on, Python

So you've never run into trouble with different encodings and editors not
aware of the encoding? For me this annoyance dates back to Windows 3.1 vs
DOS - RIP. I do remember a book on Apple Pascal that had all y and z
characters accidentally switched.

I'm not aware if there has been a discussion before, but I think it would be
worth the overhead if every string were aware of its encoding, so that
together with the -*- comment in the script you'd never again have to
explicitly go through the decode/encode routine *and* could avoid those
funny filenames - or two-character umlauts that accidentally made it into
your ISO-8859-1 files.

> is the language that will change the meaning of 7/3 because people had
> problems with integer division. And, although some people use Python
> for large multilanguage applications - I would bet that by far more
> people use Python for lots of small utility scripts, and couldn't care
> less about whether or not its "international". It just has to work, on
> my machine(s). Its not rocket science.

One byte per character encodings might or might not vanish, but unicode
*will* stay, so you better be aware of the pitfalls. 
Remember that Python is also the language of "explicit is better implicit".

Peter




More information about the Python-list mailing list