[Python-Dev] EuroPython Language Summit report

Victor Stinner victor.stinner at haypocalc.com
Fri Jun 24 13:18:44 CEST 2011


Le vendredi 24 juin 2011 à 10:52 +0200, Mark Dickinson a écrit :
>   - [Armin Ronacher] Python 3's Unicode support still has some dark areas.

What? Unicode support is perfect in Python 3!

>   One  example: when opening a text file for reading and writing, the default
>     encoding used depends on the platform and on various environment variables.

... oh, I agree. This choice is a big portability issue. Mac OS X, most
Linux distro, BSD systems use UTF-8 local encoding, whereas Windows use
legacy code pages like cp1252 (something like ISO-8859-1) or cp952
(shift jis). But sometimes, the locale is "C" (e.g. on our buildbots)
and programs start to fail with Unicode errors...

I see two options to improve the situation.


(1) hard way: change open() API to make encoding a mandatory argument.
Problem: it breaks compatibility with Python 3.0, 3.1 and 3.2 (ooops!);
the encoding argument is the 4th argument, you have to use a keyword or
choose a value for the buffering argument. I proposed to change open()
API in Python 3.1 to make all arguments -except the filename and the
mode- keyword-only arguments, but Guido rejected my idea:

"Remember, for 3.0 we're trying to get a release out of the door, not
cram in new features, no matter how small."

http://bugs.python.org/issue4121


(2) soft way: add a warning if the encoding is implicit (locale
encoding). I don't know what is the best warning type, and if it should
be always displayed, only once, or not by default. Even if it is hidden
by default, a careful developer will be able to use -Werror to fix
bugs... I suspect that most tests fail if open() raises an exception if
the encoding is not specified (e.g. see distutils/packaging issues about
the file encoding).

Victor



More information about the Python-Dev mailing list