[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

Victor Stinner victor.stinner at haypocalc.com
Wed Jun 29 11:50:57 CEST 2011


Le mercredi 29 juin 2011 à 10:18 +0200, M.-A. Lemburg a écrit :
> Victor Stinner wrote:
> > Le mardi 28 juin 2011 à 16:02 +0200, M.-A. Lemburg a écrit :
> >> How about a more radical change: have open() in Py3 default to
> >> opening the file in binary mode, if no encoding is given (even
> >> if the mode doesn't include 'b') ?
> > 
> > I tried your suggested change: Python doesn't start.
> 
> No surprise there: it's an incompatible change, but one that undoes
> a wart introduced in the Py3 transition. Guessing encodings should
> be avoided whenever possible.

It means that all programs written for Python 3.0, 3.1, 3.2 will stop
working with the new 3.x version (let say 3.3). Users will have to
migrate from Python 2 to Python 3.2, and then migration from Python 3.2
to Python 3.3 :-(

I would prefer a ResourceWarning (emited if the encoding is not
specified), hidden by default: it doesn't break compatibility, and
-Werror gives exactly the same behaviour that you expect.

> This demonstrates that Python's stdlib is still not being explicit
> about the encoding issues. I suppose that things just happen to work
> because we mostly use ASCII files for configuration and setup.

I did more tests. I found some mistakes and sometimes the binary mode
can be used, but most function really expect the locale encoding (it is
the correct encoding to read-write files). I agree that it would be to
have an explicit encoding="locale", but make it mandatory is a little
bit rude.

> > Then I tried my suggestion (use "utf-8" by default): Python starts
> > correctly, I can build it (run "make") and... the full test suite pass
> > without any change. (I'm testing on Linux, my locale encoding is UTF-8.)
> 
> I bet it would also with "ascii" in most cases. Which then just
> means that the Python build process and test suite is not a good
> test case for choosing a default encoding.
> 
> Linux is also a poor test candidate for this, since most user setups
> will use UTF-8 as locale encoding. Windows, OTOH, uses all sorts of
> code page encodings (usually not UTF-8), so you are likely to hit
> the real problem cases a lot easier.

I also ran the test suite on my patched Python (open uses UTF-8 by
default) with ASCII locale encoding (LANG=C), the test suite does also
pass. Many tests uses non-ASCII characters, some of them are skipped if
the locale encoding is unable to encode the tested text.

Victor



More information about the Python-Dev mailing list