[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

Victor Stinner victor.stinner at haypocalc.com
Wed Jun 29 00:00:45 CEST 2011


> > I don't think that Windows developer even know that they are writing
> > files into the ANSI code page. MSDN documentation of
> > WideCharToMultiByte() warns developer that the ANSI code page is not
> > portable, even accross Windows computers:
> 
> Probably true. But for many uses they also don't care. If you're
> writing something solely for a one-off job on your own PC, the ANSI
> code page is fine, and provides interoperability with other programs
> on your PC, which is really what you care about. (UTF-8 without BOM
> displays incorrectly in Vim, wordpad, and powershell get-content.

I tried to open a text file encoded to UTF-8 (without BOM) on Windows
Seven.

The default application displays it correctly, it's the well known
builtin notepad program.

gvim is unable to detect the encoding, it reads the file using the ANSI
code page (WTF? UTF-8 is correctly detected on Linux!?). 

Wordpad reads the file using the ANSI code page, it is unable to detect
the UTF-8 encoding.

The "type" command in a MS-Dos shell (cmd.exe) dosen't display the UTF-8
correctly, but a file encoded to ANSI code is also display incorrectly.
I suppose that the problem is that the terminal uses the OEM code page,
not the ANSI code page.

Visual C++ 2008 detects the UTF-8 encoding.

I don't have other applications to test on my Windows Seven. I agree
that UTF-8 is not well supported by "standard" Windows applications. I
would at least expect that Wordpad and gvim are able to detect the UTF-8
encoding.

> MBCS works fine in all of these. It also displays incorrectly in CMD type,
> but in a less familiar form than the incorrect display mbcs produces,
> for what that's worth...)

True, the encoding of a text file encoded to the ANSI code page is
correctly detected by all applications (except "type" in a shell, it
should be the OEM/ANSI code page conflict).

> IMHO, you missed another option - open() does not need improving, the
> current behaviour is better than any of the 3 options noted.

My original need is to detect that my program will behave differently on
Linux and Windows, because open() uses the implicit locale encoding.
Antoine suggested me to monkeypatch __builtins__.open to do that.

Victor



More information about the Python-Dev mailing list