[Python-ideas] Adding an 'errors' argument to print

eryk sun eryksun at gmail.com
Mon Mar 27 21:09:08 EDT 2017


On Mon, Mar 27, 2017 at 8:52 PM, Barry <barry at barrys-emacs.org> wrote:
> I took to using
>
>      chcp 65001
>
> This puts cmd.exe into unicode mode.

conhost.exe hosts the console, and chcp.com is a console app that
calls GetConsoleCP, SetConsoleCP and SetConsoleOutputCP to show or
modify the console's input and output codepages. It doesn't support
changing them separately.

cmd.exe is just another console client, no different from python.exe
or powershell.exe in this regard. Also, it's unrelated to how Python
uses the console, but for the record, cmd has used the console's
wide-character API since it was ported from OS/2 in the early 90s.

Back then the console was hosted using threads in the csrss.exe system
process, which made sense because the windowing system was hosted
there. When they moved most of the window manager to kernel mode in NT
4 (1996), the console was mostly left behind in csrss.exe. It wasn't
until Windows 7 that it found a new home in conhost.exe. In Windows 8
it got a real device driver instead of using fake file handles. In
Windows 10 it was updated to be less of a franken-window -- e.g. now
it has line-wrapped selection and text reflowing.

Using codepage 65001 (UTF-8) in a console app has a couple of annoying
bugs in the console itself, and another due to flushing of C FILE
streams. For example, reading text that has even a single non-ASCII
character will fail because conhost's encoding buffer is too small. It
handles the error by returning a read of 0 bytes. That's EOF, so
Python's REPL quits; input() raises EOFError; and stdin.read() returns
an empty string. Microsoft should fix this in Windows 10, and probably
will eventually. The Linux subsystem needs UTF-8, and it's silly that
the console doesn't allow entering non-ASCII text in Linux programs.

As was already recommended, I suggest using the wide-character API via
win_unicode_console in 2.7 and 3.5. In 3.6 we get the wide-character
API automatically thanks to Steve Dower's io._WindowsConsoleIO class.


More information about the Python-ideas mailing list