[issue27179] subprocess uses wrong encoding on Windows

Fri Jun 3 21:49:20 EDT 2016

Steve Dower added the comment:

> so ANSI is the natural default for a detached process

To clarify - ANSI is the natural default *for programs that don't support Unicode*.

Unfortunately, since "Unicode" on Windows is an incompatible data type (wchar_t rather than char), targeting Unicode rather than a code page requires completely different API calls. This would make Python's implementation much more complicated, as well as breaking some scripts and existing packages. Forcing the use of UTF-8 as the code page is the easiest way for us to support it.

I think Eryk clearly proved that we can't reliably assume or infer the right encoding for a subprocess. (When you use the ANSI APIs to print to the console, the console converts to Unicode for rendering. If you use the Unicode APIs there is no conversion, and so any codepage can be used internally without affecting what is displayed to the user.)

In short: the best available fix is to expose encoding arguments in subprocess and to fix any calls within the stdlib that need to specify them. (When we decide to separate Python's API from the C Runtime API we can break file descriptors which will let us use Unicode APIs throughout, but that's a little way off.)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue27179>
_______________________________________