[Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Martin Panter vadmium+py at gmail.com
Sat Sep 3 10:01:49 EDT 2016


On 1 September 2016 at 23:28, Random832 <random832 at fastmail.com> wrote:
> On Thu, Sep 1, 2016, at 18:28, Steve Dower wrote:
>> This is a raw (bytes) IO class that requires text to be passed encoded
>> with utf-8, which will be decoded to utf-16-le and passed to the Windows APIs.
>> Similarly, bytes read from the class will be provided by the operating
>> system as utf-16-le and converted into utf-8 when returned to Python.
>
> What happens if a character is broken across a buffer boundary? e.g. if
> someone tries to read or write one byte at a time (you can't do a
> partial read of zero bytes, there's no way to distinguish that from an
> EOF.)
>
> Is there going to be a higher-level text I/O class that bypasses the
> UTF-8 encoding step when the underlying bytes stream is a console? What
> if we did that but left the encoding as mbcs? I.e. the console is text
> stream that can magically handle characters that aren't representable in
> its encoding. Note that if anything does os.read/write to the console's
> file descriptors, they're gonna get MBCS and there's nothing we can do
> about it.

Maybe it is too complicated and impractical, but I have imagined that
the sys.stdin/stdout/stderr could be custom TextIOBase objects. They
would not be wrappers or do much encoding (other than maybe newline
encoding). To solve the compatibility problems with code that uses
stdout.buffer or whatever, you could add a custom “buffer” object,
something like my AsciiBufferMixin class:
https://gist.github.com/vadmium/d1b07d771fbf4347683c005c40991c02

Just putting this idea out there, but maybe Steve’s UTF-8 encoding
solution is good enough.


More information about the Python-Dev mailing list