[Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Paul Moore p.f.moore at gmail.com
Mon Sep 5 05:10:01 EDT 2016


On 5 September 2016 at 06:54, Steve Dower <steve.dower at python.org> wrote:
> +Using the raw object with small buffers
> +---------------------------------------
> +
> +Code that uses the raw IO object and attempts to read less than four characters
> +will now receive an error. Because it's possible that any single character may
> +require up to four bytes when represented in utf-8, requests must fail.

I'm very concerned about this statement. It's clearly not true that
the request *must* fail, as reading 1 byte from a UTF-8 enabled Linux
console stream currently works (at least I believe it does). And there
is code in the wild that works by doing a test that "there's input
available" (using kbhit on Windows and select on Unix) and then doing
read(1) to ensure a non-blocking read (the pyinvoke code I referenced
earlier). If we're going to break this behaviour, I'd argue that we
need to provide a working alternative.

At a minimum, can the PEP include a recommended cross-platform means
of implementing a non-blocking read from standard input, to replace
the current approach? (If the recommendation is to read a larger
4-byte buffer and manage the process of retaining unused bytes
yourself, then that's quite a major change to at least the code I'm
thinking of in invoke, and I'm not sure read(4) guarantees that it
*won't* block if only 1 byte is available without blocking...)

Paul


More information about the Python-Dev mailing list