[Python-Dev] PEP 528: Change Windows console encoding to UTF-8

Martin Panter vadmium+py at gmail.com
Mon Sep 5 05:37:36 EDT 2016


On 5 September 2016 at 09:10, Paul Moore <p.f.moore at gmail.com> wrote:
> On 5 September 2016 at 06:54, Steve Dower <steve.dower at python.org> wrote:
>> +Using the raw object with small buffers
>> +---------------------------------------
>> +
>> +Code that uses the raw IO object and attempts to read less than four characters
>> +will now receive an error. Because it's possible that any single character may
>> +require up to four bytes when represented in utf-8, requests must fail.
>
> I'm very concerned about this statement. It's clearly not true that
> the request *must* fail, as reading 1 byte from a UTF-8 enabled Linux
> console stream currently works (at least I believe it does). And there
> is code in the wild that works by doing a test that "there's input
> available" (using kbhit on Windows and select on Unix) and then doing
> read(1) to ensure a non-blocking read (the pyinvoke code I referenced
> earlier). If we're going to break this behaviour, I'd argue that we
> need to provide a working alternative.
>
> At a minimum, can the PEP include a recommended cross-platform means
> of implementing a non-blocking read from standard input, to replace
> the current approach? (If the recommendation is to read a larger
> 4-byte buffer and manage the process of retaining unused bytes
> yourself, then that's quite a major change to at least the code I'm
> thinking of in invoke, and I'm not sure read(4) guarantees that it
> *won't* block if only 1 byte is available without blocking...)

FWIW, on Linux and Unix in general, if select() or similar indicates
that some read data is available, calling raw read() with any buffer
size should return at least one byte, whatever is available, without
blocking. If the user has only typed one byte, read(4) would return
that one byte immediately.

But if you are using a BufferedReader (stdin.buffer rather than
stdin.buffer.raw), then this guarantee is off and read(4) will block
until it gets 4 bytes, or until EOF.


More information about the Python-Dev mailing list