[issue42707] Python uses ANSI CP for stdio on Windows console instead of using console or OEM CP
Eryk Sun
report at bugs.python.org
Tue Dec 22 19:50:37 EST 2020
Eryk Sun <eryksun at gmail.com> added the comment:
> How about treating only UTF-8 and leave legacy environment as-is?
> * When GetConsoleCP() returns CP_UTF8, use UTF-8 for stdin.
> Otherwise, use ANSI.
Okay, and also when GetConsoleCP() fails because there's no console (e.g. python.exe w/ DETACHED_PROCESS creation flag, or pythonw.exe).
However, using UTF-8 for the input code page is currently broken in many cases, so it should not be promoted as a recommended solution until Microsoft fixes their broken code (which should have been fixed 20 years ago; it's ridiculous). Legacy console applications rely on ReadFile and ReadConsoleA. Setting the input code page to UTF-8 is limited to reading 7-bit ASCII (ordinals 0-127). Other characters get converted to null bytes. For example:
>>> kernel32.SetConsoleCP(65001)
1
>>> os.read(0, 10)
ab¡¢£¤cd
b'ab\x00\x00\x00\x00cd\r\n'
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42707>
_______________________________________
More information about the Python-bugs-list
mailing list