[Python-ideas] Fix default encodings on Windows

Adam Bartoš drekin at gmail.com
Thu Aug 11 14:34:05 EDT 2016


>
> On 11 August 2016 at 04:10, Steve Dower <steve.dower at python.org <https://mail.python.org/mailman/listinfo/python-ideas>> wrote:
> >>* I suspect there's a lot of discussion to be had around this topic, so I want to get it started. There are some fairly drastic ideas here and I need help figuring out whether the impact outweighs the value.
> *
> My main reaction would be that if Drekin (Adam Bartoš) agrees the
> changes natively solve the problems thathttps://pypi.python.org/pypi/win_unicode_console works around, it's
> probably a good idea.
>
> The status quo is also sufficiently broken from both a native Windows
> perspective and a cross-platform compatibility perspective that your
> proposals are highly unlikely to make things *worse* :)
>
> Cheers,
> Nick.
>
>
The main idea of win_unicode_console is simple: to use WinAPI functions
ReadConsoleW and WriteConsoleW to communicate with the interactive console
on Windows and to wrap this in standard Python IO hierarchy – that's why
sys.std*.encoding would be 'utf-16-le': it corresponds to widechar strings
used by Windows wide APIs. But this is only about sys.std*.encoding, which
I think is not so imporant. AFAIK sys.std*.encoding should be used only
when you want to communicate in bytes (which I think is not a good idea),
so it tells you, which encoding is sys.std*.buffer assuming. In fact
sys.std* may even not have the buffer attribute, so its encoding attribute
would be useless in that case.

Unfortunatelly, sys.std*.encoding is used in some other places – namely by
the consumers of the old PyOS_Readline API (the tokenizer and input) use it
to decode the bytes returned. Actually, the consumers assume differente
encodings (sys.stdin.encoding vs. sys.stdout.encoding), so it is impossible
to write a correct readline hook when the encodings are not the same. So I
think it would be nice to have Python and string-based implementation of
readline hooks – sys.readlinehook attribute, which would use sys.std* by
default on Windows and GNU readline on Unix.

Nevertheless, I think it is a good idea to have more 'utf-8' defaults (or
'utf-8-readsig' for open()). I don't know whether it helps with the console
issue to open the standard streams in 'utf-8'.

Adam Bartoš
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160811/16f98e7d/attachment.html>


More information about the Python-ideas mailing list