[Python-ideas] Fix default encodings on Windows

Paul Moore p.f.moore at gmail.com
Wed Aug 10 14:44:02 EDT 2016


On 10 August 2016 at 19:10, Steve Dower <steve.dower at python.org> wrote:
> To summarise the proposals (remembering that these would only affect Python
> 3.6 on Windows):
>
> * change sys.getfilesystemencoding() to return 'utf-8'
> * automatically decode byte paths assuming they are utf-8
> * remove the deprecation warning on byte paths
> * make the default open() encoding check for a BOM or else use utf-8
> * [ALTERNATIVE] make the default open() encoding check for a BOM or else use
> sys.getpreferredencoding()
> * force the console encoding to UTF-8 on initialize and revert on finalize
>
> So what are your concerns? Suggestions?

I presume you'd be targeting 3.7 for this change. Broadly, I'm +1 on
all of this. Personally, I'm moving to UTF-8 everywhere, so it seems
OK to me, but I suspect defaulting open() to UTF-8 in the absence of a
BOM might cause issues for people. Most text editors still (AFAIK) use
the ANSI codepage by default, and it's the one place where an
identifying BOM isn't possible. So your alternative may be a safer
choice. On the other hand, files from Unix (via say github) would
typically be UTF-8 without BOM, so it becomes a question of choosing
the best compromise. I'm inclined to go for cross-platform and UTF-8
and clearly document the change. We might want a more convenient short
form for open(filename, "r", encoding=sys.getpreferredencoding()),
though, to ease the transition... We'd also need to consider how the
new default encoding would interact with PYTHONIOENCODING.

For the console, does this mean that the win_unicode_console module
will no longer be needed when these changes go in?

Sorry, not much in the way of direct experience or information I can
add, but a strong +1 on the change (and I'd be happy to help where
needed).

Paul


More information about the Python-ideas mailing list