[issue24968] Python 3 raises Unicode errors with the xxx.UTF-8 locale

Nick Coghlan report at bugs.python.org
Tue Sep 1 02:05:15 CEST 2015


Nick Coghlan added the comment:

Looking again at the *specific* bug report here, I'm moving the resolution to "out of date", as it's actually the one we addressed in 3.5 by enabling surrogateescape by default on all of the standard streams when the OS claims the locale encoding is ASCII, not just stderr: http://bugs.python.org/issue19977

That allows us to at least correctly roundtrip data, even if the OS has given has bad encoding settings.

The problem with forcing UTF-8 more generally when the OS claims ASCII is that it may be the wrong thing to do and result in data corruption, especially on systems using East Asian codecs. Querying /etc/locale.conf [1] instead of relying on the nominal glibc locale settings should reliably give us correct encoding/locale information on modern Linux systems in cases like this one, where SSH has forwarded mismatched locale settings from a client system to a server shell session.

Another issue with relevant background discussion is issue #23993, which speculated on extending the "default to surrogateescape" idea to all open() calls when glibc claims the locale encoding is ASCII.

[1] http://www.freedesktop.org/software/systemd/man/locale.conf.html

----------
resolution: not a bug -> out of date

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24968>
_______________________________________


More information about the Python-bugs-list mailing list