[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

Nick Coghlan ncoghlan at gmail.com
Mon Mar 13 07:01:33 EDT 2017


On 13 March 2017 at 18:37, INADA Naoki <songofacandy at gmail.com> wrote:

> But locale coercing works nice on platforms like android.
> So how about simplified version of PEP 538?  Just adding configure
> option for locale coercing
> which is disabled by default.  No envvar options and no warnings.
>

That doesn't solve my original Linux distro problem, where locale
misconfiguration problems show up as "Python 2 works, Python 3 doesn't
work" behaviour and bug reports.

The problem is that where Python 2 was largely locale-independent by
default (just passing raw bytes through) such that you'd only get immediate
encoding or decoding errors if you had a Unicode literal or a decode() call
somewhere in your code and would otherwise pass data corruption problems
further down the chain, Python 3 is locale-*aware* by default, and eagerly
decodes:

- command line parameters
- environment variables
- responses from operating system API calls
- standard stream input
- file contents

You *can* still write locale-independent Python 3 applications, but they
involve sprinkling liberal doses of "b" prefixes and suffixes and mode
settings and "surrogateescape" error handler declarations in various places
- you can't just run python-modernize over a pre-existing Python 2
application and expect it to behave the same way in the C locale as it did
before.

Once implemented, PEP 540 will partially solve the problem by introducing a
locale independent UTF-8 mode, but that still leaves the inconsistency with
other locale-aware components that are needing to deal with Python 3 API
calls that accept or return Unicode objects where Python 2 allowed the use
of 8-bit strings.

Folks that really want the old behaviour back will be able to set
PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build
their own CPython from source using `--without-c-locale-coercion` and
``--without-c-locale-warning`. However, they'll also get the explicit
support notification from PEP 11 that any Unicode handling bugs they run
into in those configurations are entirely their own problem - we won't fix
them, because we consider those configurations unsupportable in the general
case.

That puts the additional self-support burden on folks doing something
unusual (i.e. insisting on running an ASCII-only environment in 2017),
rather than on those with a more conventional use case (i.e. running an up
to date \*nix OS using UTF-8 or another universal encoding for both local
and remote interfaces).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170313/4e7f7064/attachment.html>


More information about the Python-Dev mailing list