[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale
Nick Coghlan
ncoghlan at gmail.com
Mon Mar 13 07:01:33 EDT 2017
On 13 March 2017 at 18:37, INADA Naoki <songofacandy at gmail.com> wrote:
> But locale coercing works nice on platforms like android.
> So how about simplified version of PEP 538? Just adding configure
> option for locale coercing
> which is disabled by default. No envvar options and no warnings.
>
That doesn't solve my original Linux distro problem, where locale
misconfiguration problems show up as "Python 2 works, Python 3 doesn't
work" behaviour and bug reports.
The problem is that where Python 2 was largely locale-independent by
default (just passing raw bytes through) such that you'd only get immediate
encoding or decoding errors if you had a Unicode literal or a decode() call
somewhere in your code and would otherwise pass data corruption problems
further down the chain, Python 3 is locale-*aware* by default, and eagerly
decodes:
- command line parameters
- environment variables
- responses from operating system API calls
- standard stream input
- file contents
You *can* still write locale-independent Python 3 applications, but they
involve sprinkling liberal doses of "b" prefixes and suffixes and mode
settings and "surrogateescape" error handler declarations in various places
- you can't just run python-modernize over a pre-existing Python 2
application and expect it to behave the same way in the C locale as it did
before.
Once implemented, PEP 540 will partially solve the problem by introducing a
locale independent UTF-8 mode, but that still leaves the inconsistency with
other locale-aware components that are needing to deal with Python 3 API
calls that accept or return Unicode objects where Python 2 allowed the use
of 8-bit strings.
Folks that really want the old behaviour back will be able to set
PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build
their own CPython from source using `--without-c-locale-coercion` and
``--without-c-locale-warning`. However, they'll also get the explicit
support notification from PEP 11 that any Unicode handling bugs they run
into in those configurations are entirely their own problem - we won't fix
them, because we consider those configurations unsupportable in the general
case.
That puts the additional self-support burden on folks doing something
unusual (i.e. insisting on running an ASCII-only environment in 2017),
rather than on those with a more conventional use case (i.e. running an up
to date \*nix OS using UTF-8 or another universal encoding for both local
and remote interfaces).
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170313/4e7f7064/attachment.html>
More information about the Python-Dev
mailing list