[Python-Dev] PEP 538: Coercing the legacy C locale to a UTF-8 based locale

INADA Naoki songofacandy at gmail.com
Mon Mar 13 08:22:32 EDT 2017


On Mon, Mar 13, 2017 at 8:01 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 13 March 2017 at 18:37, INADA Naoki <songofacandy at gmail.com> wrote:
>>
>> But locale coercing works nice on platforms like android.
>> So how about simplified version of PEP 538?  Just adding configure
>> option for locale coercing
>> which is disabled by default.  No envvar options and no warnings.
>
>
> That doesn't solve my original Linux distro problem, where locale
> misconfiguration problems show up as "Python 2 works, Python 3 doesn't work"
> behaviour and bug reports.

Sorry, I meant "PEP 540 + Simplified PEP 538 (coercing by configure option)".
distros can enable the configure option, off course.


>
> The problem is that where Python 2 was largely locale-independent by default
> (just passing raw bytes through) such that you'd only get immediate encoding
> or decoding errors if you had a Unicode literal or a decode() call somewhere
> in your code and would otherwise pass data corruption problems further down
> the chain, Python 3 is locale-*aware* by default, and eagerly decodes:
>
> - command line parameters
> - environment variables
> - responses from operating system API calls
> - standard stream input
> - file contents
>
> You *can* still write locale-independent Python 3 applications, but they
> involve sprinkling liberal doses of "b" prefixes and suffixes and mode
> settings and "surrogateescape" error handler declarations in various places
> - you can't just run python-modernize over a pre-existing Python 2
> application and expect it to behave the same way in the C locale as it did
> before.
>
> Once implemented, PEP 540 will partially solve the problem by introducing a
> locale independent UTF-8 mode, but that still leaves the inconsistency with
> other locale-aware components that are needing to deal with Python 3 API
> calls that accept or return Unicode objects where Python 2 allowed the use
> of 8-bit strings.

I feel problems PEP 538 solves, but PEP 540 doesn't solve are relatively small
compared with complexity introduced PEP 538.  As my understanding, PEP 538
solves problems only when:

* python executable is used.  (GUI applications linking Python for
plugin is not affected)
* One of C.UTF-8, C.utf8 or UTF8 is accepted for LC_CTYPE.
* The "locale aware components" uses something other than ASCII or
UTF-8 on C locale,
   but uses UTF-8 on UTF-8 locale.

Can't we reduce options from 3 (2 configure, 1 envvar) when PEP 540 is
accepted too?


>
> Folks that really want the old behaviour back will be able to set
> PYTHONCOERCECLOCALE=0 (as that no longer emits any warnings), or else build
> their own CPython from source using `--without-c-locale-coercion` and
> ``--without-c-locale-warning`. However, they'll also get the explicit
> support notification from PEP 11 that any Unicode handling bugs they run
> into in those configurations are entirely their own problem - we won't fix
> them, because we consider those configurations unsupportable in the general
> case.
>
> That puts the additional self-support burden on folks doing something
> unusual (i.e. insisting on running an ASCII-only environment in 2017),
> rather than on those with a more conventional use case (i.e. running an up
> to date \*nix OS using UTF-8 or another universal encoding for both local
> and remote interfaces).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list