[Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8

Nick Coghlan ncoghlan at gmail.com
Tue Jan 3 04:43:29 EST 2017


On 3 January 2017 at 19:09, Felix Yan <felixonmars at archlinux.org> wrote:
>
> On 01/03/2017 04:43 PM, Nick Coghlan wrote:
> > On 3 January 2017 at 17:51, Felix Yan <felixonmars at archlinux.org
> >     IMHO it would be nice to have an option to disable the usage of C.UTF-8.
> >
> >
> > Yep, that's part of the PEP - if you set PYTHONALLOWCLOCALE, CPython 3.7
> > would still complain about it, but it wouldn't try to coerce the locale
> > to something else.
>
> Since we know that our glibc is not providing C.UTF-8, it would be
> better to auto-detect the availability of that locale, or simply make it
> a configure switch, instead of having to set PYTHONALLOWCLOCALE for
> every python process.

Fedora's glibc still doesn't provide it natively either. Instead, it's
provided as an on disk locale, and hence can be deleted if you really
want to do so: https://bugzilla.redhat.com/show_bug.cgi?id=902094#c18

If it isn't there, CPython 3.7 will still fall back to the C locale,
same as it does for any other missing locale, which will give the
following warning pair on stderr:

===========
Python detected LC_CTYPE=C, forcing LC_ALL & LANG to C.UTF-8 (set
PYTHONALLOWCLOCALE to disable this locale coercion behaviour).

Py_Initialize detected LC_CTYPE=C, which limits Unicode compatibility. Some
libraries and operating system interfaces may not work correctly. Set
`PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar environment
when running Python directly.
===========

(We could potentially give a custom warning in that case by checking
the value of the environment variables, but it would be a bit tricky
to write a test that could be exercised on platforms that *do* provide
C.UTF-8)

Distros that want to ship Python 3.7, but don't want their users to
see that warning would then need to do one of three things:

- ship a C.UTF-8 locale on disk, as initially Debian and now also
Fedora derived distros do
- contribute a fix to glibc that implements a C.UTF-8 locale *without*
the 1.5 MiB of support data (see
https://bugzilla.redhat.com/show_bug.cgi?id=902094#c14 )
- patch their system Python to remove the warning

Option 2 is the ideal long term result, but 99.999% of Linux users
aren't going to be able to tell the difference between that and
distros collectively opting for Option 1.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Linux-sig mailing list