[Linux-SIG] PEP 538: Coercing the legacy C locale to C.UTF-8

Barry Warsaw barry at python.org
Tue Jan 3 10:56:16 EST 2017


Hi Nick, thanks for writing up this PEP.  I'm generally in favor of it,
although as you point out, it will probably have less direct effect on Debian
and Ubuntu, since we already have the C.UTF-8 locale.  I don't have the time
right now to test whether that still holds for various container and remote
environments, but I'll try to do some research there (and would actually be
surprised if that doesn't hold through the entire food chain).

A question and a suggestion.

On Jan 03, 2017, at 04:00 PM, Nick Coghlan wrote:

>* in Py_Initialize, emit a warning on stderr regarding limited Unicode
>compatibility if we detect that LC_CTYPE is set to the "C" locale

So just to be clear, you propose only to check for exactly the "C" locale?
For example, my default locale is en_US.UTF-8 which would not trigger the
warning.  I wouldn't want it to warn on any .UTF-8 locale since those should
be fine too.  (I.e. it's just C locale's implicit ASCII that's the problem.)

>* in Programs/python.c (i.e. the C level main() implementation), set LANG
>and LC_ALL in the environment to "C.UTF-8" if we detect that the locale is
>otherwise set to "C"
>* skip the coercion if PYTHONALLOWCLOCALE is set so developers running in
>recent system Python versions with this implemented can still debug
>problems that only show up in older Python 3.x releases, or in embedding
>applications that still use the C locale

I have nits to pick about the envar name and warning text.

I understand the desire to have a positive setting affect this, but it feels
more like PYTHONCOERCECLOCALE=0 would be a more descriptive name and setting.
That could be problematic because it doesn't allow any value;
i.e. PYTHONCOERCECLOCALE=1 wouldn't make sense to disable locale coercion.  I
think my unease about the name stems from potential misunderstandings about C
vs. C.UTF-8, but maybe I'm just worried about a non-problem.  Consider this a
challenge for a better envar name... or a bikeshed to ignore. :)

On to the warnings:

    When Py_Initialize is called and CPython detects that the configured
    locale is the default C locale, the following warning will be issued:

    Py_Initialize detected LC_CTYPE=C, which limits Unicode
    compatibility. Some libraries and operating system interfaces may not work
    correctly. Set `PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar
    environment when running Python directly.

I find this confusing on several fronts.  I think it might be better to say
"Embedded Python" rather than "Py_Initialize" since end users who are using an
application with Python embedded probably will have no idea what
"Py_Initialize" is, and they are the ones who will see this warning first.  It
also feels odd to provide instructions on how to reproduce this in `python`
cli from the embedded warning.  It also doesn't say that the locale is being
coerced.  What about:

    Embedded Python detected LC_CTYPE=C (a locale with default ASCII
    encoding), which may cause Unicode compatibility problems.  Coercing the
    locale to C.UTF-8.  Set the environment variable PYTHONALLOWCLOCALE=1 to
    prevent this coercion.

If C.UTF-8 isn't available, then the warning would read:

    Embedded Python detected LC_CTYPE=C (a locale with default ASCII
    encoding), which may cause some Unicode compatibility problems.  Coercion
    to C.UTF-8 locale is not possible.  Set the environment variable
    PYTHONALLOWCLOCALE=1 to suppress this warning.

I'd use the same text for `python` as cli except s/Embedded Python/Python/

I also think there should be a compile-time or run-time flag that embedders
could set so that they could explicitly disable the warning or coercion.
Something like ASCIILOCALEISFINEANDYESIKNOWWHATIAMDOINGSOSTFU=1

>* grant a priori permission to redistributors to backport this to older
>versions (as we'd like to include the change in the Fedora system Python
>for F26, which will be based on Python 3.6.0)

I think that's fine, but I doubt we'll need it for Debian and derivatives.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/linux-sig/attachments/20170103/db94bb8c/attachment.sig>


More information about the Linux-sig mailing list