[Python-Dev] Late Python 3.7.1 changes to fix the C locale coercion (PEP 538) implementation

Victor Stinner vstinner at redhat.com
Mon Sep 17 21:20:27 EDT 2018


Hi Unicode and locales lovers,

tl; dr Nick, Ned, INADA-san: I modified 3.7.1 to add a new "-X
coerce_c_locale=value" option and make sure that the C locale coercion
cannot be when Python in embedded: are you ok with these changes?


Before 3.7.0 release, during the implementation of the UTF-8 Mode (PEP
540), I changed two things in Nick Coghlan's implementation of the C
locale coercion (PEP 538):

(1) PYTHONCOERCECLOCALE environment variable is now ignored when -E or
-I command line option is used.

(2) When Python is embeded, the C locale coercion is now enabled if
the LC_CTYPE locale is "C".

Nick asked me to change the behavior:
https://bugs.python.org/issue34589

I just pushed this change in the 3.7 branch which adds a new "-X
coerce_c_locale=value" option:
https://github.com/python/cpython/commit/144f1e2c6f4a24bd288c045986842c65cc289684

Examples using Pyhon 3.7 (future 3.7.1) with UTF-8 Mode disabled, to
only test the C locale coercion:
---
$ cat test.py
import codecs, locale
enc = locale.getpreferredencoding()
enc = codecs.lookup(enc).name
print(enc)

$ export LC_ALL= LC_CTYPE=C LANG=

# Disable C locale coercion: get ASCII as expected
$ PYTHONCOERCECLOCALE=0 ./python -X utf8=0 test.py
ascii

# -E ignores PYTHONCOERCECLOCALE=0:
# C locale is coerced, we get UTF-8
$ PYTHONCOERCECLOCALE=0 ./python -E -X utf8=0 test.py
utf-8

# -X coerce_c_locale=0 is not affected by -E:
# C locale coercion disabled as expected, get ASCII as expected
$ ./python -E -X utf8=0 -X coerce_c_locale=0 test.py
ascii
---


For (1), Nick's use case is to get Python 3.6 behavior (C locale not
coerced) on Python 3.7 using PYTHONCOERCECLOCALE. Nick proposed to use
PYTHONCOERCECLOCALE even with -E or -I, but I dislike introducing a
special case for -E option.

I chose to add a new "-X coerce_c_locale=0" to Python 3.7.1 to provide
a solution for this use case. (Python 3.7.0 and older ignore this
option.)

Note: Python 3.7.0 is fine with PYTHONCOERCECLOCALE=0, we are only
talking about the special case of -E and -I options.


For (2), I modified Python 3.7.1 to make sure the C locale is never
coerced when the C API is used to embed Python inside an application:
Py_Initialize() and Py_Main(). The C locale can only be coerced by the
official Python program ("python3.7").

I don't know if it should be possible to enable C locale coercion when
Python is embedded. So I just made the change requested by Nick :-)


I dislike doing such late changes in 3.7.1, especially since PEP 538
has been designed by Nick Coghlan, and we disagree on the fix. But Ned
Deily, our Python 3.7 release manager, wants to see last 3.7 fixes
merged before Tuesday, so here we are.


Nick, Ned, INADA-san: are you ok with these changes?


The other choices for 3.7.1 are:

* Revert my change: C locale coercion can still be enabled when Python
is embedded, -E option ignores PYTHONCOERCECLOCALE env var.

* Revert my change and apply Nick's PR 9257: C locale coercion cannot
be enabled when Python is embedded and -E option doesn't ignore
PYTHONCOERCECLOCALE env var.


I spent months to fix the master branch to support all possible
locales and encodings, and get a consistent CLI:
https://vstinner.github.io/python3-locales-encodings.html

So I'm not excited by Nick's PR which IMHO moves Python backward,
especially it breaks the -E option contract: it doesn't ignore
PYTHONCOERCECLOCALE env var.

Victor


More information about the Python-Dev mailing list