[issue36204] Deprecate calling Py_Main() after Py_Initialize()? Add Py_InitializeFromArgv()?

STINNER Victor report at bugs.python.org
Wed Mar 6 11:27:44 EST 2019


STINNER Victor <vstinner at redhat.com> added the comment:

> RE making UnixMain public, I'd rather the core runtime require a known encoding, rather than trying to detect it. We should move the call into the detection logic into Programs/python.c so that embedders have to opt-in to detection (many embedding scenarios will prefer to do their own encoding).

Unix is a very complex beast and Python makes it worse by adding more options (PEP 538 and PEP 540). Py_UnixMain() works "as expected": it uses the LC_CTYPE locale encoding.

If you want to force the usage of UTF-8, you can opt-in for UTF-8 mode: call putenv("PYTHONUTF8=1") before Py_UnixMain() for example.

You cannot pass an encoding to Py_UnixMain() because the implementation of Python heavily rely on the LC_CTYPE locale: see Py_DecodeLocale() and Py_EncodeLocale() functions. Anyway, Python must use the locale encoding to avoid mojibake. Python must use the codec from the C library: mbstowcs() and wcstombs() to be able to load its own codecs. Python has a few codecs implemented in C like ASCII, UTF-8 and Latin1, but locales are way more diverse than that. For example, ISO-8859-15 is used for "euro" locale variants. Example:

$ LANG=fr_FR.iso885915 at euro python3 -c 'import sys; print(sys.getfilesystemencoding())'
iso8859-15

Python has a ISO-8859-15 codec, but it's implemented in pure Python. Python uses importlib to laod the codec, but how does Python decodes and encodes filenames to import Lib/encodings/iso8859_15.py? That's why mbstowcs()/wcstombs() and Py_DecodeLocale()/Py_EncodeLocale() come into the game :-) Enjoy:

PyObject*
PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
{
    PyInterpreterState *interp = _PyInterpreterState_GET_UNSAFE();
    const _PyCoreConfig *config = &interp->core_config;
#if defined(__APPLE__)
    return PyUnicode_DecodeUTF8Stateful(s, size, config->filesystem_errors, NULL);
#else
    /* Bootstrap check: if the filesystem codec is implemented in Python, we
       cannot use it to encode and decode filenames before it is loaded. Load
       the Python codec requires to encode at least its own filename. Use the C
       implementation of the locale codec until the codec registry is
       initialized and the Python codec is loaded. See initfsencoding(). */
    if (interp->fscodec_initialized) {
        return PyUnicode_Decode(s, size,
                                config->filesystem_encoding,
                                config->filesystem_errors);
    }
    else {
        return unicode_decode_locale(s, size,
                                     config->filesystem_errors, 0);
    }
#endif
}

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36204>
_______________________________________


More information about the Python-bugs-list mailing list