[issue43552] Add locale.get_locale_encoding() and locale.get_current_locale_encoding()

Eryk Sun report at bugs.python.org
Fri Mar 19 07:35:25 EDT 2021


Eryk Sun <eryksun at gmail.com> added the comment:

> Read the ANSI code page on Windows,

I don't see why the Windows implementation is inconsistent with POSIX here. If it were changed to be consistent, the default encoding at startup would remain the same, since setlocale(LC_CTYPE, "") uses the process code page from GetACP(). In many if not most cases, no one would be the wiser. But it seems to me that if a script calls setlocale(LC_CTYPE, "el_GR"), then it clearly wants to encode Greek text (code page 1253). open() with encoding passed as None or "locale" should respect this. Similarly if it calls setlocale(LC_CTYPE, ".UTF-8"), then it wants the default locale (language/region), but with UTF-8 encoding.

The following is a snippet to get the current locale encoding with ucrt in Windows:

    #include <locale.h>

    int cp = 0;
    __crt_locale_data_public *locale_data;

    _locale_t locale = _get_current_locale();
    if (locale) {
        locale_data = (__crt_locale_data_public *)locale->locinfo;
        cp = locale_data->_locale_lc_codepage;
       _free_locale(locale);
    }

    if (cp == 0) {
    /* "C" locale. The CRT in effect uses Latin-1 (cp28591), but 
       Windows Python prefers the process code page. */
        cp = GetACP();
    }

With ucrt, the C runtime was changed to hide most of the locale definition that was previously public, but it intentionally defines __crt_locale_data_public, so I'm assuming it's there for programs to use. That said, the fact that we have to cast locinfo seems suspect to me. Steve Dower could maybe check with the ucrt devs to ensure that this is supported. 

There's also ___lc_codepage() to get the same value more simply, and also more efficiently since the current locale data doesn't have to be copied and freed. However, it's documented as internal and could be removed (unlikely as that is).

----------
nosy: +eryksun

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43552>
_______________________________________


More information about the Python-bugs-list mailing list