[Tutor] Most efficient way to replace ", " with "." in a array and/or dataframe

Eryk Sun eryksun at gmail.com
Sun Sep 22 05:40:57 EDT 2019


On 9/22/19, Albert-Jan Roskam <sjeik_appie at hotmail.com> wrote:
>
> Do you think it's a deliberate design choice that decimal and thousands
> where used here as params, and not a 'locale' param? It seems nice to be
> able to specify e.g. locale='dutch' and then all the right lc_numeric,
> lc_monetary, lc_time where used. Or even locale='nl_NL.1252' and you also
> wouldn't need 'encoding' as a separate param. Or might that be bad on
> windows where there's no locale-gen? Just wondering...

FYI, while Windows is distributed with many locales and also supports
custom locales (not something I've had to work with), at least based
on standard locale data, "nl_NL.1252" is not a valid locale for use
with C setlocale in Windows.

Classically, the C runtime in Windows supports "C" (but not "POSIX")
and locales based on a language name or its non-standard three-letter
abbreviation. A locale can also include a country/region name (full or
three-letter abbreviation), plus an optional codepage. If the latter
is omitted, it defaults to the ANSI codepage of the language, or of
the system locale if the language has no ANSI codepage .

Examples:

    >>> locale.setlocale(0, 'dutch')
    'Dutch_Netherlands.1252'
    >>> locale.setlocale(0, 'nld')
    'Dutch_Netherlands.1252'
    >>> locale.setlocale(0, 'nld_NLD.850')
    'Dutch_Netherlands.850'

There are also a few compatibility locales such as "american" and "canadian":

    >>> locale.setlocale(0, 'american')
    'English_United States.1252'
    >>> locale.setlocale(0, 'canadian')
    'English_Canada.1252'

Classically, the Windows API represents locales not as language/region
strings but as numeric locale identifiers (LCIDs). However, back in
2006, Windows Vista introduced locale names, plus a new set of
functions that use locale names instead of LCIDs (e.g.
GetLocaleInfoEx). Locale names are based on BCP-47 language tags,
which include at least an ISO 639 language code. They can also include
an optional ISO 15924 script code (e.g. "Latn" or "Cyrl") and an
optional ISO 3166-1 region code. Strictly speaking, the codes in a
BCP-47 language tag are delimited only by hyphens, but newer versions
of Windows in most cases also allow underscore.

The Universal CRT (used by Python 3.5+) supports Vista locale names.
Recently it also supports using underscore instead of hyphen, plus an
optional ".utf8" or ".utf-8" encoding. Only UTF-8 can be specified for
a BCP-47 locale. If the encoding is not specified, BCP-47 locales use
the language's ANSI codepage, or UTF-8 if the language has no ANSI
codepage (e.g. "hi_IN").

Examples:

    >>> locale.setlocale(0, 'nl_NL')
    'nl_NL'
    >>> locale.setlocale(0, 'nl_NL.utf8')
    'nl_NL.utf8'

Older versions of the Universal CRT do not support UTF-8 or underscore
in BCP-47 locale names, so make sure a system is updated if you need
these features.



More information about the Python-list mailing list