[issue19534] normalize() in locale.py fails for sr_RS.UTF-8 at latin

Sat Nov 9 09:15:40 CET 2013

Mike FABIAN added the comment:

in locale.py, the comment above “locale_alias = {” says:

# Note that the normalize() function which uses this tables
# removes '_' and '-' characters from the encoding part of the
# locale name before doing the lookup. This saves a lot of
# space in the table.

But in normalize(), this is actually not done:

    # First lookup: fullname (possibly with encoding)
    norm_encoding = encoding.replace('-', '')
    norm_encoding = norm_encoding.replace('_', '')
    lookup_name = langname + '.' + encoding
    code = locale_alias.get(lookup_name, None)

“norm_encoding” holds the locale name with these replacements,
but then it is not used in the lookup.

The patch in http://bugs.python.org/msg202469
fixes that, using the norm_encoding together with adding the alias

+    'sr_rs.utf8 at latin':                      'sr_RS.UTF-8 at latin',

makes it work for sr_RS.UTF-8 at latin, my test program then outputs:

mfabian at ari:~
$ python2 ~/tmp/mike-test.py
ja_JP.UTF-8 -> ja_JP.UTF-8
de_DE.SJIS -> de_DE.SJIS
de_DE.foobar -> de_DE.foobar
sr_RS.UTF-8 at latin -> sr_RS.UTF-8 at latin
sr_rs at latin -> sr_RS.UTF-8 at latin
sr at latin -> sr_RS.UTF-8 at latin
sr_yu -> sr_RS.UTF-8 at latin
sr_yu.SJIS at devanagari -> sr_RS.sjis_devanagari
sr at foobar -> sr at foobar
sR at foObar -> sR at foObar
sR -> sr_RS.UTF-8
mfabian at ari:~
$ 

But note that the normalization of the “sr_yu.SJIS at devanagari”
locale is still weird (of course a “sr_yu.SJIS at devanagari”
is quite silly and does not exist anyway, but the code in normalize()
does not seem to work as intended.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19534>
_______________________________________