[issue20087] Mismatch between glibc and X11 locale.alias

Tue Mar 7 13:29:01 EST 2017

Marc-Andre Lemburg added the comment:

On 07.03.2017 18:23, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
>> 'cy_GB.ISO8859-1' to 'cy_GB.ISO8859-14'
> 
> Looks as just fixing an error. The default West-European ISO8859-1 is changed to Celtic cy_GB.ISO8859-14. This looks better option for Welsh.
> 
>> 'tg_TJ.KOI8-C' to 'tg_TJ.KOI8-T'
> 
> KOI8-C is not supported by Python, but KOI8-T is supported. I don't know what KOI8-C means, there are several rarely used incompatible encodings with this name.

While all this may make sense, I'm missing some more reasoning
behind the differences between X.org and glibc.

This change also looks strange:

-    'ka_ge':                                'ka_GE.GEORGIAN-ACADEMY',
+    'ka_ge':                                'ka_GE.GEORGIAN_PS',
     'ka_ge.georgianacademy':                'ka_GE.GEORGIAN-ACADEMY',
     'ka_ge.georgianps':                     'ka_GE.GEORGIAN-PS',
     'ka_ge.georgianrs':                     'ka_GE.GEORGIAN-ACADEMY',

Why is GEORGIAN_PS written with an underscore whereas the other
mappings use dashes ?

Or this one:

-    'fi_fi':                                'fi_FI.ISO8859-15',
+    'fi_fi':                                'fi_FI.ISO8859-1',

Why would a locale switch away from an encoding having
the Euro sign to one without it ?

Or why is this latin variant removed:

-    'nan_tw at latin':                         'nan_TW.UTF-8 at latin',

Why should Russians switch back to ISO ?

-    'ru_ru':                                'ru_RU.UTF-8',
+    'ru_ru':                                'ru_RU.ISO8859-5',

or from ISO to KOI ?

-    'russian':                              'ru_RU.ISO8859-5',
+    'russian':                              'ru_RU.KOI8-R',

The more I look at these changes, the more I believe we
should not simply take everything we find in the files
for granted. They obviously both have bugs.

>> I also don't understand why some "xx.utf-8" locale mappings were removed - I don't think we should remove those, unless they are no longer needed due to some other logic implying these mappings.
> 
> The aliases table is a table of exceptions. Removed entries no longer are exceptional.

It's not a table of exceptions, it's a table mapping commonly
used locale settings to ones which the lib C understands :-)

But regardless, I checked the code and it is already
smart enough to convert lib C incompatible spellings such
as "utf8" to "UTF-8", so these entries can indeed be
removed, but only if the locale is otherwise listed.

In some cases, it's probably better to drop the ".utf8"
to have more generic mappings, e.g.

+    'bhb_in.utf8':                          'bhb_IN.UTF-8',

or

     'de_li.utf8':                           'de_LI.UTF-8',

though I'd expect that mapping to be:

     'de_li':                           'de_LI.ISO8859-1',

as for all other "de" entries.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20087>
_______________________________________