sre is broken in SuSE 9.2

"Martin v. Löwis" martin at v.loewis.de
Sun Feb 13 15:09:25 EST 2005


Denis S. Otkidach wrote:
> You are right.  But isalpha behavior looks strange for me anyway: why
> cyrillic character '\u0430' is recognized as alpha one for de_DE locale,
> but is not for C?

In glibc, all "real" locales are based on
/usr/share/locale/i18n/locales/i18n, e.g. for de_DE through

LC_CTYPE
copy "i18n"

i18n includes U+0430 as a character, through

lower /
...
% TABLE 11 CYRILLIC/
    <U0430>..<U045F>;<U0461>..(2)..<U047F>;/

This makes U+0430 a letter in all locales including i18n
(unless locally overridden). This entire approach apparently
is based on ISO 14652, which, in section 4.3.3, introduces
the "i18n" LC_CTYPE category.

Why the C locale does not use i18n, I don't know. Most likely,
the intention is that the "C" locale works without any
additional data files - you should ask the glibc developers.
OTOH, there is a definition file POSIX for what appears
to be the POSIX locale.

I'd like to point out that this implementation is potentially
in violation of ISO 14652; annex A.2.2 says that the notion
of a POSIX locale is replaced with the i18n FDCC-set. So
accordingly, I would expect that i18n is used in POSIX as
well - see for yourself that it isn't in glibc 2.3.2.

Again, I suggest to ask the glibc developers as to why
this is so.

Regards,
Martin



More information about the Python-list mailing list