real-life example LC_CTYPE effects?

random832 at fastmail.us random832 at fastmail.us
Tue Oct 21 08:40:52 EDT 2014


On Mon, Oct 20, 2014, at 16:33, Albert-Jan Roskam wrote:
> Hi,
> 
> The locale category LC_CTYPE may affect character classification and case
> conversion.
> 
> That's the theory. Can you give a practical example where this locale
> setting matters? Eg.:
> locale.setlocale(locale.LC_CTYPE, loc)
> m = re.match("\d+", s, re.I | re.L)
> 
> So with two different values for loc, and s is identical, there will or
> won't be a match.

You're generally isolated from this by using unicode strings - there are
only a few unicode characters that have different case mappings in
different languages. LC_CTYPE was designed in an era of 8-bit character
sets. For example, in a Russian locale with KOI8-R character set, C0-DF
are all lowercase letters, and E0-FF are all the uppercase equivalent,
whereas in an English or other western european locale with ISO-8859-1,
C0-DF [except D7] are all uppercase letters, with the lowercase versions
in E0-FF [except F7], and in a Hebrew ISO-8859-8 locale only E0-FA are
letters and are not uppercase/lowercase.

Try setting the locale to tr_TR and matching "i" against "I", for a
demonstration of one of the few remaining effects this can have.



More information about the Python-list mailing list