[Python-Dev] str.ascii_lower

Jeff Epler jepler at unpythonic.net
Mon Dec 29 13:09:27 EST 2003


I stand corrected about the behavior of Unicode in the presence of
locales.

On Mon, Dec 29, 2003 at 09:47:39AM -0800, Guido van Rossum wrote:
> > >>> locale.setlocale(locale.LC_CTYPE, "tr_TR.UTF-8")
> > 'tr_TR.UTF-8'
> > >>> "I".lower()   # C library bug? (should be "\xc4\xb1")*
> > 'I'
> > >>> locale.setlocale(locale.LC_CTYPE, "en_US.UTF-8")
> > 'en_US.UTF-8'
> > >>> "I".lower()   # (UTF-8 locale works properly in english)
> > 'i'
> 
> I have no idea what adding UTF8 to the local means.  Is this something
> that Python's locale-awareness does or is it simply recognized by the
> C library?

"A locale name is typically of the form language[_territory]
[.code-set][@modifier]" -- man setlocale() on my system

RedHat 9 made a halfhearted attempt to use UTF-8 as the encoding for all
locales.  So it sets LANG=en_US.UTF-8 by default.  In theory,
tr_TR.UTF_8 should be the Turkish locale with UTF-8 characters, but it
behaves incorrectly by having "I".lower() == "I".

Well, since my earlier post combined a misunderstanding of how Python
works with a possible C library bug, I guess I raised two non-issues.
Sorry for wasting everyone's time.

Jeff



More information about the Python-Dev mailing list