Locale case change not working
Peter Otten
__peter__ at web.de
Thu May 24 05:40:28 EDT 2007
Clodoaldo wrote:
> When using unicode the case change works:
>
>>>> print u'É'.lower()
> é
>
> But when using the pt_BR.utf-8 locale it doesn't:
>
>>>> locale.setlocale(locale.LC_ALL, 'pt_BR.utf-8')
> 'pt_BR.utf-8'
>>>> locale.getlocale()
> ('pt_BR', 'utf')
>>>> print 'É'.lower()
> É
>
> What am I missing? I'm in Fedora Core 5 and Python 2.4.3.
>
> # cat /etc/sysconfig/i18n
> LANG="en_US.UTF-8"
> SYSFONT="latarcyrheb-sun16"
>
> Regards, Clodoaldo Pinto Neto
str.lower() operates on bytes and therefore doesn't handle encodings with
multibyte characters (like utf-8) properly:
>>> u"É".encode("utf8")
'\xc3\x89'
>>> u"É".encode("latin1")
'\xc9'
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "de_DE.utf8")
'de_DE.utf8'
>>> print unicode("\xc3\x89".lower(), "utf8")
É
>>> locale.setlocale(locale.LC_ALL, "de_DE.latin1")
'de_DE.latin1'
>>> print unicode("\xc9".lower(), "latin1")
é
I recommend that you forget about byte strings and use unicode throughout.
Peter
More information about the Python-list
mailing list