UTF-8 / German, Scandinavian letters - is it really this difficult?? Linux & Windows XP

Serge Orlov Serge.Orlov at gmail.com
Tue Feb 22 06:00:21 EST 2005


Mike Dee wrote:
> [snip wrestling with byte strings]

In addition to Martin reply I just want to add two notes:
1. Interactive console in python 2.3 has a bug that was fixed
in 2.4, so you can't enter unicode strings at the prompt:

C:\Python24>python.exe
>>> a=u'абв'
>>> a
u'\u0430\u0431\u0432'

C:\Python23>python.exe
>>> a=u'абв'
>>> a
u'\xa0\xa1\xa2'

in 2.3 you need to use decode method to get unicode strings:
>>> import sys
>>> a2='абв'.decode(sys.stdin.encoding)
>>> a2
u'\u0430\u0431\u0432'

2. Suse ships buggy build of python so title doesn't work
properly, see discussion http://tinyurl.com/4k3au

>>> print aoumlautxyz.title()
ÄÖXyz

You will need to call setlocale to help you:

>>> import locale
>>> locale.setlocale(locale.LC_ALL,'')
'en_US.utf-8'
>>> print aoumlautxyz.title()
Äöxyz

  Serge.




More information about the Python-list mailing list