Python and UTF-8

Matthias Huening matthias.huening at t-online.de
Thu Jan 3 14:41:24 EST 2002


Martin von Loewis <loewis at informatik.hu-berlin.de> wrote in 
news:j4itajb9jx.fsf at informatik.hu-berlin.de:

>> How to use regular expressions with Unicode?
> 
> Just use the re module: it fully supports Unicode.
> 

Not really...
At least the combination of re.I and re.U fails on texts in German. 
But that again could be a result of the combination of 'locale' and 
Unicode, right?
I tried this (Win 98, Python 2.1, Idle):

----------------------
>>> import locale
>>> locale.setlocale(locale.LC_ALL,"")
'German_Germany.1252'
>>> t = 'Mühsam ernährt sich das Eichhörnchen.'
>>> print t.upper()
MÜHSAM ERNÄHRT SICH DAS EICHHÖRNCHEN.
>>> tu = unicode(t, 'latin-1').encode('utf-8')
>>> print tu.upper()
MüHSAM ERNäHRT SICH DAS EICHHöRNCHEN.
>>> 
----------------------

This should work, I think. But it doesn't.
Did I miss something? 

Matthias




More information about the Python-list mailing list