Opening and reading randomly text file with umlauts and accents truncates output - unicode how?

Piet van Oostrum piet at cs.uu.nl
Fri Feb 1 06:33:17 EST 2002


>>>>> synthespian <synthespian at uol.com.br> (S) writes:

S> das Appartement, Appartements
S> das Auge, Augen
S> das Bad, Bäder
S> das Bein, Beine
S> das Beispiel, Beispiele
S> das Buch, Bücher
S> das Büro, Büros
S> das Café, Cafés

S> p = re.compile('^(der|die|das(\s\w+))')

If \w doesn't match the accented letters then probably your locale isn't
set properly.
A simple locale-independent solution would be to enumerate the characters
yourself, like [A-Za-zäöüé...]
Or just use the fact that the words are terminated by a comma (or maybe
space)

p = re.compile('^(der|die|das(\s[^ ,]+))')

-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum at hccnet.nl



More information about the Python-list mailing list