No explanation for weird behavior in re module!
synthespian
synthespian at uol.com.br
Sun Feb 10 19:42:32 EST 2002
Hi-
I'm really intrigued by this behavior:
>>> import re
>>> p = re.compile('^(der|die|das(\s\w+))')
>>> m = p.match('die Tür, Türen')
>>> n = p.match('das Auto, Autos')
>>> m.group(0)
'die'
>>> m.group(1)
'die'
>>> m.group(2)
[nothing!!!!]
>>> n.group(0)
'das Auto'
>>> n.group(1)
'das Auto'
>>> n.group(2)
'Auto'
I'm using Python2.0 on a Debian potato system.
Why didn't m.group(2) produce 'Tür' as the output???
Python2.0 is supposed to have Unicode support buil-in the re module right?
Other than the fact that 'Tür' has the 'ü' unicode charcater, I fail to see any difference!
I've even tried "import sre", but that didn't do it either...It's too bad this isn't working, because it's a better way to work with regexx than Perl...
What's going on here? Am I the problem here, not knowing how to make Python understand the umlaut
character (the 'ü')? Or is it a * bug *??!!!!!
Please help!
TIA,
best regards,
H
More information about the Python-list
mailing list