[New-bugs-announce] [issue26784] regular expression problem at umlaut handling

Marcus report at bugs.python.org
Sat Apr 16 12:48:11 EDT 2016


New submission from Marcus:

Working with this example string "E-112233-555-11 | Bläh - Bläh" with the following code leeds under python 2.7.10 (OSX) to an exception whereas the same code works under python 3.5.1 (OSX).

s = "E-112233-555-11 | Bläh - Bläh"

expr = re.compile(r"(?P<p>[A-Z]{1}-[0-9]{0,}(-[0-9]{0,}(-[0-9]{0,})?)?)?(( [|] )?(?P<a>[\s\w]*)?)? - (?P<j>[\s\w]*)?",re.UNICODE)
res = re.match(expr,s)
a = (res.group('p'), res.group('a'), res.group('j'))
print(a)


When I change the first umlaut in "Bläh" from ä to ü it works as expected on python 2 and 3. A change from ä to ö however leeds to a crash again.

Ideas?

----------
messages: 263567
nosy: arbyter
priority: normal
severity: normal
status: open
title: regular expression problem at umlaut handling
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26784>
_______________________________________


More information about the New-bugs-announce mailing list