[New-bugs-announce] [issue24863] Incoherent bevavior with umlaut in regular expressions
Christian Klein
report at bugs.python.org
Fri Aug 14 09:07:42 CEST 2015
New submission from Christian Klein:
The Python 2.7 re module seems not to agree what to consider a word character:
import re
s = u'f\xfc'
print re.sub('\W', '*', s, re.UNICODE)
print re.findall('\w', s, re.UNICODE)
The application of re.sub removes the character u'ü' which implies it's considered a non word character (\W).
But then re.findall shows it as a word character (\w).
Python 3.4 and Python 3.5 are correct respectively coherent.
(But that's unfortunately not an option for Google App Engine)
----------
components: Regular Expressions
messages: 248560
nosy: cklein, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: Incoherent bevavior with umlaut in regular expressions
type: behavior
versions: Python 2.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24863>
_______________________________________
More information about the New-bugs-announce
mailing list