[New-bugs-announce] [issue24863] Incoherent bevavior with umlaut in regular expressions

Fri Aug 14 09:07:42 CEST 2015

New submission from Christian Klein:

The Python 2.7 re module seems not to agree what to consider a word character:

import re
s = u'f\xfc'
print re.sub('\W', '*', s, re.UNICODE)
print re.findall('\w', s, re.UNICODE)

The application of re.sub removes the character u'ü' which implies it's considered a non word character (\W).
But then re.findall shows it as a word character (\w).

Python 3.4 and Python 3.5 are correct respectively coherent.
(But that's unfortunately not an option for Google App Engine)

----------
components: Regular Expressions
messages: 248560
nosy: cklein, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: Incoherent bevavior with umlaut in regular expressions
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24863>
_______________________________________