[Python-bugs-list] [ python-Bugs-610299 ] unicode alphanumeric regexp bug

noreply@sourceforge.net noreply@sourceforge.net
Mon, 16 Sep 2002 18:18:04 -0700


Bugs item #610299, was opened at 2002-09-17 03:18
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=610299&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Florent Guillaume (efge)
Assigned to: Fredrik Lundh (effbot)
Summary: unicode alphanumeric regexp bug

Initial Comment:
I've got the following problem, in python 2.1, 2.2 and
2.3a0 (Debian):

>>> import re
>>> re.compile(r'\w+', re.U).sub('X', u'hello caf\xe9')
u'X X'
>>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXXX'
>>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXX\xe9'

The first two results are ok, but the third is not.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=610299&group_id=5470