[Python-bugs-list] [ python-Bugs-610299 ] unicode alphanumeric regexp bug

noreply@sourceforge.net noreply@sourceforge.net
Mon, 04 Nov 2002 08:51:35 -0800


Bugs item #610299, was opened at 2002-09-16 17:18
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=610299&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Florent Guillaume (efge)
Assigned to: Fredrik Lundh (effbot)
Summary: unicode alphanumeric regexp bug

Initial Comment:
I've got the following problem, in python 2.1, 2.2 and
2.3a0 (Debian):

>>> import re
>>> re.compile(r'\w+', re.U).sub('X', u'hello caf\xe9')
u'X X'
>>> re.compile(r'\w{1}', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXXX'
>>> re.compile(r'\w', re.U).sub('X', u'hello caf\xe9')
u'XXXXX XXX\xe9'

The first two results are ok, but the third is not.


----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2002-11-04 07:51

Message:
Logged In: YES 
user_id=86307

I just posted a small patch to sre_compile.py which should fix this:

http://sourceforge.net/tracker/?
func=detail&aid=633359&group_id=5470&atid=305470

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=610299&group_id=5470