[Python-bugs-list] [ python-Bugs-690974 ] re.LOCALE, umlaut and \w

SourceForge.net noreply@sourceforge.net
Sat, 19 Apr 2003 01:14:31 -0700


Bugs item #690974, was opened at 2003-02-22 01:06
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=690974&group_id=5470

Category: Regular Expressions
Group: Python 2.3
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: peter nordlund (peterno)
Assigned to: Fredrik Lundh (effbot)
Summary: re.LOCALE, umlaut and \w

Initial Comment:
I submit this problem although I am not sure it is
a real bug. It could be that I don't know how this
locale stuff works.

Anyway, I have been browsing around quite some time on
the net to find some
good examples of code demonstating how to use regexp in
python to get hold
of åäö when using \w, but I have not found any complete
examples.

If the code below behaves correctly, I suggest that the
regexp documentation
is improved by adding a complete example that shows how
to use re.LOCALE.
(The code behaves in the same way with python 2.2.2.)

#----------------------------------------
import locale
locale.setlocale(locale.LC_ALL,'swedish')
import re
reguml=re.compile(r"[a-zä]", re.LOCALE) # I expect
reguml and regw to give the same result.
regw=re.compile(r"\w", re.LOCALE)
reguml2=re.compile(r"[a-zä]+", re.LOCALE) # I expect
reguml2 and regw2 to give the same result.
regw2=re.compile(r"[\w]+", re.LOCALE)
str="abcä d\344e ä f ";

print reguml.findall(str) # Behaves as I expect.
print regw.findall(str) # Here I expect same result as
above, but I don't get it.
print reguml2.findall(str) # Behaves as I expect.
print regw2.findall(str) # Behaves as I expect.
#----------------------------------------



>>> import locale
>>> locale.setlocale(locale.LC_ALL,'swedish')
'swedish'
>>> import re
>>> reguml=re.compile(r"[a-zä]", re.LOCALE) # I expect
reguml and regw to give the same result.
>>> regw=re.compile(r"\w", re.LOCALE)
>>> reguml2=re.compile(r"[a-zä]+", re.LOCALE) # I
expect reguml2 and regw2 to give the same result.
>>> regw2=re.compile(r"[\w]+", re.LOCALE)
>>> str="abcä d\344e ä f ";
>>>
>>> print reguml.findall(str) # Behaves as I expect.
['a', 'b', 'c', '\xe4', 'd', '\xe4', 'e', '\xe4', 'f']
>>> print regw.findall(str) # Here I expect same result
as above, but I don't get it.
['a', 'b', 'c', 'd', 'e', 'f']
>>> print reguml2.findall(str) # Behaves as I expect.
['abc\xe4', 'd\xe4e', '\xe4', 'f']
>>> print regw2.findall(str) # Behaves as I expect.
['abc\xe4', 'd\xe4e', '\xe4', 'f']
---------------------------------------------------------
peternl:Python-2.3a2>>
/work1/pkg/dev-tools/python/2.3a2/bin/python -V
Python 2.3a2
peternl:Python-2.3a2>>uname -a
Linux peternl.computervision.se 2.4.18-6mdk-petern #2
Thu May 23 06:40:30 CEST 2002 i686 unknown


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2003-04-19 10:14

Message:
Logged In: YES 
user_id=21627

This has been fixed with Greg's patch.

----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2003-02-22 18:15

Message:
Logged In: YES 
user_id=86307

I believe this is fixed by this patch:

   http://www.python.org/sf/633359

At any rate, using a patched 2.22, regw behaves identically to reguml. 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=690974&group_id=5470