[Python-bugs-list] [ python-Bugs-690974 ] re.LOCALE, umlaut and \w

SourceForge.net noreply@sourceforge.net
Fri, 21 Feb 2003 16:06:40 -0800


Bugs item #690974, was opened at 2003-02-22 00:06
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=690974&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: peter nordlund (peterno)
Assigned to: Fredrik Lundh (effbot)
Summary: re.LOCALE, umlaut and \w

Initial Comment:
I submit this problem although I am not sure it is
a real bug. It could be that I don't know how this
locale stuff works.

Anyway, I have been browsing around quite some time on
the net to find some
good examples of code demonstating how to use regexp in
python to get hold
of åäö when using \w, but I have not found any complete
examples.

If the code below behaves correctly, I suggest that the
regexp documentation
is improved by adding a complete example that shows how
to use re.LOCALE.
(The code behaves in the same way with python 2.2.2.)

#----------------------------------------
import locale
locale.setlocale(locale.LC_ALL,'swedish')
import re
reguml=re.compile(r"[a-zä]", re.LOCALE) # I expect
reguml and regw to give the same result.
regw=re.compile(r"\w", re.LOCALE)
reguml2=re.compile(r"[a-zä]+", re.LOCALE) # I expect
reguml2 and regw2 to give the same result.
regw2=re.compile(r"[\w]+", re.LOCALE)
str="abcä d\344e ä f ";

print reguml.findall(str) # Behaves as I expect.
print regw.findall(str) # Here I expect same result as
above, but I don't get it.
print reguml2.findall(str) # Behaves as I expect.
print regw2.findall(str) # Behaves as I expect.
#----------------------------------------



>>> import locale
>>> locale.setlocale(locale.LC_ALL,'swedish')
'swedish'
>>> import re
>>> reguml=re.compile(r"[a-zä]", re.LOCALE) # I expect
reguml and regw to give the same result.
>>> regw=re.compile(r"\w", re.LOCALE)
>>> reguml2=re.compile(r"[a-zä]+", re.LOCALE) # I
expect reguml2 and regw2 to give the same result.
>>> regw2=re.compile(r"[\w]+", re.LOCALE)
>>> str="abcä d\344e ä f ";
>>>
>>> print reguml.findall(str) # Behaves as I expect.
['a', 'b', 'c', '\xe4', 'd', '\xe4', 'e', '\xe4', 'f']
>>> print regw.findall(str) # Here I expect same result
as above, but I don't get it.
['a', 'b', 'c', 'd', 'e', 'f']
>>> print reguml2.findall(str) # Behaves as I expect.
['abc\xe4', 'd\xe4e', '\xe4', 'f']
>>> print regw2.findall(str) # Behaves as I expect.
['abc\xe4', 'd\xe4e', '\xe4', 'f']
---------------------------------------------------------
peternl:Python-2.3a2>>
/work1/pkg/dev-tools/python/2.3a2/bin/python -V
Python 2.3a2
peternl:Python-2.3a2>>uname -a
Linux peternl.computervision.se 2.4.18-6mdk-petern #2
Thu May 23 06:40:30 CEST 2002 i686 unknown


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=690974&group_id=5470