Regex with ASCII and non-ASCII chars

Peter Otten __peter__ at web.de
Wed Jan 31 08:47:23 EST 2007


TOXiC wrote:

> How I can do a regex match in a string with ascii and non ascii chars
> for example:
> 
>     regex = re.compile(r"(ÿÿ?ð?öÂty)", re.IGNORECASE)
>     match = regex.search("ÿÿ?ð?öÂty")
>     if match:
>         result = match.group()
>         print result
>     else:
>         result = "No match found"
>         print result
> 
> it return "no match found" even if the two string are equal.

For equal strings you should get a match:

>>> re.compile("Zäöü", re.IGNORECASE).search("yadda zäöü yadda")
<_sre.SRE_Match object at 0x401e0a68>
>>> print _.group()
zäöü

For case ignorance your best bet is unicode:

>>> re.compile(u"äöü", re.IGNORECASE|re.UNICODE).search(u"ÄÖÜ")
<_sre.SRE_Match object at 0x401e09f8>

Peter




More information about the Python-list mailing list