Regexes: How to handle escaped characters
James Stroud
jstroud at mbi.ucla.edu
Thu May 17 16:50:47 EDT 2007
Torsten Bronger wrote:
> Hallöchen!
>
> James Stroud writes:
>
>
>>Torsten Bronger wrote:
>>
>>
>>>I need some help with finding matches in a string that has some
>>>characters which are marked as escaped (in a separate list of
>>>indices). Escaped means that they must not be part of any match.
>>>
>>>[...]
>>
>>You should probably provide examples of what you are trying to do
>>or you will likely get a lot of irrelevant answers.
>
>
> Example string: u"Hollo", escaped positions: [4]. Thus, the second
> "o" is escaped and must not be found be the regexp searches.
>
> Instead of re.search, I call the function guarded_search(pattern,
> text, offset) which takes care of escaped caracters. Thus, while
>
> re.search("o$", string)
>
> will find the second "o",
>
> guarded_search("o$", string, 0)
>
> won't find anything. But how to program "guarded_search"?
> Actually, it is about changing the semantics of the regexp syntax:
> "." doesn't mean anymore "any character except newline" but "any
> character except newline and characters marked as escaped". And so
> on, for all syntax elements of regular expressions. Escaped
> characters must spoil any match, however, the regexp machine should
> continue to search for other matches.
>
> Tschö,
> Torsten.
>
You will probably need to implement your own findall, etc., but this
seems to do it for search:
def guarded_search(rgx, astring, escaped):
m = re.search(rgx, astring)
if m:
s = m.start()
e = m.end()
for i in escaped:
if s <= i <= e:
m = None
break
return m
Here it is in use:
py> def guarded_search(rgx, astring, escaped):
... m = re.search(rgx, astring)
... if m:
... s = m.start()
... e = m.end()
... for i in escaped:
... if s <= i <= e:
... m = None
... break
... return m
...
py> import re
py> escaped = [1, 5, 15]
py> print guarded_search('abc', 'xyzabcxyz', escaped)
None
py> print guarded_search('abc', 'xyzxyzabcxyz', escaped)
<_sre.SRE_Match object at 0x40379720>
James
More information about the Python-list
mailing list