Regexes: How to handle escaped characters

James Stroud jstroud at mbi.ucla.edu
Thu May 17 16:50:47 EDT 2007


Torsten Bronger wrote:
> Hallöchen!
> 
> James Stroud writes:
> 
> 
>>Torsten Bronger wrote:
>>
>>
>>>I need some help with finding matches in a string that has some
>>>characters which are marked as escaped (in a separate list of
>>>indices).  Escaped means that they must not be part of any match.
>>>
>>>[...]
>>
>>You should probably provide examples of what you are trying to do
>>or you will likely get a lot of irrelevant answers.
> 
> 
> Example string: u"Hollo", escaped positions: [4].  Thus, the second
> "o" is escaped and must not be found be the regexp searches.
> 
> Instead of re.search, I call the function guarded_search(pattern,
> text, offset) which takes care of escaped caracters.  Thus, while
> 
>     re.search("o$", string)
> 
> will find the second "o",
> 
>     guarded_search("o$", string, 0)
> 
> won't find anything.  But how to program "guarded_search"?
> Actually, it is about changing the semantics of the regexp syntax:
> "." doesn't mean anymore "any character except newline" but "any
> character except newline and characters marked as escaped".  And so
> on, for all syntax elements of regular expressions.  Escaped
> characters must spoil any match, however, the regexp machine should
> continue to search for other matches.
> 
> Tschö,
> Torsten.
> 

You will probably need to implement your own findall, etc., but this 
seems to do it for search:

def guarded_search(rgx, astring, escaped):
   m = re.search(rgx, astring)
   if m:
     s = m.start()
     e = m.end()
     for i in escaped:
       if s <= i <= e:
         m = None
         break
   return m


Here it is in use:

py> def guarded_search(rgx, astring, escaped):
...   m = re.search(rgx, astring)
...   if m:
...     s = m.start()
...     e = m.end()
...     for i in escaped:
...       if s <= i <= e:
...         m = None
...         break
...   return m
...
py> import re
py> escaped = [1, 5, 15]
py> print guarded_search('abc', 'xyzabcxyz', escaped)
None
py> print guarded_search('abc', 'xyzxyzabcxyz', escaped)
<_sre.SRE_Match object at 0x40379720>

James



More information about the Python-list mailing list