Regexes: How to handle escaped characters

Charles Sanders C.delete_this.Sanders at BoM.GOv.AU
Fri May 18 08:06:27 EDT 2007


Torsten Bronger wrote:
> Hallöchen!
[...]
>>>
>>> Example string: u"Hollo", escaped positions: [4].  Thus, the
>>> second "o" is escaped and must not be found be the regexp
>>> searches.
>>>
>>> Instead of re.search, I call the function guarded_search(pattern,
>>> text, offset) which takes care of escaped caracters.  Thus, while
>>>
> 
> Tschö,
> Torsten.

	I'm still pretty much a beginner, and I am not sure
of the exact requirements, but the following seems to work
for at least simple cases when overlapping matches are not
considered.

def guarded_search( pattern, text, exclude ):
   return [ m for m in re.finditer(pattern,text)
     if not [ e for e in exclude if m.start() <= e < m.end() ] ]

txt = "axbycz"
exc = [ 3 ]  # "y"
pat = "[xyz]"
mtch = guarded_search(pat,txt,exc)
print "Guarded search text='%s' excluding %s" % ( txt,exc )
for m in mtch:
   print m.group(), 'at', m.start()

txt = "Hollo"
exc = [ 4 ]  # Final "o"
pat = "o$"
mtch = guarded_search(pat,txt,exc)
print "Guarded search text='%s' excluding %s %s matches" % 
(txt,exc,len(mtch))
for m in mtch:
   print m.group(), 'at', m.start()

Guarded search text='axbycz' excluding [3] 2 matches
x at 1
z at 5
Guarded search text='Hollo' excluding [4] 0 matches


Simply finds all the (non-overlapping) matches and rejects any
that include one of the excluded columns (the "y" in the first
case and the final "o" in the second).

Charles



More information about the Python-list mailing list