Regexes: How to handle escaped characters

Torsten Bronger bronger at physik.rwth-aachen.de
Thu May 17 16:00:12 EDT 2007


Hallöchen!

James Stroud writes:

> Torsten Bronger wrote:
>
>> I need some help with finding matches in a string that has some
>> characters which are marked as escaped (in a separate list of
>> indices).  Escaped means that they must not be part of any match.
>>
>> [...]
>
> You should probably provide examples of what you are trying to do
> or you will likely get a lot of irrelevant answers.

Example string: u"Hollo", escaped positions: [4].  Thus, the second
"o" is escaped and must not be found be the regexp searches.

Instead of re.search, I call the function guarded_search(pattern,
text, offset) which takes care of escaped caracters.  Thus, while

    re.search("o$", string)

will find the second "o",

    guarded_search("o$", string, 0)

won't find anything.  But how to program "guarded_search"?
Actually, it is about changing the semantics of the regexp syntax:
"." doesn't mean anymore "any character except newline" but "any
character except newline and characters marked as escaped".  And so
on, for all syntax elements of regular expressions.  Escaped
characters must spoil any match, however, the regexp machine should
continue to search for other matches.

Tschö,
Torsten.

-- 
Torsten Bronger, aquisgrana, europa vetus
                                      Jabber ID: bronger at jabber.org
                      (See http://ime.webhop.org for ICQ, MSN, etc.)



More information about the Python-list mailing list