Help: non-capturing group RE fails

Robin Thomas robin.thomas at starmedia.net
Mon Feb 12 14:00:08 EST 2001


At 09:34 AM 2/12/01 -0800, Roy Mathew wrote:
>Robin,
>
>Thanks for your clear explanation; I had assumed that "non-capturing" meant
>"non-consuming" as well. I am still puzzled however, over the
>following behavior (slightly modified from before). ie: why does
>the "?=" which is a +ve lookahead assertion not produce the result
>we are looking for:

Damn, I'm sorry. I failed to notice that you want *nothing* touched in @<<b>>.

Anything not in a group will not be put in your replacement string, because 
there's no way to reference it. Things aren't "consumed", they're just 
considered part of the match, and the entire mathc substring is replaced 
with your replacement string.

"consumed" refers to anything that is not a lookahead or zero-width assertion.

?= does not consume any of the string. So

?=[^@]

says, "check the next character. As long as it's not '@', this re could 
still match. Continue matching at the same character I just checked!" So 
the next part of the re will start matching at the same character. That's 
why you get the weird behavior.

A negative lookbehind behaves differently. It will exclude the lookbehind's 
contents from the match, and do what you want:

# new code, works in 1.6 and up
import re
str = r' - <<ba>> @<<b>> <<c>> - '
r = re.compile(r'(?<!@)<<(.*?)>>')
print r.sub(r"\1", str)


--
Robin Thomas
Director, Platform Engineering
StarMedia Network, Inc.
robin.thomas at starmedia.net





More information about the Python-list mailing list