Help: non-capturing group RE fails
Robin Thomas
robin.thomas at starmedia.net
Mon Feb 12 14:00:08 EST 2001
At 09:34 AM 2/12/01 -0800, Roy Mathew wrote:
>Robin,
>
>Thanks for your clear explanation; I had assumed that "non-capturing" meant
>"non-consuming" as well. I am still puzzled however, over the
>following behavior (slightly modified from before). ie: why does
>the "?=" which is a +ve lookahead assertion not produce the result
>we are looking for:
Damn, I'm sorry. I failed to notice that you want *nothing* touched in @<<b>>.
Anything not in a group will not be put in your replacement string, because
there's no way to reference it. Things aren't "consumed", they're just
considered part of the match, and the entire mathc substring is replaced
with your replacement string.
"consumed" refers to anything that is not a lookahead or zero-width assertion.
?= does not consume any of the string. So
?=[^@]
says, "check the next character. As long as it's not '@', this re could
still match. Continue matching at the same character I just checked!" So
the next part of the re will start matching at the same character. That's
why you get the weird behavior.
A negative lookbehind behaves differently. It will exclude the lookbehind's
contents from the match, and do what you want:
# new code, works in 1.6 and up
import re
str = r' - <<ba>> @<<b>> <<c>> - '
r = re.compile(r'(?<!@)<<(.*?)>>')
print r.sub(r"\1", str)
--
Robin Thomas
Director, Platform Engineering
StarMedia Network, Inc.
robin.thomas at starmedia.net
More information about the Python-list
mailing list