[Tutor] stopping greedy matches
Mike Hall
michael.hall at critterpixstudios.com
Wed Mar 16 21:12:32 CET 2005
I'm having trouble getting re to stop matching after it's consumed what
I want it to. Using this string as an example, the goal is to match
"CAPS":
>>> s = "only the word in CAPS should be matched"
So let's say I want to specify when to begin my pattern by using a
lookbehind:
>>> x = re.compile(r"(?<=\bin)") #this will simply match the spot in
front of "in"
So that's straight forward, but let's say I don't want to use a
lookahead to specify the end of my pattern, I simply want it to stop
after it has combed over the word following "in". I would expect this
to work, but it doesn't:
>>> x=re.compile(r"(?<=\bin).+\b") #this will consume everything past
"in" all the way to the end of the string
In the above example I would think that the word boundary flag "\b"
would indicate a stopping point. Is ".+\b" not saying, "keep matching
characters until a word boundary has been reached"?
Even stranger are the results I get from:
>>> x=re.compile(r"(?<=\bin).+\s") #keep matching characters until a
whitespace has been reached(?)
>>> r = x.sub("!@!", s)
>>> print r
only the word in!@!matched
For some reason there it's decided to consume three words instead of
one.
My question is simply this: after specifying a start point, how do I
make a match stop after it has found one word, and one word only? As
always, all help is appreciated.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1962 bytes
Desc: not available
Url : http://mail.python.org/pipermail/tutor/attachments/20050316/493b95ce/attachment.bin
More information about the Tutor
mailing list