[Tutor] stopping greedy matches

Wed Mar 16 21:12:32 CET 2005

I'm having trouble getting re to stop matching after it's consumed what 
I want it to.  Using this string as an example, the goal is to match 
"CAPS":

 >>> s = "only the word in CAPS should be matched"

So let's say I want to specify when to begin my pattern by using a 
lookbehind:

 >>> x = re.compile(r"(?<=\bin)") #this will simply match the spot in 
front of "in"

So that's straight forward, but let's say I don't want to use a 
lookahead to specify the end of my pattern, I simply want it to stop 
after it has combed over the word following "in". I would expect this 
to work, but it doesn't:

 >>> x=re.compile(r"(?<=\bin).+\b") #this will consume everything past 
"in" all the way to the end of the string

In the above example I would think that the word boundary flag "\b" 
would indicate a stopping point. Is ".+\b" not saying, "keep matching 
characters until a word boundary has been reached"?

Even stranger are the results I get from:

 >>> x=re.compile(r"(?<=\bin).+\s") #keep matching characters until a 
whitespace has been reached(?)
 >>> r = x.sub("!@!", s)
 >>> print r
only the word in!@!matched

For some reason there it's decided to consume three words instead of 
one.

My question is simply this:  after specifying a start point,  how do I 
make a match stop after it has found one word, and one word only? As 
always, all help is appreciated.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1962 bytes
Desc: not available
Url : http://mail.python.org/pipermail/tutor/attachments/20050316/493b95ce/attachment.bin