Something confusing about non-greedy reg exp match

MRAB python at mrabarnett.plus.com
Mon Sep 7 07:47:46 EDT 2009


George Burdell wrote:
> On Sep 6, 10:06 pm, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
>> <gburde... at gmail.com> wrote in message
>>
>> news:f98a6057-c35f-4843-9efb-7f36b05b677c at g19g2000yqo.googlegroups.com...
>>
>>> If I do this:
>>> import re
>>> a=re.search(r'hello.*?money',  'hello how are you hello funny money')
>>> I would expect a.group(0) to be "hello funny money", since .*? is a
>>> non-greedy match. But instead, I get the whole sentence, "hello how
>>> are you hello funny money".
>>> Is this expected behavior? How can I specify the correct regexp so
>>> that I get "hello funny money" ?
>> A non-greedy match matches the fewest characters before matching the text
>> *after* the non-greedy match.  For example:
>>
>>>>> import re
>>>>> a=re.search(r'hello.*?money','hello how are you hello funny money and
>>>>> more money')
>>>>> a.group(0)  # non-greedy stops at the first money
>> 'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello how are you hello funny money and
>>>>> more money')
>>>>> a.group(0)  # greedy keeps going to the last money
>> 'hello how are you hello funny money and more money'
>>
>> This is why it is difficult to use regular expressions to match nested
>> objects like parentheses or XML tags.  In your case you'll need something
>> extra to not match the first hello.
>>
>>>>> a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny
>>>>> money')
>>>>> a.group(0)
>> 'hello funny money'
>>
>> -Mark
> 
> I see now. I also understand r's response. But what if there are many
> "hello"'s before "money," and I don't know how many there are? In
> other words, I want to find every occurrence of "money," and for each
> occurrence, I want to scan in the reverse (left) direction to the
> closest occurrence of "hello." How can this be done?

 >>> a = re.search(r'hello(?!.*?hello).*?money', 'hello how are you 
hello funny money')
 >>> a.group(0)
'hello funny money'



More information about the Python-list mailing list