Something confusing about non-greedy reg exp match

George Burdell gburdell1 at gmail.com
Sun Sep 6 23:23:44 EDT 2009


On Sep 6, 10:22 pm, George Burdell <gburde... at gmail.com> wrote:
> On Sep 6, 10:06 pm, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
>
>
>
>
>
> > <gburde... at gmail.com> wrote in message
>
> >news:f98a6057-c35f-4843-9efb-7f36b05b677c at g19g2000yqo.googlegroups.com...
>
> > > If I do this:
>
> > > import re
> > > a=re.search(r'hello.*?money',  'hello how are you hello funny money')
>
> > > I would expect a.group(0) to be "hello funny money", since .*? is a
> > > non-greedy match. But instead, I get the whole sentence, "hello how
> > > are you hello funny money".
>
> > > Is this expected behavior? How can I specify the correct regexp so
> > > that I get "hello funny money" ?
>
> > A non-greedy match matches the fewest characters before matching the text
> > *after* the non-greedy match.  For example:
>
> > >>> import re
> > >>> a=re.search(r'hello.*?money','hello how are you hello funny money and
> > >>> more money')
> > >>> a.group(0)  # non-greedy stops at the first money
>
> > 'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello how are you hello funny money and
> > >>> more money')
> > >>> a.group(0)  # greedy keeps going to the last money
>
> > 'hello how are you hello funny money and more money'
>
> > This is why it is difficult to use regular expressions to match nested
> > objects like parentheses or XML tags.  In your case you'll need something
> > extra to not match the first hello.
>
> > >>> a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny
> > >>> money')
> > >>> a.group(0)
>
> > 'hello funny money'
>
> > -Mark
>
> I see now. I also understand r's response. But what if there are many
> "hello"'s before "money," and I don't know how many there are? In
> other words, I want to find every occurrence of "money," and for each
> occurrence, I want to scan back to the first occurrence of "hello."
> How can this be done?

I should say "closet" occurrence of "hello," to be more clear.



More information about the Python-list mailing list