regexp non-greedy matching bug?

John Hazen john at hazen.net
Sun Dec 4 02:31:43 EST 2005


> [John Hazen]
> > I want to match one or two instances of a pattern in a string.
> >
> > >>> s = 'foobarbazfoobar'
> > >>> foofoo = re.compile(r'^(foo)(.*?)(foo)?(.*?)$')
> > >>> foofoo.match(s).group(1)
> > 'foo'
> > >>> foofoo.match(s).group(3)
> > >>> 

[Tim Peters]
> Your problem isn't that
> 
>     (foo)?
> 
> is not greedy (it is greedy), it's that your first
> 
>     (.*?)
> 
> is not greedy.  Remember that regexps also work left to right.

Well, I had the same symptoms when that .* was greedy (it ate up the
optional foo), which is why I went to non-greedy there.

I guess my error was thinking that greedy trumped non-greedy, rather
than left trumping right.  (ie, in order for the (foo)? to be maximally
greedy, the (.*?) has to be non-maximally non-greedy :)

> Maybe what you're looking for is
> 
>     ^P(.*P)?.*$

Yes.  That works the way I wanted.  ( ^(foo)(.*(foo))?.*$ )

Thank you, both for the specific answer, and the general education.

-John



More information about the Python-list mailing list