regex confusion

Peter Hansen peter at engcorp.com
Tue Dec 9 11:43:03 EST 2003


"Diez B. Roggisch" wrote:
> 
> John Hunter wrote:
> 
> >
> > In trying to sdebug why a certain regex wasn't working like I expected
> > it to, I came across this strange (to me) behavior.  The file I am
> > trying to match definitely contains many instances of the letter 'a',
> > so I would expect the regex
> >
> >   rgxPrev = re.compile('.*?a.*?')
> 
> This is a bogus regex - a '*' means "zero or more occurences" for the
> expression to the left. '?' means "zero or one occurence" for the exp to
> the left. 

Not true.  See http://www.python.org/doc/current/lib/re-syntax.html :

*?, +?, ?? 
The "*", "+", and "?" qualifiers are all greedy; they match as much text 
as possible.  .... Adding "?" after the qualifier makes it perform the match 
in non-greedy or minimal fashion; as few characters as possible will be 
matched. ....

-Peter




More information about the Python-list mailing list