Must be a bug in the re module [was: Why this result with the re module]

Yingjie Lan lanyjie at yahoo.com
Tue Nov 2 23:28:09 EDT 2010


> Your regex says "Zero or more consecutive occurrences of
> something, always returning the most possible".  That's
> what it does, at every position - only matching emptyness
> where it couldn't match anything (findall then skips a
> character to avoid overlapping/infinite empty
> matches),  and at all other times matching the most
> possible (eg. "has a lam" not "has", " a ", "lam").

You are about to convince me now. 
You are correct for the regex '(.a.)*'.

What I thought was for this regex: '((.a.)*)*', 
I confused myself when I added an enclosing ().

Could you please reconsider how would you
work with this new one and see if my steps 
are correct? If you agree with my 7-step
execution for the new regex, then:

We finally found a real bug for re.findall:

>>> re.findall('((.a.)*)*', 'Mary has a lamb')
[('', 'Mar'), ('', ''), ('', ''), ('', 'lam'), ('', ''), ('', '')]


Cheers,

Yingjie


      



More information about the Python-list mailing list