Help with regular expression using findall and .*?

czrpb nanotech at europa.com
Fri Sep 13 12:29:10 EDT 2002


Harvey:

Great thanks!! And thanks for sticking to my question's requirements. <wink!>

Ok, this is what we thought around here. But what I do not understand is why any backtracking data is being kept? The '?' in '.*?' means it is non-greedy right? When would backtracking ever occur using '.*?'? What am I missing?

<<q


On Fri, 13 Sep 2002, Harvey Thomas wrote:

> czrpb wrote
> > 
> > Could anyone help out with rewriting (still using regular expressions)
> > the following so that it does not cause an exception:
> > 
> > import re
> > 
> > s1=('macro\n'+'a'*200+'\norcam\n')*10
> > s2=('macro\n'+'a'*20000+'\norcam\n')*10
> > 
> > p=re.compile(r'macro.*?orcam',re.DOTALL)
> > 
> > for x in re.findall(p,s1):
> >     print x
> > 
> > for x in re.findall(p,s2):
> >     print x
> > 
> > thanks!! Quentin Crain
> > 
> 
> You need to be very careful about using .*? as the engine "only" allows 10,000 backtracks
> 
> Try this
> 
> p = re.compile('macro(?:[^o]+|o(?!rcam))*orcam')
> for x in p.findall(s2):
>     print x
> 
> HTH
> 
> Harvey
> 





More information about the Python-list mailing list