Help with regular expression using findall and .*?
czrpb
nanotech at europa.com
Fri Sep 13 12:29:10 EDT 2002
Harvey:
Great thanks!! And thanks for sticking to my question's requirements. <wink!>
Ok, this is what we thought around here. But what I do not understand is why any backtracking data is being kept? The '?' in '.*?' means it is non-greedy right? When would backtracking ever occur using '.*?'? What am I missing?
<<q
On Fri, 13 Sep 2002, Harvey Thomas wrote:
> czrpb wrote
> >
> > Could anyone help out with rewriting (still using regular expressions)
> > the following so that it does not cause an exception:
> >
> > import re
> >
> > s1=('macro\n'+'a'*200+'\norcam\n')*10
> > s2=('macro\n'+'a'*20000+'\norcam\n')*10
> >
> > p=re.compile(r'macro.*?orcam',re.DOTALL)
> >
> > for x in re.findall(p,s1):
> > print x
> >
> > for x in re.findall(p,s2):
> > print x
> >
> > thanks!! Quentin Crain
> >
>
> You need to be very careful about using .*? as the engine "only" allows 10,000 backtracks
>
> Try this
>
> p = re.compile('macro(?:[^o]+|o(?!rcam))*orcam')
> for x in p.findall(s2):
> print x
>
> HTH
>
> Harvey
>
More information about the Python-list
mailing list