Help with regular expression using findall and .*?

Harvey Thomas hst at empolis.co.uk
Fri Sep 13 03:41:59 EDT 2002


czrpb wrote
> 
> Could anyone help out with rewriting (still using regular expressions)
> the following so that it does not cause an exception:
> 
> import re
> 
> s1=('macro\n'+'a'*200+'\norcam\n')*10
> s2=('macro\n'+'a'*20000+'\norcam\n')*10
> 
> p=re.compile(r'macro.*?orcam',re.DOTALL)
> 
> for x in re.findall(p,s1):
>     print x
> 
> for x in re.findall(p,s2):
>     print x
> 
> thanks!! Quentin Crain
> 

You need to be very careful about using .*? as the engine "only" allows 10,000 backtracks

Try this

p = re.compile('macro(?:[^o]+|o(?!rcam))*orcam')
for x in p.findall(s2):
    print x

HTH

Harvey

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.




More information about the Python-list mailing list