[Python-Dev] (no subject)

MRAB python at mrabarnett.plus.com
Tue Dec 26 15:15:18 EST 2017


On 2017-12-26 07:01, Yogev Hendel wrote:
> 
> I don't know if this is the right place to put this,
> but I've found the following lines of code results in an incredibly long 
> processing time.
> Perhaps it might be of use to someone.
> 
> /import re/
> /pat = re.compile('^/?(?:\\w+)/(?:[%\\w-]+/?)+/?$')/
> /pat.match('/t/a-aa-aa-aaaaa-aa-aa-aa-aa-aa-aa./')/
> 
The pattern has a repeated repeat, which results in catastrophic 
backtracking.

As an example, think about how the pattern (?:a+)+b would try to match 
the string 'aaac'.

     Match 'aaa', but not 'c'.

     Match 'aa' and 'a', but not 'c'.

     Match 'a' and 'aa', but not 'c'.

     Match 'a' and 'a' and 'a', but not 'c'.

That's 4 failed attempts.

Now try match the string 'aaaac'.

     Match 'aaaa', but not 'c'.

     Match 'aaa' and 'a', but not 'c'.

     Match 'aa' and 'aa', but not 'c'.

     Match 'aa' and 'a a', but not 'c'.

     Match 'a' and 'aaa', but not 'c'.

     Match 'a' and 'aa' and 'a', but not 'c'.

     Match 'a' and 'a aa', but not 'c'.

     Match 'a' and 'a a' and 'a', but not 'c'.

That's 8 failed attempts.

Each additional 'a' in the string to match will double the number of 
attempts.

Your pattern has (?:[%\w-]+/?)+, and the '/' is optional. The string has 
a '.', which the pattern can't match, but it'll keep trying until it 
finally fails.

If you add just 1 more 'a' or '-' to the string, it'll take twice as 
long as it does now.

You need to think more carefully about how the pattern matches and what 
it'll do when it doesn't match.


More information about the Python-Dev mailing list