re bug

Gustavo Niemeyer niemeyer at conectiva.com
Tue Oct 5 13:22:20 EDT 2004


> '''
> (?:^|\ )
> (?P<ALLES>
>    (?:
>       [^/ ]*/[^/ ]*/
>       (?:                 # why a group?
>          cn
>          (?: :[^/ #]+ )*
[...]

That's not the same regular expression. You must escape whitespaces
when using re.VERBOSE.

> Judging from the number of '*' and '+' quantifiers, the long search
> time may be due to excessive backtracking as the regexp engine tries
> to find a match.

The problem is not just the number of repeating qualifiers, but
the nesting of them. Nesting repeating qualifiers deeply is a good
way to kill regular expression engines.

Try this example:

re.search("a(((.)*c)*d)*e", "abcdf"*20)

Also, if you're curious enough, try to replace 20 by 10, and
increase it one at a time.

Btw, the only reason that the OP's expression didn't got stuck
in the first two test cases is because the expression matched.

-- 
Gustavo Niemeyer
http://niemeyer.net



More information about the Python-list mailing list