Wanted: slow regexes

Tim Chase python.list at tim.thechases.com
Mon Dec 6 07:53:46 EST 2010


On 12/05/2010 10:08 PM, MRAB wrote:
> I'm looking for examples of regexes which are slow (especially those
> which seem never to finish) but whose results are known. I already have
> those reported in the bug tracker, but further ones will be welcome.
>
> This is for testing additional modifications to the new regex
> implementation (available on PyPI).

There was a DOS security issue in Django about a year back (fixed 
the day it came to light in changeset 11603), triggered by a 
regexp with a lot of back-tracking:

http://code.djangoproject.com/changeset/11603

which tried to match

email_re = re.compile(
 
r"(^[-!#$%&'*+/=?^_`{}|~0-9A-Z]+(\.[-!#$%&'*+/=?^_`{}|~0-9A-Z]+)*" 
# dot-atom
 
r'|^"([\001-\010\013\014\016-\037!#-\[\]-\177]|\\[\001-011\013\014\016-\177])*"' 
# quoted-string
   r')@(?:[A-Z0-9]+(?:-*[A-Z0-9]+)*\.)+[A-Z]{2,6}$', 
re.IGNORECASE)  # domain

against

'viewx3dtextx26qx3d at yahoo.comx26latlngx3d15854521645943074058'

(should return None rather than a MatchObject).

Folks were reporting that it was taking >20min to run.

-tkc






More information about the Python-list mailing list