[Baypiggies] More dramatic material?

Andrew Dalke dalke at dalkescientific.com
Thu Feb 25 00:03:00 CET 2010


On Feb 24, 2010, at 12:41 PM, David Creemer wrote:
> There is an old feature request bug to get this into the Python library:
> 
> http://bugs.python.org/issue1662581

While it's a request, the pain is mostly theoretical and arises in three main cases:

  - regular expressions written by someone who have no clue about
     what's going on (image writing regular expressions to pattern
     match entire tag groups in a multi-MB text field). These are
     often best solved through some other means, like rewriting
     the regexp so it doesn't need any backtracking.

  - people trying to push regular expressions to do bizarre things
    (like match only against input strings of length which is
    not a prime). These don't occur in real life.

  - people worried about denial-of-security attack through user-defined
    regexps. They want to remove regexp support entirely.

The solution (using the Thompson algorithm) has some costs. It does not allow back-references and it does not allow grouping. Consider the pattern for the algorithm mentioned in the second of these (match a string of 1:s of non-prime length):

  /^1?$|^(11+?)\1+$/

The Thompson algorithm cannot implement it, because the Thompson algorithm only supports "true" (in the computer science sense) regular expressions, and those cannot do primality testing.

It's possible to detect input patterns which don't need grouping and back-references, but that's extra implementation overhead, and the result would not change these three cases.


				Andrew
				dalke at dalkescientific.com




More information about the Baypiggies mailing list