[Python-Dev] regexp in Python

Mike Klaas mike.klaas at gmail.com
Fri Mar 23 18:59:34 CET 2007


On 3/23/07, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Bartlomiej Wolowiec wrote:
>
> > For some time I'm interested in regular expressions and Finite State Machine.
> > Recently, I saw that Python uses "Secret Labs' Regular Expression Engine",
> > which very often works too slow. Its pesymistic time complexity is O(2^n),
> > although other solutions, with time complexity O(n*m) ( O(n*m^2), m is the
> > length of the regular expression and n is the length of the text,
> > introduction to problem: http://swtch.com/~rsc/regexp/regexp1.html )
>
> that article almost completely ignores all the subtle capturing and left-
> to-right semantics that a perl-style engine requires, though.  trust me,
> this is a much larger can of worms than you might expect.  but if you're
> ready to open it, feel free to hack away.
>
> > major part of regular expressions
>
> the contrived example you used has nothing whatsoever to do with
> "major part of regular expressions" as seen in the wild, though.  I'd
> be much more interested in optimizations that focuses on patterns
> you've found in real code.

A fruitful direction that is not as ambitious as re-writing the entire
engine would be to add "independent group" assertions to python's RE
syntax [ (?> ... ) in perl].  They are rather handy for optimizing the
malperforming cases alluded to here (which rarely occur as the OP
posted, but tend to crop up in less malignant forms).

-Mike


More information about the Python-Dev mailing list