[Python-Dev] regexp in Python
Mike Klaas
mike.klaas at gmail.com
Fri Mar 23 18:59:34 CET 2007
On 3/23/07, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Bartlomiej Wolowiec wrote:
>
> > For some time I'm interested in regular expressions and Finite State Machine.
> > Recently, I saw that Python uses "Secret Labs' Regular Expression Engine",
> > which very often works too slow. Its pesymistic time complexity is O(2^n),
> > although other solutions, with time complexity O(n*m) ( O(n*m^2), m is the
> > length of the regular expression and n is the length of the text,
> > introduction to problem: http://swtch.com/~rsc/regexp/regexp1.html )
>
> that article almost completely ignores all the subtle capturing and left-
> to-right semantics that a perl-style engine requires, though. trust me,
> this is a much larger can of worms than you might expect. but if you're
> ready to open it, feel free to hack away.
>
> > major part of regular expressions
>
> the contrived example you used has nothing whatsoever to do with
> "major part of regular expressions" as seen in the wild, though. I'd
> be much more interested in optimizations that focuses on patterns
> you've found in real code.
A fruitful direction that is not as ambitious as re-writing the entire
engine would be to add "independent group" assertions to python's RE
syntax [ (?> ... ) in perl]. They are rather handy for optimizing the
malperforming cases alluded to here (which rarely occur as the OP
posted, but tend to crop up in less malignant forms).
-Mike
More information about the Python-Dev
mailing list