[Python-Dev] Re: regexp performance
Andrew M. Kuchling
akuchlin@mems-exchange.org
Wed, 10 Nov 1999 09:56:16 -0500 (EST)
[Cc'ed to the String-SIG; sheesh, what's the point of having SIGs
otherwise?]
Fredrik Lundh writes:
>any special pattern constructs that are in need of per-
>formance improvements? (compared to Perl, that is).
In the 1.5 source tree, I think one major slowdown is coming from the
malloc'ed failure stack. This was introduced in order to prevent an
expression like (x)* from filling the stack when applied to a string
contained 50,000 'x' characters (hence 50,000 recursive function
calls). I'd like to get rid of this stack because it's slow and
requires much tedious patching of the upstream PCRE.
>or maybe anyone has an extensive performance test
>suite for perlish regular expressions? (preferrably based
>on how real people use regular expressions, not only on
>things that are known to be slow if not optimized)
Friedl's book describes several optimizations which aren't implemented
in PCRE. The problem is that PCRE never builds a parse tree, and
parse trees are easy to analyse recursively. Instead, PCRE's
functions actually look at the compiled byte codes (for example, look
at find_firstchar or is_anchored in pypcre.c), but this makes analysis
functions hard to write, and rearranging the code near-impossible.
--
A.M. Kuchling http://starship.python.net/crew/amk/
I didn't say it was my fault. I said it was my responsibility. I know the
difference.
-- Rose Walker, in SANDMAN #60: "The Kindly Ones:4"