[Python-Dev] Re: regexp performance

Andrew M. Kuchling akuchlin@mems-exchange.org
Wed, 10 Nov 1999 09:56:16 -0500 (EST)


[Cc'ed to the String-SIG; sheesh, what's the point of having SIGs
otherwise?]

Fredrik Lundh writes:
>any special pattern constructs that are in need of per-
>formance improvements?  (compared to Perl, that is).

In the 1.5 source tree, I think one major slowdown is coming from the
malloc'ed failure stack.  This was introduced in order to prevent an
expression like (x)* from filling the stack when applied to a string
contained 50,000 'x' characters (hence 50,000 recursive function
calls).  I'd like to get rid of this stack because it's slow and
requires much tedious patching of the upstream PCRE.

>or maybe anyone has an extensive performance test
>suite for perlish regular expressions?  (preferrably based
>on how real people use regular expressions, not only on
>things that are known to be slow if not optimized)

Friedl's book describes several optimizations which aren't implemented
in PCRE.  The problem is that PCRE never builds a parse tree, and
parse trees are easy to analyse recursively.  Instead, PCRE's
functions actually look at the compiled byte codes (for example, look
at find_firstchar or is_anchored in pypcre.c), but this makes analysis
functions hard to write, and rearranging the code near-impossible.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I didn't say it was my fault. I said it was my responsibility. I know the
difference.
    -- Rose Walker, in SANDMAN #60: "The Kindly Ones:4"