Regex speed

Fri Oct 29 14:35:28 EDT 2004

A.M. Kuchling wrote:
> On Fri, 29 Oct 2004 20:06:05 +0200, 
> 	Reinhold Birkenfeld <reinhold-birkenfeld-nospam at wolke7.net> wrote:
>> re1 = re.compile(r"\s*<.*>\s*")
>> re2 = re.compile(r".*\((.*)\).*")
>> re3 = re.compile(r'^"(.*)"$')
> 
> You should post the actual code, because these substitutions could be made
> more efficient.  For example, why are the bracketing \s* in re1 and the
> bracketing .* in re2 there?  re3 isn't using re.M, so it's equivalent 
> to 'if s.startswith('"') and s.endswith('"')'.

You're right. The sub calls are those:

    fro = re1.sub("", fro)
    fro = re2.sub(r"\1", fro)
    fro = re3.sub(r"\1", fro)

So at least the \s* are justified.

But my actual question is why Perl can run the same regexes in what
seems no time at all.

>> So my question is: Why is the re module implemented in pure Python?
>> Isn't it possible to integrate it into the core or rewrite it in C?
> 
> The regex engine *is* implemented in C; look at Modules/_sre.c.

But /usr/lib/python2.3/sre*.py are relatively large for that; what's in
there?

>> Is there a Python interface for the PCRE library out there?
> 
> PCRE was used from versions 1.5 up to 2.3; it'll be gone in Python 2.4.  You
> could try 'import pre' to use it, but I don't expect it to be significantly
> faster.

You're right again. Is the pre module using the PCRE C library?

Reinhold

-- 
[Windows ist wie] die Bahn: Man muss sich um nichts kuemmern, zahlt fuer
jede Kleinigkeit einen Aufpreis, der Service ist mies, Fremde koennen
jederzeit einsteigen, es ist unflexibel und zu allen anderen Verkehrs-
mitteln inkompatibel.               -- Florian Diesch in dcoulm