Regex speed
Reinhold Birkenfeld
reinhold-birkenfeld-nospam at wolke7.net
Fri Oct 29 14:35:28 EDT 2004
A.M. Kuchling wrote:
> On Fri, 29 Oct 2004 20:06:05 +0200,
> Reinhold Birkenfeld <reinhold-birkenfeld-nospam at wolke7.net> wrote:
>> re1 = re.compile(r"\s*<.*>\s*")
>> re2 = re.compile(r".*\((.*)\).*")
>> re3 = re.compile(r'^"(.*)"$')
>
> You should post the actual code, because these substitutions could be made
> more efficient. For example, why are the bracketing \s* in re1 and the
> bracketing .* in re2 there? re3 isn't using re.M, so it's equivalent
> to 'if s.startswith('"') and s.endswith('"')'.
You're right. The sub calls are those:
fro = re1.sub("", fro)
fro = re2.sub(r"\1", fro)
fro = re3.sub(r"\1", fro)
So at least the \s* are justified.
But my actual question is why Perl can run the same regexes in what
seems no time at all.
>> So my question is: Why is the re module implemented in pure Python?
>> Isn't it possible to integrate it into the core or rewrite it in C?
>
> The regex engine *is* implemented in C; look at Modules/_sre.c.
But /usr/lib/python2.3/sre*.py are relatively large for that; what's in
there?
>> Is there a Python interface for the PCRE library out there?
>
> PCRE was used from versions 1.5 up to 2.3; it'll be gone in Python 2.4. You
> could try 'import pre' to use it, but I don't expect it to be significantly
> faster.
You're right again. Is the pre module using the PCRE C library?
Reinhold
--
[Windows ist wie] die Bahn: Man muss sich um nichts kuemmern, zahlt fuer
jede Kleinigkeit einen Aufpreis, der Service ist mies, Fremde koennen
jederzeit einsteigen, es ist unflexibel und zu allen anderen Verkehrs-
mitteln inkompatibel. -- Florian Diesch in dcoulm
More information about the Python-list
mailing list