Regex speed

A.M. Kuchling amk at amk.ca
Fri Oct 29 14:18:39 EDT 2004


On Fri, 29 Oct 2004 20:06:05 +0200, 
	Reinhold Birkenfeld <reinhold-birkenfeld-nospam at wolke7.net> wrote:
> re1 = re.compile(r"\s*<.*>\s*")
> re2 = re.compile(r".*\((.*)\).*")
> re3 = re.compile(r'^"(.*)"$')

You should post the actual code, because these substitutions could be made
more efficient.  For example, why are the bracketing \s* in re1 and the
bracketing .* in re2 there?  re3 isn't using re.M, so it's equivalent 
to 'if s.startswith('"') and s.endswith('"')'.

> So my question is: Why is the re module implemented in pure Python?
> Isn't it possible to integrate it into the core or rewrite it in C?

The regex engine *is* implemented in C; look at Modules/_sre.c.

> Is there a Python interface for the PCRE library out there?

PCRE was used from versions 1.5 up to 2.3; it'll be gone in Python 2.4.  You
could try 'import pre' to use it, but I don't expect it to be significantly
faster.

--amk



More information about the Python-list mailing list