Regex speed
Andrew Dalke
adalke at mindspring.com
Fri Oct 29 23:57:23 EDT 2004
Reinhold Birkenfeld wrote:
> re1 = re.compile(r"\s*<.*>\s*")
> re2 = re.compile(r".*\((.*)\).*")
> re3 = re.compile(r'^"(.*)"$')
BTW, do you want those or
re1 = re.compile(r"\s*<[^>]*>\s*")
re2 = re.compile(r".*\(([^)]*)\).*")
(For the last it doesn't make much difference. There will only be
a single backtrack.)
For that matter, what about
re2 = re.compile(r"\(([^)]*)\)")
then using re2.search instead of re2.match?
> So my question is: Why is the re module implemented in pure Python?
> Isn't it possible to integrate it into the core or rewrite it in C?
It isn't. It's written in C. I've not done timing tests
between Perl and Python's engines for a long time, so I can't
provide feedback on that aspect.
One thing about Python is that we tend to use regexps less
often than Perl. For example, you may be able to use
def find_text_in_matching_pairs(text, start_c = "<", end_c = ">"):
i = text.find(start_c)
if i == -1:
return None
j = text.find(end_c, i)
if j == -1:
return None
return text[i+i:j]
(If you instead what your original regexp says, use
def find_text_in_matching_pairs(text, start_c = "<", end_c = ">"):
i = text.find(start_c)
if i == -1:
return None
j = text.rfind(end_c)
if j < i: # includes 'j == -1' on find failure
return None
return text[i+1:j]
def find1(text):
return find_text_in_matching_pairs(text, "<", ">")
def find2(text):
return find_text_in_matching_pairs(text, "(", ")")
def find3(text):
if text.startswith('"') and text.endswith('"'):
return text[1:-1]
return None
> Is there a Python interface for the PCRE library out there?
Python used to use PCRE instead of its current sre, back
in the 1.5 days. Python 1.6/2.x switched to sre in part
because of the need for Unicode support.
The old benchmarks compared pcre and sre and found that
sre was faster. See
http://groups.google.com/groups?oi=djq&selm=an_588925502
Which versions of Python and Perl are you using for
the tests? I know there has been some non-trivial work
for the 2.3 version of Python.
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list