Regex speed

Reinhold Birkenfeld reinhold-birkenfeld-nospam at wolke7.net
Sat Oct 30 05:16:52 EDT 2004


Andrew Dalke wrote:
> Reinhold Birkenfeld wrote:
>> re1 = re.compile(r"\s*<.*>\s*")
>> re2 = re.compile(r".*\((.*)\).*")
>> re3 = re.compile(r'^"(.*)"$')
> 
> BTW, do you want those or
>   re1 = re.compile(r"\s*<[^>]*>\s*")
>   re2 = re.compile(r".*\(([^)]*)\).*")
> 
> (For the last it doesn't make much difference.  There will only be
> a single backtrack.)
> 
> For that matter, what about
>   re2 = re.compile(r"\(([^)]*)\)")
> then using re2.search instead of re2.match?
> 
>> So my question is: Why is the re module implemented in pure Python?
>> Isn't it possible to integrate it into the core or rewrite it in C?
> 
> It isn't.  It's written in C.  I've not done timing tests
> between Perl and Python's engines for a long time, so I can't
> provide feedback on that aspect.
> 
> One thing about Python is that we tend to use regexps less
> often than Perl.  For example, you may be able to use
> 
> def find_text_in_matching_pairs(text, start_c = "<", end_c = ">"):
>    i = text.find(start_c)
>    if i == -1:
>      return None
>    j = text.find(end_c, i)
>    if j == -1:
>      return None
>    return text[i+i:j]

OK, thank you. I now got rid of all the regexes, and - surprise,
surprise - the speeds are almost equal. The bitter thing about it is
that there are now twelve LOC more in Python that don't make
understanding the code easier.

So the Perl regex engine seems to be highly optimized, at least for
simple expressions.

>> Is there a Python interface for the PCRE library out there?
> 
> Python used to use PCRE instead of its current sre, back
> in the 1.5 days.  Python 1.6/2.x switched to sre in part
> because of the need for Unicode support.
> 
> The old benchmarks compared pcre and sre and found that
> sre was faster.  See
>    http://groups.google.com/groups?oi=djq&selm=an_588925502
> 
> Which versions of Python and Perl are you using for
> the tests?  I know there has been some non-trivial work
> for the 2.3 version of Python.

I used 2.3.4.

Reinhold

-- 
[Windows ist wie] die Bahn: Man muss sich um nichts kuemmern, zahlt fuer
jede Kleinigkeit einen Aufpreis, der Service ist mies, Fremde koennen
jederzeit einsteigen, es ist unflexibel und zu allen anderen Verkehrs-
mitteln inkompatibel.               -- Florian Diesch in dcoulm



More information about the Python-list mailing list