New implementation of re module

Mark Lawrence breamoreboy at yahoo.co.uk
Tue Jul 28 18:40:02 EDT 2009


MRAB wrote:
> Hi all,
> 
> I've been working on a new implementation of the re module. The details
> are at http://bugs.python.org/issue2636, specifically from
> http://bugs.python.org/issue2636#msg90954. I've included a .pyd file for
> Python 2.6 on Windows if you want to try it out.
> 
> I'm interested in how fast it is generally, compared with the current re
> module, but especially when faced with those 'pathological' regular
> expressions which seem to take a long time to finish, for example:
> 
>     re.search(r"^(.+|D)*A$", "x" * 25 + "B")
> 
> which on my PC (1.8GHz) takes 18.98secs with the re module but <0.01secs 
> with this new implementation.
I tried this on my 3GHz PC timings pretty much the same.

 From here http://bugs.python.org/issue1721518 I knocked up this.

import time
import re
import regex

s = "Add.1, 2020 and Add.1, 2021-2023, 2025, 2028 and 2029 and Add.1) R"
r = "(?:\s|,|and|Add\S*?|Parts?|\([^\)]*\)|[IV\-\d]+)*$"
t0 = time.clock()
print regex.search(r, s)
t1 = time.clock()
print "time", t1 - t0

print "It's going to crash"
t0 = time.clock()
print re.search(r, s)
t1 = time.clock()
print "It hasn't crashed time", t1 - t0

Output shows a slight change in timing:).

<_regex.RE_Match object at 0x0243A1A0>
time 0.00279001940191
It's going to crash
<_sre.SRE_Match object at 0x024396B0>
It hasn't crashed time 98.4238155967

> 
> TIA

I also got the files bm_regex_effbot.py and bm_regex_v8.py from
http://code.google.com/p/unladen-swallow/source/browse/#svn/tests/performance 
and ran them, then reran them having substituted regex for re.  Output 
timings were roughly effbot re 0.14secs, effbot regex 1.16secs, v8 re 
0.17secs and v8 regex 0.67secs.

HTH.

-- 
Kindest regards.

Mark Lawrence.




More information about the Python-list mailing list