New implementation of re module
Mark Lawrence
breamoreboy at yahoo.co.uk
Tue Jul 28 18:40:02 EDT 2009
MRAB wrote:
> Hi all,
>
> I've been working on a new implementation of the re module. The details
> are at http://bugs.python.org/issue2636, specifically from
> http://bugs.python.org/issue2636#msg90954. I've included a .pyd file for
> Python 2.6 on Windows if you want to try it out.
>
> I'm interested in how fast it is generally, compared with the current re
> module, but especially when faced with those 'pathological' regular
> expressions which seem to take a long time to finish, for example:
>
> re.search(r"^(.+|D)*A$", "x" * 25 + "B")
>
> which on my PC (1.8GHz) takes 18.98secs with the re module but <0.01secs
> with this new implementation.
I tried this on my 3GHz PC timings pretty much the same.
From here http://bugs.python.org/issue1721518 I knocked up this.
import time
import re
import regex
s = "Add.1, 2020 and Add.1, 2021-2023, 2025, 2028 and 2029 and Add.1) R"
r = "(?:\s|,|and|Add\S*?|Parts?|\([^\)]*\)|[IV\-\d]+)*$"
t0 = time.clock()
print regex.search(r, s)
t1 = time.clock()
print "time", t1 - t0
print "It's going to crash"
t0 = time.clock()
print re.search(r, s)
t1 = time.clock()
print "It hasn't crashed time", t1 - t0
Output shows a slight change in timing:).
<_regex.RE_Match object at 0x0243A1A0>
time 0.00279001940191
It's going to crash
<_sre.SRE_Match object at 0x024396B0>
It hasn't crashed time 98.4238155967
>
> TIA
I also got the files bm_regex_effbot.py and bm_regex_v8.py from
http://code.google.com/p/unladen-swallow/source/browse/#svn/tests/performance
and ran them, then reran them having substituted regex for re. Output
timings were roughly effbot re 0.14secs, effbot regex 1.16secs, v8 re
0.17secs and v8 regex 0.67secs.
HTH.
--
Kindest regards.
Mark Lawrence.
More information about the Python-list
mailing list