re.match() performance
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Thu Dec 18 20:48:59 EST 2008
On Thu, 18 Dec 2008 05:51:33 -0800, Emanuele D'Arrigo wrote:
> I've written the code below to test the differences in performance
...
> ## TIMED FUNCTIONS
> startTime = time.clock()
> for i in range(0, numberOfRuns):
> re.match(pattern, longMessage)
> patternMatchingTime = time.clock() - startTime
...
You probably don't need to re-invent the wheel. See the timeit module. In
my opinion, the best idiom for timing small code snippets is:
from timeit import Timer
t = Timer("func(arg)", "from __main__ import func, arg")
time_taken = min(t.repeat(number=N))/N
where N will depend on how patient you are, but probably shouldn't be
less than 100. For small enough code snippets, the default of 1000000 is
recommended.
For testing re.match, I didn't have enough patience for one million
iterations, so I used ten thousand.
My results were:
>>> t1 = Timer("re.match(pattern, longMessage)",
... "from __main__ import pattern, re, compiledPattern, longMessage")
>>> t2 = Timer("compiledPattern.match(longMessage)",
... "from __main__ import pattern, re, compiledPattern, longMessage")
>>> t1.repeat(number=10000)
[3.8806509971618652, 3.4309241771697998, 4.2391560077667236]
>>> t2.repeat(number=10000)
[3.5613579750061035, 2.725193977355957, 2.936690092086792]
which were typical over a few runs. That suggests that even with no
effort made to defeat caching, using pre-compiled patterns is
approximately 20% faster than re.match(pattern).
However, over 100,000 iterations that advantage falls to about 10%. Given
that each run took about 30 seconds, I suspect that the results are being
contaminated with some other factor, e.g. networking events or other
processes running in the background. But whatever is going on, 10% or
20%, pre-compiled patterns are slightly faster even with caching --
assuming of course that you don't count the time taken to compile it in
the first place.
--
Steven
More information about the Python-list
mailing list