Performance on local constants?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sat Dec 22 07:18:09 EST 2007


On Sat, 22 Dec 2007 10:53:39 +0000, William McBrine wrote:

> Hi all,
> 
> I'm pretty new to Python (a little over a month). I was wondering -- is
> something like this:
> 
> s = re.compile('whatever')
> 
> def t(whatnot):
>     return s.search(whatnot)
> 
> for i in xrange(1000):
>     print t(something[i])
> 
> significantly faster than something like this:
> 
> def t(whatnot):
>     s = re.compile('whatever')
>     return s.search(whatnot)
> 
> for i in xrange(1000):
>     result = t(something[i])
> 
> ? Or is Python clever enough to see that the value of s will be the same
> on every call, and thus only compile it once?


Let's find out:


>>> import re
>>> import dis
>>>
>>> def spam(x):
...     s = re.compile('nobody expects the Spanish Inquisition!')
...     return s.search(x)
...
>>> dis.dis(spam)
  2           0 LOAD_GLOBAL              0 (re)
              3 LOAD_ATTR                1 (compile)
              6 LOAD_CONST               1 ('nobody expects the Spanish 
Inquisition!')
              9 CALL_FUNCTION            1
             12 STORE_FAST               1 (s)

  3          15 LOAD_FAST                1 (s)
             18 LOAD_ATTR                2 (search)
             21 LOAD_FAST                0 (x)
             24 CALL_FUNCTION            1
             27 RETURN_VALUE



No, the Python compiler doesn't know anything about regular expression 
objects, so it compiles a call to the RE engine which is executed every 
time the function is called.

However, the re module keeps its own cache, so in fact the regular 
expression itself may only get compiled once regardless.

Here's another approach that avoids the use of a global variable for the 
regular expression:

>>> def spam2(x, s=re.compile('nobody expects the Spanish Inquisition!')):
...     return s.search(x)
...
>>> dis.dis(spam2)
  2           0 LOAD_FAST                1 (s)
              3 LOAD_ATTR                0 (search)
              6 LOAD_FAST                0 (x)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE

What happens now is that the regex is compiled by the RE engine once, at 
Python-compile time, then stored as the default value for the argument s. 
If you don't supply another value for s when you call the function, the 
default regex is used. If you do, the over-ridden value is used instead:

>>> spam2("nothing")
>>> spam2("nothing", re.compile('thing'))
<_sre.SRE_Match object at 0xb7c29c28>


I suspect that this will be not only the fastest solution, but also the 
most flexible.



-- 
Steven



More information about the Python-list mailing list