Curious to see alternate approach on a search/replace via regex

Ian Kelly ian.g.kelly at gmail.com
Thu Feb 7 20:08:00 EST 2013


On Thu, Feb 7, 2013 at 5:55 PM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> Whatever caching is being done by re.compile, that's still a 24%
> savings by moving the compile calls into the setup.

On the other hand, if you add an re.purge() call to the start of t1 to
clear the cache:

>>> t3 = Timer("""
... re.purge()
... nx = re.compile(r'https?://(.+)$')
... v = nx.search(u).group(1)
... ux = re.compile(r'([-:./?&=]+)')
... ux.sub('_', v)""", """
... import re
... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> min(t3.repeat(number=10000))
3.5532990924824617

Which is approximately 30 times slower, so clearly the regular
expression *is* being cached.  I think what we're seeing here is that
the time needed to look up the compiled regular expression in the
cache is a significant fraction of the time needed to actually execute
it.



More information about the Python-list mailing list