any chance regular expressions are cached?

Sun Mar 9 23:39:22 EDT 2008

On Mon, 10 Mar 2008 00:42:47 +0000, mh wrote:

> I've got a bit of code in a function like this:
> 
>     s=re.sub(r'\n','\n'+spaces,s)
>     s=re.sub(r'^',spaces,s)
>     s=re.sub(r' *\n','\n',s)
>     s=re.sub(r' *$','',s)
>     s=re.sub(r'\n*$','',s)
> 
> Is there any chance that these will be cached somewhere, and save me the
> trouble of having to declare some global re's if I don't want to have
> them recompiled on each function invocation?

At the interactive interpreter, type "help(re)" [enter]. A page or two 
down, you will see:

    purge()
        Clear the regular expression cache

and looking at the source code I see many calls to _compile() which 
starts off with:

    def _compile(*key):
        # internal: compile pattern
        cachekey = (type(key[0]),) + key
        p = _cache.get(cachekey)
        if p is not None:
            return p

So yes, the re module caches it's regular expressions.

Having said that, at least four out of the five examples you give are 
good examples of when you SHOULDN'T use regexes.

re.sub(r'\n','\n'+spaces,s)

is better written as s.replace('\n', '\n'+spaces). Don't believe me? 
Check this out:

>>> s = 'hello\nworld'
>>> spaces = "   "
>>> from timeit import Timer
>>> Timer("re.sub('\\n', '\\n'+spaces, s)", 
... "import re;from __main__ import s, spaces").timeit()
7.4031901359558105
>>> Timer("s.replace('\\n', '\\n'+spaces)", 
... "import re;from __main__ import s, spaces").timeit()
1.6208670139312744

The regex is nearly five times slower than the simple string replacement.

Similarly:

re.sub(r'^',spaces,s)

is better written as spaces+s, which is nearly eleven times faster.

Also:

re.sub(r' *$','',s)
re.sub(r'\n*$','',s)

are just slow ways of writing s.rstrip(' ') and s.rstrip('\n').

-- 
Steven