OK to memoize re objects?

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Mon Sep 21 22:29:49 EDT 2009


On Mon, 21 Sep 2009 13:33:05 +0000, kj wrote:

> I find the docs are pretty confusing on this point.  They first make the
> point of noting that pre-compiling regular expressions is more
> efficient, and then *immediately* shoot down this point by saying that
> one need not worry about pre-compiling in most cases. From the docs:
> 
>     ...using compile() and saving the resulting regular expression
>     object for reuse is more efficient when the expression will be used
>     several times in a single program.
> 
>     Note: The compiled versions of the most recent patterns passed to
>     re.match(), re.search() or re.compile() are cached, so programs that
>     use only a few regular expressions at a time needn't worry about
>     compiling regular expressions.
> 
> Honestly I don't know what to make of this...  I would love to see an
> example in which re.compile was unequivocally preferable, to really
> understand what the docs are saying here...

I find it entirely understandable. If you have only a few regexes, then 
there's no need to pre-compile them yourself, because the re module 
caches them. Otherwise, don't rely on the cache -- it may help, or it may 
not, no promises are made.

The nature of the cache isn't explained because it is an implementation 
detail. As it turns out, the current implementation is a single cache in 
the re module, so every module "import re" shares the one cache. The 
cache is also completely emptied if it exceeds a certain number of 
objects, so the cache may be flushed at arbitrary times out of your 
control. Or it might not.



-- 
Steven



More information about the Python-list mailing list