[issue31580] Defer compiling regular expressions

Barry A. Warsaw report at bugs.python.org
Tue Sep 26 10:34:30 EDT 2017


Barry A. Warsaw added the comment:

Let's separate the use of lru_cache from the deferred compilation.  I think I'll just revert the change to use lru_cache, although I'll note that the impetus for this was the observation that once MAXCACHE is reached the entire regexp cache is purged.  That seems suboptimal and my guess is that it was done because the current cache is just a dictionary so there's no good way to partially purge it.  My thought there was maybe to use an OrderedDictionary, but I'm not sure the complexity is worth it.  We can evaluate that separately.

I'm not sure RDB and Raymond noticed the addition of the re.IMMEDIATE flag.  That's exactly the way you would say "compile this regexp right now", so that option is absolutely not taken away!  My claim is that most regexps at module scope do *not* need to be compiled at import time, and doing so is a waste of resources.  For cases where you really need it, you have it.

I did notice the warnings problem and mostly glossed over it, but for this patch to become "real" we'd have to try to restore that.

The other thought I had was this: if we can observe that most module scope re.compiles() are essentially constants, then maybe the compiler/peephole/PEP511 can help here.  It could recognize constant arguments to re.compile() and precompile them.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue31580>
_______________________________________


More information about the Python-bugs-list mailing list