[New-bugs-announce] [issue31580] Defer compiling regular expressions

Barry A. Warsaw report at bugs.python.org
Mon Sep 25 22:10:54 EDT 2017


New submission from Barry A. Warsaw:

It's a very common pattern to see the following at module scope:

cre_a = re.compile('some pattern')
cre_b = re.compile('other pattern')

and so on.  This can cost you at start up time because all those regular expressions are compiled at import time, even if they're never used in practice (e.g. because say whatever condition tickles the compiled regex never gets exercised).

It occurred to me that if re.compile() deferred compilation of the regexp until first use, you could speed up start up time.  But by how much?  And at what cost?

So I ran a small experiment (pull request to be submitted) using the `perf` module on `pip --help`.  I was able to cut down the number of compiles from 28 to 9, and a mean startup time from 245ms to 213ms.

% python -m perf compare_to ../base.json ../defer.json 
Mean +- std dev: [base] 245 ms +- 19 ms -> [defer] 213 ms +- 21 ms: 1.15x faster (-13%)

`pip install tox` reduces the compiles from 231 to 75:

(cpython 3.7) 231 0.06945133209228516
(3.7 w/defer)  75 0.03140091896057129

So what's the cost?  Backward compatibility.  `re.compile()` doesn't return a compiled regular expression object now, but instead a "deferred" proxy.  When the proxy is used, then it does the actual compilation.  This can break compatibility by deferring any exceptions that compile() might raise.  This happens a fair bit in the test suite, but I'm not sure it's all that common in practice.  In any case, I've also added a re.IMMEDIATE (re.N -- for "now") flag to force immediate compilation.

I also modified the compilation to use an actual functools.lru_cache.  This way, if maxcache gets triggered, the entire cache won't get blown away.

So, whether this is a good idea or not, I open this and push the branch for further discussion.

----------
assignee: barry
components: Library (Lib)
messages: 302995
nosy: barry
priority: normal
severity: normal
status: open
title: Defer compiling regular expressions
type: performance
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue31580>
_______________________________________


More information about the New-bugs-announce mailing list