[Python-ideas] re.compile_lazy - on first use compiled regexes

Masklinn masklinn at masklinn.net
Sat Mar 23 14:26:30 CET 2013


On 2013-03-23, at 03:00 , Nick Coghlan wrote:

> On Fri, Mar 22, 2013 at 3:42 PM, Gregory P. Smith <greg at krypto.org> wrote:
>> 
>> On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt
>> <Ronny.Pfannschmidt at gmx.de> wrote:
>>> 
>>> Hi,
>>> 
>>> while reviewing urllib.parse i noticed a pretty ugly pattern
>>> 
>>> many functions had an attached global and in their own code they would
>>> compile an regex on first use and assign it to that global
>>> 
>>> its clear that compiling a regex is expensive, so having them be compiled
>>> later at first use would be of some benefit
>> 
>> 
>> It isn't expensive to do, it is expensive to do repeatedly for no reason.
>> Thus the use of compiled regexes.  Code like this would be better off
>> refactored to reference a precompiled global rather than conditionally check
>> if it needs compiling every time it is called.
> 
> Alternatively, if there are a lot of different regexes, it may be
> better to rely on the implicit cache inside the re module.

Wouldn't it be better if there are *few* different regexes? Since the
module itself caches 512 expressions (100 in Python 2) and does not use
an LRU or other "smart" cache (it just clears the whole cache dict once
the limit is breached as far as I can see), *and* any explicit call to
re.compile will *still* use the internal cache (meaning even going
through re.compile will count against the _MAXCACHE limit), all regex
uses throughout the application (including standard library &al) will
count against the built-in cache and increase the chance of the regex
we want cached to be thrown out no?


More information about the Python-ideas mailing list