[Python-Dev] New regex module for 3.2?

Reid Kleckner reid.kleckner at gmail.com
Thu Jul 22 17:13:53 CEST 2010


On Thu, Jul 22, 2010 at 7:42 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> Am 22.07.2010 14:12, schrieb Nick Coghlan:
>> On Thu, Jul 22, 2010 at 9:34 PM, Georg Brandl <g.brandl at gmx.net> wrote:
>>> So, I thought there wasn't a difference in performance for this use case
>>> (which is compiling a lot of regexes and matching most of them only a
>>> few times in comparison).  However, I found that looking at the regex
>>> caching is very important in this case: re._MAXCACHE is by default set to
>>> 100, and regex._MAXCACHE to 1024.  When I set re._MAXCACHE to 1024 before
>>> running the test suite, I get times around 18 (!) seconds for re.

It might be fun to do a pygments based macro benchmark.  Just have it
syntax highlight itself and time it.

> Sure -- I don't think this is a showstopper for regex.  However if we don't
> include regex in a future version, we might think about increasing MAXCACHE
> a bit, and maybe not clear the cache when it reaches its max length, but
> rather remove another element.

+50 for the last idea.  Collin encountered a problem two summers ago
in Mondrian where we were relying on the regex cache and were
surprised to find that it cleared itself after filling up, rather than
using LRU or random eviction.

Reid


More information about the Python-Dev mailing list