[Python-Dev] New regex module for 3.2?

Georg Brandl g.brandl at gmx.net
Thu Jul 22 16:42:47 CEST 2010


Am 22.07.2010 14:12, schrieb Nick Coghlan:
> On Thu, Jul 22, 2010 at 9:34 PM, Georg Brandl <g.brandl at gmx.net> wrote:
>> So, I thought there wasn't a difference in performance for this use case
>> (which is compiling a lot of regexes and matching most of them only a
>> few times in comparison).  However, I found that looking at the regex
>> caching is very important in this case: re._MAXCACHE is by default set to
>> 100, and regex._MAXCACHE to 1024.  When I set re._MAXCACHE to 1024 before
>> running the test suite, I get times around 18 (!) seconds for re.
> 
> That still fits with the compile/match performance trade-off changes
> between re and regex though. It does make it clear this isn't going to
> be a win across the board though - things like test suites are going
> to have more one-off regex operations than a long-running web server
> or a filesystem or database scanning operation.

Sure -- I don't think this is a showstopper for regex.  However if we don't
include regex in a future version, we might think about increasing MAXCACHE
a bit, and maybe not clear the cache when it reaches its max length, but
rather remove another element.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



More information about the Python-Dev mailing list