[Speed] Performance comparison of regular expression engines

Mon Mar 14 10:27:08 EDT 2016

On Sun, 13 Mar 2016 17:44:10 +0000
Brett Cannon <brett at python.org> wrote:
> >
> > 2. One iteration of all searches on full text takes 29 seconds on my
> > computer. Isn't this too long? In any case I want first optimize some
> > bottlenecks in the re module.
> >
> 
> I don't think we have established a "too long" time. We do have some
> benchmarks like spectral_norm that don't run unless you use rigorous mode
> and this could be one of them.
> 
> > 3. Do we need one benchmark that gives an accumulated time of all
> > searches, or separate microbenchmarks for every pattern?
> 
> I don't care either way. Obviously it depends on whether you want to
> measure overall re perf and have people aim to improve that or let people
> target specific workload types.

This is a more general latent issue with our current benchmarking
philosophy.  We have built something which aims to be a general-purpose
benchmark suite, but in some domains a more comprehensive set of
benchmarks may be desirable.  Obviously we don't want to have 10 JSON
benchmarks, 10 re benchmarks, 10 I/O benchmarks, etc. in the default
benchmarks run, so what do we do for such cases?  Do we tell people
domain-specific benchmarks should be developed independently?  Do we
include some facilities to create such subsuites without them being
part of the default bunch?

(note a couple domain-specific benchmarks -- iobench, stringbench, etc.
-- are currently maintained separately)

Regards

Antoine.