[Speed] Performance comparison of regular expression engines

Sat Mar 12 13:16:24 EST 2016

On 07.03.16 19:19, Brett Cannon wrote:
> Are you thinking about turning all of this into a benchmark for the
> benchmark suite?

This was my purpose. I first had written a benchmark for the benchmark 
suite, then I became interested in more detailed results and a 
comparison with alternative engines.

There are several questions about a benchmark for the benchmark suite.

1. Input data is public 20MB text (8MB in ZIP file). Should we download 
it every time (may be with caching) or add it to the repository?

2. One iteration of all searches on full text takes 29 seconds on my 
computer. Isn't this too long? In any case I want first optimize some 
bottlenecks in the re module.

3. Do we need one benchmark that gives an accumulated time of all 
searches, or separate microbenchmarks for every pattern?

4. Would be nice to use the same benchmark for comparing different 
regular expression. This requires changing perf.py. May be we could use 
the same interface to compare ElementTree with lxml and json with 
simplejson.

5. Patterns are ASCII-only and the text is mostly ASCII. Would be nice 
to add non-ASCII pattern and non-ASCII text. But this will increase run 
time.