[Speed] Performance comparison of regular expression engines
Serhiy Storchaka
storchaka at gmail.com
Sat Mar 12 13:16:24 EST 2016
On 07.03.16 19:19, Brett Cannon wrote:
> Are you thinking about turning all of this into a benchmark for the
> benchmark suite?
This was my purpose. I first had written a benchmark for the benchmark
suite, then I became interested in more detailed results and a
comparison with alternative engines.
There are several questions about a benchmark for the benchmark suite.
1. Input data is public 20MB text (8MB in ZIP file). Should we download
it every time (may be with caching) or add it to the repository?
2. One iteration of all searches on full text takes 29 seconds on my
computer. Isn't this too long? In any case I want first optimize some
bottlenecks in the re module.
3. Do we need one benchmark that gives an accumulated time of all
searches, or separate microbenchmarks for every pattern?
4. Would be nice to use the same benchmark for comparing different
regular expression. This requires changing perf.py. May be we could use
the same interface to compare ElementTree with lxml and json with
simplejson.
5. Patterns are ASCII-only and the text is mostly ASCII. Would be nice
to add non-ASCII pattern and non-ASCII text. But this will increase run
time.
More information about the Speed
mailing list