[Speed] Latest enhancements of perf 0.8.1 and performance 0.3.1

Victor Stinner victor.stinner at gmail.com
Wed Oct 19 12:55:14 EDT 2016


Hi,

Latest enhancements:

1) perf supports multiple benchmarks per script
2) perf calibrates the benchmark in a dedicated process
3) new --duplication option to perf timeit

(1)

I solved an old limitaton of my perf module: since perf 0.8, it's now
possible to run multiple benchmarks in a simple script. Example of
script:
--
runner = perf.Runner()
runner.bench_func('dict.get', dict_get, dico, keys)
runner.bench_func('try/except', try_except, dico, keys)
---

The runner spawns N process for the first benchmark + N process for
the second benchmark. The trick is to pass an option --worker-task
option to the worker process. In the worker, bench_func() does nothing
(return None) if the "worker task" counter doesn't match.

I rewrote the API in perf 0.8 to simpliy it and fix design issues. It
should be the latest large API change before perf 1.0 which is
expected before the end of the year.

(2)

Simple change but important one to enhance a little bit more the
reliability of benchmarks, especially on Python implementations having
a JIT: benchmark calibration is now done in a dedicated process.

All worker processes (computing samples) should now run exactly the
same workload. Before, the first worker was different, it ran a few
more iterations, because of the calibration.

(3)

Another simple but useful change for the shortest microbenchmarks: I
added a --duplicate option to timeit. Example:
------
$ python3 -m perf timeit -s 'x=1;y=2' 'x+y'
.....................
Median +- std dev: 26.7 ns +- 2.0 ns

$ python3 -m perf timeit -s 'x=1;y=2' 'x+y' --duplicate=1000
.....................
Median +- std dev: 19.2 ns +- 0.4 ns
------

Duplicating the statement 1000x reduces the cost of the outer loop by 28%.

FYI I wrote this feature to help me to take a decision on the old and
dummy "1+1" optimization for Python 3:
https://bugs.python.org/issue21955 (issue open since July 2014).
(Spoiler: I plan to collect benchmark results to explain that the
micro optimization is useful, but I'm not 100% sure that it's useless
right now :-))

Victor


More information about the Speed mailing list