[Speed] New CPython benchmark suite based on perf

Mon Jul 4 11:08:06 EDT 2016

2016-07-04 16:17 GMT+02:00 Victor Stinner <victor.stinner at gmail.com>:
> I modified the CPython benchmark suite to use my perf module:
> https://hg.python.org/sandbox/benchmarks_perf

Hum, you need the development version of perf to test it:

   git clone https://github.com/haypo/perf.git

> Changes:
>
> * replace explicit warmups with perf automatic warmup
> (...)
> * avoid nested loops, prefer a single level of loop: perf is
> responsible to call the sample function enough times to collect enough
> samples

Concrete example with performance/bm_go.py.

Before:
-------------------------
def main(n, timer):
    times = []
    for i in range(5):
        versus_cpu() # warmup
    for i in range(n):
        t1 = timer()
        versus_cpu()
        t2 = timer()
        times.append(t2 - t1)
    return times
-------------------------

After:
-------------------------
def main(loops):
    t0 = perf.perf_counter()

    for _ in xrange(loops):
        versus_cpu()

    return perf.perf_counter() - t0
-------------------------

Example of go benchmark output:
---
$ python3 benchmarks_perf/performance/bm_go.py -v
calibration: 1 loop: 599 ms
calibration: use 1 loop
Run 1/25: warmup (1): 601 ms; raw samples (3): 593 ms, 593 ms, 593 ms
Run 2/25: warmup (1): 609 ms; raw samples (3): 609 ms, 610 ms, 608 ms
Run 3/25: warmup (1): 599 ms; raw samples (3): 598 ms, 606 ms, 598 ms
(...)
Run 25/25: warmup (1): 606 ms; raw samples (3): 591 ms, 590 ms, 591 ms

Median +- std dev: 598 ms +- 8 ms
---

The warmup samples ("warmup (1): ... ms") are not used to compute
median or std dev.

Another example to show fancy features of perf:
---
$ python3 benchmarks_perf/performance/bm_telco.py -v --hist --stats
--metadata -n5 -p50
calibration: 1 loop: 34.6 ms
calibration: 2 loops: 57.8 ms
calibration: 4 loops: 105 ms
calibration: use 4 loops
Run 1/50: warmup (1): 116 ms; raw samples (5): 106 ms, 106 ms, 105 ms,
106 ms, 106 ms
Run 2/50: warmup (1): 107 ms; raw samples (5): 107 ms, 107 ms, 106 ms,
106 ms, 106 ms
Run 3/50: warmup (1): 107 ms; raw samples (5): 106 ms, 106 ms, 106 ms,
106 ms, 106 ms
(...)
Run 50/50: warmup (1): 106 ms; raw samples (5): 104 ms, 105 ms, 105
ms, 106 ms, 105 ms

Metadata:
- aslr: enabled
- cpu_count: 4
- cpu_model_name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
- date: 2016-07-04T17:00:33
- description: Test the performance of the Telco decimal benchmark
- duration: 35.6 sec
- hostname: smithers
- name: telco
- perf_version: 0.6
- platform: Linux-4.5.7-300.fc24.x86_64-x86_64-with-fedora-24-Twenty_Four
- python_executable: /usr/bin/python3
- python_implementation: cpython
- python_version: 3.5.1 (64bit)
- timer: clock_gettime(CLOCK_MONOTONIC), resolution: 1.00 ns

25.8 ms:  1 ##
25.9 ms:  2 #####
26.0 ms:  4 ##########
26.0 ms: 13 ###############################
26.1 ms: 27 #################################################################
26.2 ms: 28 ###################################################################
26.3 ms: 21 ##################################################
26.3 ms: 25 ############################################################
26.4 ms: 32 #############################################################################
26.5 ms: 33 ###############################################################################
26.6 ms: 18 ###########################################
26.6 ms: 13 ###############################
26.7 ms:  8 ###################
26.8 ms:  8 ###################
26.8 ms:  7 #################
26.9 ms:  4 ##########
27.0 ms:  4 ##########
27.1 ms:  1 ##
27.1 ms:  0 |
27.2 ms:  0 |
27.3 ms:  1 ##

Number of samples: 250 (50 runs x 5 samples; 1 warmup)
Standard deviation / median: 1%
Shortest raw sample: 103 ms (4 loops)

Minimum: 25.9 ms (-2.1%)
Median +- std dev: 26.4 ms +- 0.2 ms
Maximum: 27.3 ms (+3.4%)

Median +- std dev: 26.4 ms +- 0.2 ms
---
I used " -n5 -p50" to compute 5 samples per process and use 50
processes. It helps to get a nicer histogram :-) (to have a better
uniform distribution) For histogram, I like using telco because it
generates a regular gaussian curve :-)

Victor