[pypy-dev] performance benchmark suite

Victor Stinner victor.stinner at gmail.com
Thu Apr 6 05:43:43 EDT 2017


Ok, let's be more concrete: I ran benchmarks with PyPy2 v5.7.1 on the
speed-python server. See attached pypy.json.gz file.

"perf check pypy.json.gz" detected the 10 benchmarks as "unstable".

Let's use pathlib as an example (Mean +- std dev: 28.2 ms +- 5.4 ms).
If you look closer, I confirm that performance is unstable. It seems
like the distribution is multi-modal with 3 ranges around: 21.7 ms,
27.4 ms and  33.1 ms. It's very hard to summarize such distribution
with a single mean or even median value.

I should now check if pathlib becomes more stable if it runs longer.

perf 1.0 now displays results using mean and standard deviation. See
perf doc for the rationale:
http://perf.readthedocs.io/en/latest/analyze.html#statistics


logging_silent
--------------

WARNING: the benchmark result may be unstable
* the shortest raw value is only 9.78 us

pathlib
-------

WARNING: the benchmark result may be unstable
* the standard deviation (5.42 ms) is 19% of the mean (28.2 ms)

regex_compile
-------------

WARNING: the benchmark result may be unstable
* the standard deviation (14.0 ms) is 12% of the mean (120 ms)

scimark_sparse_mat_mult
-----------------------

WARNING: the benchmark result may be unstable
* the standard deviation (19.9 us) is 11% of the mean (188 us)

spambayes
---------

WARNING: the benchmark result may be unstable
* the standard deviation (16.4 ms) is 19% of the mean (85.2 ms)
* the maximum (133 ms) is 56% greater than the mean (85.2 ms)

sqlalchemy_imperative
---------------------

WARNING: the benchmark result may be unstable
* the standard deviation (51.5 ms) is 39% of the mean (134 ms)
* the minimum (42.7 ms) is 68% smaller than the mean (134 ms)
* the maximum (267 ms) is 100% greater than the mean (134 ms)

sympy_integrate
---------------

WARNING: the benchmark result may be unstable
* the standard deviation (21.6 ms) is 14% of the mean (150 ms)

sympy_sum
---------

WARNING: the benchmark result may be unstable
* the standard deviation (19.5 ms) is 13% of the mean (151 ms)

sympy_str
---------

WARNING: the benchmark result may be unstable
* the standard deviation (23.4 ms) is 13% of the mean (174 ms)

xml_etree_process
-----------------

WARNING: the benchmark result may be unstable
* the standard deviation (7.64 ms) is 12% of the mean (62.9 ms)


haypo at selma$ python3 -m perf stats pypy.json.gz -b pathlib -q
Total duration: 32.8 sec
Start date: 2017-04-05 21:16:25
End date: 2017-04-05 21:17:08
Raw value minimum: 169 ms
Raw value maximum: 284 ms

Number of runs: 9
Total number of values: 60
Number of values per run: 10
Number of warmups per run: 10
Loop iterations per value: 8

Minimum:         21.2 ms
Median +- MAD:   29.9 ms +- 4.0 ms
Mean +- std dev: 28.2 ms +- 5.4 ms
Maximum:         35.4 ms

  0th percentile: 21.2 ms (-25% of the mean) -- minimum
  5th percentile: 21.4 ms (-24% of the mean)
 25th percentile: 22.2 ms (-21% of the mean)
 50th percentile: 29.9 ms (+6% of the mean) -- median
 75th percentile: 33.3 ms (+18% of the mean)
 95th percentile: 34.1 ms (+21% of the mean)
100th percentile: 35.4 ms (+26% of the mean) -- maximum


haypo at selma$ python3 -m perf hist pypy.json.gz -b pathlib -q
21.1 ms:  7 ###########################
21.7 ms: 12 ###############################################
22.3 ms:  2 ########
22.8 ms:  3 ############
23.4 ms:  0 |
24.0 ms:  0 |
24.5 ms:  0 |
25.1 ms:  0 |
25.7 ms:  0 |
26.3 ms:  1 ####
26.8 ms:  1 ####
27.4 ms:  4 ################
28.0 ms:  0 |
28.5 ms:  0 |
29.1 ms:  0 |
29.7 ms:  0 |
30.2 ms:  0 |
30.8 ms:  0 |
31.4 ms:  1 ####
32.0 ms:  4 ################
32.5 ms:  5 ####################
33.1 ms: 13 ###################################################
33.7 ms:  5 ####################
34.2 ms:  1 ####
34.8 ms:  0 |
35.4 ms:  1 ####


$ python3 -m perf dump pypy.json.gz -b pathlib -q
Run 4: values (10): 33.1 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%),
22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.9 ms (-19%), 32.4 ms
(+15%), 21.4 ms (-24%), 33.1 ms (+17%)
Run 5: values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.4 ms (+18%),
22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.7 ms (-20%), 32.2 ms
(+14%), 21.3 ms (-24%), 33.3 ms (+18%)
Run 6: values (10): 33.0 ms (+17%), 21.8 ms (-23%), 33.5 ms (+19%),
22.2 ms (-21%), 33.9 ms (+20%), 27.4 ms, 32.0 ms (+13%), 23.0 ms
(-19%), 21.3 ms (-25%), 34.1 ms (+21%)
Run 7: values (10): 32.9 ms (+17%), 21.8 ms (-23%), 33.4 ms (+18%),
22.2 ms (-21%), 34.0 ms (+20%), 27.5 ms, 32.0 ms (+13%), 22.9 ms
(-19%), 21.5 ms (-24%), 34.0 ms (+21%)
Run 8: values (10): 33.0 ms (+17%), 21.9 ms (-22%), 34.2 ms (+21%),
22.2 ms (-21%), 35.4 ms (+26%), 26.7 ms (-5%), 33.1 ms (+17%), 22.2 ms
(-21%), 21.5 ms (-24%), 34.6 ms (+23%)
Run 9: values (10): 33.1 ms (+17%), 21.7 ms (-23%), 33.3 ms (+18%),
22.0 ms (-22%), 33.5 ms (+19%), 27.8 ms, 22.6 ms (-20%), 32.3 ms
(+15%), 21.2 ms (-25%), 33.2 ms (+18%)


haypo at selma$ python3 -m perf dump pypy.json.gz -b pathlib
Run 1: calibrate
- 1 loop: 135 ms (raw: 135 ms)
Run 2: calibrate
- 1 loop: 136 ms (raw: 136 ms)
- 1 loop: 32.2 ms (raw: 32.2 ms)
- 2 loops: 48.2 ms (raw: 96.5 ms)
- 4 loops: 22.6 ms (raw: 90.4 ms)
- 8 loops: 26.8 ms (raw: 214 ms)
- 8 loops: 22.7 ms (raw: 181 ms)
- 8 loops: 29.4 ms (raw: 235 ms)
- 8 loops: 22.7 ms (raw: 182 ms)
- 8 loops: 34.0 ms (raw: 272 ms)
- 8 loops: 22.4 ms (raw: 179 ms)
- 8 loops: 21.7 ms (raw: 174 ms)
- 8 loops: 33.1 ms (raw: 265 ms)
- 8 loops: 21.9 ms (raw: 175 ms)
Run 3: calibrate
- 8 loops: 42.9 ms (raw: 343 ms)
- 8 loops: 26.9 ms (raw: 215 ms)
- 8 loops: 22.3 ms (raw: 179 ms)
- 8 loops: 29.3 ms (raw: 235 ms)
- 8 loops: 22.9 ms (raw: 183 ms)
- 8 loops: 31.2 ms (raw: 250 ms)
- 8 loops: 23.9 ms (raw: 191 ms)
- 8 loops: 21.8 ms (raw: 174 ms)
- 8 loops: 32.5 ms (raw: 260 ms)
- 8 loops: 21.8 ms (raw: 174 ms)
Run 4: warmups (10): 42.3 ms (+50%), 26.8 ms, 22.2 ms (-21%), 29.4 ms,
22.5 ms (-20%), 30.8 ms (+9%), 23.8 ms (-16%), 21.8 ms (-23%), 32.4 ms
(+15%), 21.7 ms (-23%); values (10): 33.1 ms (+17%), 21.8 ms (-23%),
33.5 ms (+19%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.9 ms
(-19%), 32.4 ms (+15%), 21.4 ms (-24%), 33.1 ms (+17%)
Run 5: warmups (10): 42.2 ms (+50%), 26.9 ms, 22.6 ms (-20%), 29.6 ms,
22.7 ms (-19%), 31.3 ms (+11%), 23.7 ms (-16%), 22.2 ms (-21%), 32.6
ms (+16%), 21.7 ms (-23%); values (10): 33.1 ms (+17%), 21.7 ms
(-23%), 33.4 ms (+18%), 22.2 ms (-21%), 33.5 ms (+19%), 27.8 ms, 22.7
ms (-20%), 32.2 ms (+14%), 21.3 ms (-24%), 33.3 ms (+18%)
Run 6: warmups (10): 42.3 ms (+50%), 26.8 ms (-5%), 22.2 ms (-21%),
29.2 ms, 22.3 ms (-21%), 30.6 ms (+8%), 23.9 ms (-15%), 21.6 ms
(-24%), 32.5 ms (+15%), 21.5 ms (-24%); values (10): 33.0 ms (+17%),
21.8 ms (-23%), 33.5 ms (+19%), 22.2 ms (-21%), 33.9 ms (+20%), 27.4
ms, 32.0 ms (+13%), 23.0 ms (-19%), 21.3 ms (-25%), 34.1 ms (+21%)
Run 7: warmups (10): 42.5 ms (+51%), 26.8 ms, 22.2 ms (-21%), 29.3 ms,
22.7 ms (-19%), 31.1 ms (+10%), 23.8 ms (-16%), 21.7 ms (-23%), 32.3
ms (+15%), 21.6 ms (-23%); values (10): 32.9 ms (+17%), 21.8 ms
(-23%), 33.4 ms (+18%), 22.2 ms (-21%), 34.0 ms (+20%), 27.5 ms, 32.0
ms (+13%), 22.9 ms (-19%), 21.5 ms (-24%), 34.0 ms (+21%)
Run 8: warmups (10): 43.4 ms (+54%), 26.9 ms, 22.5 ms (-20%), 29.8 ms
(+5%), 22.9 ms (-19%), 33.4 ms (+19%), 23.2 ms (-18%), 22.0 ms (-22%),
32.2 ms (+14%), 21.8 ms (-23%); values (10): 33.0 ms (+17%), 21.9 ms
(-22%), 34.2 ms (+21%), 22.2 ms (-21%), 35.4 ms (+26%), 26.7 ms (-5%),
33.1 ms (+17%), 22.2 ms (-21%), 21.5 ms (-24%), 34.6 ms (+23%)
Run 9: warmups (10): 42.2 ms (+50%), 27.3 ms, 22.6 ms (-20%), 29.4 ms,
22.3 ms (-21%), 30.8 ms (+9%), 23.7 ms (-16%), 21.4 ms (-24%), 32.9 ms
(+16%), 21.8 ms (-23%); values (10): 33.1 ms (+17%), 21.7 ms (-23%),
33.3 ms (+18%), 22.0 ms (-22%), 33.5 ms (+19%), 27.8 ms, 22.6 ms
(-20%), 32.3 ms (+15%), 21.2 ms (-25%), 33.2 ms (+18%)

WARNING: the benchmark result may be unstable
* the standard deviation (5.42 ms) is 19% of the mean (28.2 ms)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m perf system tune' command to reduce the system jitter.
Use perf stats, perf dump and perf hist to analyze results.
Use --quiet option to hide these warnings.

Victor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pypy.json.gz
Type: application/x-gzip
Size: 110568 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20170406/b54872bb/attachment-0001.bin>


More information about the pypy-dev mailing list