[Speed] Median +- MAD or Mean +- std dev?

Wed Mar 15 20:38:14 EDT 2017

Hi All,

I am attaching an image with comparison running the CALL_METHOD in the old Grand Unified Python Benchmark (GUPB) suite (https://hg.python.org/benchmarks), with and without ASLR disabled.
You could see the run2run variation was reduced significantly, from data scattering all over the place, to just one single outlier, out of 30 repeated runs.
This effectively eliminated most of the variations for this micro-benchmark.

On a Linux system, you could do this by:
as root
echo 0 > /proc/sys/kernel/randomize_va_space   # to disable
echo 2 > /proc/sys/kernel/randomize_va_space   # to enable

If anyone still experiences run2run variation, I'd suggest to read on:
Based on my observation in our labs, a lot of factors could impact performance, including environment (yes, even a room temperature), HW components or related such as platforms, chipset, memory DIMMs, CPU generations and stepping, BIOS version, kernels, the list goes on and on.

Being said that, would it be helpful we work together, to identify the root cause, be it due to SW, or anything else?  We could start with a specific micro-benchmark, with specific goal as what to measure.
After that, or in parallel after some baseline work is done, then focus on measurement process/methodology?  

Is this helpful?

Thanks,

Peter

-----Original Message-----
From: Speed [mailto:speed-bounces+peter.xihong.wang=intel.com at python.org] On Behalf Of Victor Stinner
Sent: Wednesday, March 15, 2017 11:11 AM
To: Antoine Pitrou <solipsis at pitrou.net>
Cc: speed at python.org
Subject: Re: [Speed] Median +- MAD or Mean +- std dev?

2017-03-15 18:11 GMT+01:00 Antoine Pitrou <solipsis at pitrou.net>:
> I would say keep it simple.  mean/stddev is informative enough, no 
> need to add or maintain options of dubious utility.

Ok. I added a message to suggest to use perf stats to analyze results.

Example of warnings with a benchmark result considered as unstable, python startup time measured by the new bench_command() function:
---
$ python3 -m perf show startup1.json
WARNING: the benchmark result may be unstable
* the standard deviation (6.08 ms) is 16% of the mean (39.1 ms)
* the minimum (23.6 ms) is 40% smaller than the mean (39.1 ms)

Try to rerun the benchmark with more runs, values and/or loops.
Run 'python3 -m perf system tune' command to reduce the system jitter.
Use perf stats to analyze results, or --quiet to hide warnings.

Median +- MAD: 40.7 ms +- 3.9 ms
----

Statistics of this result:
---
$ python3 -m perf stats startup1.json -q Total duration: 37.2 sec Start date: 2017-03-15 18:02:46 End date: 2017-03-15 18:03:27 Raw value minimum: 189 ms Raw value maximum: 390 ms

Number of runs: 25
Total number of values: 75
Number of values per run: 3
Number of warmups per run: 1
Loop iterations per value: 8

Minimum: 23.6 ms (-42% of the median)
Median +- MAD: 40.7 ms +- 3.9 ms
Mean +- std dev: 39.1 ms +- 6.1 ms
Maximum: 48.7 ms (+20% of the median)
---

Victor
_______________________________________________
Speed mailing list
Speed at python.org
https://mail.python.org/mailman/listinfo/speed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ASLR_disabled_enabled_comparison.jpg
Type: image/jpeg
Size: 79494 bytes
Desc: ASLR_disabled_enabled_comparison.jpg
URL: <http://mail.python.org/pipermail/speed/attachments/20170316/b1a565a4/attachment-0001.jpg>