[Speed] Disable hash randomization to get reliable benchmarks

Mon Apr 25 02:25:20 EDT 2016

Hi Victor

The problem with disabled ASLR is that you change the measurment from
a statistical distribution, to one draw from a statistical
distribution repeatedly. There is no going around doing multiple runs
and doing an average on that. Essentially for the same reason why
using min is much worse than using average, with ASLR say you get:
2.0+-0.3 which we run 5 times and 1.8, 1.9, 2.2, 2.1, 2.1 now if you
disable ASLR, you get one draw repeated 5 times, which might be 2.0,
but also might be 1.8, 5 times. That just hides the problem, but does
not actually fix it (because if you touch something, stuff might be
allocated in a different order and then you get a different draw)

On Mon, Apr 25, 2016 at 12:49 AM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> Last months, I spent a lot of time on microbenchmarks. Probably too
> much time :-) I found a great Linux config to get a much more stable
> system to get reliable microbenchmarks:
> https://haypo-notes.readthedocs.org/microbenchmark.html
>
> * isolate some CPU cores
> * force CPU to performance
> * disable ASLR
> * block IRQ on isolated CPU cores
>
> With such Linux config, the system load doesn't impact benchmark results at all.
>
> Last days, I almost lost my mind trying to figure out why a very tiny
> change in C code makes a difference up to 8% slower.
>
> My main issue was to get reliable benchmark since running the same
> microbenchmark using perf.py gave me "random" results.
>
> I finished to run directly the underlying script bm_call_simple.py:
>
> taskset -c 7 ./python ../benchmarks/performance/bm_call_simple.py -n 5
> --timer perf_counter
>
> In a single run, timings of each loop iteration is very stable. Example:
>
> 0.22682707803323865
> 0.22741253697313368
> 0.227521265973337
> 0.22750743699725717
> 0.22752994997426867
> 0.22753606992773712
> 0.22742654103785753
> 0.22750875598285347
> 0.22752253606449813
> 0.22718404198531061
>
> Problem: each new run gives a different result. Example:
>
> * run 1: 0.226...
> * run 2: 0.255...
> * run 3: 0.248...
> * run 4: 0.258...
> * etc.
>
> I saw 3 groups of values: ~0.226, ~0.248, ~0.255.
>
> I didn't understand how running the same program can give so different
> result. The reply is the randomization of the Python hash function.
> Aaaaaaah! The last source of entropy in my microbenchmark!
>
> The performance difference can be seen by forcing a specific hash function:
>
> PYTHONHASHSEED=2 => 0.254...
> PYTHONHASHSEED=1 => 0.246...
> PYTHONHASHSEED=5 => 0.228...
>
> Sadly, perf.py and timeit don't disable hash randomization for me. I
> hacked perf.py to set PYTHONHASHSEED=0 and magically the result became
> super stable!
>
> Multiple runs of the command:
>
> $ taskset_isolated.py python3 perf.py ../default/python-ref
> ../default/python -b call_simple --fast
>
> Outputs:
>
> ### call_simple ###
> Min: 0.232621 -> 0.247904: 1.07x slower
> Avg: 0.232628 -> 0.247941: 1.07x slower
> Significant (t=-591.78)
> Stddev: 0.00001 -> 0.00010: 13.7450x larger
>
> ### call_simple ###
> Min: 0.232619 -> 0.247904: 1.07x slower
> Avg: 0.232703 -> 0.247955: 1.07x slower
> Significant (t=-190.58)
> Stddev: 0.00029 -> 0.00011: 2.6336x smaller
>
> ### call_simple ###
> Min: 0.232621 -> 0.247903: 1.07x slower
> Avg: 0.232629 -> 0.247918: 1.07x slower
> Significant (t=-5896.14)
> Stddev: 0.00001 -> 0.00001: 1.3350x larger
>
> Even with --fast, the result is *very* stable. See the very good
> standard deviation. In 3 runs, I got exactly the same "1.07x". Average
> timings are the same +/-1 up to 4 digits!
>
> No need to use the ultra slow --rigourous option. This option is
> probably designed to hide the noise of a very unstable system. But
> using my Linux config, it doesn't seem to be needed anymore, at least
> on this very specific microbenchmark.
>
> Ok, now I can investigate why my change on the C code introduced a
> performance regression :-D
>
> Victor
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed