[Python-Dev] Possible performance regression

Victor Stinner vstinner at redhat.com
Mon Feb 25 04:42:22 EST 2019


Hi,

Le lun. 25 févr. 2019 à 05:57, Raymond Hettinger
<raymond.hettinger at gmail.com> a écrit :
> I'll been running benchmarks that have been stable for a while.  But between today and yesterday, there has been an almost across the board performance regression.

How do you run your benchmarks? If you use Linux, are you using CPU isolation?

> It's possible that this is a measurement error or something unique to my system (my Mac installed the 10.14.3 release today), so I'm hoping other folks can run checks as well.

Getting reproducible benchmark results on timing smaller than 1 ms is
really hard. I wrote some advices to get more stable results:
https://perf.readthedocs.io/en/latest/run_benchmark.html#how-to-get-reproductible-benchmark-results

> Variable and attribute read access:
>    4.0 ns       read_local

In my experience, for timing less than 100 ns, *everything* impacts
the benchmark, and the result is useless without the standard
deviation.

On such microbenchmarks, the hash function hash a significant impact
on performance. So you should run your benchmark on multiple different
*processes* to get multiple different hash functions. Some people
prefer to use PYTHONHASHSEED=0 (or another value), but I dislike using
that since it's less representative of performance "on production"
(with randomized hash function). For example, using 20 processes to
test 20 randomized hash function is enough to compute the average cost
of the hash function. More remark was more general, I didn't look at
the specific case of var_access_benchmark.py. Maybe benchmarks on C
depend on the hash function.

For example, 4.0 ns +/- 10 ns or 4.0 ns +/- 0.1 ns is completely
different to decide if "5.0 ns" is slower to faster.

The "perf compare" command of my perf module "determines whether two
samples differ significantly using a Student’s two-sample, two-tailed
t-test with alpha equals to 0.95.":
https://en.wikipedia.org/wiki/Student's_t-test

I don't understand how these things work, I just copied the code from
the old Python benchmark suite :-)

See also my articles in my journey to stable benchmarks:

* https://vstinner.github.io/journey-to-stable-benchmark-system.html #
nosy applications / CPU isolation
* https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html # PGO
* https://vstinner.github.io/journey-to-stable-benchmark-average.html
# randomized hash function

There are likely other parameters which impact benchmarks, that's why
std dev and how the benchmark matter so much.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.


More information about the Python-Dev mailing list