[Speed] intel_pstate C0 bug on isolated CPUs with the performance governor

Victor Stinner victor.stinner at gmail.com
Tue Sep 27 09:40:42 EDT 2016


Hi,

I made further tests and I understood better the issue.

In short, the intel_pstate driver doesn't support NOHZ_FULL, and so
the frequency of CPUs using NOHZ_FULL depends on the workload of other
CPUs. This is especially true when using the powersave (default) cpu
frequency governor. At least, I tested on my CPU without HWP.

intel_pstate updates the Pstate of each CPU by writing into the MSR
199H. The purpose of NOHZ_FULL is to avoid any interruption, whereas
intel_pstate is based on interruptions to sample performances, pick
the right Pstate and write it into the MSR. To write into the MSR of
the CPU 7, the kernel must run on the CPU 7. If the benchmark is CPU
bound and never calls the kernel, there is no opportonity to run the
intel_pstate drive.


Antoine:
> Ah, well, I don't have HyperThreading on my CPU, sorry.

The bug can be reproduced without HyperThreading.

New much simpler scenario to reproduce the bug (and my analysis of the bug):
https://bugzilla.redhat.com/show_bug.cgi?id=1378529#c6


2016-09-24 8:11 GMT+02:00 Armin Rigo <arigo at tunes.org>:
> IMHO this is not a very good solution.  With the CPU running at, say,
> a fifth of its nominal performance, you can't expect that it will
> behave in a remotely similar way.

The norminal speed is 3.4 GHz. The minimum speed is 1.6 GHz. Timings
are just the double between nominal and minimum speed.


> As a result, it is easy to introduce changes to the CPython
> core that appear beneficial, but are actually detrimental, or
> vice-versa.  For example, replacing some computation by lookups in a
> table may look like a good idea, when it is not.

Yeah, maybe, I don't know.

Anyway, there are two solutions to run stable benchmarks at nominal speed:

* (Use NOHZ_FULL but) Force frequency to the maximum
* Don't use NOHZ_FULL

Victor


More information about the Speed mailing list