From victor.stinner at gmail.com  Thu Sep  1 06:58:00 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 1 Sep 2016 12:58:00 +0200
Subject: [Speed] New instance of CodeSpeed at speed.python.org running
 performance on CPython and PyPy?
Message-ID: <CAMpsgwbSRnKaa393VH_eJnJWE0Jn5FsxVBfhHNXEXqtbt=gxOg@mail.gmail.com>

Hi,

Would it be possible to run a new instance of CodeSpeed (the website
behing speed.python.org) which would run the "performance" benchmark
suite rather than the "benchmarks" benchmark suite? And would it be
possible to run it on CPython (2.7 and 3.5 branches) and PyPy (master
branch, maybe also the py3k branch)?

I found https://github.com/tobami/codespeed/ but I didn't look at it
right now. I guess that some code should be written to convert perf
JSON file to the format expected by CodeSpeed?

FYI I released performance 0.2 yesterday. JSON files now contain the
version of the benchmark suite ("performance_version: 0.2"). I plan to
use semantic version: increase the major version (ex: upgrade to 0.3,
but later it will be 1.x, 2.x, etc.) when benchmark results are
considered to not be compatible.

For example, I upgraded Django (from 1.9 to 1.10) and Chameleon (from
2.22 to 2.24) in performance 0.2.

The question is how to upgrade the performance to a new major version:
should we drop previous benchmark results?

Maybe we should put the performance version in the URL, and use
"/latest/" by default. Only /latest/ would get new results, and
/latest/ would restart from an empty set of results when performance
is upgraded?

Another option, less exciting, is to never upgrade benchmarks. The
benchmarks project *added* new benchmarks when a dependency was
"upgraded". In fact, the old dependency was kept and a new dependency
(full copy of the code in fact ;-)) was added. So it has django,
django_v2, django_v3, etc. The problem is that it still uses Mercurial
1.2 which was released 7 years ago (2009)... Since it's painful to
upgrade, most dependencies were outdated.

Do you care of old benchmark results? It's quite easy to regenerate
them (on demand?) if needed, no? Using Mercurial and Git, it's easy to
update to any old revisions to run again a benchmark on an old version
of CPython / PyPy / etc.

Victor

From brett at python.org  Thu Sep  1 11:48:14 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 01 Sep 2016 15:48:14 +0000
Subject: [Speed] New instance of CodeSpeed at speed.python.org running
 performance on CPython and PyPy?
In-Reply-To: <CAMpsgwbSRnKaa393VH_eJnJWE0Jn5FsxVBfhHNXEXqtbt=gxOg@mail.gmail.com>
References: <CAMpsgwbSRnKaa393VH_eJnJWE0Jn5FsxVBfhHNXEXqtbt=gxOg@mail.gmail.com>
Message-ID: <CAP1=2W5Nr54Ly1m6mvPrxf9yMC_F55NzPetCsEfAC84NL=i9aA@mail.gmail.com>

On Thu, 1 Sep 2016 at 03:58 Victor Stinner <victor.stinner at gmail.com> wrote:

> Hi,
>
> Would it be possible to run a new instance of CodeSpeed (the website
> behing speed.python.org) which would run the "performance" benchmark
> suite rather than the "benchmarks" benchmark suite? And would it be
> possible to run it on CPython (2.7 and 3.5 branches) and PyPy (master
> branch, maybe also the py3k branch)?
>

I believe Zach has the repo containing the code. He also said it's all
rather hacked up at the moment. Maybe something to discuss next week at the
sprint as I think you're both going to be there.


>
> I found https://github.com/tobami/codespeed/ but I didn't look at it
> right now. I guess that some code should be written to convert perf
> JSON file to the format expected by CodeSpeed?
>
> FYI I released performance 0.2 yesterday. JSON files now contain the
> version of the benchmark suite ("performance_version: 0.2"). I plan to
> use semantic version: increase the major version (ex: upgrade to 0.3,
> but later it will be 1.x, 2.x, etc.) when benchmark results are
> considered to not be compatible.
>

SGTM.


>
> For example, I upgraded Django (from 1.9 to 1.10) and Chameleon (from
> 2.22 to 2.24) in performance 0.2.
>
> The question is how to upgrade the performance to a new major version:
> should we drop previous benchmark results?
>

They don't really compare anymore, so they should at least not be compared
to benchmark results from a newer benchmark.


>
> Maybe we should put the performance version in the URL, and use
> "/latest/" by default. Only /latest/ would get new results, and
> /latest/ would restart from an empty set of results when performance
> is upgraded?
>

SGTM


>
> Another option, less exciting, is to never upgrade benchmarks. The
> benchmarks project *added* new benchmarks when a dependency was
> "upgraded". In fact, the old dependency was kept and a new dependency
> (full copy of the code in fact ;-)) was added. So it has django,
> django_v2, django_v3, etc. The problem is that it still uses Mercurial
> 1.2 which was released 7 years ago (2009)... Since it's painful to
> upgrade, most dependencies were outdated.
>

Based on my experience with the benchmark suite I don't like this option
either; it just gathers cruft. As Maciej and the PyPy folks have pointed
out, benchmarks should try to represent modern code and old benchmarks
won't necessarily do that.


>
> Do you care of old benchmark results? It's quite easy to regenerate
> them (on demand?) if needed, no? Using Mercurial and Git, it's easy to
> update to any old revisions to run again a benchmark on an old version
> of CPython / PyPy / etc.
>

I personally don't, but that's because care about either current
performance in comparison to others or very short timescales to see when a
regression occurred (hence a switchover has a very small chance of
impacting that investigation), not long timescale results for historical
purposes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160901/9da11b53/attachment.html>

From zachary.ware+pydev at gmail.com  Thu Sep  1 12:33:34 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Thu, 1 Sep 2016 11:33:34 -0500
Subject: [Speed] New instance of CodeSpeed at speed.python.org running
 performance on CPython and PyPy?
In-Reply-To: <CAMpsgwbSRnKaa393VH_eJnJWE0Jn5FsxVBfhHNXEXqtbt=gxOg@mail.gmail.com>
References: <CAMpsgwbSRnKaa393VH_eJnJWE0Jn5FsxVBfhHNXEXqtbt=gxOg@mail.gmail.com>
Message-ID: <CAKJDb-P0HM-nPQVb+TwDj-pvY6qyCvJhvrKcFLtazyRxWEa_GA@mail.gmail.com>

On Thu, Sep 1, 2016 at 5:58 AM, Victor Stinner <victor.stinner at gmail.com> wrote:
> Hi,
>
> Would it be possible to run a new instance of CodeSpeed (the website
> behing speed.python.org) which would run the "performance" benchmark
> suite rather than the "benchmarks" benchmark suite? And would it be
> possible to run it on CPython (2.7 and 3.5 branches) and PyPy (master
> branch, maybe also the py3k branch)?

Short answer is yes, please :).  Slightly longer answer is that that's
the plan, but I don't know when I'll have opportunity to work on it.
Possibly next week at the sprint, we'll see.

> I found https://github.com/tobami/codespeed/ but I didn't look at it
> right now. I guess that some code should be written to convert perf
> JSON file to the format expected by CodeSpeed?

The code that's actually running speed.python.org is at
https://github.com/zware/codespeed, speed.python.org branch.  I've
been meaning to get that moved to https://github.com/python/codespeed,
but it hasn't happened yet.  Other relevant code is hidden in the
buildbot master and on the runner box itself, which is not publicly
version controlled (which is bad).

We will need either a translation layer between performance and
CodeSpeed, or if we can, just change the format that performance
outputs to match what CodeSpeed expects.

> FYI I released performance 0.2 yesterday. JSON files now contain the
> version of the benchmark suite ("performance_version: 0.2"). I plan to
> use semantic version: increase the major version (ex: upgrade to 0.3,
> but later it will be 1.x, 2.x, etc.) when benchmark results are
> considered to not be compatible.
>
> For example, I upgraded Django (from 1.9 to 1.10) and Chameleon (from
> 2.22 to 2.24) in performance 0.2.
>
> The question is how to upgrade the performance to a new major version:
> should we drop previous benchmark results?
>
> Maybe we should put the performance version in the URL, and use
> "/latest/" by default. Only /latest/ would get new results, and
> /latest/ would restart from an empty set of results when performance
> is upgraded?

I have only enough experience with Django and CodeSpeed to have gotten
speed.python.org to the state that it currently is, so I really don't
know how (un)limited the possibilities are.  One simple method would
be to combine the benchmark name with the performance version, and
periodically clear out old benchmark results.

> Another option, less exciting, is to never upgrade benchmarks. The
> benchmarks project *added* new benchmarks when a dependency was
> "upgraded". In fact, the old dependency was kept and a new dependency
> (full copy of the code in fact ;-)) was added. So it has django,
> django_v2, django_v3, etc. The problem is that it still uses Mercurial
> 1.2 which was released 7 years ago (2009)... Since it's painful to
> upgrade, most dependencies were outdated.

I agree that we should have the ability to easily update benchmarks
and actually do so sometimes.

> Do you care of old benchmark results? It's quite easy to regenerate
> them (on demand?) if needed, no? Using Mercurial and Git, it's easy to
> update to any old revisions to run again a benchmark on an old version
> of CPython / PyPy / etc.

I suggest that upon updates to the benchmark suite/runner/etc., we
should clear out old results and rerun the benchmarks on a selection
of released versions of each interpreter.  We should also have some
way to trigger a run of the benchmarks on a particular revision of an
interpreter.

-- 
Zach

From kmod at dropbox.com  Thu Sep  1 13:53:35 2016
From: kmod at dropbox.com (Kevin Modzelewski)
Date: Thu, 1 Sep 2016 10:53:35 -0700
Subject: [Speed] New instance of CodeSpeed at speed.python.org running
 performance on CPython and PyPy?
In-Reply-To: <CAMpsgwbSRnKaa393VH_eJnJWE0Jn5FsxVBfhHNXEXqtbt=gxOg@mail.gmail.com>
References: <CAMpsgwbSRnKaa393VH_eJnJWE0Jn5FsxVBfhHNXEXqtbt=gxOg@mail.gmail.com>
Message-ID: <CAO=oM6vkBhg-t2LvBgKUXS09JctQ+LFG2oTcenzbTfxcOf0REQ@mail.gmail.com>

Just my two cents -- having a benchmark change underneath the benchmark
runner is quite confusing to debug, because it looks indistinguishable from
a non-reproducible regression that happens in the performance itself.  My
vote would be to wipe the benchmark results when this happens (and if that
is too expensive, not upgrade that often).

Another thing to consider is that there will be other people using this
benchmark set than just the codespeed setup: there will be long-lived
benchmark results in the form of blogs and academic papers.  I think it's
important to get some good wording about having to include the version of
the benchmarks when publishing results, and then it would be good to follow
that advice internally as well.

kmod

On Thu, Sep 1, 2016 at 3:58 AM, Victor Stinner <victor.stinner at gmail.com>
wrote:

> Hi,
>
> Would it be possible to run a new instance of CodeSpeed (the website
> behing speed.python.org) which would run the "performance" benchmark
> suite rather than the "benchmarks" benchmark suite? And would it be
> possible to run it on CPython (2.7 and 3.5 branches) and PyPy (master
> branch, maybe also the py3k branch)?
>
> I found https://github.com/tobami/codespeed/ but I didn't look at it
> right now. I guess that some code should be written to convert perf
> JSON file to the format expected by CodeSpeed?
>
> FYI I released performance 0.2 yesterday. JSON files now contain the
> version of the benchmark suite ("performance_version: 0.2"). I plan to
> use semantic version: increase the major version (ex: upgrade to 0.3,
> but later it will be 1.x, 2.x, etc.) when benchmark results are
> considered to not be compatible.
>
> For example, I upgraded Django (from 1.9 to 1.10) and Chameleon (from
> 2.22 to 2.24) in performance 0.2.
>
> The question is how to upgrade the performance to a new major version:
> should we drop previous benchmark results?
>
> Maybe we should put the performance version in the URL, and use
> "/latest/" by default. Only /latest/ would get new results, and
> /latest/ would restart from an empty set of results when performance
> is upgraded?
>
> Another option, less exciting, is to never upgrade benchmarks. The
> benchmarks project *added* new benchmarks when a dependency was
> "upgraded". In fact, the old dependency was kept and a new dependency
> (full copy of the code in fact ;-)) was added. So it has django,
> django_v2, django_v3, etc. The problem is that it still uses Mercurial
> 1.2 which was released 7 years ago (2009)... Since it's painful to
> upgrade, most dependencies were outdated.
>
> Do you care of old benchmark results? It's quite easy to regenerate
> them (on demand?) if needed, no? Using Mercurial and Git, it's easy to
> update to any old revisions to run again a benchmark on an old version
> of CPython / PyPy / etc.
>
> Victor
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160901/152256c6/attachment.html>

From victor.stinner at gmail.com  Thu Sep  1 16:36:22 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 1 Sep 2016 22:36:22 +0200
Subject: [Speed] New instance of CodeSpeed at speed.python.org running
 performance on CPython and PyPy?
In-Reply-To: <CAO=oM6vkBhg-t2LvBgKUXS09JctQ+LFG2oTcenzbTfxcOf0REQ@mail.gmail.com>
References: <CAMpsgwbSRnKaa393VH_eJnJWE0Jn5FsxVBfhHNXEXqtbt=gxOg@mail.gmail.com>
 <CAO=oM6vkBhg-t2LvBgKUXS09JctQ+LFG2oTcenzbTfxcOf0REQ@mail.gmail.com>
Message-ID: <CAMpsgwYtuKvCVao3q7Ler8mfSEERaqbeTLK01H37Z5=FmP7RVA@mail.gmail.com>

2016-09-01 19:53 GMT+02:00 Kevin Modzelewski <kmod at dropbox.com>:
> Just my two cents -- having a benchmark change underneath the benchmark
> runner is quite confusing to debug, because it looks indistinguishable from
> a non-reproducible regression that happens in the performance itself.

I agree. That's why I proposed to use semantic versionning. I'm not
sure that old results must be removed. We should just be explicit
about versions.

The main issue is when you *compare* two results produced by two
different performance versions. I have an item in my TODO list to emit
a warning if the exact version (minor version) is different, and
display an error if the major version is different.

About reproductability: I made another change in the development
version, indirect dependencies are now pinned as well:
https://github.com/python/performance/blob/master/performance/requirements.txt#L15

It should help to have a more reproductible benchmark ;-)

The last known issue about reproductability is that I dropped the code
to remove environment variables. I should fix this in the perf module
directly.

Interesting link: https://reproducible-builds.org/


> Another thing to consider is that there will be other people using this
> benchmark set than just the codespeed setup: there will be long-lived
> benchmark results in the form of blogs and academic papers.  I think it's
> important to get some good wording about having to include the version of
> the benchmarks when publishing results, and then it would be good to follow
> that advice internally as well.

The perf module has a feature for that: it supports storing metadata
in JSON files. I modified many benchmarks to store dependency version
(like Django verison), but now also store performance version (since
performance 0.2). The perf module stores its own version by default
;-)

I suggest to store JSON file rather than only the compact text output
(which contains much less information).

Victor

From victor.stinner at gmail.com  Mon Sep 19 05:42:25 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 19 Sep 2016 11:42:25 +0200
Subject: [Speed] performance 0.2.2 released
Message-ID: <CAMpsgwav9fY0fcv6Zui1Ers2M4m-nVAtNgia=A3nD8aQvsnNXw@mail.gmail.com>

Hi,

I released performance 0.2.2. Compared to performance 0.1:

* it fixes the --track-memory option, adds a new "show" command,
* enhance "compare" output (display python & performance versions, use
sample units: seconds or bytes),
* it isolates again environment variables (fix --inherit-environ
cmdline option),
* bugfixes as usual.

Version 0.2.2 (2016-09-19)
--------------------------

* Add a new ``show`` command to display a benchmark files
* Issue #11: Display Python version in compare. Display also the performance
  version.
* CPython issue #26383; csv output: don't truncate digits for timings shorter
  than 1 us
* compare: Use sample unit of benchmarks, format values in the table
  output using the unit
* compare: Fix the table output if benchmarks only contain a single sample
* Remove unused -C/--control_label and -E/--experiment_label options
* Update perf dependency to 0.7.11 to get Benchmark.get_unit() and
  BenchmarkSuite.get_metadata()

Version 0.2.1 (2016-09-10)
--------------------------

* Add ``--csv`` option to the ``compare`` command
* Fix ``compare -O table`` output format
* Freeze indirect dependencies in requirements.txt
* ``run``: add ``--track-memory`` option to track the memory peak usage
* Update perf dependency to 0.7.8 to support memory tracking and the new
  ``--inherit-environ`` command line option
* If ``virtualenv`` command fail, try another command to create the virtual
  environment: catch ``virtualenv`` error
* The first command to upgrade pip to version ``>= 6.0`` now uses the ``pip``
  binary rather than ``python -m pip`` to support pip 1.0 which doesn't support
  ``python -m pip`` CLI.
* Update Django (1.10.1), Mercurial (3.9.1) and psutil (4.3.1)
* Rename ``--inherit_env`` command line option to ``--inherit-environ`` and fix
  it

Version 0.2 (2016-09-01)
------------------------

* Update Django dependency to 1.10
* Update Chameleon dependency to 2.24
* Add the ``--venv`` command line option
* Convert Python startup, Mercurial startup and 2to3 benchmarks to perf scripts
  (bm_startup.py, bm_hg_startup.py and bm_2to3.py)
* Pass the ``--affinity`` option to perf scripts rather than using the
  ``taskset`` command
* Put more installer and optional requirements into
  ``performance/requirements.txt``
* Cached ``.pyc`` files are no more removed before running a benchmark.
  Use ``venv recreate`` command to update a virtual environment if required.
* The broken ``--track_memory`` option has been removed. It will be added back
  when it will be fixed.
* Add performance version to metadata
* Upgrade perf dependency to 0.7.5 to get ``Benchmark.update_metadata()``

Victor

From victor.stinner at gmail.com  Mon Sep 19 05:51:06 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 19 Sep 2016 11:51:06 +0200
Subject: [Speed] perf 0.7.11 released
Message-ID: <CAMpsgwYRc1mqb+o5JE2_yTKu74_C7PACzQdPuMjXK8CWFavahg@mail.gmail.com>

Hi,

I released perf 0.7.11. News since perf 0.7.3:

* Support PyPy
* Add units to samples: second, byte, integer. Benchmarks on the
memory usage (track memory) are now displayed correctly.
* Remove environment variables: add --inherit-environ cmdline option.
* Add more metadata: mem_max_rss, python_hash_seed (PYTHONHASHSEED env
var). Enhance cpu_conifig: add nohz_full & isolated. Enhance
python_version: add Mercurial revision.
* Better and more reliable code to calibrate benchmarks, calibration
samples are now stored (in warmup samples).
* Bugfixes as usual.


perf changelog:

Version 0.7.11 (2016-09-19)
---------------------------

* Fix metadata when NOHZ is not used: when /sys/devices/system/cpu/nohz_full
  contains ' (null)\n'

Version 0.7.10 (2016-09-17)
---------------------------

* Fix metadata when there is no isolated CPU
* Fix collecting metadata when /sys/devices/system/cpu/nohz_full doesn't exist

Version 0.7.9 (2016-09-17)
--------------------------

* Add :meth:`Benchmark.get_unit` method
* Add :meth:`BenchmarkSuite.get_metadata` method
* metadata: add ``nohz_full`` and ``isolated`` to ``cpu_config``
* add ``--affinity`` option to the ``metadata`` command
* ``convert``: fix ``--remove-all-metadata``, keep the unit
* metadata: fix regex to get the Mercurial revision for ``python_version``,
  support also locally modified source code (revision ending with "+")

Version 0.7.8 (2016-09-10)
--------------------------

* Worker child processes are now run in a fresh environment: environment
  variables are removed, to enhance reproductability.
* Add ``--inherit-environ`` command line argument.
* metadata: add ``python_cflags``, fix ``python_version`` for PyPy and
  add also the Mercurial version into ``python_version`` (if available)

Version 0.7.7 (2016-09-07)
--------------------------

* Reintroduce TextRunner._spawn_worker_suite() as a temporary workaround
  to fix the pybench benchmark of the performance module.

Version 0.7.6 (2016-09-02)
--------------------------

Tracking memory usage now works correctly on Linux and Windows. The calibration
is now done in a the first worker process.

* ``--tracemalloc`` and ``--track-memory`` now use the memory peak as the
  unique sample for the run.
* Rewrite code to track memory usage on Windows. Add
  ``mem_peak_pagefile_usage`` metadata. The ``win32api`` module is no more
  needed, the code now uses the ``ctypes`` module.
* ``convert``: add ``--remove-all-metadata`` and ``--update-metadata`` commands
* Add ``unit`` metadata: ``byte``, ``integer`` or ``second``.
* Run samples can now be integer (not only float).
* Don't round samples to 1 nanosecond anymore: with a large number of loops
  (ex: 2^24), rounding reduces the accuracy.
* The benchmark calibration is now done by the first worker process

Version 0.7.5 (2016-09-01)
--------------------------

* Add ``Benchmark.update_metadata()`` method
* Warmup samples can now be zero. TextRunner now raises an error if a sample
  function returns zero for a sample, except of calibration and warmup samples.

Version 0.7.4 (2016-08-18)
--------------------------

* Support PyPy
* metadata: add ``mem_max_rss`` and ``python_hash_seed``
* Add :func:`perf.python_implementation` and :func:`perf.python_has_jit`
  functions
* In workers, calibration samples are now stored as warmup samples.
* With a JIT (PyPy), the calibration is now done in each worker. The warmup
  step can compute more warmup samples if a raw sample is shorter than the
  minimum time.
* Warmups of Run objects are now lists of (loops, raw_sample) rather than lists
  of samples. This change requires a change in the JSON format.

Victor

From victor.stinner at gmail.com  Thu Sep 22 19:19:54 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 23 Sep 2016 01:19:54 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance
 governor
Message-ID: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>

Hi,

While analyzing a performance regression (
http://bugs.python.org/issue28243 ) , I had a major issue with my
benchmark. Suddenly, for no reason, after 30 minutes of benchmarking,
the benchmark became 2x FASTER...

A similar issue occurred to me the last week when testing if PGO
compilation makes Python performance unstable.

It might be an issue related to the intel_pstate driver on Linux and
CPU isolation. I reported the bug in the Fedora bug tracker in the
kernel category:
https://bugzilla.redhat.com/show_bug.cgi?id=1378529

I don't know much about this issue yet, I contacted Intel engineers
who knows these things better than me :-)

If you have an Intel CPU, use Linux, have a CPU with multiple physical
cores and have 15 minutes to run a test, I would appreciate if you can
try to reproduce the bug!

Victor

From solipsis at pitrou.net  Fri Sep 23 05:23:59 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 23 Sep 2016 11:23:59 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
Message-ID: <20160923112359.5ecd8c96@fsol>

On Fri, 23 Sep 2016 01:19:54 +0200
Victor Stinner <victor.stinner at gmail.com>
wrote:
> 
> While analyzing a performance regression (
> http://bugs.python.org/issue28243 ) , I had a major issue with my
> benchmark. Suddenly, for no reason, after 30 minutes of benchmarking,
> the benchmark became 2x FASTER...

Did the benchmark really become 2x faster, or did the clock become 2x
slower?

If you found a magic knob in your CPU that suddently makes it 2x
faster, many people would probably like to hear about it ;-)

> If you have an Intel CPU, use Linux, have a CPU with multiple physical
> cores and have 15 minutes to run a test, I would appreciate if you can
> try to reproduce the bug!

Can you tell us how to "reproduce"?

Regards

Antoine.


From victor.stinner at gmail.com  Fri Sep 23 05:44:12 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 23 Sep 2016 11:44:12 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
In-Reply-To: <20160923112359.5ecd8c96@fsol>
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
 <20160923112359.5ecd8c96@fsol>
Message-ID: <CAMpsgwYV+5dODap0DerwG+CZ+BMrY8_HbpMTxY-TL-B5a2e0Qg@mail.gmail.com>

2016-09-23 11:23 GMT+02:00 Antoine Pitrou <solipsis at pitrou.net>:
> Did the benchmark really become 2x faster, or did the clock become 2x
> slower?
>
> If you found a magic knob in your CPU that suddently makes it 2x
> faster, many people would probably like to hear about it ;-)

He he. It's a matter of point of view :-) When I got the issue for the
first time last Friday, it was like my CPU became 2x faster:
https://bugzilla.redhat.com/show_bug.cgi?id=1378529#c1

I guess that for some reasons, the CPU frequency was 1.6 GHz (min
frequency) even if I configured the CPU frequency governor to
performance. But an unknown reason, suddenly, the governor noticed
that my CPU should run at 3.4 GHz and so the benchmark "became
faster".

In fact, the benchmark started at half speed (1.6 GHz), and suddenly
ran at the "normal speed" (3.4 GHz).


> Can you tell us how to "reproduce"?

https://bugzilla.redhat.com/show_bug.cgi?id=1378529

* Disable Turbo Boost
* Enable HyperThreading
* Isolate at least one physical CPU core (so two logical cores using
HyperThreading) -- you can use "lscpu -a -e" to find a pair of logical
CPUs of a physical core
* Enable NOHZ_FULL on isolated CPUs
* Use performance governor, at least for isolated CPUs, or better for all CPUs
* Run "cpupower monitor" in one terminal (cpupower comes from the
package kernel-tools)
* Run a benchmark in different terminal, but pin it to one isolated
CPU using "taskset -c <CPU number>"
* Wait a few seconds
* See C0 state of the isolated CPUs increasing up to 100%, whereas no
process is running on these CPUs (the system is idle and the CPU usage
is 0% on these CPUs)
* Then run again the benchmark on an isolated CPU

For example, I'm using CPUs 3 and 7. I interrupted the boot process
(GRUB) to edit the Linux command ("linuxefi ... vmlinuz ...") to add
these parameters: "... isolcpus=3,7 nohz_full=3,7" (then boot with
CTRL-x). When Linux is booted, I'm running the isolcpus.py script
attached to the bug report to set the governor to performance (but
also mask interruptions on these CPUs).

I run the benchmark on the CPU 7 to trigger the "C0 bug" and then I
run the benchmark on the CPU 3. Sometimes, I have to run the benchmark
on the CPU 3 to trigger the bug.

Sometimes, the benchmark becomes slower on the CPU 3, sometimes on
both CPUs, sometimes only on the CPU 7...

The exact behaviour is not really deterministic.

For a longer explanation how to reproduce the bug with "snapshots" of
programs and an example of benchmark (perf timeit), see:
https://bugzilla.redhat.com/show_bug.cgi?id=1378529#c0

I don't think that the benchmark matters, you only have to find a way
to increase the CPU usage to 100% on one logical CPU and then stop the
program to decrease the CPU usage to 0%.

Victor

From solipsis at pitrou.net  Fri Sep 23 06:19:38 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 23 Sep 2016 12:19:38 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
 <20160923112359.5ecd8c96@fsol>
 <CAMpsgwYV+5dODap0DerwG+CZ+BMrY8_HbpMTxY-TL-B5a2e0Qg@mail.gmail.com>
Message-ID: <20160923121938.5925b781@fsol>

On Fri, 23 Sep 2016 11:44:12 +0200
Victor Stinner <victor.stinner at gmail.com>
wrote:
> I guess that for some reasons, the CPU frequency was 1.6 GHz (min
> frequency) even if I configured the CPU frequency governor to
> performance.

Does that mean that all "perf"-reported benchmark results that you
reported from your machine are actually invalid?

> > Can you tell us how to "reproduce"?  
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1378529
> 
> * Disable Turbo Boost
> * Enable HyperThreading
[...]

Ah, well, I don't have HyperThreading on my CPU, sorry.

Regards

Antoine.


From victor.stinner at gmail.com  Fri Sep 23 06:35:52 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 23 Sep 2016 12:35:52 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
In-Reply-To: <20160923121938.5925b781@fsol>
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
 <20160923112359.5ecd8c96@fsol>
 <CAMpsgwYV+5dODap0DerwG+CZ+BMrY8_HbpMTxY-TL-B5a2e0Qg@mail.gmail.com>
 <20160923121938.5925b781@fsol>
Message-ID: <CAMpsgwaNi-2aMJqBozfOW+-igOtetPe1DRrWa+d89B-X7OTEJQ@mail.gmail.com>

2016-09-23 12:19 GMT+02:00 Antoine Pitrou <solipsis at pitrou.net>:
> On Fri, 23 Sep 2016 11:44:12 +0200
> Victor Stinner <victor.stinner at gmail.com>
> wrote:
>> I guess that for some reasons, the CPU frequency was 1.6 GHz (min
>> frequency) even if I configured the CPU frequency governor to
>> performance.
>
> Does that mean that all "perf"-reported benchmark results that you
> reported from your machine are actually invalid?

I don't know why, but the bug only started to occur since one week.
The performance difference is so huge (2.0x factor) that it's easy to
spot it.

In fact, raw performance numbers don't matter. IMHO only comparison
between timings matter: the "...x faster" or "...x slower".

By the way, I now hesitate to try an advice that I read somewhere:
always use the minimum CPU frequency to run a benchmark. It avoids all
tricky speed changes of modern Intel CPUs. A CPU cannot run slower
than its minimum frequency. If the frequency is fixed, it cannot run
faster neither. So the CPU frequency issue should be fixed for all
known issues: Turbo Boost, speed change when close to the heat limit
(100?C), etc.

Victor

From victor.stinner at gmail.com  Fri Sep 23 19:49:28 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sat, 24 Sep 2016 01:49:28 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
In-Reply-To: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
Message-ID: <CAMpsgwYMSdfX5XCfhC5u0PM=MCS+gV8FJ6gunN+DagrKBcPExQ@mail.gmail.com>

I wrote a long article explaing how I identified the bug by testing
Turbo Boost, CPU temperature, CPU frequency, etc.
https://haypo.github.io/intel-cpus-part2.html

Copy/paste of my conclusion:

To get stable benchmarks, the safest fix for all these issues is
probably to set the CPU frequency of the CPUs used by benchmarks to
the minimum. It seems like nothing can reduce the frequency of a CPU
below its minimum.

When running benchmarks, raw timings and CPU performance don't matter.
Only comparisons between benchmark results and stable performances
matter.

Victor

From arigo at tunes.org  Sat Sep 24 02:11:22 2016
From: arigo at tunes.org (Armin Rigo)
Date: Sat, 24 Sep 2016 08:11:22 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
In-Reply-To: <CAMpsgwYMSdfX5XCfhC5u0PM=MCS+gV8FJ6gunN+DagrKBcPExQ@mail.gmail.com>
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
 <CAMpsgwYMSdfX5XCfhC5u0PM=MCS+gV8FJ6gunN+DagrKBcPExQ@mail.gmail.com>
Message-ID: <CAMSv6X0n6_9D_y9oc2EPsOQUg2bfWKRkxxWvG7vHmVK3bcxKBQ@mail.gmail.com>

Hi Victor,

On 24 September 2016 at 01:49, Victor Stinner <victor.stinner at gmail.com> wrote:
> When running benchmarks, raw timings and CPU performance don't matter.
> Only comparisons between benchmark results and stable performances
> matter.

IMHO this is not a very good solution.  With the CPU running at, say,
a fifth of its nominal performance, you can't expect that it will
behave in a remotely similar way.  For example, it makes the RAM
appear five times faster.  I would guess (but I don't know) that even
the on-core L2/L3 caches are not slowed down by nearly as much as five
times.  As a result, it is easy to introduce changes to the CPython
core that appear beneficial, but are actually detrimental, or
vice-versa.  For example, replacing some computation by lookups in a
table may look like a good idea, when it is not.


A bient?t,

Armin.

From solipsis at pitrou.net  Sat Sep 24 03:42:42 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 24 Sep 2016 09:42:42 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
 <CAMpsgwYMSdfX5XCfhC5u0PM=MCS+gV8FJ6gunN+DagrKBcPExQ@mail.gmail.com>
 <CAMSv6X0n6_9D_y9oc2EPsOQUg2bfWKRkxxWvG7vHmVK3bcxKBQ@mail.gmail.com>
Message-ID: <20160924094242.345b3357@fsol>

On Sat, 24 Sep 2016 08:11:22 +0200
Armin Rigo <arigo at tunes.org> wrote:
> Hi Victor,
> 
> On 24 September 2016 at 01:49, Victor Stinner <victor.stinner at gmail.com> wrote:
> > When running benchmarks, raw timings and CPU performance don't matter.
> > Only comparisons between benchmark results and stable performances
> > matter.  
> 
> IMHO this is not a very good solution.  With the CPU running at, say,
> a fifth of its nominal performance, you can't expect that it will
> behave in a remotely similar way.  For example, it makes the RAM
> appear five times faster.  I would guess (but I don't know) that even
> the on-core L2/L3 caches are not slowed down by nearly as much as five
> times.  As a result, it is easy to introduce changes to the CPython
> core that appear beneficial, but are actually detrimental, or
> vice-versa.  For example, replacing some computation by lookups in a
> table may look like a good idea, when it is not.

Agreed with Armin.

Regards

Antoine.


From victor.stinner at gmail.com  Tue Sep 27 09:40:42 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 27 Sep 2016 15:40:42 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
In-Reply-To: <CAMSv6X0n6_9D_y9oc2EPsOQUg2bfWKRkxxWvG7vHmVK3bcxKBQ@mail.gmail.com>
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
 <CAMpsgwYMSdfX5XCfhC5u0PM=MCS+gV8FJ6gunN+DagrKBcPExQ@mail.gmail.com>
 <CAMSv6X0n6_9D_y9oc2EPsOQUg2bfWKRkxxWvG7vHmVK3bcxKBQ@mail.gmail.com>
Message-ID: <CAMpsgwZ8pYzNJSmTz2vXsLbp67-Boo7WSSE228A8SbKZXa12sw@mail.gmail.com>

Hi,

I made further tests and I understood better the issue.

In short, the intel_pstate driver doesn't support NOHZ_FULL, and so
the frequency of CPUs using NOHZ_FULL depends on the workload of other
CPUs. This is especially true when using the powersave (default) cpu
frequency governor. At least, I tested on my CPU without HWP.

intel_pstate updates the Pstate of each CPU by writing into the MSR
199H. The purpose of NOHZ_FULL is to avoid any interruption, whereas
intel_pstate is based on interruptions to sample performances, pick
the right Pstate and write it into the MSR. To write into the MSR of
the CPU 7, the kernel must run on the CPU 7. If the benchmark is CPU
bound and never calls the kernel, there is no opportonity to run the
intel_pstate drive.


Antoine:
> Ah, well, I don't have HyperThreading on my CPU, sorry.

The bug can be reproduced without HyperThreading.

New much simpler scenario to reproduce the bug (and my analysis of the bug):
https://bugzilla.redhat.com/show_bug.cgi?id=1378529#c6


2016-09-24 8:11 GMT+02:00 Armin Rigo <arigo at tunes.org>:
> IMHO this is not a very good solution.  With the CPU running at, say,
> a fifth of its nominal performance, you can't expect that it will
> behave in a remotely similar way.

The norminal speed is 3.4 GHz. The minimum speed is 1.6 GHz. Timings
are just the double between nominal and minimum speed.


> As a result, it is easy to introduce changes to the CPython
> core that appear beneficial, but are actually detrimental, or
> vice-versa.  For example, replacing some computation by lookups in a
> table may look like a good idea, when it is not.

Yeah, maybe, I don't know.

Anyway, there are two solutions to run stable benchmarks at nominal speed:

* (Use NOHZ_FULL but) Force frequency to the maximum
* Don't use NOHZ_FULL

Victor

From arigo at tunes.org  Thu Sep 29 11:11:09 2016
From: arigo at tunes.org (Armin Rigo)
Date: Thu, 29 Sep 2016 17:11:09 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
In-Reply-To: <CAMpsgwZ8pYzNJSmTz2vXsLbp67-Boo7WSSE228A8SbKZXa12sw@mail.gmail.com>
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
 <CAMpsgwYMSdfX5XCfhC5u0PM=MCS+gV8FJ6gunN+DagrKBcPExQ@mail.gmail.com>
 <CAMSv6X0n6_9D_y9oc2EPsOQUg2bfWKRkxxWvG7vHmVK3bcxKBQ@mail.gmail.com>
 <CAMpsgwZ8pYzNJSmTz2vXsLbp67-Boo7WSSE228A8SbKZXa12sw@mail.gmail.com>
Message-ID: <CAMSv6X2w=DyvdTOeFkQSNwBHaD6tjVbDnQOx5RXTHeS9upj1Aw@mail.gmail.com>

Hi Victor,

On 27 September 2016 at 15:40, Victor Stinner <victor.stinner at gmail.com> wrote:
> 2016-09-24 8:11 GMT+02:00 Armin Rigo <arigo at tunes.org>:
>> IMHO this is not a very good solution.  With the CPU running at, say,
>> a fifth of its nominal performance, you can't expect that it will
>> behave in a remotely similar way.
>
> The norminal speed is 3.4 GHz. The minimum speed is 1.6 GHz. Timings
> are just the double between nominal and minimum speed.

On my laptop, the speed ranges between 500MHz and 2300MHz.

> * (Use NOHZ_FULL but) Force frequency to the maximum
> * Don't use NOHZ_FULL

I think we should force the frequency to a single number which is the
maximum that can be reasonably sustained (I mean, not the overclocked
speed done by some CPUs for short amounts of time).


A bient?t,

Armin.

From victor.stinner at gmail.com  Thu Sep 29 11:56:05 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 29 Sep 2016 17:56:05 +0200
Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the
 performance governor
In-Reply-To: <CAMSv6X2w=DyvdTOeFkQSNwBHaD6tjVbDnQOx5RXTHeS9upj1Aw@mail.gmail.com>
References: <CAMpsgwZorRqzXGoxMQXyfBa01LSa83N4UXLDZycKrW_FnVKc-Q@mail.gmail.com>
 <CAMpsgwYMSdfX5XCfhC5u0PM=MCS+gV8FJ6gunN+DagrKBcPExQ@mail.gmail.com>
 <CAMSv6X0n6_9D_y9oc2EPsOQUg2bfWKRkxxWvG7vHmVK3bcxKBQ@mail.gmail.com>
 <CAMpsgwZ8pYzNJSmTz2vXsLbp67-Boo7WSSE228A8SbKZXa12sw@mail.gmail.com>
 <CAMSv6X2w=DyvdTOeFkQSNwBHaD6tjVbDnQOx5RXTHeS9upj1Aw@mail.gmail.com>
Message-ID: <CAMpsgwYbdu6orJ+V56tK=4BGtdbhzQ1566gZdntR6oGuhUjbnw@mail.gmail.com>

2016-09-29 17:11 GMT+02:00 Armin Rigo <arigo at tunes.org>:
> On my laptop, the speed ranges between 500MHz and 2300MHz.

Oh I see, on this computer the difference can be up to 5x slower!


>> * (Use NOHZ_FULL but) Force frequency to the maximum
>> * Don't use NOHZ_FULL
>
> I think we should force the frequency to a single number which is the
> maximum that can be reasonably sustained

In my experience, the CPU is fine when running at the nominal speed.
I'm talking about the value displayed in the CPU model in /cpu/cpuinfo
(2.9 GHz):

$ grep 'model name' /proc/cpuinfo
model name    : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz


> (I mean, not the overclocked speed done by some CPUs for short amounts of time).

This is the Turbo Mode. It can be disabled in the BIOS or by writing 1
into /sys/devices/system/cpu/intel_pstate/no_turbo

By the way, the most reliable tool that I found to read the CPU
frequency is turbostat. I used APERF and MPERF counters to compute the
"Busy MHz". You may also try "cpupower monitor" which is similar.

Victor

From victor.stinner at gmail.com  Fri Sep 30 10:52:35 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 30 Sep 2016 16:52:35 +0200
Subject: [Speed] perf 0.7.12: --python and --compare-to options
Message-ID: <CAMpsgwZcivC7SziVDS8pArCmyjd=Z1Zeeec2eqb=-EF-R5B46A@mail.gmail.com>

Hi,

I always wanted to be able to compare the performance of two Python
versions using timeit *in a single command*. So I just implemented it!
I added --python and --compare-to options.

Real example to show the new "timeit --compared-to" feature:
---

$ export PYTHONPATH=~/prog/GIT/perf
$ ./python-resize -m perf timeit --inherit-environ=PYTHONPATH
--compare-to=./python-ref -s 'x = range(1000); d={}' 'for i in x:
d[i]=i; del d[i];' --rigorous
python-ref: ........................................ 77.6 us +- 1.8 us
python-resize: ........................................ 74.8 us +- 1.9 us

Median +- std dev: [python-ref] 77.6 us +- 1.8 us -> [python-resize]
74.8 us +- 1.9 us: 1.04x faster
---
http://bugs.python.org/issue28199#msg277755

Changes between 0.7.11 and 0.7.12:

* Add ``--python`` command line option
* ``timeit``: add ``--name``, ``--inner-loops`` and ``--compare-to``
  options
* TextRunner don't set CPU affinity of the main process, only on worker
  processes. It may help a little bit when using NOHZ_FULL.
* metadata: add ``boot_time`` and ``uptime`` on Linux
* metadata: add idle driver to ``cpu_config``

Victor