From victor.stinner at gmail.com Thu Sep 1 06:58:00 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 1 Sep 2016 12:58:00 +0200 Subject: [Speed] New instance of CodeSpeed at speed.python.org running performance on CPython and PyPy? Message-ID: Hi, Would it be possible to run a new instance of CodeSpeed (the website behing speed.python.org) which would run the "performance" benchmark suite rather than the "benchmarks" benchmark suite? And would it be possible to run it on CPython (2.7 and 3.5 branches) and PyPy (master branch, maybe also the py3k branch)? I found https://github.com/tobami/codespeed/ but I didn't look at it right now. I guess that some code should be written to convert perf JSON file to the format expected by CodeSpeed? FYI I released performance 0.2 yesterday. JSON files now contain the version of the benchmark suite ("performance_version: 0.2"). I plan to use semantic version: increase the major version (ex: upgrade to 0.3, but later it will be 1.x, 2.x, etc.) when benchmark results are considered to not be compatible. For example, I upgraded Django (from 1.9 to 1.10) and Chameleon (from 2.22 to 2.24) in performance 0.2. The question is how to upgrade the performance to a new major version: should we drop previous benchmark results? Maybe we should put the performance version in the URL, and use "/latest/" by default. Only /latest/ would get new results, and /latest/ would restart from an empty set of results when performance is upgraded? Another option, less exciting, is to never upgrade benchmarks. The benchmarks project *added* new benchmarks when a dependency was "upgraded". In fact, the old dependency was kept and a new dependency (full copy of the code in fact ;-)) was added. So it has django, django_v2, django_v3, etc. The problem is that it still uses Mercurial 1.2 which was released 7 years ago (2009)... Since it's painful to upgrade, most dependencies were outdated. Do you care of old benchmark results? It's quite easy to regenerate them (on demand?) if needed, no? Using Mercurial and Git, it's easy to update to any old revisions to run again a benchmark on an old version of CPython / PyPy / etc. Victor From brett at python.org Thu Sep 1 11:48:14 2016 From: brett at python.org (Brett Cannon) Date: Thu, 01 Sep 2016 15:48:14 +0000 Subject: [Speed] New instance of CodeSpeed at speed.python.org running performance on CPython and PyPy? In-Reply-To: References: Message-ID: On Thu, 1 Sep 2016 at 03:58 Victor Stinner wrote: > Hi, > > Would it be possible to run a new instance of CodeSpeed (the website > behing speed.python.org) which would run the "performance" benchmark > suite rather than the "benchmarks" benchmark suite? And would it be > possible to run it on CPython (2.7 and 3.5 branches) and PyPy (master > branch, maybe also the py3k branch)? > I believe Zach has the repo containing the code. He also said it's all rather hacked up at the moment. Maybe something to discuss next week at the sprint as I think you're both going to be there. > > I found https://github.com/tobami/codespeed/ but I didn't look at it > right now. I guess that some code should be written to convert perf > JSON file to the format expected by CodeSpeed? > > FYI I released performance 0.2 yesterday. JSON files now contain the > version of the benchmark suite ("performance_version: 0.2"). I plan to > use semantic version: increase the major version (ex: upgrade to 0.3, > but later it will be 1.x, 2.x, etc.) when benchmark results are > considered to not be compatible. > SGTM. > > For example, I upgraded Django (from 1.9 to 1.10) and Chameleon (from > 2.22 to 2.24) in performance 0.2. > > The question is how to upgrade the performance to a new major version: > should we drop previous benchmark results? > They don't really compare anymore, so they should at least not be compared to benchmark results from a newer benchmark. > > Maybe we should put the performance version in the URL, and use > "/latest/" by default. Only /latest/ would get new results, and > /latest/ would restart from an empty set of results when performance > is upgraded? > SGTM > > Another option, less exciting, is to never upgrade benchmarks. The > benchmarks project *added* new benchmarks when a dependency was > "upgraded". In fact, the old dependency was kept and a new dependency > (full copy of the code in fact ;-)) was added. So it has django, > django_v2, django_v3, etc. The problem is that it still uses Mercurial > 1.2 which was released 7 years ago (2009)... Since it's painful to > upgrade, most dependencies were outdated. > Based on my experience with the benchmark suite I don't like this option either; it just gathers cruft. As Maciej and the PyPy folks have pointed out, benchmarks should try to represent modern code and old benchmarks won't necessarily do that. > > Do you care of old benchmark results? It's quite easy to regenerate > them (on demand?) if needed, no? Using Mercurial and Git, it's easy to > update to any old revisions to run again a benchmark on an old version > of CPython / PyPy / etc. > I personally don't, but that's because care about either current performance in comparison to others or very short timescales to see when a regression occurred (hence a switchover has a very small chance of impacting that investigation), not long timescale results for historical purposes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.ware+pydev at gmail.com Thu Sep 1 12:33:34 2016 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Thu, 1 Sep 2016 11:33:34 -0500 Subject: [Speed] New instance of CodeSpeed at speed.python.org running performance on CPython and PyPy? In-Reply-To: References: Message-ID: On Thu, Sep 1, 2016 at 5:58 AM, Victor Stinner wrote: > Hi, > > Would it be possible to run a new instance of CodeSpeed (the website > behing speed.python.org) which would run the "performance" benchmark > suite rather than the "benchmarks" benchmark suite? And would it be > possible to run it on CPython (2.7 and 3.5 branches) and PyPy (master > branch, maybe also the py3k branch)? Short answer is yes, please :). Slightly longer answer is that that's the plan, but I don't know when I'll have opportunity to work on it. Possibly next week at the sprint, we'll see. > I found https://github.com/tobami/codespeed/ but I didn't look at it > right now. I guess that some code should be written to convert perf > JSON file to the format expected by CodeSpeed? The code that's actually running speed.python.org is at https://github.com/zware/codespeed, speed.python.org branch. I've been meaning to get that moved to https://github.com/python/codespeed, but it hasn't happened yet. Other relevant code is hidden in the buildbot master and on the runner box itself, which is not publicly version controlled (which is bad). We will need either a translation layer between performance and CodeSpeed, or if we can, just change the format that performance outputs to match what CodeSpeed expects. > FYI I released performance 0.2 yesterday. JSON files now contain the > version of the benchmark suite ("performance_version: 0.2"). I plan to > use semantic version: increase the major version (ex: upgrade to 0.3, > but later it will be 1.x, 2.x, etc.) when benchmark results are > considered to not be compatible. > > For example, I upgraded Django (from 1.9 to 1.10) and Chameleon (from > 2.22 to 2.24) in performance 0.2. > > The question is how to upgrade the performance to a new major version: > should we drop previous benchmark results? > > Maybe we should put the performance version in the URL, and use > "/latest/" by default. Only /latest/ would get new results, and > /latest/ would restart from an empty set of results when performance > is upgraded? I have only enough experience with Django and CodeSpeed to have gotten speed.python.org to the state that it currently is, so I really don't know how (un)limited the possibilities are. One simple method would be to combine the benchmark name with the performance version, and periodically clear out old benchmark results. > Another option, less exciting, is to never upgrade benchmarks. The > benchmarks project *added* new benchmarks when a dependency was > "upgraded". In fact, the old dependency was kept and a new dependency > (full copy of the code in fact ;-)) was added. So it has django, > django_v2, django_v3, etc. The problem is that it still uses Mercurial > 1.2 which was released 7 years ago (2009)... Since it's painful to > upgrade, most dependencies were outdated. I agree that we should have the ability to easily update benchmarks and actually do so sometimes. > Do you care of old benchmark results? It's quite easy to regenerate > them (on demand?) if needed, no? Using Mercurial and Git, it's easy to > update to any old revisions to run again a benchmark on an old version > of CPython / PyPy / etc. I suggest that upon updates to the benchmark suite/runner/etc., we should clear out old results and rerun the benchmarks on a selection of released versions of each interpreter. We should also have some way to trigger a run of the benchmarks on a particular revision of an interpreter. -- Zach From kmod at dropbox.com Thu Sep 1 13:53:35 2016 From: kmod at dropbox.com (Kevin Modzelewski) Date: Thu, 1 Sep 2016 10:53:35 -0700 Subject: [Speed] New instance of CodeSpeed at speed.python.org running performance on CPython and PyPy? In-Reply-To: References: Message-ID: Just my two cents -- having a benchmark change underneath the benchmark runner is quite confusing to debug, because it looks indistinguishable from a non-reproducible regression that happens in the performance itself. My vote would be to wipe the benchmark results when this happens (and if that is too expensive, not upgrade that often). Another thing to consider is that there will be other people using this benchmark set than just the codespeed setup: there will be long-lived benchmark results in the form of blogs and academic papers. I think it's important to get some good wording about having to include the version of the benchmarks when publishing results, and then it would be good to follow that advice internally as well. kmod On Thu, Sep 1, 2016 at 3:58 AM, Victor Stinner wrote: > Hi, > > Would it be possible to run a new instance of CodeSpeed (the website > behing speed.python.org) which would run the "performance" benchmark > suite rather than the "benchmarks" benchmark suite? And would it be > possible to run it on CPython (2.7 and 3.5 branches) and PyPy (master > branch, maybe also the py3k branch)? > > I found https://github.com/tobami/codespeed/ but I didn't look at it > right now. I guess that some code should be written to convert perf > JSON file to the format expected by CodeSpeed? > > FYI I released performance 0.2 yesterday. JSON files now contain the > version of the benchmark suite ("performance_version: 0.2"). I plan to > use semantic version: increase the major version (ex: upgrade to 0.3, > but later it will be 1.x, 2.x, etc.) when benchmark results are > considered to not be compatible. > > For example, I upgraded Django (from 1.9 to 1.10) and Chameleon (from > 2.22 to 2.24) in performance 0.2. > > The question is how to upgrade the performance to a new major version: > should we drop previous benchmark results? > > Maybe we should put the performance version in the URL, and use > "/latest/" by default. Only /latest/ would get new results, and > /latest/ would restart from an empty set of results when performance > is upgraded? > > Another option, less exciting, is to never upgrade benchmarks. The > benchmarks project *added* new benchmarks when a dependency was > "upgraded". In fact, the old dependency was kept and a new dependency > (full copy of the code in fact ;-)) was added. So it has django, > django_v2, django_v3, etc. The problem is that it still uses Mercurial > 1.2 which was released 7 years ago (2009)... Since it's painful to > upgrade, most dependencies were outdated. > > Do you care of old benchmark results? It's quite easy to regenerate > them (on demand?) if needed, no? Using Mercurial and Git, it's easy to > update to any old revisions to run again a benchmark on an old version > of CPython / PyPy / etc. > > Victor > _______________________________________________ > Speed mailing list > Speed at python.org > https://mail.python.org/mailman/listinfo/speed > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Sep 1 16:36:22 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 1 Sep 2016 22:36:22 +0200 Subject: [Speed] New instance of CodeSpeed at speed.python.org running performance on CPython and PyPy? In-Reply-To: References: Message-ID: 2016-09-01 19:53 GMT+02:00 Kevin Modzelewski : > Just my two cents -- having a benchmark change underneath the benchmark > runner is quite confusing to debug, because it looks indistinguishable from > a non-reproducible regression that happens in the performance itself. I agree. That's why I proposed to use semantic versionning. I'm not sure that old results must be removed. We should just be explicit about versions. The main issue is when you *compare* two results produced by two different performance versions. I have an item in my TODO list to emit a warning if the exact version (minor version) is different, and display an error if the major version is different. About reproductability: I made another change in the development version, indirect dependencies are now pinned as well: https://github.com/python/performance/blob/master/performance/requirements.txt#L15 It should help to have a more reproductible benchmark ;-) The last known issue about reproductability is that I dropped the code to remove environment variables. I should fix this in the perf module directly. Interesting link: https://reproducible-builds.org/ > Another thing to consider is that there will be other people using this > benchmark set than just the codespeed setup: there will be long-lived > benchmark results in the form of blogs and academic papers. I think it's > important to get some good wording about having to include the version of > the benchmarks when publishing results, and then it would be good to follow > that advice internally as well. The perf module has a feature for that: it supports storing metadata in JSON files. I modified many benchmarks to store dependency version (like Django verison), but now also store performance version (since performance 0.2). The perf module stores its own version by default ;-) I suggest to store JSON file rather than only the compact text output (which contains much less information). Victor From victor.stinner at gmail.com Mon Sep 19 05:42:25 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 19 Sep 2016 11:42:25 +0200 Subject: [Speed] performance 0.2.2 released Message-ID: Hi, I released performance 0.2.2. Compared to performance 0.1: * it fixes the --track-memory option, adds a new "show" command, * enhance "compare" output (display python & performance versions, use sample units: seconds or bytes), * it isolates again environment variables (fix --inherit-environ cmdline option), * bugfixes as usual. Version 0.2.2 (2016-09-19) -------------------------- * Add a new ``show`` command to display a benchmark files * Issue #11: Display Python version in compare. Display also the performance version. * CPython issue #26383; csv output: don't truncate digits for timings shorter than 1 us * compare: Use sample unit of benchmarks, format values in the table output using the unit * compare: Fix the table output if benchmarks only contain a single sample * Remove unused -C/--control_label and -E/--experiment_label options * Update perf dependency to 0.7.11 to get Benchmark.get_unit() and BenchmarkSuite.get_metadata() Version 0.2.1 (2016-09-10) -------------------------- * Add ``--csv`` option to the ``compare`` command * Fix ``compare -O table`` output format * Freeze indirect dependencies in requirements.txt * ``run``: add ``--track-memory`` option to track the memory peak usage * Update perf dependency to 0.7.8 to support memory tracking and the new ``--inherit-environ`` command line option * If ``virtualenv`` command fail, try another command to create the virtual environment: catch ``virtualenv`` error * The first command to upgrade pip to version ``>= 6.0`` now uses the ``pip`` binary rather than ``python -m pip`` to support pip 1.0 which doesn't support ``python -m pip`` CLI. * Update Django (1.10.1), Mercurial (3.9.1) and psutil (4.3.1) * Rename ``--inherit_env`` command line option to ``--inherit-environ`` and fix it Version 0.2 (2016-09-01) ------------------------ * Update Django dependency to 1.10 * Update Chameleon dependency to 2.24 * Add the ``--venv`` command line option * Convert Python startup, Mercurial startup and 2to3 benchmarks to perf scripts (bm_startup.py, bm_hg_startup.py and bm_2to3.py) * Pass the ``--affinity`` option to perf scripts rather than using the ``taskset`` command * Put more installer and optional requirements into ``performance/requirements.txt`` * Cached ``.pyc`` files are no more removed before running a benchmark. Use ``venv recreate`` command to update a virtual environment if required. * The broken ``--track_memory`` option has been removed. It will be added back when it will be fixed. * Add performance version to metadata * Upgrade perf dependency to 0.7.5 to get ``Benchmark.update_metadata()`` Victor From victor.stinner at gmail.com Mon Sep 19 05:51:06 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 19 Sep 2016 11:51:06 +0200 Subject: [Speed] perf 0.7.11 released Message-ID: Hi, I released perf 0.7.11. News since perf 0.7.3: * Support PyPy * Add units to samples: second, byte, integer. Benchmarks on the memory usage (track memory) are now displayed correctly. * Remove environment variables: add --inherit-environ cmdline option. * Add more metadata: mem_max_rss, python_hash_seed (PYTHONHASHSEED env var). Enhance cpu_conifig: add nohz_full & isolated. Enhance python_version: add Mercurial revision. * Better and more reliable code to calibrate benchmarks, calibration samples are now stored (in warmup samples). * Bugfixes as usual. perf changelog: Version 0.7.11 (2016-09-19) --------------------------- * Fix metadata when NOHZ is not used: when /sys/devices/system/cpu/nohz_full contains ' (null)\n' Version 0.7.10 (2016-09-17) --------------------------- * Fix metadata when there is no isolated CPU * Fix collecting metadata when /sys/devices/system/cpu/nohz_full doesn't exist Version 0.7.9 (2016-09-17) -------------------------- * Add :meth:`Benchmark.get_unit` method * Add :meth:`BenchmarkSuite.get_metadata` method * metadata: add ``nohz_full`` and ``isolated`` to ``cpu_config`` * add ``--affinity`` option to the ``metadata`` command * ``convert``: fix ``--remove-all-metadata``, keep the unit * metadata: fix regex to get the Mercurial revision for ``python_version``, support also locally modified source code (revision ending with "+") Version 0.7.8 (2016-09-10) -------------------------- * Worker child processes are now run in a fresh environment: environment variables are removed, to enhance reproductability. * Add ``--inherit-environ`` command line argument. * metadata: add ``python_cflags``, fix ``python_version`` for PyPy and add also the Mercurial version into ``python_version`` (if available) Version 0.7.7 (2016-09-07) -------------------------- * Reintroduce TextRunner._spawn_worker_suite() as a temporary workaround to fix the pybench benchmark of the performance module. Version 0.7.6 (2016-09-02) -------------------------- Tracking memory usage now works correctly on Linux and Windows. The calibration is now done in a the first worker process. * ``--tracemalloc`` and ``--track-memory`` now use the memory peak as the unique sample for the run. * Rewrite code to track memory usage on Windows. Add ``mem_peak_pagefile_usage`` metadata. The ``win32api`` module is no more needed, the code now uses the ``ctypes`` module. * ``convert``: add ``--remove-all-metadata`` and ``--update-metadata`` commands * Add ``unit`` metadata: ``byte``, ``integer`` or ``second``. * Run samples can now be integer (not only float). * Don't round samples to 1 nanosecond anymore: with a large number of loops (ex: 2^24), rounding reduces the accuracy. * The benchmark calibration is now done by the first worker process Version 0.7.5 (2016-09-01) -------------------------- * Add ``Benchmark.update_metadata()`` method * Warmup samples can now be zero. TextRunner now raises an error if a sample function returns zero for a sample, except of calibration and warmup samples. Version 0.7.4 (2016-08-18) -------------------------- * Support PyPy * metadata: add ``mem_max_rss`` and ``python_hash_seed`` * Add :func:`perf.python_implementation` and :func:`perf.python_has_jit` functions * In workers, calibration samples are now stored as warmup samples. * With a JIT (PyPy), the calibration is now done in each worker. The warmup step can compute more warmup samples if a raw sample is shorter than the minimum time. * Warmups of Run objects are now lists of (loops, raw_sample) rather than lists of samples. This change requires a change in the JSON format. Victor From victor.stinner at gmail.com Thu Sep 22 19:19:54 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 23 Sep 2016 01:19:54 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor Message-ID: Hi, While analyzing a performance regression ( http://bugs.python.org/issue28243 ) , I had a major issue with my benchmark. Suddenly, for no reason, after 30 minutes of benchmarking, the benchmark became 2x FASTER... A similar issue occurred to me the last week when testing if PGO compilation makes Python performance unstable. It might be an issue related to the intel_pstate driver on Linux and CPU isolation. I reported the bug in the Fedora bug tracker in the kernel category: https://bugzilla.redhat.com/show_bug.cgi?id=1378529 I don't know much about this issue yet, I contacted Intel engineers who knows these things better than me :-) If you have an Intel CPU, use Linux, have a CPU with multiple physical cores and have 15 minutes to run a test, I would appreciate if you can try to reproduce the bug! Victor From solipsis at pitrou.net Fri Sep 23 05:23:59 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 23 Sep 2016 11:23:59 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor References: Message-ID: <20160923112359.5ecd8c96@fsol> On Fri, 23 Sep 2016 01:19:54 +0200 Victor Stinner wrote: > > While analyzing a performance regression ( > http://bugs.python.org/issue28243 ) , I had a major issue with my > benchmark. Suddenly, for no reason, after 30 minutes of benchmarking, > the benchmark became 2x FASTER... Did the benchmark really become 2x faster, or did the clock become 2x slower? If you found a magic knob in your CPU that suddently makes it 2x faster, many people would probably like to hear about it ;-) > If you have an Intel CPU, use Linux, have a CPU with multiple physical > cores and have 15 minutes to run a test, I would appreciate if you can > try to reproduce the bug! Can you tell us how to "reproduce"? Regards Antoine. From victor.stinner at gmail.com Fri Sep 23 05:44:12 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 23 Sep 2016 11:44:12 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor In-Reply-To: <20160923112359.5ecd8c96@fsol> References: <20160923112359.5ecd8c96@fsol> Message-ID: 2016-09-23 11:23 GMT+02:00 Antoine Pitrou : > Did the benchmark really become 2x faster, or did the clock become 2x > slower? > > If you found a magic knob in your CPU that suddently makes it 2x > faster, many people would probably like to hear about it ;-) He he. It's a matter of point of view :-) When I got the issue for the first time last Friday, it was like my CPU became 2x faster: https://bugzilla.redhat.com/show_bug.cgi?id=1378529#c1 I guess that for some reasons, the CPU frequency was 1.6 GHz (min frequency) even if I configured the CPU frequency governor to performance. But an unknown reason, suddenly, the governor noticed that my CPU should run at 3.4 GHz and so the benchmark "became faster". In fact, the benchmark started at half speed (1.6 GHz), and suddenly ran at the "normal speed" (3.4 GHz). > Can you tell us how to "reproduce"? https://bugzilla.redhat.com/show_bug.cgi?id=1378529 * Disable Turbo Boost * Enable HyperThreading * Isolate at least one physical CPU core (so two logical cores using HyperThreading) -- you can use "lscpu -a -e" to find a pair of logical CPUs of a physical core * Enable NOHZ_FULL on isolated CPUs * Use performance governor, at least for isolated CPUs, or better for all CPUs * Run "cpupower monitor" in one terminal (cpupower comes from the package kernel-tools) * Run a benchmark in different terminal, but pin it to one isolated CPU using "taskset -c " * Wait a few seconds * See C0 state of the isolated CPUs increasing up to 100%, whereas no process is running on these CPUs (the system is idle and the CPU usage is 0% on these CPUs) * Then run again the benchmark on an isolated CPU For example, I'm using CPUs 3 and 7. I interrupted the boot process (GRUB) to edit the Linux command ("linuxefi ... vmlinuz ...") to add these parameters: "... isolcpus=3,7 nohz_full=3,7" (then boot with CTRL-x). When Linux is booted, I'm running the isolcpus.py script attached to the bug report to set the governor to performance (but also mask interruptions on these CPUs). I run the benchmark on the CPU 7 to trigger the "C0 bug" and then I run the benchmark on the CPU 3. Sometimes, I have to run the benchmark on the CPU 3 to trigger the bug. Sometimes, the benchmark becomes slower on the CPU 3, sometimes on both CPUs, sometimes only on the CPU 7... The exact behaviour is not really deterministic. For a longer explanation how to reproduce the bug with "snapshots" of programs and an example of benchmark (perf timeit), see: https://bugzilla.redhat.com/show_bug.cgi?id=1378529#c0 I don't think that the benchmark matters, you only have to find a way to increase the CPU usage to 100% on one logical CPU and then stop the program to decrease the CPU usage to 0%. Victor From solipsis at pitrou.net Fri Sep 23 06:19:38 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 23 Sep 2016 12:19:38 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor References: <20160923112359.5ecd8c96@fsol> Message-ID: <20160923121938.5925b781@fsol> On Fri, 23 Sep 2016 11:44:12 +0200 Victor Stinner wrote: > I guess that for some reasons, the CPU frequency was 1.6 GHz (min > frequency) even if I configured the CPU frequency governor to > performance. Does that mean that all "perf"-reported benchmark results that you reported from your machine are actually invalid? > > Can you tell us how to "reproduce"? > > https://bugzilla.redhat.com/show_bug.cgi?id=1378529 > > * Disable Turbo Boost > * Enable HyperThreading [...] Ah, well, I don't have HyperThreading on my CPU, sorry. Regards Antoine. From victor.stinner at gmail.com Fri Sep 23 06:35:52 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 23 Sep 2016 12:35:52 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor In-Reply-To: <20160923121938.5925b781@fsol> References: <20160923112359.5ecd8c96@fsol> <20160923121938.5925b781@fsol> Message-ID: 2016-09-23 12:19 GMT+02:00 Antoine Pitrou : > On Fri, 23 Sep 2016 11:44:12 +0200 > Victor Stinner > wrote: >> I guess that for some reasons, the CPU frequency was 1.6 GHz (min >> frequency) even if I configured the CPU frequency governor to >> performance. > > Does that mean that all "perf"-reported benchmark results that you > reported from your machine are actually invalid? I don't know why, but the bug only started to occur since one week. The performance difference is so huge (2.0x factor) that it's easy to spot it. In fact, raw performance numbers don't matter. IMHO only comparison between timings matter: the "...x faster" or "...x slower". By the way, I now hesitate to try an advice that I read somewhere: always use the minimum CPU frequency to run a benchmark. It avoids all tricky speed changes of modern Intel CPUs. A CPU cannot run slower than its minimum frequency. If the frequency is fixed, it cannot run faster neither. So the CPU frequency issue should be fixed for all known issues: Turbo Boost, speed change when close to the heat limit (100?C), etc. Victor From victor.stinner at gmail.com Fri Sep 23 19:49:28 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 24 Sep 2016 01:49:28 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor In-Reply-To: References: Message-ID: I wrote a long article explaing how I identified the bug by testing Turbo Boost, CPU temperature, CPU frequency, etc. https://haypo.github.io/intel-cpus-part2.html Copy/paste of my conclusion: To get stable benchmarks, the safest fix for all these issues is probably to set the CPU frequency of the CPUs used by benchmarks to the minimum. It seems like nothing can reduce the frequency of a CPU below its minimum. When running benchmarks, raw timings and CPU performance don't matter. Only comparisons between benchmark results and stable performances matter. Victor From arigo at tunes.org Sat Sep 24 02:11:22 2016 From: arigo at tunes.org (Armin Rigo) Date: Sat, 24 Sep 2016 08:11:22 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor In-Reply-To: References: Message-ID: Hi Victor, On 24 September 2016 at 01:49, Victor Stinner wrote: > When running benchmarks, raw timings and CPU performance don't matter. > Only comparisons between benchmark results and stable performances > matter. IMHO this is not a very good solution. With the CPU running at, say, a fifth of its nominal performance, you can't expect that it will behave in a remotely similar way. For example, it makes the RAM appear five times faster. I would guess (but I don't know) that even the on-core L2/L3 caches are not slowed down by nearly as much as five times. As a result, it is easy to introduce changes to the CPython core that appear beneficial, but are actually detrimental, or vice-versa. For example, replacing some computation by lookups in a table may look like a good idea, when it is not. A bient?t, Armin. From solipsis at pitrou.net Sat Sep 24 03:42:42 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 24 Sep 2016 09:42:42 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor References: Message-ID: <20160924094242.345b3357@fsol> On Sat, 24 Sep 2016 08:11:22 +0200 Armin Rigo wrote: > Hi Victor, > > On 24 September 2016 at 01:49, Victor Stinner wrote: > > When running benchmarks, raw timings and CPU performance don't matter. > > Only comparisons between benchmark results and stable performances > > matter. > > IMHO this is not a very good solution. With the CPU running at, say, > a fifth of its nominal performance, you can't expect that it will > behave in a remotely similar way. For example, it makes the RAM > appear five times faster. I would guess (but I don't know) that even > the on-core L2/L3 caches are not slowed down by nearly as much as five > times. As a result, it is easy to introduce changes to the CPython > core that appear beneficial, but are actually detrimental, or > vice-versa. For example, replacing some computation by lookups in a > table may look like a good idea, when it is not. Agreed with Armin. Regards Antoine. From victor.stinner at gmail.com Tue Sep 27 09:40:42 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 27 Sep 2016 15:40:42 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor In-Reply-To: References: Message-ID: Hi, I made further tests and I understood better the issue. In short, the intel_pstate driver doesn't support NOHZ_FULL, and so the frequency of CPUs using NOHZ_FULL depends on the workload of other CPUs. This is especially true when using the powersave (default) cpu frequency governor. At least, I tested on my CPU without HWP. intel_pstate updates the Pstate of each CPU by writing into the MSR 199H. The purpose of NOHZ_FULL is to avoid any interruption, whereas intel_pstate is based on interruptions to sample performances, pick the right Pstate and write it into the MSR. To write into the MSR of the CPU 7, the kernel must run on the CPU 7. If the benchmark is CPU bound and never calls the kernel, there is no opportonity to run the intel_pstate drive. Antoine: > Ah, well, I don't have HyperThreading on my CPU, sorry. The bug can be reproduced without HyperThreading. New much simpler scenario to reproduce the bug (and my analysis of the bug): https://bugzilla.redhat.com/show_bug.cgi?id=1378529#c6 2016-09-24 8:11 GMT+02:00 Armin Rigo : > IMHO this is not a very good solution. With the CPU running at, say, > a fifth of its nominal performance, you can't expect that it will > behave in a remotely similar way. The norminal speed is 3.4 GHz. The minimum speed is 1.6 GHz. Timings are just the double between nominal and minimum speed. > As a result, it is easy to introduce changes to the CPython > core that appear beneficial, but are actually detrimental, or > vice-versa. For example, replacing some computation by lookups in a > table may look like a good idea, when it is not. Yeah, maybe, I don't know. Anyway, there are two solutions to run stable benchmarks at nominal speed: * (Use NOHZ_FULL but) Force frequency to the maximum * Don't use NOHZ_FULL Victor From arigo at tunes.org Thu Sep 29 11:11:09 2016 From: arigo at tunes.org (Armin Rigo) Date: Thu, 29 Sep 2016 17:11:09 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor In-Reply-To: References: Message-ID: Hi Victor, On 27 September 2016 at 15:40, Victor Stinner wrote: > 2016-09-24 8:11 GMT+02:00 Armin Rigo : >> IMHO this is not a very good solution. With the CPU running at, say, >> a fifth of its nominal performance, you can't expect that it will >> behave in a remotely similar way. > > The norminal speed is 3.4 GHz. The minimum speed is 1.6 GHz. Timings > are just the double between nominal and minimum speed. On my laptop, the speed ranges between 500MHz and 2300MHz. > * (Use NOHZ_FULL but) Force frequency to the maximum > * Don't use NOHZ_FULL I think we should force the frequency to a single number which is the maximum that can be reasonably sustained (I mean, not the overclocked speed done by some CPUs for short amounts of time). A bient?t, Armin. From victor.stinner at gmail.com Thu Sep 29 11:56:05 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 29 Sep 2016 17:56:05 +0200 Subject: [Speed] intel_pstate C0 bug on isolated CPUs with the performance governor In-Reply-To: References: Message-ID: 2016-09-29 17:11 GMT+02:00 Armin Rigo : > On my laptop, the speed ranges between 500MHz and 2300MHz. Oh I see, on this computer the difference can be up to 5x slower! >> * (Use NOHZ_FULL but) Force frequency to the maximum >> * Don't use NOHZ_FULL > > I think we should force the frequency to a single number which is the > maximum that can be reasonably sustained In my experience, the CPU is fine when running at the nominal speed. I'm talking about the value displayed in the CPU model in /cpu/cpuinfo (2.9 GHz): $ grep 'model name' /proc/cpuinfo model name : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz > (I mean, not the overclocked speed done by some CPUs for short amounts of time). This is the Turbo Mode. It can be disabled in the BIOS or by writing 1 into /sys/devices/system/cpu/intel_pstate/no_turbo By the way, the most reliable tool that I found to read the CPU frequency is turbostat. I used APERF and MPERF counters to compute the "Busy MHz". You may also try "cpupower monitor" which is similar. Victor From victor.stinner at gmail.com Fri Sep 30 10:52:35 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 30 Sep 2016 16:52:35 +0200 Subject: [Speed] perf 0.7.12: --python and --compare-to options Message-ID: Hi, I always wanted to be able to compare the performance of two Python versions using timeit *in a single command*. So I just implemented it! I added --python and --compare-to options. Real example to show the new "timeit --compared-to" feature: --- $ export PYTHONPATH=~/prog/GIT/perf $ ./python-resize -m perf timeit --inherit-environ=PYTHONPATH --compare-to=./python-ref -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];' --rigorous python-ref: ........................................ 77.6 us +- 1.8 us python-resize: ........................................ 74.8 us +- 1.9 us Median +- std dev: [python-ref] 77.6 us +- 1.8 us -> [python-resize] 74.8 us +- 1.9 us: 1.04x faster --- http://bugs.python.org/issue28199#msg277755 Changes between 0.7.11 and 0.7.12: * Add ``--python`` command line option * ``timeit``: add ``--name``, ``--inner-loops`` and ``--compare-to`` options * TextRunner don't set CPU affinity of the main process, only on worker processes. It may help a little bit when using NOHZ_FULL. * metadata: add ``boot_time`` and ``uptime`` on Linux * metadata: add idle driver to ``cpu_config`` Victor