From victor.stinner at gmail.com Wed Jun 1 21:19:32 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 2 Jun 2016 03:19:32 +0200 Subject: [Speed] A new perf module: toolkit to write benchmarks Message-ID: Hi, I started to write blog posts on stable benchmarks: 1) https://haypo.github.io/journey-to-stable-benchmark-system.html 2) https://haypo.github.io/journey-to-stable-benchmark-deadcode.html 3) https://haypo.github.io/journey-to-stable-benchmark-average.html One important point is that minimum is commonly used in Python benchmarks, whereas it is a bad practice to get a stable benchmark. I started to work on a toolkit to write benchmarks, the new "perf" module: http://perf.readthedocs.io/en/latest/ https://github.com/haypo/perf I used timeit as a concrete use case, since timeit is popular and badly implemented. timeit currently uses 1 process running the microbenchmarks 3 times and take the minimum. timeit is *known* to be unstable, and the common advice is to run it at least 3 times and again take the minimum of the minimum. Example of links about timeit being unstable: * https://mail.python.org/pipermail/python-dev/2012-August/121379.html * https://bugs.python.org/issue23693 * https://bugs.python.org/issue6422 (not directly related) Moreover, the timeit module disables the garbage collector which is also wrong. It's wrong because it's rare to disable the GC in applications. My goal for the perf module is to provide basic features and then reuse it in existing benchmarks: * mean() and stdev() to display result * clock chosen for benchmark * result classes to store numbers * etc. Work in progress: * new implementation of timeit using multiple processes * perf.metadata module: collect various information about Python, the system, etc. * file format to store numbers and metadata I'm interested by the very basic perf.py internal text format: one timing per line, that's all. But it's incomplete, the "loops" informaiton is not stored. Maybe a binary format is better? I don't know yet. It should be possible to cumulate files of multiple processes. I'm also interested to implement a generic "rerun" command to add more samples if a benchmark doesn't look stable enough. perf.timeit looks more stable than timeit, the CLI is basically the same: replace "-m timeit" with "-m perf.timeit". 5 timeit output ("1000000 loops, best of 3: ... per loop"): * 0.247 usec * 0.252 usec * 0.247 usec * 0.251 usec * 0.251 usec It's disturbing to get 3 different "minimums" :-/ 5 perf.timeit outputs ("Average: 25 runs x 3 samples x 10^6 loops: ..."): * 250 ns +- 3 ns * 250 ns +- 3 ns * 251 ns +- 3 ns * 251 ns +- 4 ns * 251 ns +- 3 ns Note: I also got " 258 ns +- 17 ns" when I opened a webpage in Firefox while the benchmark is running. Note: I ran these benchmarks on a regular Linux without any specific tuning. ASLR is enabled, but the system was idle. Victor From solipsis at pitrou.net Thu Jun 2 03:17:18 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 2 Jun 2016 09:17:18 +0200 Subject: [Speed] A new perf module: toolkit to write benchmarks References: Message-ID: <20160602091718.74562cb4@fsol> On Thu, 2 Jun 2016 03:19:32 +0200 Victor Stinner wrote: > I'm interested by the very basic perf.py internal text format: one > timing per line, that's all. But it's incomplete, the "loops" > informaiton is not stored. Maybe a binary format is better? I don't > know yet. Just use a simple JSON format. Regards Antoine. From arigo at tunes.org Thu Jun 2 04:38:02 2016 From: arigo at tunes.org (Armin Rigo) Date: Thu, 2 Jun 2016 10:38:02 +0200 Subject: [Speed] A new perf module: toolkit to write benchmarks In-Reply-To: References: Message-ID: Hi Victor, On 2 June 2016 at 03:19, Victor Stinner wrote: > 5 timeit output ("1000000 loops, best of 3: ... per loop"): > > * 0.247 usec > * 0.252 usec > * 0.247 usec > * 0.251 usec > * 0.251 usec > > 5 perf.timeit outputs ("Average: 25 runs x 3 samples x 10^6 loops: ..."): > > * 250 ns +- 3 ns > * 250 ns +- 3 ns > * 251 ns +- 3 ns > * 251 ns +- 4 ns > * 251 ns +- 3 ns Looks good. IMHO the important bit is that `timeit` is simple to use, readily available, and gives just a number, which makes it very attractive for people. Your output would achieve the same result (with the `+-` added, which is fine) assuming that it eventually replaces `timeit` in the standard library. I know there are many good reasons for why getting just a single number is not enough, but I'd say that we still need to achieve the best practical results given that constrain. The results you posted above seem to show that `perf.timeit` is better than `timeit` at doing that, and I believe that it's a great step forward. A bient?t, Armin. From victor.stinner at gmail.com Thu Jun 2 04:58:35 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 2 Jun 2016 10:58:35 +0200 Subject: [Speed] A new perf module: toolkit to write benchmarks In-Reply-To: References: Message-ID: 2016-06-02 10:38 GMT+02:00 Armin Rigo : > Looks good. IMHO the important bit is that `timeit` is simple to use, > readily available, and gives just a number, which makes it very > attractive for people. By default min & max are hidden. You can show them using -v option. To make the output even simpler, maybe the standard deviation can be displayed "in english". Something like: * "Average: 250 ns +- 3 ns" => "Average: 250 ns (stable)", or just "Average: 250 ns" * "Average: 250 ns +- 120 ns" => "Average: 250 ns (not reliable, try again on an idle system)" Usually, timeit it used to compare two versions of Python. Maybe we should focus on this use case, and check if the difference is significant, as perf.py does? By default, perf.py does *not* display any number if the difference is not significant. I like this behaviour, even if it can be surprising for the first time. For the CLI, we can extend timeit CLI to accept the path/name of two python binaries. Or we can use something like pybench to store result into files and then load & compare two files. Victor From solipsis at pitrou.net Thu Jun 2 07:29:14 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 2 Jun 2016 13:29:14 +0200 Subject: [Speed] A new perf module: toolkit to write benchmarks References: Message-ID: <20160602132914.6b57c9f5@fsol> On Thu, 2 Jun 2016 10:58:35 +0200 Victor Stinner wrote: > > Usually, timeit it used to compare two versions of Python. timeit is used for many different things, including comparing two versions of Python, but not only. > For the CLI, we can extend timeit CLI to accept the path/name of two > python binaries. That sounds reasonable. Regards Antoine. From arigo at tunes.org Thu Jun 2 07:53:07 2016 From: arigo at tunes.org (Armin Rigo) Date: Thu, 2 Jun 2016 13:53:07 +0200 Subject: [Speed] A new perf module: toolkit to write benchmarks In-Reply-To: References: Message-ID: Hi Victor, On 2 June 2016 at 10:58, Victor Stinner wrote: > Usually, timeit it used to compare two versions of Python. That's not the use case I'm focusing about here: timeit is also used by Mr. Random Programmer to tweak their Python code to improve the performance. (Often, it's the performance of non-representative microbenchmarks, but well, better have a tool that at least get some saner results than the current timeit.) A bient?t, Armin. From victor.stinner at gmail.com Thu Jun 2 09:22:28 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 2 Jun 2016 15:22:28 +0200 Subject: [Speed] A new perf module: toolkit to write benchmarks In-Reply-To: <20160602091718.74562cb4@fsol> References: <20160602091718.74562cb4@fsol> Message-ID: 2016-06-02 9:17 GMT+02:00 Antoine Pitrou : > Just use a simple JSON format. Yeah, Python 2.7 includes are JSON parser and JSON is human readble (but not really designed to be modified by a human). I had a technical issue: I wanted to produce JSON output *and* keep nice human output at the same time. I found a nice trick: by default write human output to stdout, but write JSON to stdout and human output to stderr in JSON mode. At the end, you get a simple CLI: --- $ python3 -m perf.timeit --json 1+1 > run.json ......................... Average: 18.3 ns +- 0.3 ns (25 runs x 3 samples x 10^7 loops) $ python3 -m perf < run.json Average: 18.3 ns +- 0.3 ns (25 runs x 3 samples x 10^7 loops) --- The JSON can contain metadata as well: --- $ python3 -m perf.timeit --metadata --json 1+1 > run.json Metadata: - aslr: enabled - cpu_count: 4 - (...) ......................... Average: 18.2 ns +- 0.0 ns (25 runs x 3 samples x 10^7 loops) $ python3 -m perf < run.json Metadata: - aslr: enabled - cpu_count: 4 - (...) Average: 18.2 ns +- 0.0 ns (25 runs x 3 samples x 10^7 loops) --- There are two kinds of objects: a single run, or a result composed of multiple runs. The format is one JSON object per line. Example of single runs using individual JSON files and then combine them: --- $ python3 -m perf.timeit --raw --json 1+1 > run1.json warmup 1: 18.3 ns sample 1: 18.3 ns sample 2: 18.3 ns sample 3: 18.3 ns $ python3 -m perf.timeit --raw --json 1+1 > run2.json warmup 1: 18.2 ns sample 1: 18.2 ns sample 2: 18.2 ns sample 3: 18.2 ns $ python3 -m perf.timeit --raw --json 1+1 > run3.json warmup 1: 18.2 ns sample 1: 18.2 ns sample 2: 18.2 ns sample 3: 18.2 ns $ python3 -m perf < run1.json # single run Average: 18.3 ns +- 0.0 ns (3 samples x 10^7 loops) $ cat run1.json run2.json run3.json | python3 -m perf # 3 runs Average: 18.2 ns +- 0.0 ns (3 runs x 3 samples x 10^7 loops) --- Victor From victor.stinner at gmail.com Tue Jun 7 09:03:46 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 7 Jun 2016 15:03:46 +0200 Subject: [Speed] perf 0.2 released, perf fork of CPython benchmark suite Message-ID: Hi, I completed the API of my small perf module and released a version 0.2: https://perf.readthedocs.io/ It is supposed to provide the basic tools to collect samples, compute the average, display the result, etc. I started to work on JSON serialization to "easily" run multiple processes. The idea is also to be split the code to produce numbers and the code to display results. I expect that we can do better to display results. See for example speed.python.org and speed.pypy.org, it's nicer than perf.py text output ;-) I also started to hack CPython benchmark suite (benchmarks repository) to use my perf module: https://hg.python.org/sandbox/benchmarks_perf I should now stop NIH and see how to merge my work with PyPy fork of benchmarks ;-) FYI I started to write the perf module because I started to write an article about the impact of CPU speed on Python microbenchmarks, and I wanted to have a smart timeit running multiple processes. Since it was cool to work on such project, I started to hack benchmarks, but maybe I gone too far and I should look at PyPy's benchmark instead ;-) Victor From victor.stinner at gmail.com Fri Jun 10 06:50:25 2016 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 10 Jun 2016 12:50:25 +0200 Subject: [Speed] perf 0.3 released Message-ID: Hi, I just released perf 0.3. Major changes: - "python -m perf" CLI now has 3 commands: show, compare and compare_to. Compare commands says if the difference is significant (I copied the code from perf.py) - TextRunner is now able to spawn child processes, parse command arguments and more - If TextRunner detects isolated CPUs, it automatically pins the CPUs of the worker processes to isolated CPUs - Add ``--json-file`` command line option - Add TextRunner.bench_sample_func() method: the sample function is responsible to measure the elapsed time, useful for microbenchmarks - Enhance a lot of the documentation Writing a benchmark now only takes one line: "perf.text_runner.TextRunner().bench_func(func)"! Full example: --- import time import perf.text_runner def func(): time.sleep(0.001) perf.text_runner.TextRunner().bench_func(func) --- I looked at PyPy benchmarks: https://bitbucket.org/pypy/benchmarks Results can also be serialized to JSON, but the serialization is only done at the end: the final result is serialized. It's not possible to save each run in a JSON file. Running multiple processes is not supported neither. With perf, the final JSON contains all data: all runs, all samples even warmup samples. perf now also collects metadata in each worker process. So it is more safer to compare runs since it's possible to manually check when and how the worker executed the benchmark. For example, the CPU affinity is now saved in metadata. For example, "python -m perf.timeit" now saves the setup and statements in metadata. With perf 0.3, TextRunner now also includes a builtin calibration to compute the number of outter loop iteartions: repeat each sample so it takes between 100 ms and 1 sec (min/max are configurable). Victor From contrebasse at gmail.com Sat Jun 11 18:20:18 2016 From: contrebasse at gmail.com (Joseph Martinot-Lagarde) Date: Sat, 11 Jun 2016 22:20:18 +0000 (UTC) Subject: [Speed] Performance comparison of regular expression engines References: <56DB26EA.3070005@gmail.com> Message-ID: Serhiy Storchaka writes: > The first column is the searched pattern. The second column is the > number of found matches (for control, it should be the same with all > engines and versions). The third column, under the "re" header is the > time in milliseconds. The column under the "str.find" header is the time > of searching without using regular expressions. It would be easier to read with a constant number of digits after the comma, so that numbers are better aligned.