From victor.stinner at gmail.com  Mon Oct 10 20:22:20 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 11 Oct 2016 02:22:20 +0200
Subject: [Speed] Performance 0.3 released with 10 new benchmarks
Message-ID: <CAMpsgwYvbMAytHS1ySmy7n6Sv_9BThRikHAStZp=86doA9nKoQ@mail.gmail.com>

Hi,

I just released performance 0.3, the Python benchmark suite, with 10
new benchmarks from the PyPy benchmark suite:
https://github.com/python/performance

Version 0.3.0 changelog.

New benchmarks:

* Add ``crypto_pyaes``: Benchmark a pure-Python implementation of the AES
  block-cipher in CTR mode using the pyaes module (version 1.6.0). Add
  ``pyaes`` dependency.
* Add ``sympy``: Benchmark on SymPy. Add ``scipy`` dependency.
* Add ``scimark`` benchmark
* Add ``deltablue``: DeltaBlue benchmark
* Add ``dulwich_log``: Iterate on commits of the asyncio Git repository using
  the Dulwich module. Add ``dulwich`` (and ``mpmath``) dependencies.
* Add ``pyflate``: Pyflate benchmark, tar/bzip2 decompressor in pure
  Python
* Add ``sqlite_synth`` benchmark: Benchmark Python aggregate for SQLite
* Add ``genshi`` benchmark: Render template to XML or plain text using the
  Genshi module. Add ``Genshi`` dependency.
* Add ``sqlalchemy_declarative`` and ``sqlalchemy_imperative`` benchmarks:
  SQLAlchemy Declarative and Imperative benchmarks using SQLite. Add
  ``SQLAlchemy`` dependency.

Enhancements:

* ``compare`` command now fails if the performance versions are different
* ``nbody``: add ``--reference`` and ``--iterations`` command line options.
* ``chaos``: add ``--width``, ``--height``, ``--thickness``, ``--filename``
  and ``--rng-seed`` command line options
* ``django_template``: add ``--table-size`` command line option
* ``json_dumps``: add ``--cases`` command line option
* ``pidigits``: add ``--digits`` command line option
* ``raytrace``: add ``--width``, ``--height`` and ``--filename`` command line
  options
* Port ``html5lib`` benchmark to Python 3
* Enable ``pickle_pure_python`` and ``unpickle_pure_python`` on Python 3
  (code was already compatible with Python 3)
* Creating the virtual environment doesn't inherit environment variables
  (especially ``PYTHONPATH``) by default anymore: ``--inherit-environ``
  command line option must now be used explicitly.

Bugfixes:

* ``chaos`` benchmark now also reset the ``random`` module at each sample
  to get more reproductible benchmark results
* Logging benchmarks now truncate the in-memory stream before each benchmark
  run

Rename benchmarks:

* Rename benchmarks to get a consistent name between the command line and
  benchmark name in the JSON file.
* Rename pickle benchmarks:

   - ``slowpickle`` becomes ``pickle_pure_python``
   - ``slowunpickle`` becomes ``unpickle_pure_python``
   - ``fastpickle`` becomes ``pickle``
   - ``fastunpickle`` becomes ``unpickle``

 * Rename ElementTree benchmarks: replace ``etree_`` prefix with
   ``xml_etree_``.
 * Rename ``hexiom2`` to ``hexiom_level25`` and explicitly pass ``--level=25``
   parameter
 * Rename ``json_load`` to ``json_loads``
 * Rename ``json_dump_v2`` to ``json_dumps`` (and remove the deprecated
   ``json_dump`` benchmark)
 * Rename ``normal_startup`` to ``python_startup``, and ``startup_nosite``
   to ``python_startup_no_site``
 * Rename ``threaded_count`` to ``threading_threaded_count``,
   rename ``iterative_count`` to ``threading_iterative_count``
 * Rename logging benchmarks:

   - ``silent_logging`` to ``logging_silent``
   - ``simple_logging`` to ``logging_simple``
   - ``formatted_logging`` to ``logging_format``

Minor changes:

* Update dependencies
* Remove broken ``--args`` command line option.

Victor

From victor.stinner at gmail.com  Mon Oct 10 20:26:44 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 11 Oct 2016 02:26:44 +0200
Subject: [Speed] Performance 0.3 released with 10 new benchmarks
In-Reply-To: <CAMpsgwYvbMAytHS1ySmy7n6Sv_9BThRikHAStZp=86doA9nKoQ@mail.gmail.com>
References: <CAMpsgwYvbMAytHS1ySmy7n6Sv_9BThRikHAStZp=86doA9nKoQ@mail.gmail.com>
Message-ID: <CAMpsgwaJuaVNCLfA2WearvZyp10QYDXUD=pOxQz82BsqSpqTOw@mail.gmail.com>

I didn't copy/paste code from PyPy benchmaks directly. I updated 3rd
party dependencies, I updated the code to use the perf API, and
sometimes I even fixed bugs in the benchmarks.

2016-10-11 2:22 GMT+02:00 Victor Stinner <victor.stinner at gmail.com>:
> * Add ``sqlalchemy_declarative`` and ``sqlalchemy_imperative`` benchmarks:
>   SQLAlchemy Declarative and Imperative benchmarks using SQLite. Add
>   ``SQLAlchemy`` dependency.

For these two new benchmarks, it's unclear to me if the purpose of the
benchmark is to test INSERT, SELECT or INSERT+SELECT.

Currently, the benchmark test INSERT+SELECT.

Compared to the PyPy benchmark, the benchmark now drops all rows of
the tables before each run to get more reproductible timings.

Victor

From victor.stinner at gmail.com  Wed Oct 19 12:55:14 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 19 Oct 2016 18:55:14 +0200
Subject: [Speed] Latest enhancements of perf 0.8.1 and performance 0.3.1
Message-ID: <CAMpsgwYhsEKJ=o-Vc2u8_e-h3=PWq8Fx4y8PU8yVbtXJy2Rijw@mail.gmail.com>

Hi,

Latest enhancements:

1) perf supports multiple benchmarks per script
2) perf calibrates the benchmark in a dedicated process
3) new --duplication option to perf timeit

(1)

I solved an old limitaton of my perf module: since perf 0.8, it's now
possible to run multiple benchmarks in a simple script. Example of
script:
--
runner = perf.Runner()
runner.bench_func('dict.get', dict_get, dico, keys)
runner.bench_func('try/except', try_except, dico, keys)
---

The runner spawns N process for the first benchmark + N process for
the second benchmark. The trick is to pass an option --worker-task
option to the worker process. In the worker, bench_func() does nothing
(return None) if the "worker task" counter doesn't match.

I rewrote the API in perf 0.8 to simpliy it and fix design issues. It
should be the latest large API change before perf 1.0 which is
expected before the end of the year.

(2)

Simple change but important one to enhance a little bit more the
reliability of benchmarks, especially on Python implementations having
a JIT: benchmark calibration is now done in a dedicated process.

All worker processes (computing samples) should now run exactly the
same workload. Before, the first worker was different, it ran a few
more iterations, because of the calibration.

(3)

Another simple but useful change for the shortest microbenchmarks: I
added a --duplicate option to timeit. Example:
------
$ python3 -m perf timeit -s 'x=1;y=2' 'x+y'
.....................
Median +- std dev: 26.7 ns +- 2.0 ns

$ python3 -m perf timeit -s 'x=1;y=2' 'x+y' --duplicate=1000
.....................
Median +- std dev: 19.2 ns +- 0.4 ns
------

Duplicating the statement 1000x reduces the cost of the outer loop by 28%.

FYI I wrote this feature to help me to take a decision on the old and
dummy "1+1" optimization for Python 3:
https://bugs.python.org/issue21955 (issue open since July 2014).
(Spoiler: I plan to collect benchmark results to explain that the
micro optimization is useful, but I'm not 100% sure that it's useless
right now :-))

Victor

From victor.stinner at gmail.com  Thu Oct 20 06:59:46 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 20 Oct 2016 12:59:46 +0200
Subject: [Speed] Fwd: Benchmarking Python and micro-optimizations
In-Reply-To: <CAMpsgwYsAXJgHfJH_hDH6pj13pHacMWYbi_LExLoXdcoUo6F-Q@mail.gmail.com>
References: <CAMpsgwYsAXJgHfJH_hDH6pj13pHacMWYbi_LExLoXdcoUo6F-Q@mail.gmail.com>
Message-ID: <CAMpsgwYpxf2_GHb=5a7jjr3EjEicg2Mo1ZBR2ddcbxb=WCWc8g@mail.gmail.com>

If you are not subscribed to the Python-Dev mailing list, here is the
copy of the email I just sent.

Victor


---------- Forwarded message ----------
From: Victor Stinner <victor.stinner at gmail.com>
Date: 2016-10-20 12:56 GMT+02:00
Subject: Benchmarking Python and micro-optimizations
To: Python Dev <Python-Dev at python.org>


Hi,

Last months, I worked a lot on benchmarks. I ran benchmarks, analyzed
results in depth (up to the hardware and kernel drivers!), I wrote new
tools and enhanced existing tools.

* I wrote a new perf module which runs benchmarks in a reliable way
and contains a LOT of features: collect metadata, JSON file format,
commands to compare, render an histogram, etc.

* I rewrote the Python benchmark suite: the old benchmarks Mercurial
repository moved to a new performance GitHub project which uses my
perf module and contains more benchmarks.

* I also made minor enhancements to timeit in Python 3.7 -- some dev
don't want major changes to not "break the backward compatibility".

For timeit, I suggest to use my perf tool which includes a reliable
timeit command and has much more features like --duplicate (repeat the
statements to reduce the cost of the outer loop) and --compare-to
(compare two versions of Python), but also all builtin perf features
(JSON output, statistics, histogram, etc.).

I added benchmarks from PyPy and Pyston benchmark suites to
performance: performance 0.3.1 contains 51 benchmark scripts which run
a total of 121 benchmarks. Example of tested Python modules:

* SQLAlchemy
* Dulwich (full Git implementation in Python)
* Mercurial (currently only the startup time)
* html5lib
* pyaes (AES crypto cipher in pure Python)
* sympy
* Tornado (HTTP client and server)
* Django (sadly, only the template engine right now, Pyston contains
HTTP benchmarks)
* pathlib
* spambayes

More benchmarks will be added later. It would be nice to add
benchmarks on numpy for example, numpy is important for a large part
of our community.

All these (new or updated) tools can now be used to take smarter
decisions on optimizations. Please don't push any optimization anymore
without providing reliable benchmark results!


My first major action was to close the latest attempt to
micro-optimize int+int in Python/ceval.c,
http://bugs.python.org/issue21955 : I closed the issue as rejected,
because there is no significant speedup on benchmarks other than two
(tiny) microbenchmarks. To make sure that no one looses its time on
trying to micro-optimize int+int, I even added a comment to
Python/ceval.c :-)

   https://hg.python.org/cpython/rev/61fcb12a9873
   "Please don't try to micro-optimize int+int"


The perf and performance are now well tested: Travis CI runs tests on
the new commits and pull requests, and the "tox" command can be used
locally to test different Python versions, pep8, doc, ... in a single
command.


Next steps:

* Run performance 0.3.1 on speed.python.org: the benchmark runner is
currently stopped (and still uses the old benchmarks project). The
website part may be updated to allow to download full JSON files which
includes *all* information (all timings, metadata and more).

* I plan to run performance on CPython 2.7, CPython 3.7, PyPy and PyPy
3. Maybe also CPython 3.5 and CPython 3.6 if they don't take too much
resources.

* Later, we can consider adding more implementations of Python:
Jython, IronPython, MicroPython, Pyston, Pyjion, etc. All benchmarks
should be run on the same hardware to be comparable.

* Later, we might also allow other projects to upload their own
benchmark results, but we should find a solution to groups benchmark
results per benchmark runner (ex: at least by the hostname, perf JSON
contains the hostname) to not compare two results from two different
hardware

* We should continue to add more benchmarks to the performance
benchmark suite, especially benchmarks more representative of real
applications (we have enough microbenchmarks!)


Links:

* perf: http://perf.readthedocs.io/
* performance: https://github.com/python/performance
* Python Speed mailing list: https://mail.python.org/mailman/listinfo/speed
* https://speed.python.org/ (currently outdated, and don't use performance yet)

See https://pypi.python.org/pypi/performance which contains even more
links to Python benchmarks (PyPy, Pyston, Numba, Pythran, etc.)

Victor

From victor.stinner at gmail.com  Sat Oct 22 02:47:24 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sat, 22 Oct 2016 08:47:24 +0200
Subject: [Speed] Benchmarking Python and micro-optimizations
In-Reply-To: <CAMpsgwYsAXJgHfJH_hDH6pj13pHacMWYbi_LExLoXdcoUo6F-Q@mail.gmail.com>
References: <CAMpsgwYsAXJgHfJH_hDH6pj13pHacMWYbi_LExLoXdcoUo6F-Q@mail.gmail.com>
Message-ID: <CAMpsgwbyAfovmZ_99q68G8AMa3Xx3ZLpZGLg0HWFk7Fjgf5-BQ@mail.gmail.com>

Hi,

I removed all old benchmarks results and I started to run manually
benchmarks. The timeline view is interesting to investigate
performance regression:
https://speed.python.org/timeline/#/?exe=3&ben=grid&env=1&revs=50&equid=off&quarts=on&extr=on

For example, it seems like call_method became slower between Oct 9 and
Oct 20: 35.9 ms => 59.9 ms:
https://speed.python.org/timeline/#/?exe=3&ben=call_method&env=1&revs=50&equid=off&quarts=on&extr=on

I don't know well the hardware of the benchmark runner, so maybe it's
an issue with the server running benchmarks?

Victor