From zachary.ware+pydev at gmail.com  Thu Feb  4 01:48:21 2016
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Thu, 4 Feb 2016 00:48:21 -0600
Subject: [Speed] speed.python.org
Message-ID: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>

I'm happy to announce that speed.python.org is finally functional!
There's not much there yet, as each benchmark builder has only sent
one result so far (and one of those involved a bit of cheating on my
part), but it's there.

There are likely to be rough edges that still need smoothing out.
When you find them, please report them at
https://github.com/zware/codespeed/issues or on the speed at python.org
mailing list.

Many thanks to Intel for funding the work to get it set up and to
Brett Cannon and Benjamin Peterson for their reviews.

Happy benchmarking,
-- 
Zach

From victor.stinner at gmail.com  Thu Feb  4 03:19:42 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 4 Feb 2016 09:19:42 +0100
Subject: [Speed] [Python-Dev] speed.python.org
In-Reply-To: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
References: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
Message-ID: <CAMpsgwb5z6uywFtCA5BQzhsirC96J1joLymEFjTspi2buw3Agw@mail.gmail.com>

Great!

2016-02-04 7:48 GMT+01:00 Zachary Ware <zachary.ware+pydev at gmail.com>:
> I'm happy to announce that speed.python.org is finally functional!
> There's not much there yet, as each benchmark builder has only sent
> one result so far (and one of those involved a bit of cheating on my
> part), but it's there.
>
> There are likely to be rough edges that still need smoothing out.
> When you find them, please report them at
> https://github.com/zware/codespeed/issues or on the speed at python.org
> mailing list.
>
> Many thanks to Intel for funding the work to get it set up and to
> Brett Cannon and Benjamin Peterson for their reviews.
>
> Happy benchmarking,
> --
> Zach
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com

From ncoghlan at gmail.com  Thu Feb  4 08:41:59 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 4 Feb 2016 23:41:59 +1000
Subject: [Speed] [Python-Dev] speed.python.org
In-Reply-To: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
References: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
Message-ID: <CADiSq7efW0ZFDZeiF_K=rX9cN39KYtAVUqt6p3rzwwyeSx3pYg@mail.gmail.com>

On 4 February 2016 at 16:48, Zachary Ware <zachary.ware+pydev at gmail.com> wrote:
> I'm happy to announce that speed.python.org is finally functional!
> There's not much there yet, as each benchmark builder has only sent
> one result so far (and one of those involved a bit of cheating on my
> part), but it's there.
>
> There are likely to be rough edges that still need smoothing out.
> When you find them, please report them at
> https://github.com/zware/codespeed/issues or on the speed at python.org
> mailing list.
>
> Many thanks to Intel for funding the work to get it set up and to
> Brett Cannon and Benjamin Peterson for their reviews.

This is great to hear!

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Thu Feb  4 08:46:04 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 4 Feb 2016 23:46:04 +1000
Subject: [Speed] [Python-Dev] speed.python.org
In-Reply-To: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
References: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
Message-ID: <CADiSq7e7U1Xv+5hy-4CH1Nz4+mrACnr1YURs_zsup8MTrDfHyg@mail.gmail.com>

On 4 February 2016 at 16:48, Zachary Ware <zachary.ware+pydev at gmail.com> wrote:
> I'm happy to announce that speed.python.org is finally functional!
> There's not much there yet, as each benchmark builder has only sent
> one result so far (and one of those involved a bit of cheating on my
> part), but it's there.
>
> There are likely to be rough edges that still need smoothing out.
> When you find them, please report them at
> https://github.com/zware/codespeed/issues or on the speed at python.org
> mailing list.
>
> Many thanks to Intel for funding the work to get it set up and to
> Brett Cannon and Benjamin Peterson for their reviews.

Heh, cdecimal utterly demolishing the old pure Python decimal module
on the telco benchmark means normalising against CPython 3.5 rather
than 2.7 really isn't very readable :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From brett at python.org  Fri Feb  5 13:07:03 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 05 Feb 2016 18:07:03 +0000
Subject: [Speed] [Python-Dev] speed.python.org
In-Reply-To: <CADiSq7e7U1Xv+5hy-4CH1Nz4+mrACnr1YURs_zsup8MTrDfHyg@mail.gmail.com>
References: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
 <CADiSq7e7U1Xv+5hy-4CH1Nz4+mrACnr1YURs_zsup8MTrDfHyg@mail.gmail.com>
Message-ID: <CAP1=2W7q9WEcm31q3hrHXfKkq04_5OOe4N58ao8sqkMMGhFtNA@mail.gmail.com>

On Thu, 4 Feb 2016 at 05:46 Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 4 February 2016 at 16:48, Zachary Ware <zachary.ware+pydev at gmail.com>
> wrote:
> > I'm happy to announce that speed.python.org is finally functional!
> > There's not much there yet, as each benchmark builder has only sent
> > one result so far (and one of those involved a bit of cheating on my
> > part), but it's there.
> >
> > There are likely to be rough edges that still need smoothing out.
> > When you find them, please report them at
> > https://github.com/zware/codespeed/issues or on the speed at python.org
> > mailing list.
> >
> > Many thanks to Intel for funding the work to get it set up and to
> > Brett Cannon and Benjamin Peterson for their reviews.
>
> Heh, cdecimal utterly demolishing the old pure Python decimal module
> on the telco benchmark means normalising against CPython 3.5 rather
> than 2.7 really isn't very readable :)
>

I find viewing the graphs using the horizontal layout is much easier to
read (the bars are a lot thicker and everything zooms in more).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160205/c5151714/attachment.html>

From brett at python.org  Fri Feb  5 13:29:18 2016
From: brett at python.org (Brett Cannon)
Date: Fri, 05 Feb 2016 18:29:18 +0000
Subject: [Speed] [Python-Dev] speed.python.org
In-Reply-To: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
References: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
Message-ID: <CAP1=2W6b=tipTEvkN_Ve_m3zmk0bGC3t_nTsGfxfycS1nOJR_Q@mail.gmail.com>

To piggyback on Zach's speed.python.org announcement, we will most likely
be kicking off a discussion of redoing the benchmark suite, tweaking the
test runner, etc. over on the speed@ ML. Those of us who have been doing
perf work lately have found some shortcoming we would like to fix in our
benchmarks suite, so if you want to participate in that discussion, please
join speed@ by next week.

On Wed, 3 Feb 2016 at 22:49 Zachary Ware <zachary.ware+pydev at gmail.com>
wrote:

> I'm happy to announce that speed.python.org is finally functional!
> There's not much there yet, as each benchmark builder has only sent
> one result so far (and one of those involved a bit of cheating on my
> part), but it's there.
>
> There are likely to be rough edges that still need smoothing out.
> When you find them, please report them at
> https://github.com/zware/codespeed/issues or on the speed at python.org
> mailing list.
>
> Many thanks to Intel for funding the work to get it set up and to
> Brett Cannon and Benjamin Peterson for their reviews.
>
> Happy benchmarking,
> --
> Zach
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160205/548e6f58/attachment.html>

From ncoghlan at gmail.com  Sat Feb  6 02:05:26 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 6 Feb 2016 17:05:26 +1000
Subject: [Speed] [Python-Dev] speed.python.org
In-Reply-To: <CAP1=2W7q9WEcm31q3hrHXfKkq04_5OOe4N58ao8sqkMMGhFtNA@mail.gmail.com>
References: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
 <CADiSq7e7U1Xv+5hy-4CH1Nz4+mrACnr1YURs_zsup8MTrDfHyg@mail.gmail.com>
 <CAP1=2W7q9WEcm31q3hrHXfKkq04_5OOe4N58ao8sqkMMGhFtNA@mail.gmail.com>
Message-ID: <CADiSq7f_sEVBCj+kRWS+=Pgv75HaObJ_QJ-P=7jzHDH9e6neZQ@mail.gmail.com>

On 6 February 2016 at 04:07, Brett Cannon <brett at python.org> wrote:
> On Thu, 4 Feb 2016 at 05:46 Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Heh, cdecimal utterly demolishing the old pure Python decimal module
>> on the telco benchmark means normalising against CPython 3.5 rather
>> than 2.7 really isn't very readable :)
>
> I find viewing the graphs using the horizontal layout is much easier to read
> (the bars are a lot thicker and everything zooms in more).

That comment was based on the horizontal layout - the telco benchmark
runs ~53x faster in Python 3 than it does in Python 2 (without
switching to cdecimal), so you end up with all the other benchmarks
being squashed into the leftmost couple of grid cells.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From greg at krypto.org  Sun Feb  7 02:54:27 2016
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 07 Feb 2016 07:54:27 +0000
Subject: [Speed] [Python-Dev]  speed.python.org
In-Reply-To: <CADiSq7f_sEVBCj+kRWS+=Pgv75HaObJ_QJ-P=7jzHDH9e6neZQ@mail.gmail.com>
References: <CAKJDb-P2SBTaG3X4Dix+d+RFskt+DdB_rrKDpfnod2FHWbCDaA@mail.gmail.com>
 <CADiSq7e7U1Xv+5hy-4CH1Nz4+mrACnr1YURs_zsup8MTrDfHyg@mail.gmail.com>
 <CAP1=2W7q9WEcm31q3hrHXfKkq04_5OOe4N58ao8sqkMMGhFtNA@mail.gmail.com>
 <CADiSq7f_sEVBCj+kRWS+=Pgv75HaObJ_QJ-P=7jzHDH9e6neZQ@mail.gmail.com>
Message-ID: <CAGE7PNJVwL2vMQ0Kstwr2+87R_7KEdnWUJvEcA1-RKD-B+HM+Q@mail.gmail.com>

Displaying ratios linearly rather than on a log scale axis can be
misleading depending on what you are looking for.  (feature request: allow
a log scale?)

major kudos to everyone involved in getting this setup!

On Fri, Feb 5, 2016 at 11:06 PM Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 6 February 2016 at 04:07, Brett Cannon <brett at python.org> wrote:
> > On Thu, 4 Feb 2016 at 05:46 Nick Coghlan <ncoghlan at gmail.com> wrote:
> >> Heh, cdecimal utterly demolishing the old pure Python decimal module
> >> on the telco benchmark means normalising against CPython 3.5 rather
> >> than 2.7 really isn't very readable :)
> >
> > I find viewing the graphs using the horizontal layout is much easier to
> read
> > (the bars are a lot thicker and everything zooms in more).
>
> That comment was based on the horizontal layout - the telco benchmark
> runs ~53x faster in Python 3 than it does in Python 2 (without
> switching to cdecimal), so you end up with all the other benchmarks
> being squashed into the leftmost couple of grid cells.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160207/0adfa7b0/attachment.html>

From brett at python.org  Thu Feb 11 13:31:02 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 11 Feb 2016 18:31:02 +0000
Subject: [Speed] Any changes we want to make to perf.py?
Message-ID: <CAP1=2W7vYZCi1skW59Qdc6t5EVV1gReix_-QO13cb2iLuqhsmg@mail.gmail.com>

Some people have brought up the idea of tweaking how perf.py drives the
benchmarks. I personally wonder if we should go from a elapsed time
measurement to # of executions in a set amount of time measurement to get a
more stable number that's easier to measure and will make sense even as
Python and computers get faster (I got this idea from Mozilla's Dromaeo
benchmark suite: https://wiki.mozilla.org/Dromaeo).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160211/3b72145b/attachment.html>

From brett at python.org  Thu Feb 11 13:35:29 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 11 Feb 2016 18:35:29 +0000
Subject: [Speed] Do we want to stop vendoring source of third-party
 libraries with the benchmarks?
Message-ID: <CAP1=2W5+sEBe672S_uMmsWryjC6HZm-w09Dfim--rcMtx3Mdbg@mail.gmail.com>

Maybe we should just have a requirements.txt file for Python 2 and another
for Python 3 that are pegged to specific versions? We could even install
things into a venv for isolation. If we go this route then we could make
the benchmark suite a package on PyPI and have people install the benchmark
suite and then have instructions to run pip on the requirements files that
we embed in the package. This also gets us around any potential licensing
issues with embedding third-party libraries.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160211/988886b1/attachment.html>

From brett at python.org  Thu Feb 11 13:36:33 2016
From: brett at python.org (Brett Cannon)
Date: Thu, 11 Feb 2016 18:36:33 +0000
Subject: [Speed] Should we change what benchmarks we have?
Message-ID: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>

Are we happy with the current benchmarks? Are there some we want to drop?
How about add? Do we want to have explanations as to why each benchmark is
included? A better balance of micro vs. macro benchmarks (and probably
matching groups)?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160211/c2536d7d/attachment.html>

From victor.stinner at gmail.com  Thu Feb 11 17:27:35 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 11 Feb 2016 23:27:35 +0100
Subject: [Speed] Any changes we want to make to perf.py?
In-Reply-To: <CAP1=2W7vYZCi1skW59Qdc6t5EVV1gReix_-QO13cb2iLuqhsmg@mail.gmail.com>
References: <CAP1=2W7vYZCi1skW59Qdc6t5EVV1gReix_-QO13cb2iLuqhsmg@mail.gmail.com>
Message-ID: <CAMpsgwZiggbN+7ki1s74jHBa1bmp3oeC+-O6=9zBSrMO2EF=Tg@mail.gmail.com>

I don't think that using a fixed number of iterations is good to get
stable benchmark results. I opened the following issue to discussed
that:
https://bugs.python.org/issue26275

I proposed to calibrate the number of runs and the number of loops
using time. I'm not convinced myself yet that it's a good idea.

For "runs" and "loops", I'm talking about something like that:

times = []
for run in range(runs):
 dt = time.perf_counter()
 for loop in loops:
   func()   # or python instructions
 timed.append(dt - time.perf_counter())

Victor

2016-02-11 19:31 GMT+01:00 Brett Cannon <brett at python.org>:
> Some people have brought up the idea of tweaking how perf.py drives the
> benchmarks. I personally wonder if we should go from a elapsed time
> measurement to # of executions in a set amount of time measurement to get a
> more stable number that's easier to measure and will make sense even as
> Python and computers get faster (I got this idea from Mozilla's Dromaeo
> benchmark suite: https://wiki.mozilla.org/Dromaeo).
>
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed
>

From victor.stinner at gmail.com  Thu Feb 11 17:37:34 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 11 Feb 2016 23:37:34 +0100
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
Message-ID: <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>

2016-02-11 19:36 GMT+01:00 Brett Cannon <brett at python.org>:
> Are we happy with the current benchmarks?

bm_regex8 looks unstable, but I don't know if it's an issue of the
benchmark itself or perf.py (see the other thread "[Speed] Any changes
we want to make to perf.py?").

I spent a lot of time (probably too much!) last months trying to
micro-optimize some parts of Python, specially operations on Python
int. See for example this long issue:
https://bugs.python.org/issue21955

At the end, the discussed patched only makes two benchmarks faster:
nbody & spectral_norm.

I'm disappointed because I don't know if it's worth to take these
micro-optimizations only to run two *benchmarks* faster. Are they
representative of "regular" Python code and "real-world applications"?
Or are they typical maths benchmark?

For maths, we all know that pure Python sucks and that maybe better
options are available: PyPy, numba, Cython, etc. For example, PyPy is
around 10x faster, whereas discussed micro-optimizations are 1.18x
faster in the best case (in one very specific micro-benchmark).


> Are there some we want to drop?
> How about add? Do we want to have explanations as to why each benchmark is
> included? A better balance of micro vs. macro benchmarks (and probably
> matching groups)?

For some kinds of optimizations, I consider that a micro-benchmark is
enough. I don't have strict rules. Basically, it's when you know that
the change cannot introduce slow-down in other cases, but will only
benefit on one specific case. So the best is to write a tiny benchmark
just for this case.

Victor

From victor.stinner at gmail.com  Thu Feb 11 17:39:17 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 11 Feb 2016 23:39:17 +0100
Subject: [Speed] Tool to run Python microbenchmarks
Message-ID: <CAMpsgwaBOaOW2bH9ZjQKea6Wip=3GcQzV_fPWu-Qcf4dF3XK0g@mail.gmail.com>

Hi,

To run "micro"-benchmarks on "micro"-optimizations, I started to use
timeit, but in my experience timeit it far from reliable.

When I say micro: I'm talking about a test which takes less than 1000
ns, sometimes even a few nanoseconds!

You always have to run the same micro-benchmark when timeit *at least*
5 times to find the "real" "minimum" runtime.

That's why I wrote my own tool to run microbenchmarks:
https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py

Yury suggested me to add this tool to the Python benchmark project.
I'm ok with that, but only if we rename it to "microbench.py" :-) I
wrote this tool to compare micro-optimizations with a long list of
very simple tests. The result is written into a file. Then you can
compare two files and compare more files, and maybe even compare
multiple files to a "reference". It "hides" difference smaller than 5%
to ignore the noise.

The main feature is benchmark.py is that it calibrates the benchmark
using time to choose the number of runs and number of loops. I
proposed a similar idea for perf.py:
https://bugs.python.org/issue26275

What do you think? Would this tool be useful?

Victor

From victor.stinner at gmail.com  Thu Feb 11 17:54:20 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 11 Feb 2016 23:54:20 +0100
Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark
Message-ID: <CAMpsgwbkUFBH3DAEoCJoV5KWF=dUDA3SGM_F0WyHau7w0X57vg@mail.gmail.com>

Hi,

I'm sharing with you my notes (tricks) to get more reliable benchmarks
on Linux if your CPU have multiple cores:

https://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micro-benchmarks

FYI perf.py recently got a new --affinity= optional parameter. I plan
to send a patch to automatically use /sys/devices/system/cpu/isolated
if it's not empty.

What are your "tricks" to get reliable benchmarks?

Victor

From kmod at dropbox.com  Thu Feb 11 17:36:44 2016
From: kmod at dropbox.com (Kevin Modzelewski)
Date: Thu, 11 Feb 2016 14:36:44 -0800
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
Message-ID: <CAO=oM6t8ftwgC28d=MSyfdKXuE1DVa=J6c-POfZVqLfZ12260Q@mail.gmail.com>

We on the Pyston team have created some new benchmarks which I can
recommend using; I wouldn't call them "macrobenchmarks" since they don't
test entire applications, but we've found them to be better than the
existing benchmarks, which tend to be quite microbenchmarky.  For example,
our django-templating benchmark actually exercises the django templating
system, as opposed to bm_django.py which just tests unicode concatenation.
You can find them here
https://github.com/dropbox/pyston-perf/tree/master/benchmarking/benchmark_suite
 The current ones we look at are django_template3_10x,
sqlalchemy_imperative2_10x, and pyxl_bench_10x.

On Thu, Feb 11, 2016 at 10:36 AM, Brett Cannon <brett at python.org> wrote:

> Are we happy with the current benchmarks? Are there some we want to drop?
> How about add? Do we want to have explanations as to why each benchmark is
> included? A better balance of micro vs. macro benchmarks (and probably
> matching groups)?
>
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160211/c5247455/attachment.html>

From yselivanov.ml at gmail.com  Thu Feb 11 17:50:05 2016
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 11 Feb 2016 17:50:05 -0500
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
Message-ID: <56BD101D.1090604@gmail.com>


On 2016-02-11 5:37 PM, Victor Stinner wrote:
> 2016-02-11 19:36 GMT+01:00 Brett Cannon <brett at python.org>:
>> Are we happy with the current benchmarks?
> bm_regex8 looks unstable, but I don't know if it's an issue of the
> benchmark itself or perf.py (see the other thread "[Speed] Any changes
> we want to make to perf.py?").

It's super unstable.  As well as telco -- I don't trust those benchmarks.

>
> I spent a lot of time (probably too much!) last months trying to
> micro-optimize some parts of Python, specially operations on Python
> int. See for example this long issue:
> https://bugs.python.org/issue21955
>
> At the end, the discussed patched only makes two benchmarks faster:
> nbody & spectral_norm.
>
> I'm disappointed because I don't know if it's worth to take these
> micro-optimizations only to run two *benchmarks* faster. Are they
> representative of "regular" Python code and "real-world applications"?
> Or are they typical maths benchmark?
>
> For maths, we all know that pure Python sucks and that maybe better
> options are available: PyPy, numba, Cython, etc. For example, PyPy is
> around 10x faster, whereas discussed micro-optimizations are 1.18x
> faster in the best case (in one very specific micro-benchmark).


18% is a pretty serious improvement.

I consider issue 21955 as an attempt to fix a performance regression in 
Python 3.  int+int operations in Py2 have a fast path in Python2, so 
they should have it in Python 3.

Right now, spectral_norm is 50% faster on python 2 (when compared to 3.5).

With patches from:

- #26288 (fast PyLong_AsDouble, committed),
- #26289 (faster floor division for longs, committed),
- #24165 (free list for longs, will be committed) and
- #21955 (fast path for longs in ceval, not committed)

we can make 3.6 as fast as 2.7 for numeric code.

Yes, spectral_norm is micro-benchmark, but still, there is a lot of 
python code out there that does some calculation in pure Python not 
involving numpy or pypy.  I think it's important to fix py3 for that 
kind of code.

That said, I'd like to find a better alternative to spectral-norm, 
something real, that stresses ints/floats and not using numpy.

We also need a numpy benchmark, to make sure that we don't make numpy 
code slower by optimizing CPython.


Yury

From victor.stinner at gmail.com  Thu Feb 11 18:00:41 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 12 Feb 2016 00:00:41 +0100
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <56BD101D.1090604@gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com>
Message-ID: <CAMpsgwbM5wgzUPiY+c4quQF2utRxD0tAgJaXGphZy6R0Rdm4-Q@mail.gmail.com>

2016-02-11 23:50 GMT+01:00 Yury Selivanov <yselivanov.ml at gmail.com>:
> That said, I'd like to find a better alternative to spectral-norm, something
> real, that stresses ints/floats and not using numpy.

Case Van Horsen mentioned mpmath test suite:
https://bugs.python.org/issue21955#msg259859

I extracted the slowest test and put it in a loop to the issue #21955
patches: on this patch, it's "only" around 2% faster with the patches.
I understand that the test uses "large" integers (not fitting into a
single PyLongObject digit).
https://bugs.python.org/issue21955#msg259999

I don't know if it's a good benchmark for our "generic" benchmark :-p

Victor

From solipsis at pitrou.net  Thu Feb 11 18:06:47 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Feb 2016 00:06:47 +0100
Subject: [Speed] Should we change what benchmarks we have?
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com>
Message-ID: <20160212000647.77745b24@fsol>

On Thu, 11 Feb 2016 17:50:05 -0500
Yury Selivanov <yselivanov.ml at gmail.com>
wrote:
> 
> Right now, spectral_norm is 50% faster on python 2 (when compared to 3.5).

spectral_norm is really a horrid benchmark.

> Yes, spectral_norm is micro-benchmark, but still, there is a lot of 
> python code out there that does some calculation in pure Python not 
> involving numpy or pypy.

Can you clarify "a lot"?

Regards

Antoine.


From yselivanov.ml at gmail.com  Thu Feb 11 18:16:23 2016
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 11 Feb 2016 18:16:23 -0500
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <20160212000647.77745b24@fsol>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol>
Message-ID: <56BD1647.5000009@gmail.com>


On 2016-02-11 6:06 PM, Antoine Pitrou wrote:
> On Thu, 11 Feb 2016 17:50:05 -0500
> Yury Selivanov <yselivanov.ml at gmail.com>
> wrote:
>> Right now, spectral_norm is 50% faster on python 2 (when compared to 3.5).
> spectral_norm is really a horrid benchmark.
>
>> Yes, spectral_norm is micro-benchmark, but still, there is a lot of
>> python code out there that does some calculation in pure Python not
>> involving numpy or pypy.
> Can you clarify "a lot"?

Any code that occasionally uses "int [op] int" code.  That code becomes 
faster (especially if it's small ints).  In tight loops significantly 
faster (that's what spectral_norm is doing).

Look at the pillow package, for instance [1] -- just one of the first 
packages I thought of -- something non-scientific that happens to do 
some calculations here and there.

Unless 21955 makes numpy code slower, I'm not sure why we're discussing 
this.

Yury

[1] https://github.com/python-pillow/Pillow

From victor.stinner at gmail.com  Thu Feb 11 18:24:19 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 12 Feb 2016 00:24:19 +0100
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <56BD1647.5000009@gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol>
 <56BD1647.5000009@gmail.com>
Message-ID: <CAMpsgwZ6pj=fWzSWWj6ZTZtoYNFZmJD6G-08=1ixaFPKLuJS-w@mail.gmail.com>

2016-02-12 0:16 GMT+01:00 Yury Selivanov <yselivanov.ml at gmail.com>:
> Unless 21955 makes numpy code slower, I'm not sure why we're discussing
> this.

Stefan Krah wrote that it makes the decimal module 6% slower:
https://bugs.python.org/issue21955#msg259571

Again in another message, "big slowdown for _decimal":
https://bugs.python.org/issue21955#msg259793

Victor

From solipsis at pitrou.net  Thu Feb 11 18:26:58 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Feb 2016 00:26:58 +0100
Subject: [Speed] Should we change what benchmarks we have?
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol>
 <56BD1647.5000009@gmail.com>
Message-ID: <20160212002658.218a06ef@fsol>

On Thu, 11 Feb 2016 18:16:23 -0500
Yury Selivanov <yselivanov.ml at gmail.com>
wrote:
> >
> >> Yes, spectral_norm is micro-benchmark, but still, there is a lot of
> >> python code out there that does some calculation in pure Python not
> >> involving numpy or pypy.
> > Can you clarify "a lot"?
> 
> Any code that occasionally uses "int [op] int" code.  That code becomes 
> faster (especially if it's small ints).  In tight loops significantly 
> faster (that's what spectral_norm is doing).

I agree for int addition, subtraction, perhaps multiplication. General
math on small integers is not worth really improving, though, IMO.

(and I don't think spectral_norm is representative of anything)

> Look at the pillow package, for instance [1] -- just one of the first 
> packages I thought of -- something non-scientific that happens to do 
> some calculations here and there.

Uh ? I would be extremely surprised if pillow processed images in pure
Python.

Regards

Antoine.


From yselivanov.ml at gmail.com  Thu Feb 11 18:37:56 2016
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 11 Feb 2016 18:37:56 -0500
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <20160212002658.218a06ef@fsol>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol>
 <56BD1647.5000009@gmail.com> <20160212002658.218a06ef@fsol>
Message-ID: <56BD1B54.7070709@gmail.com>

On 2016-02-11 6:26 PM, Antoine Pitrou wrote:
> On Thu, 11 Feb 2016 18:16:23 -0500
> Yury Selivanov <yselivanov.ml at gmail.com>
> wrote:
>>>> Yes, spectral_norm is micro-benchmark, but still, there is a lot of
>>>> python code out there that does some calculation in pure Python not
>>>> involving numpy or pypy.
>>> Can you clarify "a lot"?
>> Any code that occasionally uses "int [op] int" code.  That code becomes
>> faster (especially if it's small ints).  In tight loops significantly
>> faster (that's what spectral_norm is doing).
> I agree for int addition, subtraction, perhaps multiplication. General
> math on small integers is not worth really improving, though, IMO.

Look, 21955 optimizes the following ops (fastint6.patch):

1. +, +=, -, -=, *, *= -- the ones that py2 has a fast path for

2. //, ,//=, %, %-, >>, >>=, <<, <<= -- these ones are usually
used only on ints, so nothing should be affected negatively

3. /, /= -- these ones are used on floats, ints, decimals, etc


If we decide to optimize group (1), I don't see why we can't
apply the same macro to group (2).  And then it's just
group (3, true division) that we might or might not optimize.

So to me, the real question is: should we optimize
"long [op] long" at all?

+ and - are very common operations.  If fastint6 manages to
make numpy code (not microbenchmarks, but some real
algorithms) even 3-5% slower - then let's just close 21955
as "won't fix".

The problem is that we don't have any good decimal or numpy
benchmark.  telco is so unstable, that I take it less seriously
than spectral_norm.


Thanks,
Yury

From yselivanov.ml at gmail.com  Thu Feb 11 18:38:58 2016
From: yselivanov.ml at gmail.com (Yury Selivanov)
Date: Thu, 11 Feb 2016 18:38:58 -0500
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <CAMpsgwZ6pj=fWzSWWj6ZTZtoYNFZmJD6G-08=1ixaFPKLuJS-w@mail.gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com> <20160212000647.77745b24@fsol>
 <56BD1647.5000009@gmail.com>
 <CAMpsgwZ6pj=fWzSWWj6ZTZtoYNFZmJD6G-08=1ixaFPKLuJS-w@mail.gmail.com>
Message-ID: <56BD1B92.10801@gmail.com>


On 2016-02-11 6:24 PM, Victor Stinner wrote:
> 2016-02-12 0:16 GMT+01:00 Yury Selivanov <yselivanov.ml at gmail.com>:
>> Unless 21955 makes numpy code slower, I'm not sure why we're discussing
>> this.
> Stefan Krah wrote that it makes the decimal module 6% slower:
> https://bugs.python.org/issue21955#msg259571
>
> Again in another message, "big slowdown for _decimal":
> https://bugs.python.org/issue21955#msg259793
>
> Victor

Yes, we need a good benchmark for decimals or numpy.
Both use operator overloading extensively.  Then I guess
we can talk if there is an actual slowdown.

Yury

From ncoghlan at gmail.com  Fri Feb 12 00:03:36 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Feb 2016 15:03:36 +1000
Subject: [Speed] Do we want to stop vendoring source of third-party
 libraries with the benchmarks?
In-Reply-To: <CAP1=2W5+sEBe672S_uMmsWryjC6HZm-w09Dfim--rcMtx3Mdbg@mail.gmail.com>
References: <CAP1=2W5+sEBe672S_uMmsWryjC6HZm-w09Dfim--rcMtx3Mdbg@mail.gmail.com>
Message-ID: <CADiSq7fVHoAddvqVxYe9EDHjRSC+dRjUgFOQL0hMm8Xg7u_epw@mail.gmail.com>

On 12 February 2016 at 04:35, Brett Cannon <brett at python.org> wrote:
> Maybe we should just have a requirements.txt file for Python 2 and another
> for Python 3 that are pegged to specific versions? We could even install
> things into a venv for isolation. If we go this route then we could make the
> benchmark suite a package on PyPI and have people install the benchmark
> suite and then have instructions to run pip on the requirements files that
> we embed in the package. This also gets us around any potential licensing
> issues with embedding third-party libraries.

+1, especially if you use peep to update the requirements list with
the sdist hashes

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Fri Feb 12 00:17:08 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Feb 2016 15:17:08 +1000
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <56BD101D.1090604@gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com>
Message-ID: <CADiSq7dx+x9TfA4DtF3M5m4pp0-4t+dfb7qiBVYCkN=XPw_F3Q@mail.gmail.com>

On 12 February 2016 at 08:50, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> On 2016-02-11 5:37 PM, Victor Stinner wrote:
>> 2016-02-11 19:36 GMT+01:00 Brett Cannon <brett at python.org>:
>>> Are we happy with the current benchmarks?
>>
>> bm_regex8 looks unstable, but I don't know if it's an issue of the
>> benchmark itself or perf.py (see the other thread "[Speed] Any changes
>> we want to make to perf.py?").
>
> It's super unstable.  As well as telco -- I don't trust those benchmarks.

telco covers a fairly important use case in the form of "Do things
that billing applications need to do". Spending a few months running
and re-running that to help optimise the original Python
implementation of decimal was one of my first contributions to CPython
(including figuring out the "int("".join(map(str, digits)))" hack that
proved to be the fastest way in CPython to convert a tuple of digits
into a Python integer, much to the annoyance of the PyPy folks trying
to accelerate that code later).

It's probably best to consider telco as a microbenchmark of decimal
module performance rather than as a general macrobenchmark, though -
that's why the integration of cdecimal improved it so dramatically.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Fri Feb 12 00:21:26 2016
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Feb 2016 15:21:26 +1000
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <CADiSq7dx+x9TfA4DtF3M5m4pp0-4t+dfb7qiBVYCkN=XPw_F3Q@mail.gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <CAMpsgwYnv3sVuROq5tLxZ5Czqwg3kFDxHfGxgRdzN2iN9co99g@mail.gmail.com>
 <56BD101D.1090604@gmail.com>
 <CADiSq7dx+x9TfA4DtF3M5m4pp0-4t+dfb7qiBVYCkN=XPw_F3Q@mail.gmail.com>
Message-ID: <CADiSq7dsvrDt7h1FEOPT8SLbjdZ+2_S0V=9DCFTYURa8RkLf3w@mail.gmail.com>

On 12 February 2016 at 15:17, Nick Coghlan <ncoghlan at gmail.com> wrote:
> It's probably best to consider telco as a microbenchmark of decimal
> module performance rather than as a general macrobenchmark, though -
> that's why the integration of cdecimal improved it so dramatically.

Ah, I had misread the rest of the thread - if telco in its current
form isn't useful as a decimal microbenchmark, then yes, updating it
to improve its stability is more important than preserving it as is.
Its original use case was to optimise the decimal implementation
itself by figuring out where the hotspots were and optimising those,
rather than as a general benchmark for other changes to the
interpreter implementation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From alecsandru.patrascu at intel.com  Fri Feb 12 02:42:36 2016
From: alecsandru.patrascu at intel.com (Patrascu, Alecsandru)
Date: Fri, 12 Feb 2016 07:42:36 +0000
Subject: [Speed] Linux tip: use isolcpus to have (more) reliable
 benchmark
In-Reply-To: <CAMpsgwbkUFBH3DAEoCJoV5KWF=dUDA3SGM_F0WyHau7w0X57vg@mail.gmail.com>
References: <CAMpsgwbkUFBH3DAEoCJoV5KWF=dUDA3SGM_F0WyHau7w0X57vg@mail.gmail.com>
Message-ID: <3CF256F4F774BD48A1691D131AA043191424F26B@IRSMSX102.ger.corp.intel.com>

Hi,

Some of the things we do here at Intel, in our Languages Performance Lab [1,2], is to disable ASLR as you get more reliable results. This can be achieved on Linux by running echo 0 > /proc/sys/kernel/randomize_va_space. Also, setting the CPU frequency at a fixed frequency, disabling Turbo Boost and Hyper Threading, also helps for benchmark stability.

>From my experience, the isolcpus feature is useful when you have a lot of cores on your machine because the kernel will have other cores on which it can schedule its work; furthermore, it is a best effort situation and it is not an absolute guarantee that the kernel will not use the cores specified if you have a lot of processes running (for example, if you benchmark on a machine with 2 physical cores and you isolate one of the cores, there is a big chance that the kernel will schedule processes on this core also, even it is for a small amount of time). Nevertheless, for machines with more physical cores, it can be good to have dedicated core(s) on which we do benchmarking.

[1] http://languagesperformance.intel.com/
[2] https://lists.01.org/pipermail/langperf/

Thank you,
Alecsandru

> -----Original Message-----
> From: Speed [mailto:speed-
> bounces+alecsandru.patrascu=intel.com at python.org] On Behalf Of Victor
> Stinner
> Sent: Friday, February 12, 2016 12:54 AM
> To: speed at python.org
> Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark
> 
> Hi,
> 
> I'm sharing with you my notes (tricks) to get more reliable benchmarks on
> Linux if your CPU have multiple cores:
> 
> https://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micro-
> benchmarks
> 
> FYI perf.py recently got a new --affinity= optional parameter. I plan to
> send a patch to automatically use /sys/devices/system/cpu/isolated if it's
> not empty.
> 
> What are your "tricks" to get reliable benchmarks?
> 
> Victor
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed

From donald at stufft.io  Fri Feb 12 05:42:52 2016
From: donald at stufft.io (Donald Stufft)
Date: Fri, 12 Feb 2016 05:42:52 -0500
Subject: [Speed] Do we want to stop vendoring source of third-party
 libraries with the benchmarks?
In-Reply-To: <CADiSq7fVHoAddvqVxYe9EDHjRSC+dRjUgFOQL0hMm8Xg7u_epw@mail.gmail.com>
References: <CAP1=2W5+sEBe672S_uMmsWryjC6HZm-w09Dfim--rcMtx3Mdbg@mail.gmail.com>
 <CADiSq7fVHoAddvqVxYe9EDHjRSC+dRjUgFOQL0hMm8Xg7u_epw@mail.gmail.com>
Message-ID: <3B87245A-236A-41C6-86A9-A6FF0D8C7911@stufft.io>


> On Feb 12, 2016, at 12:03 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> On 12 February 2016 at 04:35, Brett Cannon <brett at python.org> wrote:
>> Maybe we should just have a requirements.txt file for Python 2 and another
>> for Python 3 that are pegged to specific versions? We could even install
>> things into a venv for isolation. If we go this route then we could make the
>> benchmark suite a package on PyPI and have people install the benchmark
>> suite and then have instructions to run pip on the requirements files that
>> we embed in the package. This also gets us around any potential licensing
>> issues with embedding third-party libraries.
> 
> +1, especially if you use peep to update the requirements list with
> the sdist hashes
> 


pip 8 has peep functionality built in.


-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/speed/attachments/20160212/1e6a7067/attachment.sig>

From edd at theunixzoo.co.uk  Fri Feb 12 06:18:53 2016
From: edd at theunixzoo.co.uk (Edd Barrett)
Date: Fri, 12 Feb 2016 11:18:53 +0000
Subject: [Speed] Experiences with Microbenchmarking
Message-ID: <20160212111853.GA4914@wilfred.dlink.com>

Hi,

A colleague has just pointed me to the discussions on this list
regarding benchmarking methodology. Over the past few months we have
been devising an "as rigorous as possible" micro-benchmarking
experiment. It seems there's a lot of crossover in our work and your
discussions.

In short, our experiment is investigating the warmup behaviours of
JITted VMs (currently PyPy, HotSpot, Graal, LuaJIT, HHVM, JRubyTruffle
and V8) using microbenchmarks. For each microbenchmark/VM pairing we
sequentially run a number of processes (currently 10), and within each
process we run 2000 iterations of the microbenchmark. We then plot the
results and make observations.

The experiments were run under our own "paranoid" benchmark runner
(Krun), which aims to control as many confounding variables as are
practically possible. Amongst others, it checks that all benchmarks are
run with the system at a similar starting temperature, disables ASLR,
uses a monotonic system clock (in some cases we had to patch VMs) and it
reboots the system before each benchmark. We did not isolate CPUs, since
we found that this creates artificial contention on multi-threaded VMs,
however, we did use (and Krun checks for) a tickless Linux kernel.

We expected to see typical warmup behaviours (with distinct phases for
profiling, compilation, and peak performance), but in reality we saw
all kinds of crazy behaviours and even slowdowns.

We've published a draft paper showing our preliminary findings here:
http://arxiv.org/abs/1602.00602

The draft shows a subset of our results. Run-sequence plots for all
process executions can be found here:
https://archive.org/download/softdev_warmup_experiment_artefacts/v0.1/all_graphs.pdf

For the final version of the paper we are trying to devise statistical
methods to automatically classify the strange warmup behaviours we
encountered. We will also run CPython in our final experiment, which may
interest you guys :)

If this interests anyone, I'd be happy to discuss further.

Cheers

-- 
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

From solipsis at pitrou.net  Fri Feb 12 07:26:06 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Feb 2016 13:26:06 +0100
Subject: [Speed] Do we want to stop vendoring source of third-party
 libraries with the benchmarks?
References: <CAP1=2W5+sEBe672S_uMmsWryjC6HZm-w09Dfim--rcMtx3Mdbg@mail.gmail.com>
Message-ID: <20160212132606.61ada70c@fsol>

On Thu, 11 Feb 2016 18:35:29 +0000
Brett Cannon <brett at python.org> wrote:
> Maybe we should just have a requirements.txt file for Python 2 and another
> for Python 3 that are pegged to specific versions? We could even install
> things into a venv for isolation.

How does this impact interaction with the benchmarks suite?
E.g. does it increase the time of running a couple of benchmarks? Does
it make it easier or harder to benchmark a work-in-progress patch for
whatever interpreter?

> If we go this route then we could make
> the benchmark suite a package on PyPI and have people install the benchmark
> suite and then have instructions to run pip on the requirements files that
> we embed in the package.

I'm not fond of encouraging random users to run the benchmarks suite
without understanding what they're doing, and starting throwing around
pointless numbers and misconceptions about performance (which are then
very hard to fight since people tend to be irrationally captivated by
"performance numbers").  The benchmarks suite is mostly a tool for
developers of Python implementations, not the greater public.

Having the benchmarks suite only available through hg or git kind of
discourages those tendencies.

Regards

Antoine.


From solipsis at pitrou.net  Fri Feb 12 07:31:07 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Feb 2016 13:31:07 +0100
Subject: [Speed] Should we change what benchmarks we have?
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
Message-ID: <20160212133107.0eebd97b@fsol>

On Thu, 11 Feb 2016 18:36:33 +0000
Brett Cannon <brett at python.org> wrote:
> Are we happy with the current benchmarks? Are there some we want to drop?
> How about add? Do we want to have explanations as to why each benchmark is
> included?

There are no real explanations except the provenance of said benchmarks:
- the benchmarks suite was originally developed for Unladen Swallow
- some benchmarks were taken and adapted from the "Great Computer
  Language Shootout" (which I think is a poor source of benchmarks)
- some benchmarks have been added for specific concerns that may not be
  of enough interest in general (for example micro-benchmarks of
  methods calls, or benchmarks of json / pickle performance)

> A better balance of micro vs. macro benchmarks (and probably
> matching groups)?

Easier said than done :-) Macro-benchmarks are harder to write,
especially with the constraints that 1) runtimes should be short enough
for convenient use 2) performance numbers should be stable enough
accross runs.

Regards

Antoine.


From solipsis at pitrou.net  Fri Feb 12 07:57:00 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Feb 2016 13:57:00 +0100
Subject: [Speed] Should we change what benchmarks we have?
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <20160212133107.0eebd97b@fsol>
Message-ID: <20160212135700.105dd73e@fsol>

On Fri, 12 Feb 2016 13:31:07 +0100
Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 11 Feb 2016 18:36:33 +0000
> Brett Cannon <brett at python.org> wrote:
> > Are we happy with the current benchmarks? Are there some we want to drop?
> > How about add? Do we want to have explanations as to why each benchmark is
> > included?
> 
> There are no real explanations except the provenance of said benchmarks:
> - the benchmarks suite was originally developed for Unladen Swallow
> - some benchmarks were taken and adapted from the "Great Computer
>   Language Shootout" (which I think is a poor source of benchmarks)
> - some benchmarks have been added for specific concerns that may not be
>   of enough interest in general (for example micro-benchmarks of
>   methods calls, or benchmarks of json / pickle performance)

That said, yes, ideally the presence or usefulness of each benchmark
should be explained somewhere ("what is this trying to measure?").

Regards

Antoine.


From fijall at gmail.com  Fri Feb 12 09:48:01 2016
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 12 Feb 2016 15:48:01 +0100
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <20160212133107.0eebd97b@fsol>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <20160212133107.0eebd97b@fsol>
Message-ID: <CAK5idxTvU+jZd_FK+mEB32=HVfPKA37ErizHyAY84cY_L=s9Rw@mail.gmail.com>

I presume you looked at the pypy benchmark suite, which contains a
large collection of library-based benchmarks. You can endlessly argue
whether it's "macro enough", but it does cover some usages of various
libraries as submitted/written with help from lib authors (sympy,
twisted, various templating engines, sqlalchemy ORM, etc.) as well as
interesting python programs that are CPU intensive found on the
interwebs.

On Fri, Feb 12, 2016 at 1:31 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 11 Feb 2016 18:36:33 +0000
> Brett Cannon <brett at python.org> wrote:
>> Are we happy with the current benchmarks? Are there some we want to drop?
>> How about add? Do we want to have explanations as to why each benchmark is
>> included?
>
> There are no real explanations except the provenance of said benchmarks:
> - the benchmarks suite was originally developed for Unladen Swallow
> - some benchmarks were taken and adapted from the "Great Computer
>   Language Shootout" (which I think is a poor source of benchmarks)
> - some benchmarks have been added for specific concerns that may not be
>   of enough interest in general (for example micro-benchmarks of
>   methods calls, or benchmarks of json / pickle performance)
>
>> A better balance of micro vs. macro benchmarks (and probably
>> matching groups)?
>
> Easier said than done :-) Macro-benchmarks are harder to write,
> especially with the constraints that 1) runtimes should be short enough
> for convenient use 2) performance numbers should be stable enough
> accross runs.
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed

From arigo at tunes.org  Fri Feb 12 10:10:30 2016
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 12 Feb 2016 16:10:30 +0100
Subject: [Speed] Experiences with Microbenchmarking
In-Reply-To: <20160212111853.GA4914@wilfred.dlink.com>
References: <20160212111853.GA4914@wilfred.dlink.com>
Message-ID: <CAMSv6X0ZRN89L2jKGb=Nfc1BfV24zpmxooBxiUj1ELCJfWf40Q@mail.gmail.com>

Hi Edd,

On Fri, Feb 12, 2016 at 12:18 PM, Edd Barrett <edd at theunixzoo.co.uk> wrote:
> JITted VMs (currently PyPy, HotSpot, Graal, LuaJIT, HHVM, JRubyTruffle
> and V8) using microbenchmarks. For each microbenchmark/VM pairing we
> sequentially run a number of processes (currently 10), and within each
> process we run 2000 iterations of the microbenchmark. We then plot the
> results and make observations.

PyPy typically needs more than 2000 iterations to be warmed up.


A bient?t,

Armin.

From fijall at gmail.com  Fri Feb 12 10:42:08 2016
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 12 Feb 2016 16:42:08 +0100
Subject: [Speed] Tool to run Python microbenchmarks
In-Reply-To: <CAMpsgwaBOaOW2bH9ZjQKea6Wip=3GcQzV_fPWu-Qcf4dF3XK0g@mail.gmail.com>
References: <CAMpsgwaBOaOW2bH9ZjQKea6Wip=3GcQzV_fPWu-Qcf4dF3XK0g@mail.gmail.com>
Message-ID: <CAK5idxQEe0zz4wsJfQwmJr+sc2LfiNFAYGJOYMQkSvhbpaq6nw@mail.gmail.com>

Hi Victor

timeit does two really terrible things - uses min(time) and disables
the garbage collector, which makes it completely unreliable.

On Thu, Feb 11, 2016 at 11:39 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> To run "micro"-benchmarks on "micro"-optimizations, I started to use
> timeit, but in my experience timeit it far from reliable.
>
> When I say micro: I'm talking about a test which takes less than 1000
> ns, sometimes even a few nanoseconds!
>
> You always have to run the same micro-benchmark when timeit *at least*
> 5 times to find the "real" "minimum" runtime.
>
> That's why I wrote my own tool to run microbenchmarks:
> https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py
>
> Yury suggested me to add this tool to the Python benchmark project.
> I'm ok with that, but only if we rename it to "microbench.py" :-) I
> wrote this tool to compare micro-optimizations with a long list of
> very simple tests. The result is written into a file. Then you can
> compare two files and compare more files, and maybe even compare
> multiple files to a "reference". It "hides" difference smaller than 5%
> to ignore the noise.
>
> The main feature is benchmark.py is that it calibrates the benchmark
> using time to choose the number of runs and number of loops. I
> proposed a similar idea for perf.py:
> https://bugs.python.org/issue26275
>
> What do you think? Would this tool be useful?
>
> Victor
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed

From solipsis at pitrou.net  Fri Feb 12 10:48:56 2016
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Feb 2016 16:48:56 +0100
Subject: [Speed] Should we change what benchmarks we have?
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <20160212133107.0eebd97b@fsol>
 <CAK5idxTvU+jZd_FK+mEB32=HVfPKA37ErizHyAY84cY_L=s9Rw@mail.gmail.com>
Message-ID: <20160212164856.0d493dd7@fsol>

On Fri, 12 Feb 2016 15:48:01 +0100
Maciej Fijalkowski <fijall at gmail.com>
wrote:
> I presume you looked at the pypy benchmark suite, which contains a
> large collection of library-based benchmarks.

Not in a long time, I admit...

Regards

Antoine.


From victor.stinner at gmail.com  Fri Feb 12 10:58:32 2016
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 12 Feb 2016 16:58:32 +0100
Subject: [Speed] Tool to run Python microbenchmarks
In-Reply-To: <CAK5idxQEe0zz4wsJfQwmJr+sc2LfiNFAYGJOYMQkSvhbpaq6nw@mail.gmail.com>
References: <CAMpsgwaBOaOW2bH9ZjQKea6Wip=3GcQzV_fPWu-Qcf4dF3XK0g@mail.gmail.com>
 <CAK5idxQEe0zz4wsJfQwmJr+sc2LfiNFAYGJOYMQkSvhbpaq6nw@mail.gmail.com>
Message-ID: <CAMpsgwaMCrKyyo6e++hb63zHUhgVvsNTZmV1aB+UfYP7nKGcVA@mail.gmail.com>

Hi,

2016-02-12 16:42 GMT+01:00 Maciej Fijalkowski <fijall at gmail.com>:
> timeit does two really terrible things - uses min(time) and disables
> the garbage collector, which makes it completely unreliable.

Can you please elaborate why using min(times) is a bad idea?

I'm also using min() in my tool, I expect that it helps to ignore the
sporadic peeks when the system is unstable.

Victor

From fijall at gmail.com  Fri Feb 12 11:02:39 2016
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 12 Feb 2016 17:02:39 +0100
Subject: [Speed] Tool to run Python microbenchmarks
In-Reply-To: <CAMpsgwaMCrKyyo6e++hb63zHUhgVvsNTZmV1aB+UfYP7nKGcVA@mail.gmail.com>
References: <CAMpsgwaBOaOW2bH9ZjQKea6Wip=3GcQzV_fPWu-Qcf4dF3XK0g@mail.gmail.com>
 <CAK5idxQEe0zz4wsJfQwmJr+sc2LfiNFAYGJOYMQkSvhbpaq6nw@mail.gmail.com>
 <CAMpsgwaMCrKyyo6e++hb63zHUhgVvsNTZmV1aB+UfYP7nKGcVA@mail.gmail.com>
Message-ID: <CAK5idxTsiqMidPuCUUk=WLtxax6poNEjAqMjEuAZ8aJb-MPDGw@mail.gmail.com>

On Fri, Feb 12, 2016 at 4:58 PM, Victor Stinner
<victor.stinner at gmail.com> wrote:
> Hi,
>
> 2016-02-12 16:42 GMT+01:00 Maciej Fijalkowski <fijall at gmail.com>:
>> timeit does two really terrible things - uses min(time) and disables
>> the garbage collector, which makes it completely unreliable.
>
> Can you please elaborate why using min(times) is a bad idea?
>
> I'm also using min() in my tool, I expect that it helps to ignore the
> sporadic peeks when the system is unstable.
>
> Victor

Yes, it also helps to ignore systematic peaks that will happen
randomly (due to cache alignment, memory ordering, dicts etc.). Some
operations are really random that you should not ignore. E.g. if you
have:

l.append('a') in a loop, you gonna ignore all the places that resize loop.

I'll look for a reference

From paul at paulgraydon.co.uk  Fri Feb 12 11:00:23 2016
From: paul at paulgraydon.co.uk (Paul)
Date: Fri, 12 Feb 2016 08:00:23 -0800
Subject: [Speed] Experiences with Microbenchmarking
Message-ID: <E1aUG91-00044W-Gz@paulgraydon.co.uk>


On 12 Feb 2016 07:10, Armin Rigo <arigo at tunes.org> wrote:
>
> Hi Edd,
>
> On Fri, Feb 12, 2016 at 12:18 PM, Edd Barrett <edd at theunixzoo.co.uk> wrote:
> > JITted VMs (currently PyPy, HotSpot, Graal, LuaJIT, HHVM, JRubyTruffle
> > and V8) using microbenchmarks. For each microbenchmark/VM pairing we
> > sequentially run a number of processes (currently 10), and within each
> > process we run 2000 iterations of the microbenchmark. We then plot the
> > results and make observations.
>
> PyPy typically needs more than 2000 iterations to be warmed up.
>

Same goes for the JVM. Off the top of my head it doesn't even start marking a method as hot until around 10,000 iterations (at which point it'll start to do the first stage of optimisations). If you're below that threshold you're dealing with pure interpreter performance.


Paul.

From arigo at tunes.org  Fri Feb 12 13:00:00 2016
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 12 Feb 2016 19:00:00 +0100
Subject: [Speed] Experiences with Microbenchmarking
In-Reply-To: <E1aUG91-00044W-Gz@paulgraydon.co.uk>
References: <E1aUG91-00044W-Gz@paulgraydon.co.uk>
Message-ID: <CAMSv6X0Czv=heEW6eSkPCbEFqxo6VvifM-aTUkJXsxAHP9ojnw@mail.gmail.com>

Hi Paul,

On Fri, Feb 12, 2016 at 5:00 PM, Paul <paul at paulgraydon.co.uk> wrote:
>> PyPy typically needs more than 2000 iterations to be warmed up.
>
> Same goes for the JVM. Off the top of my head it doesn't even start marking a method as hot until around 10,000 iterations (at which point it'll start to do the first stage of optimisations). If you're below that threshold you're dealing with pure interpreter performance.

Ew, it's even longer than PyPy :-)

In the PyPy case, the number 2000 is particularly bad, because the JIT
starts after 1039 iterations.  It also adds a few extra paths
afterwards, starting maybe around ~400-500 extra iterations (as a mean
value).  Each time, the JIT produces more machine code
and there is a relatively important pause.  So 2000 is close to the
worst case: even running 2000 purely-interpreted iterations would be
faster.


A bient?t,

Armin.

From fijall at gmail.com  Fri Feb 12 13:38:13 2016
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 12 Feb 2016 19:38:13 +0100
Subject: [Speed] Experiences with Microbenchmarking
In-Reply-To: <CAMSv6X0Czv=heEW6eSkPCbEFqxo6VvifM-aTUkJXsxAHP9ojnw@mail.gmail.com>
References: <E1aUG91-00044W-Gz@paulgraydon.co.uk>
 <CAMSv6X0Czv=heEW6eSkPCbEFqxo6VvifM-aTUkJXsxAHP9ojnw@mail.gmail.com>
Message-ID: <CAK5idxQocArTXkjo3QGu-aJYOAW2j09JAPFz=PLQzP=LjqOZ2w@mail.gmail.com>

On Fri, Feb 12, 2016 at 7:00 PM, Armin Rigo <arigo at tunes.org> wrote:
> Hi Paul,
>
> On Fri, Feb 12, 2016 at 5:00 PM, Paul <paul at paulgraydon.co.uk> wrote:
>>> PyPy typically needs more than 2000 iterations to be warmed up.
>>
>> Same goes for the JVM. Off the top of my head it doesn't even start marking a method as hot until around 10,000 iterations (at which point it'll start to do the first stage of optimisations). If you're below that threshold you're dealing with pure interpreter performance.
>
> Ew, it's even longer than PyPy :-)
>
> In the PyPy case, the number 2000 is particularly bad, because the JIT
> starts after 1039 iterations.  It also adds a few extra paths
> afterwards, starting maybe around ~400-500 extra iterations (as a mean
> value).  Each time, the JIT produces more machine code
> and there is a relatively important pause.  So 2000 is close to the
> worst case: even running 2000 purely-interpreted iterations would be
> faster.
>
>
> A bient?t,
>
> Armin.
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed

Armin, those are "2000 iterations of a benchmark" and not "2000
iterations of a loop". A lot of those are pypy benchmarks, just run
longer

From edd at theunixzoo.co.uk  Fri Feb 12 13:44:19 2016
From: edd at theunixzoo.co.uk (Edd Barrett)
Date: Fri, 12 Feb 2016 18:44:19 +0000
Subject: [Speed] Experiences with Microbenchmarking
In-Reply-To: <E1aUG91-00044W-Gz@paulgraydon.co.uk>
References: <E1aUG91-00044W-Gz@paulgraydon.co.uk>
Message-ID: <20160212183913.GA28808@wilfred.dlink.com>

On Fri, Feb 12, 2016 at 08:00:23AM -0800, Paul wrote:
> > PyPy typically needs more than 2000 iterations to be warmed up.
> >
> 
> Same goes for the JVM. Off the top of my head it doesn't even start marking a method as hot until around 10,000 iterations (at which point it'll start to do the first stage of optimisations). If you're below that threshold you're dealing with pure interpreter performance.

To be clear, what I called an "iteration" is one in-process run of an
entire benchmark. Each benchmark will invoke tons of methods and execute
tons of user loops.

2000 in-process iterations should be plenty enough to warm up the VMs.
Most benchmarking experiments take only around 30 post-warmup in-process
iterations (enough to compute a confidence interval).

The well-behaved benchmark/vm pairs in our experiment warmup in less
than ten in-process iterations.

Cheers

-- 
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk

From arigo at tunes.org  Fri Feb 12 14:06:07 2016
From: arigo at tunes.org (Armin Rigo)
Date: Fri, 12 Feb 2016 20:06:07 +0100
Subject: [Speed] Experiences with Microbenchmarking
In-Reply-To: <20160212183913.GA28808@wilfred.dlink.com>
References: <E1aUG91-00044W-Gz@paulgraydon.co.uk>
 <20160212183913.GA28808@wilfred.dlink.com>
Message-ID: <CAMSv6X3r4S+rkvymnrJyFOiCuKbZETBA2jeeVtPPjHz_2JpM7g@mail.gmail.com>

Hi Edd,

On Fri, Feb 12, 2016 at 7:44 PM, Edd Barrett <edd at theunixzoo.co.uk> wrote:
> To be clear, what I called an "iteration" is one in-process run of an
> entire benchmark.

Oops, sorry.  The subject of this thread is "Experiences with
Microbenchmarking".  I naturally assumed that a microbenchmark is
doing one simple thing not in a loop, in which case "iterations" is
simply repeating that simple thing.  If you have in mind benchmarks
that are not as micro as that, then I stand corrected.


A bient?t,

Armin.

From brett at python.org  Fri Feb 12 20:08:23 2016
From: brett at python.org (Brett Cannon)
Date: Sat, 13 Feb 2016 01:08:23 +0000
Subject: [Speed] Do we want to stop vendoring source of third-party
 libraries with the benchmarks?
In-Reply-To: <20160212132606.61ada70c@fsol>
References: <CAP1=2W5+sEBe672S_uMmsWryjC6HZm-w09Dfim--rcMtx3Mdbg@mail.gmail.com>
 <20160212132606.61ada70c@fsol>
Message-ID: <CAP1=2W7cHkxXsLQo7cs12Q33358r2yubHGYrpQP1GGt7ziUg1g@mail.gmail.com>

On Fri, Feb 12, 2016, 04:26 Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Thu, 11 Feb 2016 18:35:29 +0000
> Brett Cannon <brett at python.org> wrote:
> > Maybe we should just have a requirements.txt file for Python 2 and
> another
> > for Python 3 that are pegged to specific versions? We could even install
> > things into a venv for isolation.
>
> How does this impact interaction with the benchmarks suite?
>

Upon installation you would need to run `pip install -r requirements 3.txt`
to get the dependencies.

E.g. does it increase the time of running a couple of benchmarks?


No

Does
> it make it easier or harder to benchmark a work-in-progress patch for
> whatever interpreter?
>

I think only on Windows because of the lack of symlink support.


> > If we go this route then we could make
> > the benchmark suite a package on PyPI and have people install the
> benchmark
> > suite and then have instructions to run pip on the requirements files
> that
> > we embed in the package.
>
> I'm not fond of encouraging random users to run the benchmarks suite
> without understanding what they're doing, and starting throwing around
> pointless numbers and misconceptions about performance (which are then
> very hard to fight since people tend to be irrationally captivated by
> "performance numbers").  The benchmarks suite is mostly a tool for
> developers of Python implementations, not the greater public.
>
> Having the benchmarks suite only available through hg or git kind of
> discourages those tendencies.
>

That's fine, but then I would still want requirements files so we stop
vendoring.

Brett


> Regards
>
> Antoine.
>
>
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160213/ab60a240/attachment-0001.html>

From fijall at gmail.com  Sun Feb 14 07:20:14 2016
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Sun, 14 Feb 2016 13:20:14 +0100
Subject: [Speed] Linux tip: use isolcpus to have (more) reliable
 benchmark
In-Reply-To: <3CF256F4F774BD48A1691D131AA043191424F26B@IRSMSX102.ger.corp.intel.com>
References: <CAMpsgwbkUFBH3DAEoCJoV5KWF=dUDA3SGM_F0WyHau7w0X57vg@mail.gmail.com>
 <3CF256F4F774BD48A1691D131AA043191424F26B@IRSMSX102.ger.corp.intel.com>
Message-ID: <CAK5idxTyd=DMDg2k8ahjYc4DZhd3O75OF_7w8AwHa0rTMeMz1A@mail.gmail.com>

Hi.

Disabling ASLR means you get more repeatable benchmarks, of course,
but also means that on another identical machine (or a bit different
circumstances), you can get different results, hence you moved
statistical error to a more systematic one. I don't think that's a win

On Fri, Feb 12, 2016 at 8:42 AM, Patrascu, Alecsandru
<alecsandru.patrascu at intel.com> wrote:
> Hi,
>
> Some of the things we do here at Intel, in our Languages Performance Lab [1,2], is to disable ASLR as you get more reliable results. This can be achieved on Linux by running echo 0 > /proc/sys/kernel/randomize_va_space. Also, setting the CPU frequency at a fixed frequency, disabling Turbo Boost and Hyper Threading, also helps for benchmark stability.
>
> From my experience, the isolcpus feature is useful when you have a lot of cores on your machine because the kernel will have other cores on which it can schedule its work; furthermore, it is a best effort situation and it is not an absolute guarantee that the kernel will not use the cores specified if you have a lot of processes running (for example, if you benchmark on a machine with 2 physical cores and you isolate one of the cores, there is a big chance that the kernel will schedule processes on this core also, even it is for a small amount of time). Nevertheless, for machines with more physical cores, it can be good to have dedicated core(s) on which we do benchmarking.
>
> [1] http://languagesperformance.intel.com/
> [2] https://lists.01.org/pipermail/langperf/
>
> Thank you,
> Alecsandru
>
>> -----Original Message-----
>> From: Speed [mailto:speed-
>> bounces+alecsandru.patrascu=intel.com at python.org] On Behalf Of Victor
>> Stinner
>> Sent: Friday, February 12, 2016 12:54 AM
>> To: speed at python.org
>> Subject: [Speed] Linux tip: use isolcpus to have (more) reliable benchmark
>>
>> Hi,
>>
>> I'm sharing with you my notes (tricks) to get more reliable benchmarks on
>> Linux if your CPU have multiple cores:
>>
>> https://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micro-
>> benchmarks
>>
>> FYI perf.py recently got a new --affinity= optional parameter. I plan to
>> send a patch to automatically use /sys/devices/system/cpu/isolated if it's
>> not empty.
>>
>> What are your "tricks" to get reliable benchmarks?
>>
>> Victor
>> _______________________________________________
>> Speed mailing list
>> Speed at python.org
>> https://mail.python.org/mailman/listinfo/speed
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed

From alecsandru.patrascu at intel.com  Sun Feb 14 11:37:07 2016
From: alecsandru.patrascu at intel.com (Patrascu, Alecsandru)
Date: Sun, 14 Feb 2016 16:37:07 +0000
Subject: [Speed] Linux tip: use isolcpus to have (more) reliable
 benchmark
In-Reply-To: <CAK5idxTyd=DMDg2k8ahjYc4DZhd3O75OF_7w8AwHa0rTMeMz1A@mail.gmail.com>
References: <CAMpsgwbkUFBH3DAEoCJoV5KWF=dUDA3SGM_F0WyHau7w0X57vg@mail.gmail.com>
 <3CF256F4F774BD48A1691D131AA043191424F26B@IRSMSX102.ger.corp.intel.com>
 <CAK5idxTyd=DMDg2k8ahjYc4DZhd3O75OF_7w8AwHa0rTMeMz1A@mail.gmail.com>
Message-ID: <3CF256F4F774BD48A1691D131AA043191424F99D@IRSMSX102.ger.corp.intel.com>

Hello,

The existence of variance across machines is true even with ASLR on. My point was regarding doing repetitive measurements on the same machine. Nevertheless, even if small variations may appear in certain circumstances, you can minimize them if you use identical machines, with identical software, settings, etc. And the most important, the deltas remain comparable.

For the dedicated Python CodeSpeed machine that does daily measurements, among others, this can be a good setting for a bit more reliable results.

Thank you,
Alecsandru

> -----Original Message-----
> From: Maciej Fijalkowski [mailto:fijall at gmail.com]
> Sent: Sunday, February 14, 2016 2:20 PM
> To: Patrascu, Alecsandru <alecsandru.patrascu at intel.com>
> Cc: Victor Stinner <victor.stinner at gmail.com>; speed at python.org
> Subject: Re: [Speed] Linux tip: use isolcpus to have (more) reliable
> benchmark
> 
> Hi.
> 
> Disabling ASLR means you get more repeatable benchmarks, of course, but
> also means that on another identical machine (or a bit different
> circumstances), you can get different results, hence you moved statistical
> error to a more systematic one. I don't think that's a win
> 
> On Fri, Feb 12, 2016 at 8:42 AM, Patrascu, Alecsandru
> <alecsandru.patrascu at intel.com> wrote:
> > Hi,
> >
> > Some of the things we do here at Intel, in our Languages Performance Lab
> [1,2], is to disable ASLR as you get more reliable results. This can be
> achieved on Linux by running echo 0 > /proc/sys/kernel/randomize_va_space.
> Also, setting the CPU frequency at a fixed frequency, disabling Turbo
> Boost and Hyper Threading, also helps for benchmark stability.
> >
> > From my experience, the isolcpus feature is useful when you have a lot
> of cores on your machine because the kernel will have other cores on which
> it can schedule its work; furthermore, it is a best effort situation and
> it is not an absolute guarantee that the kernel will not use the cores
> specified if you have a lot of processes running (for example, if you
> benchmark on a machine with 2 physical cores and you isolate one of the
> cores, there is a big chance that the kernel will schedule processes on
> this core also, even it is for a small amount of time). Nevertheless, for
> machines with more physical cores, it can be good to have dedicated
> core(s) on which we do benchmarking.
> >
> > [1] http://languagesperformance.intel.com/
> > [2] https://lists.01.org/pipermail/langperf/
> >
> > Thank you,
> > Alecsandru
> >
> >> -----Original Message-----
> >> From: Speed [mailto:speed-
> >> bounces+alecsandru.patrascu=intel.com at python.org] On Behalf Of Victor
> >> Stinner
> >> Sent: Friday, February 12, 2016 12:54 AM
> >> To: speed at python.org
> >> Subject: [Speed] Linux tip: use isolcpus to have (more) reliable
> >> benchmark
> >>
> >> Hi,
> >>
> >> I'm sharing with you my notes (tricks) to get more reliable
> >> benchmarks on Linux if your CPU have multiple cores:
> >>
> >> https://haypo-notes.readthedocs.org/microbenchmark.html#reliable-micr
> >> o-
> >> benchmarks
> >>
> >> FYI perf.py recently got a new --affinity= optional parameter. I plan
> >> to send a patch to automatically use /sys/devices/system/cpu/isolated
> >> if it's not empty.
> >>
> >> What are your "tricks" to get reliable benchmarks?
> >>
> >> Victor
> >> _______________________________________________
> >> Speed mailing list
> >> Speed at python.org
> >> https://mail.python.org/mailman/listinfo/speed
> > _______________________________________________
> > Speed mailing list
> > Speed at python.org
> > https://mail.python.org/mailman/listinfo/speed

From brett at python.org  Sun Feb 14 11:57:33 2016
From: brett at python.org (Brett Cannon)
Date: Sun, 14 Feb 2016 16:57:33 +0000
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <20160212164856.0d493dd7@fsol>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <20160212133107.0eebd97b@fsol>
 <CAK5idxTvU+jZd_FK+mEB32=HVfPKA37ErizHyAY84cY_L=s9Rw@mail.gmail.com>
 <20160212164856.0d493dd7@fsol>
Message-ID: <CAP1=2W582-k4E5+Y+s_h-pOxgs-+8hc=zyk3kXV1RgjoAL1vGA@mail.gmail.com>

On Fri, Feb 12, 2016, 07:49 Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Fri, 12 Feb 2016 15:48:01 +0100
> Maciej Fijalkowski <fijall at gmail.com>
> wrote:
> > I presume you looked at the pypy benchmark suite, which contains a
> > large collection of library-based benchmarks.
>
> Not in a long time, I admit...
>

So it sounds like:

* we should drop regex_v8, telco, and spectral_norm
*Having an explanation as to what a benchmark is meant to exercise wouldn't
go amiss
* Pyston and PyPy have potential benchmarks to steal (although they need to
work with at least Python 3.5 to be considered)

Anyone want the satisfaction of deprecating those benchmarks? How about
writing a README file for what each of the benchmarks are for (which will
become the README for the future GitHub repo)? And do we want the Pyston
and PyPy folks to nominate benchmarks they think we really should add (with
a wild hope of finally having a single suite that everyone at least starts
from), or should some cpython devs look s at what PyPy and Pyston have and
raid their benchmarks?

Brett


> Regards
>
> Antoine.
>
>
> _______________________________________________
> Speed mailing list
> Speed at python.org
> https://mail.python.org/mailman/listinfo/speed
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160214/216e3f79/attachment.html>

From brett at python.org  Mon Feb 22 18:10:33 2016
From: brett at python.org (Brett Cannon)
Date: Mon, 22 Feb 2016 23:10:33 +0000
Subject: [Speed] Should we change what benchmarks we have?
In-Reply-To: <CAP1=2W582-k4E5+Y+s_h-pOxgs-+8hc=zyk3kXV1RgjoAL1vGA@mail.gmail.com>
References: <CAP1=2W6uaBctqLTK0r9bW-wS128wgkjzxUweWQ+n8ocQRTc7=g@mail.gmail.com>
 <20160212133107.0eebd97b@fsol>
 <CAK5idxTvU+jZd_FK+mEB32=HVfPKA37ErizHyAY84cY_L=s9Rw@mail.gmail.com>
 <20160212164856.0d493dd7@fsol>
 <CAP1=2W582-k4E5+Y+s_h-pOxgs-+8hc=zyk3kXV1RgjoAL1vGA@mail.gmail.com>
Message-ID: <CAP1=2W4SL3a4+tHnAmHPBLuu18J8WXP1sdAEYeujvU8rN=vrew@mail.gmail.com>

On Sun, 14 Feb 2016 at 08:57 Brett Cannon <brett at python.org> wrote:

>
>
> On Fri, Feb 12, 2016, 07:49 Antoine Pitrou <solipsis at pitrou.net> wrote:
>
>> On Fri, 12 Feb 2016 15:48:01 +0100
>> Maciej Fijalkowski <fijall at gmail.com>
>> wrote:
>> > I presume you looked at the pypy benchmark suite, which contains a
>> > large collection of library-based benchmarks.
>>
>> Not in a long time, I admit...
>>
>
> So it sounds like:
>
> * we should drop regex_v8, telco, and spectral_norm
>

Created http://bugs.python.org/issue26416 to track this.


> *Having an explanation as to what a benchmark is meant to exercise
> wouldn't go amiss
>

This can wait until we migrate to GitHub.


> * Pyston and PyPy have potential benchmarks to steal (although they need
> to work with at least Python 3.5 to be considered)
>

No one stepped forward for this on either the PyPy/Pyston or CPython side.

-Brett


>
> Anyone want the satisfaction of deprecating those benchmarks? How about
> writing a README file for what each of the benchmarks are for (which will
> become the README for the future GitHub repo)? And do we want the Pyston
> and PyPy folks to nominate benchmarks they think we really should add (with
> a wild hope of finally having a single suite that everyone at least starts
> from), or should some cpython devs look s at what PyPy and Pyston have and
> raid their benchmarks?
>
> Brett
>
>
>
>> Regards
>>
>> Antoine.
>>
>>
>> _______________________________________________
>> Speed mailing list
>> Speed at python.org
>> https://mail.python.org/mailman/listinfo/speed
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20160222/b378530e/attachment.html>