[Speed] Are benchmarks and libraries mutable?

Sat Sep 1 20:57:16 CEST 2012

On Sat, 1 Sep 2012 13:21:36 -0400
Brett Cannon <brett at python.org> wrote:
> 
> One is moving benchmarks from PyPy over to the unladen repo on
> hg.python.org/benchmarks. But I wanted to first make sure people don't view
> the benchmarks as immutable (e.g. as Octane does:
> https://developers.google.com/octane/faq). Since the benchmarks are always
> relative between two interpreters their immutability isn't critical
> compared to if we were to report some overall score. But it also means that
> any changes made would throw off historical comparisons. For instance, if I
> take PyPy's Mako benchmark (which does a lot more work), should it be named
> mako_v2, or should we just replace mako wholesale?

mako_v2 sounds fine to me. Mutating benchmarks makes things confusing:
one person may report that interpreter A is faster than interpreter B
on a given benchmark, and another person retort that no, interpreter B
is faster than interpreter A.

Besides, if you want to have useful timelines on speed.p.o, you
definitely need stable benchmarks.

> And the second is the same question for libraries. For instance, the
> unladen benchmarks have Django 1.1a0 as the version which is rather
> ancient. And with 1.5 coming out with provisional Python 3 support I
> obviously would like to update it. But the same questions as with
> benchmarks crops up in reference to immutability.

django_v2 sounds fine too :)

> (e.g. I will have to probably update the 2.7 code to use
> io.BytesIO instead of StringIO.StringIO to be on more equal footing).

I disagree. If io.BytesIO is faster than StringIO.StringIO then it's
normal for the benchmark results to reflect that (ditto if it's slower).

> If we can't find a reasonable way to handle all of this then what I will do
> is branch the unladen benchmarks for 2.x/3.x benchmarking, and then create
> another branch of the benchmark suite to just be for Python 3.x so that we
> can start fresh with a new set of benchmarks that will never change
> themselves for benchmarking Python 3 itself.

Why not simply add Python 3-specific benchmarks to the mix?
You can then create a "py3" benchmark suite in perf.py (and perhaps
also a "py2" one).

Regards

Antoine.

-- 
Software development and contracting: http://pro.pitrou.net