[Speed] Buildbot Status

Tue Jan 31 21:39:54 CET 2012

On Tue, Jan 31, 2012 at 15:04, Maciej Fijalkowski <fijall at gmail.com> wrote:

> On Tue, Jan 31, 2012 at 9:55 PM, Brett Cannon <brett at python.org> wrote:
> >
> >
> > On Tue, Jan 31, 2012 at 13:44, Maciej Fijalkowski <fijall at gmail.com>
> wrote:
> >>
> >> On Tue, Jan 31, 2012 at 7:40 PM, Brett Cannon <brett at python.org> wrote:
> >> >
> >> >
> >> > On Tue, Jan 31, 2012 at 11:58, Paul Graydon <paul at paulgraydon.co.uk>
> >> > wrote:
> >> >>
> >> >>
> >> >>> And this is a fundamental issue with tying benchmarks to real
> >> >>> applications and libraries; if the code the benchmark relies on
> never
> >> >>> changes to Python 3, then the benchmark is dead in the water. As
> >> >>> Daniel
> >> >>> pointed out, if spitfire simply never converts then either we need
> to
> >> >>> convert them ourselves *just* for  the benchmark (yuck), live w/o
> the
> >> >>> benchmark (ok, but if this happens to a bunch of benchmarks then we
> >> >>> are
> >> >>> going to not have a lot of data), or we look at making new
> benchmarks
> >> >>> based
> >> >>> on apps/libraries that _have_ made the switch to Python 3 (which
> means
> >> >>> trying to agree on some new set of  benchmarks to add to the current
> >> >>> set).
> >> >>>
> >> >>>
> >> >> What is the criteria by which the original benchmark sets were
> chosen?
> >> >>  I'm assuming it was because they're generally popular libraries
> >> >> amongst
> >> >> developers across a variety of purposes, so speed.pypy would show the
> >> >> speed
> >> >> of regular tasks?
> >> >
> >> >
> >> > That's the reason unladen swallow chose them, yes. PyPy then adopted
> >> > them
> >> > and added in the Twisted benchmarks.
> >> >
> >> >>
> >> >> If so, presumably it shouldn't be too hard to find appropriate
> >> >> libraries
> >> >> for Python 3?
> >> >
> >> >
> >> > Perhaps, but someone has to put in the effort to find those
> benchmarks,
> >> > code
> >> > them up, show how they are a reasonable workload, and then get them
> >> > accepted. Everyone likes the current set because the unladen team put
> in
> >> > a
> >> > lot of time and effort into selecting and creating those benchmarks.
> >>
> >> I think we also spent significant amount of time grabbing various
> >> benchmarks from various places (we = people who contributed to
> >> speed.pypy.org benchmark suite, that's by far not a group consisting
> >> only pypy devs).
> >
> >
> > Where does the PyPy benchmark code live, anyway?
>
> http://bitbucket.org/pypy/benchmarks
>
> >
> >>
> >>
> >> You might be surprised, but the criteria we used were mostly
> >> "contributed benchmarks showing some sort of real workload". I don't
> >> think we ever *rejected* a benchmark barring one case that was very
> >> variable and not very interesting (Depending on the HD performance).
> >> Some benchmarks were developed from "we know pypy is slow on this"
> >> scenarios as well.
> >
> >
> > Yeah, you and Alex have told me that in-person before.
> >
> >>
> >>
> >> The important part is that we want also "interesting" benchmarks to be
> >> included. This mostly means "run by someone somewhere" which includes
> >> a very broad category of things, but *excludes* fibonacci, richards,
> >> pystone and stuff like this. I think it's fine if we have a benchmark
> >> that runs python 3 version of whatever is there, but this requires
> >> work. Is there someone willing to do that work?
> >
> >
> > Right, I'm not suggesting something as silly as fibonacci.
> >
> > I think we need to first decide which set of benchmarks we are using
> since
> > there is already divergence between what is on hg.python.org and what is
> > measured at speed.pypy.org (e.g. hg.python.org tests 2to3 while pypy.org
> > does not, reverse goes for twisted). Once we know what set of benchmarks
> we
> > care about (it can be a cross-section), then we need to take a hard look
> at
> > where we are coming up short for Python 3. But  from a python-dev
> > perspective, benchmarks running against Python 2 are not interesting
> since
> > we are simply no longer developing performance improvements for Python
> 2.7.
>
> 2to3 is essentially an overlook on pypy side, we'll integrate it back.
> Other than that I think pypy benchmarks are mostly a superset (there
> is also pickle and a bunch of pointless microbenchmarks).
>

I think pickle was mostly for unladen's pickle performance patches (trying
saying that three times fast =), so I don't really care about that one.

Would it make sense to change the pypy repo to make the unladen_swallow
directory an external repo from hg.python.org/benchmarks? Because as it
stands right now there are two mako benchmarks that are not identical.
Otherwise we should talk at PyCon and figure this all out before we end up
with two divergent benchmark suites that are being independently maintained
(since we are all going to be running the same benchmarks on
speed.python.org).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/speed/attachments/20120131/def62d9b/attachment.html>