From fijall at gmail.com Wed Feb 1 10:43:48 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 1 Feb 2012 11:43:48 +0200 Subject: [Speed] Buildbot Status In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

Message-ID: > > I think pickle was mostly for unladen's pickle performance patches (trying > saying that three times fast =), so I don't really care about that one. This is up for discussion whether pickle's performance matters or not (we have it disabled for example, but we might reenable it one day) > > Would it make sense to change the pypy repo to make the unladen_swallow > directory an external repo from hg.python.org/benchmarks? Because as it > stands right now there are two mako benchmarks that are not identical. > Otherwise we should talk at PyCon and figure this all out before we end up > with two divergent benchmark suites that are being independently maintained > (since we are all going to be running the same benchmarks on > speed.python.org). No, I think it's a bad idea. First benchmarks should not change. It's fine to have a py3k benchmark next to py2 one, but we have 0 checkins to US benchmarks once we imported them. Second, I find some of US benchmarks (those with hand unrolled loops, like json from the newer one) complete nonsense. If it does make any sense, it makes it only for cpython so we have no interest in having those benchmarks at all. If someone unrolls loops by hand and have a performance problem on pypy it's his own problem :) I have no idea why those benchmarks diverged. Probably because we did not work on the python's hg so people hacked there. Cheers, fijal From jnoller at gmail.com Wed Feb 1 12:24:06 2012 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 1 Feb 2012 06:24:06 -0500 Subject: [Speed] Buildbot Status In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

Message-ID: <2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com> On Feb 1, 2012, at 4:43 AM, Maciej Fijalkowski wrote: >> >> I think pickle was mostly for unladen's pickle performance patches (trying >> saying that three times fast =), so I don't really care about that one. > > This is up for discussion whether pickle's performance matters or not > (we have it disabled for example, but we might reenable it one day) > >> >> Would it make sense to change the pypy repo to make the unladen_swallow >> directory an external repo from hg.python.org/benchmarks? Because as it >> stands right now there are two mako benchmarks that are not identical. >> Otherwise we should talk at PyCon and figure this all out before we end up >> with two divergent benchmark suites that are being independently maintained >> (since we are all going to be running the same benchmarks on >> speed.python.org). > > No, I think it's a bad idea. First benchmarks should not change. It's > fine to have a py3k benchmark next to py2 one, but we have 0 checkins > to US benchmarks once we imported them. > > Second, I find some of US benchmarks (those with hand unrolled loops, > like json from the newer one) complete nonsense. If it does make any > sense, it makes it only for cpython so we have no interest in having > those benchmarks at all. If someone unrolls loops by hand and have a > performance problem on pypy it's his own problem :) Great attitude for a shared, common set of benchmarks. Remind me not to provide additional resources. > > I have no idea why those benchmarks diverged. Probably because we did > not work on the python's hg so people hacked there. > > Cheers, > fijal > _______________________________________________ > Speed mailing list > Speed at python.org > http://mail.python.org/mailman/listinfo/speed From fijall at gmail.com Wed Feb 1 12:33:44 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 1 Feb 2012 13:33:44 +0200 Subject: [Speed] Buildbot Status In-Reply-To: <2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com> References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com> Message-ID: On Wed, Feb 1, 2012 at 1:24 PM, Jesse Noller wrote: > > > On Feb 1, 2012, at 4:43 AM, Maciej Fijalkowski wrote: > >>> >>> I think pickle was mostly for unladen's pickle performance patches (trying >>> saying that three times fast =), so I don't really care about that one. >> >> This is up for discussion whether pickle's performance matters or not >> (we have it disabled for example, but we might reenable it one day) >> >>> >>> Would it make sense to change the pypy repo to make the unladen_swallow >>> directory an external repo from hg.python.org/benchmarks? Because as it >>> stands right now there are two mako benchmarks that are not identical. >>> Otherwise we should talk at PyCon and figure this all out before we end up >>> with two divergent benchmark suites that are being independently maintained >>> (since we are all going to be running the same benchmarks on >>> speed.python.org). >> >> No, I think it's a bad idea. First benchmarks should not change. It's >> fine to have a py3k benchmark next to py2 one, but we have 0 checkins >> to US benchmarks once we imported them. >> >> Second, I find some of US benchmarks (those with hand unrolled loops, >> like json from the newer one) complete nonsense. If it does make any >> sense, it makes it only for cpython so we have no interest in having >> those benchmarks at all. If someone unrolls loops by hand and have a >> performance problem on pypy it's his own problem :) > > Great attitude for a shared, common set of benchmarks. Remind me not to provide additional resources. Jesse, do you really write code like that: json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(DICT) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) json.dumps(TUPLE) sorry for the long paste, but this is how the benchmark looks like. I would like to have a common set of benchmarks, but a *reasonable* one. Don't complain if I call it nonsense. Cheers, fijal From mark at hotpy.org Wed Feb 1 12:52:45 2012 From: mark at hotpy.org (Mark Shannon) Date: Wed, 01 Feb 2012 11:52:45 +0000 Subject: [Speed] Buildbot Status In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com> Message-ID: <4F29278D.6090602@hotpy.org> Maciej Fijalkowski wrote: > On Wed, Feb 1, 2012 at 1:24 PM, Jesse Noller wrote: >> >> On Feb 1, 2012, at 4:43 AM, Maciej Fijalkowski wrote: >> >>>> I think pickle was mostly for unladen's pickle performance patches (trying >>>> saying that three times fast =), so I don't really care about that one. >>> This is up for discussion whether pickle's performance matters or not >>> (we have it disabled for example, but we might reenable it one day) >>> >>>> Would it make sense to change the pypy repo to make the unladen_swallow >>>> directory an external repo from hg.python.org/benchmarks? Because as it >>>> stands right now there are two mako benchmarks that are not identical. >>>> Otherwise we should talk at PyCon and figure this all out before we end up >>>> with two divergent benchmark suites that are being independently maintained >>>> (since we are all going to be running the same benchmarks on >>>> speed.python.org). >>> No, I think it's a bad idea. First benchmarks should not change. It's >>> fine to have a py3k benchmark next to py2 one, but we have 0 checkins >>> to US benchmarks once we imported them. >>> >>> Second, I find some of US benchmarks (those with hand unrolled loops, >>> like json from the newer one) complete nonsense. If it does make any >>> sense, it makes it only for cpython so we have no interest in having >>> those benchmarks at all. If someone unrolls loops by hand and have a >>> performance problem on pypy it's his own problem :) Doesn't PyPy have a loop re-rolling optimisation? :) >> Great attitude for a shared, common set of benchmarks. Remind me not to provide additional resources. > > Jesse, do you really write code like that: > > json.dumps(DICT) [snip - Manual loop unrolling (1970s style) :( ] > json.dumps(TUPLE) > I think the PyPy benchmarks are better than the US ones, some of which may have been aimed at a few particular cases where Google wanted to speed up CPython. Try running the json or pickle benchmarks with a profiler (oprofile or similar) and you will see that they spend most of their time in a single C module, not much of a benchmark for testing the VM. It might seem that the PyPy folks were cherry picking, but I don't think that is the case. Cheers, Mark. From fijall at gmail.com Wed Feb 1 13:00:00 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 1 Feb 2012 14:00:00 +0200 Subject: [Speed] Buildbot Status In-Reply-To: <4F29278D.6090602@hotpy.org> References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com> <4F29278D.6090602@hotpy.org> Message-ID: > Try running the json or pickle benchmarks with a profiler > (oprofile or similar) > and you will see that they spend most of their time in a single > C module, not much of a benchmark for testing the VM. For what is worth I do believe that it is an aspect of VM performance (for good or bad). FYI json module in pypy is pure python anyway. > > It might seem that the PyPy folks were cherry picking, > but I don't think that is the case. If so, then mostly for "what we're slow at". > > Cheers, > Mark. > > _______________________________________________ > Speed mailing list > Speed at python.org > http://mail.python.org/mailman/listinfo/speed From mark at hotpy.org Wed Feb 1 13:08:07 2012 From: mark at hotpy.org (Mark Shannon) Date: Wed, 01 Feb 2012 12:08:07 +0000 Subject: [Speed] Buildbot Status In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com> <4F29278D.6090602@hotpy.org> Message-ID: <4F292B27.8080501@hotpy.org> Maciej Fijalkowski wrote: >> Try running the json or pickle benchmarks with a profiler >> (oprofile or similar) >> and you will see that they spend most of their time in a single >> C module, not much of a benchmark for testing the VM. > > For what is worth I do believe that it is an aspect of VM performance > (for good or bad). FYI json module in pypy is pure python anyway. We should benchmark all the modules or none of them, why should json and pickle be special? Cheers, Mark. From fijall at gmail.com Wed Feb 1 13:11:11 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Wed, 1 Feb 2012 14:11:11 +0200 Subject: [Speed] Buildbot Status In-Reply-To: <4F292B27.8080501@hotpy.org> References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com> <4F29278D.6090602@hotpy.org> <4F292B27.8080501@hotpy.org> Message-ID: On Wed, Feb 1, 2012 at 2:08 PM, Mark Shannon wrote: > Maciej Fijalkowski wrote: >>> >>> Try running the json or pickle benchmarks with a profiler >>> (oprofile or similar) >>> and you will see that they spend most of their time in a single >>> C module, not much of a benchmark for testing the VM. >> >> >> For what is worth I do believe that it is an aspect of VM performance >> (for good or bad). FYI json module in pypy is pure python anyway. > > > We should benchmark all the modules or none of them, > why should json and pickle be special? > > Cheers, > Mark. Ideally all. I suppose "because people complain" is why json. A lot of "real world" apps would do something based on say itertools or functools. Is that considered common or not? Re is already benchmarked in few of those (html5lib and spambayes I believe). From brett at python.org Wed Feb 1 18:25:06 2012 From: brett at python.org (Brett Cannon) Date: Wed, 1 Feb 2012 12:25:06 -0500 Subject: [Speed] Buildbot Status In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com> Message-ID: On Wed, Feb 1, 2012 at 06:33, Maciej Fijalkowski wrote: > On Wed, Feb 1, 2012 at 1:24 PM, Jesse Noller wrote: > > > > > > On Feb 1, 2012, at 4:43 AM, Maciej Fijalkowski wrote: > > > >>> > >>> I think pickle was mostly for unladen's pickle performance patches > (trying > >>> saying that three times fast =), so I don't really care about that one. > >> > >> This is up for discussion whether pickle's performance matters or not > >> (we have it disabled for example, but we might reenable it one day) > >> > >>> > >>> Would it make sense to change the pypy repo to make the unladen_swallow > >>> directory an external repo from hg.python.org/benchmarks? Because as > it > >>> stands right now there are two mako benchmarks that are not identical. > >>> Otherwise we should talk at PyCon and figure this all out before we > end up > >>> with two divergent benchmark suites that are being independently > maintained > >>> (since we are all going to be running the same benchmarks on > >>> speed.python.org). > >> > >> No, I think it's a bad idea. First benchmarks should not change. It's > >> fine to have a py3k benchmark next to py2 one, but we have 0 checkins > >> to US benchmarks once we imported them. > >> > >> Second, I find some of US benchmarks (those with hand unrolled loops, > >> like json from the newer one) complete nonsense. If it does make any > >> sense, it makes it only for cpython so we have no interest in having > >> those benchmarks at all. If someone unrolls loops by hand and have a > >> performance problem on pypy it's his own problem :) > > > > Great attitude for a shared, common set of benchmarks. Remind me not to > provide additional resources. > > Jesse, do you really write code like that: > > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(DICT) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > json.dumps(TUPLE) > > sorry for the long paste, but this is how the benchmark looks like. I > would like to have a common set of benchmarks, but a *reasonable* one. > Don't complain if I call it nonsense. > I think Jesse's point has nothing to do with thinking the unladen benchmarks are perfect as-is and everything to do with divergence; instead of diverging from the unladen benchmarks it would have been nicer to simply fix them instead of having PyPy containing some changes that are legitimate but never added back to the unladen benchmark repo that the work started from. We have revision history so there is no need to keep a pristine version anywhere if that was the thinking. Otherwise I'm going to assume it's because of unladen being on svn back in the day or some historical reason. So, to prevent this from either ending up in a dead-end because of this, we need to first decide where the canonical set of Python VM benchmarks are going to live. I say hg.python.org/benchmarks for two reasons. One is that Antoine has already done work there to port some of the benchmarks so there is at least some there that are ready to be run under Python 3 (and the tooling is in place to create separate Python 2 and Python 3 benchmark suites). Two, this can be a test of having the various VM contributors work out of hg.python.org if we are ever going to break the stdlib out for shared development. At worst we can simply take the changes made at pypy/benchmarks that apply to just the unladen benchmarks that exists, and at best merge the two sets (manually) into one benchmark suite so PyPy doesn't lose anything for Python 2 measurements that they have written and CPython doesn't lose any of its Python 3 benchmarks that it has created. How does that sound? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at hotpy.org Wed Feb 1 21:33:28 2012 From: mark at hotpy.org (Mark Shannon) Date: Wed, 01 Feb 2012 20:33:28 +0000 Subject: [Speed] Buildbot Status In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com>

Message-ID: <4F29A198.1050201@hotpy.org> Brett Cannon wrote: > > [snip] > > So, to prevent this from either ending up in a dead-end because of this, > we need to first decide where the canonical set of Python VM benchmarks > are going to live. I say hg.python.org/benchmarks > for two reasons. One is that Antoine > has already done work there to port some of the benchmarks so there is > at least some there that are ready to be run under Python 3 (and the > tooling is in place to create separate Python 2 and Python 3 benchmark > suites). Two, this can be a test of having the various VM contributors > work out of hg.python.org if we are ever going to > break the stdlib out for shared development. At worst we can simply take > the changes made at pypy/benchmarks that apply to just the unladen > benchmarks that exists, and at best merge the two sets (manually) into > one benchmark suite so PyPy doesn't lose anything for Python 2 > measurements that they have written and CPython doesn't lose any of its > Python 3 benchmarks that it has created. > > How does that sound? > Very sensible. Cheers, Mark. From stefan_ml at behnel.de Thu Feb 2 09:21:11 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 02 Feb 2012 09:21:11 +0100 Subject: [Speed] Cython's view on a common benchmark suite (was: Re: Buildbot Status) In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com>

Message-ID: <4F2A4777.2080705@behnel.de> Brett Cannon, 01.02.2012 18:25: > to prevent this from either ending up in a dead-end because of this, we > need to first decide where the canonical set of Python VM benchmarks are > going to live. I say hg.python.org/benchmarks for two reasons. One is that > Antoine has already done work there to port some of the benchmarks so there > is at least some there that are ready to be run under Python 3 (and the > tooling is in place to create separate Python 2 and Python 3 benchmark > suites). Two, this can be a test of having the various VM contributors work > out of hg.python.org if we are ever going to break the stdlib out for > shared development. At worst we can simply take the changes made at > pypy/benchmarks that apply to just the unladen benchmarks that exists, and > at best merge the two sets (manually) into one benchmark suite so PyPy > doesn't lose anything for Python 2 measurements that they have written and > CPython doesn't lose any of its Python 3 benchmarks that it has created. > > How does that sound? +1 FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7) and that of hg.python.org (in Py2.7 and 3.3), but without codespeed integration and also without a dedicated server for benchmark runs. So the results are unfortunately not accurate enough to spot minor changes even over time. https://sage.math.washington.edu:8091/hudson/view/bench/ We would like to join in on speed.python.org, once it's clear how the benchmarks will be run and how the data uploads work and all that. It already proved a bit tricky to get Cython integrated with the benchmark runner on our side, and I'm planning to rewrite that integration at some point, but it should already be doable to get "something" to work now. I should also note that we don't currently support the whole benchmark suite, so there must be a way to record individual benchmark results even in the face of failures in other benchmarks. Basically, speed.python.org would be useless for us if a failure in a single benchmark left us without any performance data at all, because it will still take us some time to get to 100% compliance and we would like to know if anything on that road has a performance impact. Currently, we apply a short patch that adds a try-except to the benchmark runner's main loop before starting the measurements, because otherwise it would just bail out completely on a single failure. Oh, and we also patch the benchmarks to remove references to __file__ because of CPython issue 13429, although we may be able to work around that at some point, specifically when doing on-the-fly compilation during imports. http://bugs.python.org/issue13429 Also note that benchmarks that only test C implemented stdlib modules (re, pickle, json) are useless for Cython because they would only end up timing the exact same code as for plain CPython. Another test that is useless for us is the "mako" benchmark, because most of what it does is to run generated code. There is currently no way for Cython to hook into that, so we're out of the game here. We also don't care about program startup tests, obviously, because we know that Cython's compiler overhead plus an optimising gcc run will render them meaningless anyway. I like the fact that there's still an old hg_startup timing result lingering around from the time before I disabled that test, telling us that Cython runs it 99.68% slower than CPython. Got to beat that. 8-) Stefan From senger at rehfisch.de Thu Feb 2 09:56:06 2012 From: senger at rehfisch.de (Carsten Senger) Date: Thu, 02 Feb 2012 09:56:06 +0100 Subject: [Speed] Cython's view on a common benchmark suite In-Reply-To: <4F2A4777.2080705@behnel.de> References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com>

<4F2A4777.2080705@behnel.de> Message-ID: <4F2A4FA6.1050700@rehfisch.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Am 02.02.2012 09:21, schrieb Stefan Behnel: > Brett Cannon, 01.02.2012 18:25: >> to prevent this from either ending up in a dead-end because of >> this, we need to first decide where the canonical set of Python >> VM benchmarks are going to live. I say hg.python.org/benchmarks >> for two reasons. One is that Antoine has already done work there >> to port some of the benchmarks so there is at least some there >> that are ready to be run under Python 3 (and the tooling is in >> place to create separate Python 2 and Python 3 benchmark >> suites). Two, this can be a test of having the various VM >> contributors work out of hg.python.org if we are ever going to >> break the stdlib out for shared development. At worst we can >> simply take the changes made at pypy/benchmarks that apply to >> just the unladen benchmarks that exists, and at best merge the >> two sets (manually) into one benchmark suite so PyPy doesn't lose >> anything for Python 2 measurements that they have written and >> CPython doesn't lose any of its Python 3 benchmarks that it has >> created. >> >> How does that sound? > +1 > > FWIW, Cython currently uses both benchmark suites, that of PyPy > (in Py2.7) and that of hg.python.org (in Py2.7 and 3.3), but > without codespeed integration and also without a dedicated server > for benchmark runs. So the results are unfortunately not accurate > enough to spot minor changes even over time. > > https://sage.math.washington.edu:8091/hudson/view/bench/ > > We would like to join in on speed.python.org, once it's clear how > the benchmarks will be run and how the data uploads work and all > that. It already proved a bit tricky to get Cython integrated with > the benchmark runner on our side, and I'm planning to rewrite that > integration at some point, but it should already be doable to get > "something" to work now. I support Brett's plan to use the pypy python2 benchmarks and the glue code for codespeed integration, add the python3 compatible benchmarks from hg.python.org so that they do not change the python2 results and host it on hg.python.org. I'd work on merging the repositories. I'd also help to write a build factory for Cython to integrate it into the buildbot. You can look at the current CPython build factory how the build and upload works currently: https://bitbucket.org/pypy/buildbot/src/20f86228d582/bot2/pypybuildbot/builds.py#cl-427 ..Carsten - -- Carsten Senger - Schumannstr. 38 - 65193 Wiesbaden senger at rehfisch.de - (0611) 5324176 PGP: gpg --recv-keys --keyserver hkp://subkeys.pgp.net 0xE374C75A -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPKk+mAAoJEAOSv+HjdMdaYrwIAIedLSd/XmRSwJTZQCCuDgZt Et+miW95H2qnlys6JymCrdY25l7memlZ4XtpVgoswtND/oJU/Bk3+9aPGZy+djam TG9dUSYdUuPU9qaW8pjRWWoFR3+ChSzmOXmS3oSsaF0ZlH2HKnmOeJGfizzhJyHq 18B1Zb/Jnv1+giVch91f55LID/6XO8+Rtsjo0bD3ZrWnGdSO6e0G2F0krGShXkRs fg1r/FM0Dyk6+8d0Zf+4EVcINfzUnF0b1KefyO5d/bl9DfI6eEal1fiNuSJi7812 ht1kTl6VyL6sh2nenxBICyyo1cCGqflj8EbSVdLhYmyXsoGeMF+4UQHZrwjvg8U= =I0TQ -----END PGP SIGNATURE----- From fijall at gmail.com Thu Feb 2 10:11:17 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 2 Feb 2012 11:11:17 +0200 Subject: [Speed] Buildbot Status In-Reply-To: <4F29A198.1050201@hotpy.org> References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com>

<4F29A198.1050201@hotpy.org> Message-ID: On Wed, Feb 1, 2012 at 10:33 PM, Mark Shannon wrote: > Brett Cannon wrote: >> >> >> > [snip] >> >> >> So, to prevent this from either ending up in a dead-end because of this, >> we need to first decide where the canonical set of Python VM benchmarks are >> going to live. I say hg.python.org/benchmarks >> for two reasons. One is that Antoine has >> already done work there to port some of the benchmarks so there is at least >> some there that are ready to be ?run under Python 3 (and the tooling is in >> place to create separate Python 2 and Python 3 benchmark suites). Two, this >> can be a test of having the various VM contributors work out of >> hg.python.org if we are ever going to break the >> stdlib out for shared development. At worst we can simply take the changes >> made at pypy/benchmarks that apply to just the unladen benchmarks that >> exists, and at best merge the two sets (manually) into one benchmark suite >> so PyPy doesn't lose anything for Python 2 measurements that they have >> written and CPython doesn't lose any of its Python 3 benchmarks that it has >> created. >> >> How does that sound? >> > Very sensible. +1 from me as well. Note that "we'll have a common set of benchmarks at python.org" sounds way more pleasant than "use a subrepo from python.org". From fijall at gmail.com Thu Feb 2 12:09:31 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 2 Feb 2012 13:09:31 +0200 Subject: [Speed] Cython's view on a common benchmark suite (was: Re: Buildbot Status) In-Reply-To: <4F2A4777.2080705@behnel.de> References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com>

<4F2A4777.2080705@behnel.de> Message-ID: On Thu, Feb 2, 2012 at 10:21 AM, Stefan Behnel wrote: > Brett Cannon, 01.02.2012 18:25: >> to prevent this from either ending up in a dead-end because of this, we >> need to first decide where the canonical set of Python VM benchmarks are >> going to live. I say hg.python.org/benchmarks for two reasons. One is that >> Antoine has already done work there to port some of the benchmarks so there >> is at least some there that are ready to be ?run under Python 3 (and the >> tooling is in place to create separate Python 2 and Python 3 benchmark >> suites). Two, this can be a test of having the various VM contributors work >> out of hg.python.org if we are ever going to break the stdlib out for >> shared development. At worst we can simply take the changes made at >> pypy/benchmarks that apply to just the unladen benchmarks that exists, and >> at best merge the two sets (manually) into one benchmark suite so PyPy >> doesn't lose anything for Python 2 measurements that they have written and >> CPython doesn't lose any of its Python 3 benchmarks that it has created. >> >> How does that sound? > > +1 > > FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7) > and that of hg.python.org (in Py2.7 and 3.3), but without codespeed > integration and also without a dedicated server for benchmark runs. So the > results are unfortunately not accurate enough to spot minor changes even > over time. > > https://sage.math.washington.edu:8091/hudson/view/bench/ > > We would like to join in on speed.python.org, once it's clear how the > benchmarks will be run and how the data uploads work and all that. It > already proved a bit tricky to get Cython integrated with the benchmark > runner on our side, and I'm planning to rewrite that integration at some > point, but it should already be doable to get "something" to work now. Can you come up with a script that does "cython "? that would simplify a lot > > I should also note that we don't currently support the whole benchmark > suite, so there must be a way to record individual benchmark results even > in the face of failures in other benchmarks. Basically, speed.python.org > would be useless for us if a failure in a single benchmark left us without > any performance data at all, because it will still take us some time to get > to 100% compliance and we would like to know if anything on that road has a > performance impact. Currently, we apply a short patch that adds a > try-except to the benchmark runner's main loop before starting the > measurements, because otherwise it would just bail out completely on a > single failure. Oh, and we also patch the benchmarks to remove references > to __file__ because of CPython issue 13429, although we may be able to work > around that at some point, specifically when doing on-the-fly compilation > during imports. I think it's fine to mark certain benchmarks not to be runnable under certain platforms. For example it's not like jython will run twisted stuff. > > http://bugs.python.org/issue13429 > > Also note that benchmarks that only test C implemented stdlib modules (re, > pickle, json) are useless for Cython because they would only end up timing > the exact same code as for plain CPython. > > Another test that is useless for us is the "mako" benchmark, because most > of what it does is to run generated code. There is currently no way for > Cython to hook into that, so we're out of the game here. Well, if you want cython to be considered python I think this is a pretty crucial feature no? > > We also don't care about program startup tests, obviously, because we know > that Cython's compiler overhead plus an optimising gcc run will render them > meaningless anyway. I like the fact that there's still an old hg_startup > timing result lingering around from the time before I disabled that test, > telling us that Cython runs it 99.68% slower than CPython. Got to beat > that. 8-) That's probably okish. From fijall at gmail.com Thu Feb 2 12:12:26 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 2 Feb 2012 13:12:26 +0200 Subject: [Speed] Cython's view on a common benchmark suite (was: Re: Buildbot Status) In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com>

<4F2A4777.2080705@behnel.de> Message-ID: On Thu, Feb 2, 2012 at 1:09 PM, Maciej Fijalkowski wrote: > On Thu, Feb 2, 2012 at 10:21 AM, Stefan Behnel wrote: >> Brett Cannon, 01.02.2012 18:25: >>> to prevent this from either ending up in a dead-end because of this, we >>> need to first decide where the canonical set of Python VM benchmarks are >>> going to live. I say hg.python.org/benchmarks for two reasons. One is that >>> Antoine has already done work there to port some of the benchmarks so there >>> is at least some there that are ready to be ?run under Python 3 (and the >>> tooling is in place to create separate Python 2 and Python 3 benchmark >>> suites). Two, this can be a test of having the various VM contributors work >>> out of hg.python.org if we are ever going to break the stdlib out for >>> shared development. At worst we can simply take the changes made at >>> pypy/benchmarks that apply to just the unladen benchmarks that exists, and >>> at best merge the two sets (manually) into one benchmark suite so PyPy >>> doesn't lose anything for Python 2 measurements that they have written and >>> CPython doesn't lose any of its Python 3 benchmarks that it has created. >>> >>> How does that sound? >> >> +1 >> >> FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7) >> and that of hg.python.org (in Py2.7 and 3.3), but without codespeed >> integration and also without a dedicated server for benchmark runs. So the >> results are unfortunately not accurate enough to spot minor changes even >> over time. >> >> https://sage.math.washington.edu:8091/hudson/view/bench/ >> >> We would like to join in on speed.python.org, once it's clear how the >> benchmarks will be run and how the data uploads work and all that. It >> already proved a bit tricky to get Cython integrated with the benchmark >> runner on our side, and I'm planning to rewrite that integration at some >> point, but it should already be doable to get "something" to work now. > > Can you come up with a script that does "cython

"? > that would simplify a lot > >> >> I should also note that we don't currently support the whole benchmark >> suite, so there must be a way to record individual benchmark results even >> in the face of failures in other benchmarks. Basically, speed.python.org >> would be useless for us if a failure in a single benchmark left us without >> any performance data at all, because it will still take us some time to get >> to 100% compliance and we would like to know if anything on that road has a >> performance impact. Currently, we apply a short patch that adds a >> try-except to the benchmark runner's main loop before starting the >> measurements, because otherwise it would just bail out completely on a >> single failure. Oh, and we also patch the benchmarks to remove references >> to __file__ because of CPython issue 13429, although we may be able to work >> around that at some point, specifically when doing on-the-fly compilation >> during imports. > > I think it's fine to mark certain benchmarks not to be runnable under > certain platforms. For example it's not like jython will run twisted > stuff. > >> >> http://bugs.python.org/issue13429 >> >> Also note that benchmarks that only test C implemented stdlib modules (re, >> pickle, json) are useless for Cython because they would only end up timing >> the exact same code as for plain CPython. >> >> Another test that is useless for us is the "mako" benchmark, because most >> of what it does is to run generated code. There is currently no way for >> Cython to hook into that, so we're out of the game here. > > Well, if you want cython to be considered python I think this is a > pretty crucial feature no? > >> >> We also don't care about program startup tests, obviously, because we know >> that Cython's compiler overhead plus an optimising gcc run will render them >> meaningless anyway. I like the fact that there's still an old hg_startup >> timing result lingering around from the time before I disabled that test, >> telling us that Cython runs it 99.68% slower than CPython. Got to beat >> that. 8-) > > That's probably okish. Stefan, can you please not cross-post between mailing lists? Not everyone is subscribed and people reading would get a confusing half-of-the-world view. Cheers, fijal From stefan_ml at behnel.de Thu Feb 2 14:31:25 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 02 Feb 2012 14:31:25 +0100 Subject: [Speed] Cython's view on a common benchmark suite In-Reply-To: References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>

<2403B6B2-6077-4061-B85E-C32B8268B6FC@gmail.com>

<4F2A4777.2080705@behnel.de> Message-ID: <4F2A902D.6060208@behnel.de> Maciej Fijalkowski, 02.02.2012 12:12: > On Thu, Feb 2, 2012 at 1:09 PM, Maciej Fijalkowski wrote: >> On Thu, Feb 2, 2012 at 10:21 AM, Stefan Behnel wrote: >>> We would like to join in on speed.python.org, once it's clear how the >>> benchmarks will be run and how the data uploads work and all that. It >>> already proved a bit tricky to get Cython integrated with the benchmark >>> runner on our side, and I'm planning to rewrite that integration at some >>> point, but it should already be doable to get "something" to work now. >> >> Can you come up with a script that does "cython

"? >> that would simplify a lot Yes, I have something like that, but it's a whole bunch of "do this, add that, then run something". It mostly works (as you can see from the link above), but it needs some serious reworking. Basically, it compiles and starts the main program, and then enables on-the-fly compilation of modules in sitecustomize.py by registering an import hook. I'll see if I can get the script wrapped up a tiny bit so that it becomes usable for speed.python.org. Any way I could get an account on the machine? Would make it easier to test it there. >>> I should also note that we don't currently support the whole benchmark >>> suite, so there must be a way to record individual benchmark results even >>> in the face of failures in other benchmarks. Basically, speed.python.org >>> would be useless for us if a failure in a single benchmark left us without >>> any performance data at all, because it will still take us some time to get >>> to 100% compliance and we would like to know if anything on that road has a >>> performance impact. Currently, we apply a short patch that adds a >>> try-except to the benchmark runner's main loop before starting the >>> measurements, because otherwise it would just bail out completely on a >>> single failure. Oh, and we also patch the benchmarks to remove references >>> to __file__ because of CPython issue 13429, although we may be able to work >>> around that at some point, specifically when doing on-the-fly compilation >>> during imports. >> >> I think it's fine to mark certain benchmarks not to be runnable under >> certain platforms. For example it's not like jython will run twisted >> stuff. ... oh, and we'd like to know when it suddenly starts working. ;) So, I think catching and ignoring (or logging) errors is the best way to go about it. >>> Another test that is useless for us is the "mako" benchmark, because most >>> of what it does is to run generated code. There is currently no way for >>> Cython to hook into that, so we're out of the game here. >> >> Well, if you want cython to be considered python I think this is a >> pretty crucial feature no? Oh, we have that feature, it's called CPython. The thing is that Cython doesn't get to see the generated sources, so it won't compile them and instead, CPython ends up executing the code at normal interpreted speed. So there's nothing gained by running the benchmark at all. And even if we found a way to hook into this machinery, I doubt that the static compiler overhead would make this any useful. The whole purpose of generating code is that it likely will not look the same the next time you do it (well, outside of benchmarks, that is), so even a cache is unlikely to help much for real code. It's like PyPy running code in interpreting mode before it gets compiled, except that Cython will never compile this code, even if it turns out to be worth it. Personally, I rather consider it a feature that users can employ exec() from their Cython code to run code in plain CPython (for whatever reason). Stefan From fijall at gmail.com Thu Feb 2 14:35:26 2012 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 2 Feb 2012 15:35:26 +0200 Subject: [Speed] Cython's view on a common benchmark suite In-Reply-To: <4F2A902D.6060208@behnel.de> References: <4F21B5AE.2080304@rehfisch.de>

<4F281DA4.7010300@paulgraydon.co.uk>