[pypy-dev] speed.pypy.org launched

Fri Feb 26 13:30:25 CET 2010

Carl Friedrich Bolz, 26.02.2010 11:25:
> http://buytaert.net/files/oopsla07-georges.pdf

It's sad that the paper doesn't try to understand *why* others use
different ways to benchmark. They even admit at the end that their
statistical approach is only really interesting when the differences are
small enough, not mentioning at that point that the system must be complex
enough also, such as the Sun JVM. However, if the differences are small and
the benchmarked system is complex, it's best to question the benchmark in
the first place, rather than the statistics that lead to its results.

Anyway, I agree that, given the complexity of at least some of the
benchmarks in the suite, and given the requirement to do continuous
benchmarking to find both small and large differences, taking statistically
relevant run lines makes sense.

Stefan