[Python-Dev] Proposal for a common benchmark suite

Fri Apr 29 12:22:23 CEST 2011

On 29/04/2011 11:04, M.-A. Lemburg wrote:
> Mark Shannon wrote:
>> Maciej Fijalkowski wrote:
>>> On Thu, Apr 28, 2011 at 11:10 PM, Stefan Behnel<stefan_ml at behnel.de>
>>> wrote:
>>>> M.-A. Lemburg, 28.04.2011 22:23:
>>>>> Stefan Behnel wrote:
>>>>>> DasIch, 28.04.2011 20:55:
>>>>>>> the CPython
>>>>>>> benchmarks have an extensive set of microbenchmarks in the pybench
>>>>>>> package
>>>>>> Try not to care too much about pybench. There is some value in it, but
>>>>>> some of its microbenchmarks are also tied to CPython's interpreter
>>>>>> behaviour. For example, the benchmarks for literals can easily be
>>>>>> considered dead code by other Python implementations so that they may
>>>>>> end up optimising the benchmarked code away completely, or at least
>>>>>> partially. That makes a comparison of the results somewhat pointless.
>>>>> The point of the micro benchmarks in pybench is to be able to compare
>>>>> them one-by-one, not by looking at the sum of the tests.
>>>>>
>>>>> If one implementation optimizes away some parts, then the comparison
>>>>> will show this fact very clearly - and that's the whole point.
>>>>>
>>>>> Taking the sum of the micro benchmarks only has some meaning
>>>>> as very rough indicator of improvement. That's why I wrote pybench:
>>>>> to get a better, more details picture of what's happening,
>>>>> rather than trying to find some way of measuring "average"
>>>>> use.
>>>>>
>>>>> This "average" is very different depending on where you look:
>>>>> for some applications method calls may be very important,
>>>>> for others, arithmetic operations, and yet others may have more
>>>>> need for fast attribute lookup.
>>>> I wasn't talking about "averages" or "sums", and I also wasn't trying
>>>> to put
>>>> down pybench in general. As it stands, it makes sense as a benchmark for
>>>> CPython.
>>>>
>>>> However, I'm arguing that a substantial part of it does not make
>>>> sense as a
>>>> benchmark for PyPy and others. With Cython, I couldn't get some of the
>>>> literal arithmetic benchmarks to run at all. The runner script simply
>>>> bails
>>>> out with an error when the benchmarks accidentally run faster than the
>>>> initial empty loop. I imagine that PyPy would eventually even drop
>>>> the loop
>>>> itself, thus leaving nothing to compare. Does that tell us that PyPy is
>>>> faster than Cython for arithmetic? I don't think it does.
>>>>
>>>> When I see that a benchmark shows that one implementation runs in
>>>> 100% less
>>>> time than another, I simply go *shrug* and look for a better
>>>> benchmark to
>>>> compare the two.
>>> I second here what Stefan says. This sort of benchmarks might be
>>> useful for CPython, but they're not particularly useful for PyPy or
>>> for comparisons (or any other implementation which tries harder to
>>> optimize stuff away). For example a method call in PyPy would be
>>> inlined and completely removed if method is empty, which does not
>>> measure method call overhead at all. That's why we settled on
>>> medium-to-large examples where it's more of an average of possible
>>> scenarios than just one.
>> If CPython were to start incorporating any specialising optimisations,
>> pybench wouldn't be much use for CPython.
>> The Unladen Swallow folks didn't like pybench as a benchmark.
> This is all true, but I think there's a general misunderstanding
> of what pybench is.

pybench proved useful for IronPython. It certainly highlighted some 
performance problems with some of the basic operations it measures.

All the best,

Michael Foord

> I wrote pybench in 1997 when I was working on optimizing the
> Python 1.5 implementation for use in an web application server.
>
> At the time, we had pystone and that was a really poor benchmark
> for determining of whether certain optimizations in the Python VM
> and compiler made sense or not.
>
> pybench was then improved and extended over the course of
> several years and then added to Python 2.5 in 2006.
>
> The benchmark is written as framework for micro benchmarks
> based on the assumption of a non-optimizing (byte code)
> compiler.
>
> As such it may or may not work with an optimizing compiler.
> The calibration part would likely have to be disabled for
> an optimizing compiler (run with -C 0) and a new set of
> benchmark tests would have to be added; one which tests
> the Python implementation at a higher level than the
> existing tests.
>
> That last part is something people tend to forget: pybench
> is not a monolithic application with a predefined and
> fixed set of tests. It's a framework that can be extended
> as needed.
>
> All you have to do is add a new module with test classes
> and import it in Setup.py.
>

-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html