[Python-Dev] Proposal for a common benchmark suite

Fri Apr 29 12:04:23 CEST 2011

Mark Shannon wrote:
> Maciej Fijalkowski wrote:
>> On Thu, Apr 28, 2011 at 11:10 PM, Stefan Behnel <stefan_ml at behnel.de>
>> wrote:
>>> M.-A. Lemburg, 28.04.2011 22:23:
>>>> Stefan Behnel wrote:
>>>>> DasIch, 28.04.2011 20:55:
>>>>>> the CPython
>>>>>> benchmarks have an extensive set of microbenchmarks in the pybench
>>>>>> package
>>>>> Try not to care too much about pybench. There is some value in it, but
>>>>> some of its microbenchmarks are also tied to CPython's interpreter
>>>>> behaviour. For example, the benchmarks for literals can easily be
>>>>> considered dead code by other Python implementations so that they may
>>>>> end up optimising the benchmarked code away completely, or at least
>>>>> partially. That makes a comparison of the results somewhat pointless.
>>>> The point of the micro benchmarks in pybench is to be able to compare
>>>> them one-by-one, not by looking at the sum of the tests.
>>>>
>>>> If one implementation optimizes away some parts, then the comparison
>>>> will show this fact very clearly - and that's the whole point.
>>>>
>>>> Taking the sum of the micro benchmarks only has some meaning
>>>> as very rough indicator of improvement. That's why I wrote pybench:
>>>> to get a better, more details picture of what's happening,
>>>> rather than trying to find some way of measuring "average"
>>>> use.
>>>>
>>>> This "average" is very different depending on where you look:
>>>> for some applications method calls may be very important,
>>>> for others, arithmetic operations, and yet others may have more
>>>> need for fast attribute lookup.
>>> I wasn't talking about "averages" or "sums", and I also wasn't trying
>>> to put
>>> down pybench in general. As it stands, it makes sense as a benchmark for
>>> CPython.
>>>
>>> However, I'm arguing that a substantial part of it does not make
>>> sense as a
>>> benchmark for PyPy and others. With Cython, I couldn't get some of the
>>> literal arithmetic benchmarks to run at all. The runner script simply
>>> bails
>>> out with an error when the benchmarks accidentally run faster than the
>>> initial empty loop. I imagine that PyPy would eventually even drop
>>> the loop
>>> itself, thus leaving nothing to compare. Does that tell us that PyPy is
>>> faster than Cython for arithmetic? I don't think it does.
>>>
>>> When I see that a benchmark shows that one implementation runs in
>>> 100% less
>>> time than another, I simply go *shrug* and look for a better
>>> benchmark to
>>> compare the two.
>>
>> I second here what Stefan says. This sort of benchmarks might be
>> useful for CPython, but they're not particularly useful for PyPy or
>> for comparisons (or any other implementation which tries harder to
>> optimize stuff away). For example a method call in PyPy would be
>> inlined and completely removed if method is empty, which does not
>> measure method call overhead at all. That's why we settled on
>> medium-to-large examples where it's more of an average of possible
>> scenarios than just one.
> 
> If CPython were to start incorporating any specialising optimisations,
> pybench wouldn't be much use for CPython.
> The Unladen Swallow folks didn't like pybench as a benchmark.

This is all true, but I think there's a general misunderstanding
of what pybench is.

I wrote pybench in 1997 when I was working on optimizing the
Python 1.5 implementation for use in an web application server.

At the time, we had pystone and that was a really poor benchmark
for determining of whether certain optimizations in the Python VM
and compiler made sense or not.

pybench was then improved and extended over the course of
several years and then added to Python 2.5 in 2006.

The benchmark is written as framework for micro benchmarks
based on the assumption of a non-optimizing (byte code)
compiler.

As such it may or may not work with an optimizing compiler.
The calibration part would likely have to be disabled for
an optimizing compiler (run with -C 0) and a new set of
benchmark tests would have to be added; one which tests
the Python implementation at a higher level than the
existing tests.

That last part is something people tend to forget: pybench
is not a monolithic application with a predefined and
fixed set of tests. It's a framework that can be extended
as needed.

All you have to do is add a new module with test classes
and import it in Setup.py.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 29 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-06-20: EuroPython 2011, Florence, Italy               52 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/