[Python-Dev] Re: new bytecode results

Tim Peters tim.one@comcast.net
Sun, 02 Mar 2003 23:06:30 -0500


[Dan Wolfe]
> In the last year of lurking on this list, I've seen requests for a good
> python benchmark no less than 4 times - the most recent being damien
> morton's attempt to prove/disprove his optimizations.
>
> Having an "approved" good benchmark/realistic test program would make
> it easy to validate optimizations, and head off the consistent 'pystone
> is not a realistic benchmark' arguments that come up each time....

pystone is a very good benchmark for one thing:  testing the "general speed"
of the interpreter.  Perhaps because it *is* so atypical, it's hard to do
something that gives pystone a significant speed boost.  Rewriting the eval
loop several years ago managed to do that, and ruthlessly cutting slop out
of the dict implementation gave it an 8% boost more recently.  I can't
recall any other single thing that helped pystone as much as those.  Jim
Fulton claims that pystone is a good predictor of Zope speed on a new box,
and now that I know more about Zope than I used to, I believe that:  while
Zope may look like Python code, there are so many meta-tricks being played
under the covers that it's plausible that the only thing that really matters
is how fast you can get around the eval loop.

Anyway, several years ago I offered to collect and organize a set of
"typical" benchmarks.  Nobody responded, so that turned out to be a lot
easier than I thought it would be <wink>.

> Besides, it will take a 6 months just to agree to a basic framework,
> and another 6 months to work around all the "competitive optimization
> tricks" timbot has up his sleeve...

You can't help it.  If you know the code in advance, the implementation
*will* get warped to favor it.  The best you can hope for is that warping
won't be done at the expense of other code.  For example, if you decide to
reorder the eval loop case statements, and use pystone as your measure of
goodness, you'll end up with a different order than if you use test_descr.py
as your measure.  Is that cheating?  I suppose it depends on who's doing it
<wink>.