[Python-Dev] Where the speed is lost! (was: 1.6 speed)

Thu, 27 Apr 2000 18:12:15 -0400 (EDT)

>>>>> "CT" == Christian Tismer <tismer@tismer.com> writes:

  CT> Summary: We had two effects here.  Effect 1: Wasting time with
  CT> extra errors in instance creation.  Effect 2: Loss of locality
  CT> due to code size increase.

  CT> Solution to 1 is Jeremy's patch.  Solution to 2 could be a
  CT> little renaming of the one or the other module, in order to get
  CT> the default link order to support locality better.

  CT> Now everything is clear to me. My first attempts with reordering
  CT> could not reveal the loss with the instance stuff.

  CT> All together, Python 1.6 is a bit faster than 1.5.2 if we try to
  CT> get related code ordered better.

I reach a different conclusion.  The performance difference 1.5.2 and
1.6, measured with pystone and pybench, is so small that effects like
the order in which the compiler assembles the code make a difference.
I don't think we should make any non-trivial effort to improve
performance based on this kind of voodoo.

I also question the claim that the two effects here explain the
performance difference between 1.5.2 and 1.6.  Rather, they explain
the performance difference of pystone and pybench running on different
versions of the interpreter.  Saying that pystone is the same speed is
a far cry from saying that python is the same speed!  Remember that
performance on a benchmark is just that.  (It's like the old joke
about a person's IQ: It is a very good indicator of how well they did
on the IQ test.)

I think we could use better benchmarks of two sorts.  The pybench
microbenchmarks are quite helpful individually, though the overall
number isn't particularly meaningful.  However, these benchmarks are
sometimes a little too big to be useful.  For example, the instance
creation effect was tracked down by running this code:

class Foo:
    pass

for i in range(big_num):
    Foo()

The pybench test "CreateInstance" does all sorts of other stuff.  It
tests creation with and without an __init__ method.  It tests instance
deallocation (because all the created objected need to be dealloced,
too).  It also tests attribute assignment, since many of the __init__
methods make assignments.  

What would be better (and I'm not sure what priority should be placed
on doing it) is a set of nano-benchmarks that try to limit themselves
to a single feature or small set of features.  Guido suggested having
a hierarchy so that there are multiple nano-benchmarks for instance
creation, each identifying a particular effect, and a micro-benchmark
that is the aggregate of all these nano-benchmarks.

We could also use some better large benchmarks.  Using pystone is
pretty crude, because it doesn't necessarily measure the performance
of things we care about.  It would be better to have a collection of
5-10 apps that each do something we care about -- munging text files
or XML data, creating lots of objects, etc.

For example, I used the compiler package (in nondist/src/Compiler) to
compile itself.  Based on that benchmark, an interpreter built from
the current CVS tree is still 9-11% slower than 1.5.

Jeremy