[Python-Dev] explanations for more pybench slowdowns

Guido van Rossum guido@digicool.com
Fri, 18 May 2001 17:58:25 -0400


> The scary thing about BuiltinFunctinoCalls is that the profiler shows
> it spending almost 30% of its time in PyArg_ParseTuple().  It
> certainly is a shame that we have this complicated, slow run-time
> parsing mechanism to deal with a static property of the code, namely
> how many arguments it takes and whether their types are.

I would love to see a mechanism whereby the signature of a C function
could be stored as part of the static info about it, in an extension
of the PyMethodDef structure: this would serve as documentation, allow
for introspection, etc.  I'm sure Ping would love this for pydoc and
his inspect module.

But I'm not sure how much we can speed things up, unless we give up on
the tuple interface (an argc/argv API could be much faster since
usually the arguments are already on the frame's stack in this form).

> A few of the other tests, SimpleComplexArithmetic and
> CreateStringsWithConcat, are slower because of the new coercion
> logic.  I didn't spend much time on SimpleComplexArithmetic, but I did
> look at CreateStringsWithConcat in some detail.  The basic problem is
> that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls
> PyNumber_Add("ab", "cd").  This function tries all sorts of different
> ways to coerce the strings into addable numbers before giving up and
> trying sequence concat.
> 
> It looks like the new coercion rules have optimized number ops at the
> expense of string ops.  If you're writing programs with lots of
> numbers, you probably think that's peachy.  If you're parsing HTML,
> perhaps you don't :-).
> 
> I looked at the test suite to see how often it is called with
> non-number arguments.  The answer is 77% of the time, but almost all
> of those calls are from test_unicodedata.  If that one test is
> excluded, the majority of the calls (~90%) are with numbers.  But the
> majority of those calls just come from a few tests -- test_pow,
> test_long, test_mutants, test_strftime.
> 
> If I were to do something about the coercions, I would see if there
> was a way to quickly determine that PyNumber_Add() ain't gonna have
> any luck.  Then we could bail to things like string_concat more
> quickly.

There's already a special case for int+int in the BINARY_ADD opcode
(otherwise you would probably see more numbers).  Maybe another
special case for str+str would help here?

> I also looked at SmallLists.  It seems that the only significant
> change since 1.5.2 is the garbage collection.  This tests spends a lot
> more time deallocating lists than it used to, and the only change I
> see in the code is the GC.  I assume, but haven't checked, that the
> story is similar for SmallTuples.
> 
> So the primary things that have slowed down since 1.5.2 seem to be:
> comparisons, coercion, and memory management for containers.  These
> also seem to be the things that have improved the most in terms of
> features, completeness, etc.  Looks like we need to revisit them and
> sort out the performance issues.

Thanks for doing all this work, Jeremy!

I just hope that these performance hacks won't have to be redone when
I'm done with healing the types/class split.  I'm expecting that
things can become a lot simpler if everything inherits from Object,
sequences inherit from Sequence, and so on.  But since I'm currently
going slow on this work, I won't complain too much if the existing
code gets optimized first.  The stuff you already checked in looks
good!

--Guido van Rossum (home page: http://www.python.org/~guido/)