[Python-3000] Function call speed (Was: Cleaning up argument listparsing)

Talin talin at acm.org
Tue Apr 18 08:08:28 CEST 2006


Paul Moore <p.f.moore <at> gmail.com> writes:

> 
> On 4/17/06, Terry Reedy <tjreedy <at> udel.edu> wrote:
> >
> > "Talin" <talin <at> acm.org> wrote in message
> > news:loom.20060417T072709-664 <at> post.gmane.org...
> > > The important thing is that the behavior be clear and unambiguous,
> > > which I think this is.
> >
> > Many would also prefer that functions calls not become noticeable slower
> > than they already are.
> 
> How slow *are* function calls? There seems to be a common "function
> calls in Python are slow" meme. It's never been a significant issue
> for me, but my code is pretty much always IO-bound, so that doesn't
> surprise me. Are they slow enough to warrant serious effort to speed
> them up?
> 
> More specifically (and more on-topic ) is there any room for
> speeding up function calls in Python 3000? Given that pure performance
> improvements are clearly OK for 2.x, I'm thinking of cases where it
> would be necessary to impose language-level restrictions to ease the
> optimisation process.
> 
> Paul.

Well, I spent a little time looking at the function calling code. Admittedly,
there are quite a few things that I don't understand yet, however I did
notice a couple of things.

First, it isn't the Python language that's the problem, its the Python C API.
For various good reasons, the Python-to-Python function calling code goes
through a number of C API functions, such as PyObject_Call and friends.
This makes it very easy to call C code, and for C code to call Python.
However, it also means that the interface between caller and callee is
fairly abstract and heavy - except in a few specialized cases, arguments
have to be pulled off the stack, converted into tuples and dicts (with the
associated memory allocation hits), and then pushed back onto the
stack so that the caller can read them. This is especially significant
considering that most functions with keyword arguments generally only
have one or two of them, yet we are paying the price to create a whole
new dictionary object each time. (In fact, I would wager that if you
got rid of the dict and simply passed them in as an array of key/value
pairs, it would be a speed win - the cost of linearly searching the
names would be low compared to the cost of building the dict - for one
thing, the array would have a smaller cache-miss footprint. The few
cases where you have a lot of arguments would be rare enough that
they wouldn't dominate performance.)

It might be possible to write an algorithm that arranges the calling
arguments on the stack directly in the form that the caller needs. However,
in order for this to work, either the calling code would need to know a
lot more information about the callee's method signature, or vice versa,
the callee would need to know more about the arrangement of the
caller's arguments on the stack.

Making radical performance improvements in this area would be a fairly
hefty challenge; My approach to the problem would be to start by
limiting the number of design constraints, in particular stating from
the outset that we're going to break the C API. Then, what I'd do is
design an optimal performance pathway for internal calls, and then
create a new C API which sits on top of that. Code using the external
API would probably require a conversion step for arguments being
passed to the outside world, but I don't think such a conversion step
need be much more onerous or slow than what's being done already.

Again, I could be completely full of it here - so caveat lector :)

-- Talin




More information about the Python-3000 mailing list