[Python-Dev] Speed up function calls

Mon Jan 24 09:11:05 CET 2005

[Neal Norwitz]
> I would like feedback on whether the approach is desirable.
> 
> The patch adds a new method type (flags) METH_ARGS that is used in
> PyMethodDef. METH_ARGS means the min and max # of arguments are
> specified in the PyMethodDef by adding 2 new fields. This information
> can be used in ceval to
> call the method. No tuple packing/unpacking is required since the C
> stack is used.
> 
> The benefits are:
>  * faster function calls
>  * simplify function call machinery by removing METH_NOARGS, METH_O,
> and possibly METH_VARARGS
>  * more introspection info for C functions (ie, min/max arg count)
> (not implemented)

An additional benefit would be improving the C-API by allowing C calls
without creating temporary argument tuples.  Also, some small degree of
introspection becomes possible when a method knows its own arity.

Replacing METH_O and METH_NOARGS seems straight-forward, but
METH_VARARGS has much broader capabilities.  How would you handle the
simple case of "O|OO"?  How could you determine useful default values
(NULL, 0, -1, -909, etc.)?

If you solve the default value problem, then please also try to come up
with a better flag name than METH_ARGS which I find to be indistinct
from METH_VARARGS and also not very descriptive of its functionality.
Perhaps something like METH_UNPACKED would be an improvement.

> The drawbacks are:
>  * the defn of the MethodDef (# args) is separate from the function
defn
>  * potentially more error prone to write C methods???

No worse than with METH_O or METH_NOARGS.

> I've measured between 13-22% speed improvement (debug build on
> Operton) when doing simple tests like:
> 
>      ./python ./Lib/timeit.py -v 'pow(3, 5)'
> 
> I think the difference tends to be fairly constant at about .3 usec
per
> loop.

If speed is the main advantage being sought, it would be worthwhile to
conduct more extensive timing tests with a variety of code and not using
a debug build.  Running test.test_decimal would be a useful overall
benchmark.

In theory, I don't see how you could improve on METH_O and METH_NOARGS.
The only saving is the time for the flag test (a predictable branch).
Offsetting that savings is the additional time for checking min/max args
and for constructing a C call with the appropriate number of args.  I
suspect there is no savings here and that the timings will get worse.

In all likelihood, the only real opportunity for savings is replacing
METH_VARARGS in cases that have already been sped-up using
PyTuple_Unpack().  Those can be further improved by eliminating the time
to build and unpack the temporary argument tuple.

Even then, I don't see how to overcome the need to set useful default
values for optional object arguments.

Raymond Hettinger