Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 580 -- The C call protocol

PEP:580
Title:The C call protocol
Author:Jeroen Demeyer <J.Demeyer at UGent.be>
Status:Draft
Type:Standards Track
Created:14-Jun-2018
Python-Version:3.8
Post-History:20-Jun-2018, 22-Jun-2018, 16-Jul-2018

Abstract

A new "C call" protocol is proposed. It is meant for classes representing functions or methods which need to implement fast calling. The goal is to generalize existing optimizations for built-in functions to arbitrary extension types.

In the reference implementation, this new protocol is used for the existing classes builtin_function_or_method and method_descriptor. However, in the future, more classes may implement it.

NOTE: This PEP deals only with CPython implementation details, it does not affect the Python language or standard library.

Motivation

Currently, the Python bytecode interpreter has various optimizations for calling instances of builtin_function_or_method, method_descriptor, method and function. However, none of these classes is subclassable. Therefore, these optimizations are not available to user-defined extension types.

If this PEP is implemented, then the checks for builtin_function_or_method and method_descriptor could be replaced by simply checking for and using the C call protocol. This simplifies existing code.

We also design the C call protocol such that it can easily be extended with new features in the future.

For more background and motivation, see PEP 579.

Basic idea

Currently, CPython has multiple optimizations for fast calling for a few specific function classes. Calling instances of these classes using a plain tp_call is slower than using the optimizations. The basic idea of this PEP is to allow user-defined extension types (not Python classes) to use these optimizations also, both as caller and as callee.

The existing class builtin_function_or_method and a few others use a PyMethodDef structure for describing the underlying C function and its signature. The first concrete change is that this is replaced by a new structure PyCCallDef. This stores some of the same information as a PyMethodDef, but with one important addition: the "parent" of the function (the class or module where it is defined). Note that PyMethodDef arrays are still used to construct functions/methods but no longer for calling them.

Second, we want that every class can use such a PyCCallDef for optimizing calls, so the PyTypeObject structure gains a tp_ccalloffset field giving an offset to a PyCCallDef * in the object structure and a flag Py_TPFLAGS_HAVE_CCALL indicating that tp_ccalloffset is valid.

Third, since we want to deal efficiently with unbound and bound methods too (as opposed to only plain functions), we need to handle __self__ too: after the PyCCallDef * in the object structure, there is a PyObject *self field. These two fields together are referred to as a PyCCallRoot structure.

The new protocol for efficiently calling objects using these new structures is called the "C call protocol".

New data structures

The PyTypeObject structure gains a new field Py_ssize_t tp_ccalloffset and a new flag Py_TPFLAGS_HAVE_CCALL. If this flag is set, then tp_ccalloffset is assumed to be a valid offset inside the object structure (similar to tp_weaklistoffset). It must be a strictly positive integer. At that offset, a PyCCallRoot structure appears:

typedef struct {
    PyCCallDef *cr_ccall;
    PyObject   *cr_self;     /* __self__ argument for methods */
} PyCCallRoot;

The PyCCallDef structure contains everything needed to describe how the function can be called:

typedef struct {
    uint32_t  cc_flags;
    PyCFunc   cc_func;    /* C function to call */
    PyObject *cc_parent;  /* class or module */
} PyCCallDef;

The reason for putting __self__ outside of PyCCallDef is that PyCCallDef is not meant to be changed after creating the function. A single PyCCallDef can be shared by an unbound method and multiple bound methods. This wouldn't work if we would put __self__ inside that structure.

NOTE: unlike tp_dictoffset we do not allow negative numbers for tp_ccalloffset to mean counting from the end. There does not seem to be a use case for it and it would only complicate the implementation.

Parent

The cc_parent field (accessed for example by a __parent__ or __objclass__ descriptor from Python code) can be any Python object. For methods of extension types, this is set to the class. For functions of modules, this is set to the module.

The parent serves multiple purposes: for methods of extension types, it is used for type checks like the following:

>>> list.append({}, "x")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor 'append' requires a 'list' object but received a 'dict'

PEP 573 specifies that every function should have access to the module in which it is defined. For functions of a module, this is given by the parent. For methods, this works indirectly through the class, assuming that the class has a pointer to the module.

The parent would also typically be used to implement __qualname__. The new C API function PyCCall_GenericGetQualname() does exactly that.

Custom classes are free to set cc_parent to whatever they want. It is only used by the C call protocol if the CCALL_OBJCLASS flag is set.

Using tp_print

We propose to replace the existing unused field tp_print by tp_ccalloffset. Since Py_TPFLAGS_HAVE_CCALL would not be added to Py_TPFLAGS_DEFAULT, this ensures full backwards compatibility for existing extension modules setting tp_print. It also means that we can require that tp_ccalloffset is a valid offset when Py_TPFLAGS_HAVE_CCALL is specified: we do not need to check tp_ccalloffset != 0. In future Python versions, we may decide that tp_print becomes tp_ccalloffset unconditionally, drop the Py_TPFLAGS_HAVE_CCALL flag and instead check for tp_ccalloffset != 0.

The C call protocol

We say that a class implements the C call protocol if it has the Py_TPFLAGS_HAVE_CCALL flag set (as explained above, it must then set tp_ccalloffset > 0). Such a class must implement __call__ as described in this section (in practice, this just means setting tp_call to PyCCall_Call).

The cc_func field is a C function pointer. Its precise signature depends on flags. Below are the possible values for cc_flags & CCALL_SIGNATURE together with the arguments that the C function takes. The return value is always PyObject *. The following are completely analogous to the existing PyMethodDef signature flags:

  • CCALL_VARARGS: cc_func(PyObject *self, PyObject *args)
  • CCALL_VARARGS | CCALL_KEYWORDS: cc_func(PyObject *self, PyObject *args, PyObject *kwds)
  • CCALL_FASTCALL: cc_func(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
  • CCALL_FASTCALL | CCALL_KEYWORDS: cc_func(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)
  • CCALL_NULLARG: cc_func(PyObject *self, PyObject *null) (the function takes no arguments but a NULL is passed to the C function)
  • CCALL_O: cc_func(PyObject *self, PyObject *arg)

The flag CCALL_FUNCARG may be combined with any of these. If so, the C function takes an additional argument as first argument which is the function object (the self in __call__). For example, we have the following signature:

  • CCALL_FUNCARG | CCALL_VARARGS: cc_func(PyObject *func, PyObject *self, PyObject *args)

NOTE: in the case of bound methods, it is currently unspecified whether the "function object" in the paragraph above refers to the bound method or the original function (which is wrapped by the bound method). In the reference implementation, the bound method is passed. In the future, this may change to the wrapped function. Despite this ambiguity, the implementation of bound methods guarantees that PyCCall_CCALLDEF(func) points to the PyCCallDef of the original function.

NOTE: unlike the existing METH_... flags, the CCALL_... constants do not necessarily represent single bits. So checking (cc_flags & CCALL_VARARGS) == 0 is not a valid way for checking the signature. There are also no guarantees of binary compatibility between Python versions for these flags.

Checking __objclass__

If the CCALL_OBJCLASS flag is set and if cr_self is NULL (this is the case for unbound methods of extension types), then a type check is done: the function must be called with at least one positional argument and the first (typically called self) must be an instance of cc_parent (which must be a class). If not, a TypeError is raised.

Self slicing

If cr_self is not NULL or if the flag CCALL_SLICE_SELF is not set in cc_flags, then the argument passed as self is simply cr_self.

If cr_self is NULL and the flag CCALL_SLICE_SELF is set, then the first positional argument is removed from args and instead passed as first argument to the C function. Effectively, the first positional argument is treated as __self__. If there are no positional arguments, TypeError is raised.

This process is called self slicing and a function is said to have self slicing if cr_self is NULL and CCALL_SLICE_SELF is set.

Note that a METH_NULLARG function with self slicing effectively has one argument, namely self. Analogously, a METH_O function with self slicing has two arguments.

Descriptor behavior

Classes supporting the C call protocol must implement the descriptor protocol in a specific way. This is required for an efficient implementation of bound methods: it allows sharing the PyCCallDef structure between bound and unbound methods. It is also needed for a correct implementation of _PyObject_GetMethod which is used by the LOAD_METHOD/CALL_METHOD optimization. First of all, if func supports the C call protocol, then func.__set__ must not be implemented.

Second, func.__get__ must behave as follows:

  • If cr_self is not NULL, then __get__ must be a no-op in the sense that func.__get__(obj, cls)(*args, **kwds) behaves exactly the same as func(*args, **kwds). It is also allowed for __get__ to be not implemented at all.
  • If cr_self is NULL, then func.__get__(obj, cls)(*args, **kwds) (with obj not None) must be equivalent to func(obj, *args, **kwds). In particular, __get__ must be implemented in this case. Note that this is unrelated to self slicing: obj may be passed as self argument to the C function or it may be the first positional argument.
  • If cr_self is NULL, then func.__get__(None, cls)(*args, **kwds) must be equivalent to func(*args, **kwds).

There are no restrictions on the object func.__get__(obj, cls). The latter is not required to implement the C call protocol for example. It only specifies what func.__get__(obj, cls).__call__ does.

For classes that do not care about __self__ and __get__ at all, the easiest solution is to assign cr_self = Py_None (or any other non-NULL value).

__name__ attribute

The C call protocol requires that the function has a __name__ attribute which is of type str (not a subclass).

Furthermore, this must be idempotent in the sense that getting the __name__ attribute twice in a row must return exactly the same Python object. This implies that it cannot be a temporary object, it must be stored somewhere. This is required because PyEval_GetFuncName and PyEval_GetFuncDesc use borrowed references to the __name__ attribute.

Generic API functions

This section lists the new public API functions dealing with the C call protocol.

  • int PyCCall_Check(PyObject *op): return true if op implements the C call protocol.

All the functions and macros below apply to any instance supporting the C call protocol. In other words, PyCCall_Check(func) must be true.

  • PyObject * PyCCall_Call(PyObject *func, PyObject *args, PyObject *kwds): call func with positional arguments args and keyword arguments kwds (kwds may be NULL). This function is meant to be put in the tp_call slot.
  • PyObject * PyCCall_FASTCALL(PyObject *func, PyObject *const *args, Py_ssize_t nargs, PyObject *kwds): call func with nargs positional arguments given by args[0], …, args[nargs-1]. The parameter kwds can be NULL (no keyword arguments), a dict with name:value items or a tuple with keyword names. In the latter case, the keyword values are stored in the args array, starting at args[nargs].

Macros to access the PyCCallRoot and PyCCallDef structures:

  • PyCCallRoot * PyCCall_CCALLROOT(PyObject *func): pointer to the PyCCallRoot structure inside func.
  • PyCCallDef * PyCCall_CCALLDEF(PyObject *func): shorthand for PyCCall_CCALLROOT(func)->cr_ccall.
  • PyCCallDef * PyCCall_FLAGS(PyObject *func): shorthand for PyCCall_CCALLROOT(func)->cr_ccall->cc_flags.
  • PyObject * PyCCall_SELF(PyOject *func): shorthand for PyCCall_CCALLROOT(func)->cr_self.

Generic getters, meant to be put into the tp_getset array:

  • PyObject * PyCCall_GenericGetParent(PyObject *func, void *closure): return cc_parent. Raise AttributeError if cc_parent is NULL.
  • PyObject * PyCCall_GenericGetQualname(PyObject *func, void *closure): return a string suitable for using as __qualname__. This uses the __qualname__ of cc_parent if possible. It also uses the __name__ attribute.
  • PyObject * PyCCall_GenericGetSelf(PyObject *func, void *closure): return cr_self. Raise AttributeError if cr_self is NULL.

Profiling

The profiling events c_call, c_return and c_exception are only generated when calling actual instances of builtin_function_or_method or method_descriptor. This is done for simplicity and also for backwards compatibility (such that the profile function does not receive objects that it does not recognize). In a future PEP, we may extend C-level profiling to arbitrary classes implementing the C call protocol.

Changes to built-in functions and methods

The reference implementation of this PEP changes the existing classes builtin_function_or_method and method_descriptor to use the C call protocol. In fact, those two classes are almost merged: the implementation becomes very similar, but they remain separate classes (mostly for backwards compatibility). The PyCCallDef structure is simply stored as part of the object structure. Both classes use PyCFunctionObject as object structure. This is the new layout:

typedef struct {
    PyObject_HEAD
    PyCCallDef  *m_ccall;
    PyObject    *m_self;         /* Passed as 'self' arg to the C function */
    PyCCallDef   _ccalldef;      /* Storage for m_ccall */
    PyObject    *m_name;         /* __name__; str object (not NULL) */
    PyObject    *m_module;       /* __module__; can be anything */
    const char  *m_doc;          /* __text_signature__ and __doc__ */
    PyObject    *m_weakreflist;  /* List of weak references */
} PyCFunctionObject;

For functions of a module and for unbound methods of extension types, m_ccall points to the _ccalldef field. For bound methods, m_ccall points to the PyCCallDef of the unbound method.

NOTE: the new layout of method_descriptor changes it such that it no longer starts with PyDescr_COMMON. This is purely an implementation detail and it should cause few (if any) compatibility problems.

C API functions

The following function is added (also to the stable ABI [1]):

  • PyObject * PyCFunction_ClsNew(PyTypeObject *cls, PyMethodDef *ml, PyObject *self, PyObject *module, PyObject *parent): create a new object with object structure PyCFunctionObject and class cls. This is called in turn by PyCFunction_NewEx and PyDescr_NewMethod.

The undocumented functions PyCFunction_GetFlags and PyCFunction_GET_FLAGS are removed because it would be non-trivial to support them in a backwards-compatible way.

Inheritance

Extension types inherit the type flag Py_TPFLAGS_HAVE_CCALL and the value tp_ccalloffset from the base class, provided that they implement tp_call and tp_descr_get the same way as the base class. Heap types never inherit the C call protocol because that would not be safe (heap types can be changed dynamically).

Performance

This PEP should not impact the performance of existing code (in the positive or negative sense). It is meant to allow efficient new code to be written, not to make existing code faster.

Stable ABI

None of the functions, structures or constants dealing with the C call protocol are added to the stable ABI [1].

There are two reasons for this: first of all, the most useful feature of the C call protocol is probably the METH_FASTCALL calling convention. Given that this is not even part of the public API (see also PEP 579, issue 6), it would be strange to add anything else from the C call protocol to the stable ABI.

Second, we want the C call protocol to be extensible in the future. By not adding anything to the stable ABI, we are free to do that without restrictions.

Backwards compatibility

There should be no difference at all for the Python interface, and neither for the documented C API (in the sense that all functions remain supported with the same functionality).

The removed function PyCFunction_GetFlags, is officially part of the stable ABI [1]. However, this is probably an oversight: first of all, it is not even documented. Second, the flag METH_FASTCALL is not part of the stable ABI but it is very common (because of Argument Clinic). So, if one cannot support METH_FASTCALL, it is hard to imagine a use case for PyCFunction_GetFlags. The fact that PyCFunction_GET_FLAGS and PyCFunction_GetFlags are not used at all by CPython outside of Objects/call.c further shows that these functions are not particularly useful.

Concluding: the only potential breakage is with C code which accesses the internals of PyCFunctionObject and PyMethodDescrObject. We expect very few problems because of this.

Rationale

Why is this better than PEP 575?

One of the major complaints of PEP 575 was that is was coupling functionality (the calling and introspection protocol) with the class hierarchy: a class could only benefit from the new features if it was a subclass of base_function. It may be difficult for existing classes to do that because they may have other constraints on the layout of the C object structure, coming from an existing base class or implementation details. For example, functools.lru_cache cannot implement PEP 575 as-is.

It also complicated the implementation precisely because changes were needed both in the implementation details and in the class hierarchy.

The current PEP does not have these problems.

Why store the function pointer in the instance?

The actual information needed for calling an object is stored in the instance (in the PyCCallDef structure) instead of the class. This is different from the tp_call slot or earlier attempts at implementing a tp_fastcall slot [2].

The main use case is built-in functions and methods. For those, the C function to be called does depend on the instance.

Note that the current protocol makes it easy to support the case where the same C function is called for all instances: just use a single static PyCCallDef structure for every instance.

Why CCALL_OBJCLASS?

The flag CCALL_OBJCLASS is meant to support various cases where the class of a self argument must be checked, such as:

>>> list.append({}, None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: append() requires a 'list' object but received a 'dict'

>>> list.__len__({})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__len__' requires a 'list' object but received a 'dict'

>>> float.__dict__["fromhex"](list, "0xff")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor 'fromhex' for type 'float' doesn't apply to type 'list'

In the reference implementation, only the first of these uses the new code. The other examples show that these kind of checks appear in multiple places, so it makes sense to add generic support for them.

Why CCALL_SLICE_SELF?

The flag CCALL_SLICE_SELF and the concept of self slicing are needed to support methods: the C function should not care whether it is called as unbound method or as bound method. In both cases, there should be a self argument and this is simply the first positional argument of an unbound method call.

For example, list.append is a METH_O method. Both the calls list.append([], 42) and [].append(42) should translate to the C call list_append([], 42).

Thanks to the proposed C call protocol, we can support this in such a way that both the unbound and the bound method share a PyCCallDef structure (with the CCALL_SLICE_SELF flag set).

Concluding, CCALL_SLICE_SELF has two advantages: there is no extra layer of indirection for calling and constructing bound methods does not require setting up a PyCCallDef structure.

Replacing tp_print

We repurpose tp_print as tp_ccalloffset because this makes it easier for external projects to backport the C call protocol to earlier Python versions. In particular, the Cython project has shown interest in doing that (see https://mail.python.org/pipermail/python-dev/2018-June/153927.html).

Alternative suggestions

PEP 576 is an alternative approach to solving the same problem as this PEP. See https://mail.python.org/pipermail/python-dev/2018-July/154238.html for comments on the difference between PEP 576 and PEP 580.

Reference implementation

The reference implementation can be found at https://github.com/jdemeyer/cpython/tree/pep580

References

[1](1, 2, 3) Löwis, PEP 384 – Defining a Stable ABI, https://www.python.org/dev/peps/pep-0384/
[2]Add tp_fastcall to PyTypeObject: support FASTCALL calling convention for all callable objects, https://bugs.python.org/issue29259
Source: https://github.com/python/peps/blob/master/pep-0580.rst