[Python-Dev] Identifier API

Stefan Behnel stefan_ml at behnel.de
Tue Oct 11 17:06:17 CEST 2011


"Martin v. Löwis", 08.10.2011 16:54:
> In benchmarking PEP 393, I noticed that many UTF-8 decode
> calls originate from C code with static strings, in particular
> PyObject_CallMethod. Many of such calls already have been optimized
> to cache a string object, however, PyObject_CallMethod remains
> unoptimized since it requires a char*.

Yes, I noticed that in Cython, too. We often use PyObject_CallMethod() as a 
fallback for optimistically optimised method calls when the expected fast 
path does not hit, and it always bugged me that this needs to generate a 
Python string on each call in order to look up the method.


> I propose to add an explicit API to deal with such identifiers.
> With this API,
>
> tmp = PyObject_CallMethod(result, "update", "O", other);
>
> would be replaced with
>
> PyObject *tmp;
> Py_identifier(update);
> ...
> tmp = PyObject_CallMethodId(result, &PyId_update, "O", other);
>
> Py_identifier expands to a struct
>
> typedef struct Py_Identifier {
> struct Py_Identifier *next;
> const char* string;
> PyObject *object;
> } Py_Identifier;
>
> string will be initialized by the compiler, next and object on
> first use.

As I understand it, the macro expands to both the ID variable declaration 
and the init-at-first-call code, right? This is problematic when more than 
one identifier is used, as some C compilers strictly require declarations 
to be written *before* any other code. I'm not sure how often users will 
need more than one identifier in a function, but if it's not too hard to 
come up with a way that avoids this problem all together, it would be 
better to do so right from the start.

Also note that existing code needs to be changed in order to take advantage 
of this. It might be possible to optimise PyObject_CallMethod() internally 
by making the lookup either reuse a number of cached Python strings, or by 
supporting a lookup of char* values in a dict *somehow*. However, this 
appears to be substantially more involved than just moving a smaller burden 
on the users.

Stefan



More information about the Python-Dev mailing list