[Python-Dev] Patching builtin_id to allow for C proxy objects?

Mon Jun 27 12:48:35 CEST 2011

Hi all.

I'm writing a module to proxy C++ objects into Python for a large C++
application. There are hundreds of thousands of C++ objects, some of
which are temporary while others are very long lived.

Currently every time one of these objects is accessed from Python, a
new "myproxy" instance is created. So if I were to access the same
field of an object twice, I would receive two python objects proxying
the same underlying C++ object. This messes up "id" and "is", and is
causing me issues when, for example, I run into circular references
when enoding json or otherwise attempt to determine whether two proxy
objects refer to the same C++ object.

I can't see how to cache the "myproxy" objects instead of returning
new instances - due to the architecture of the C++ application,
there's no weak reference support at all, and the number of objects is
very large.

My current plan would be for me to override the id builtin to return
the underlying C++ object instance pointer instead of the PyObject
instance pointer in the case of the "myproxy" object type, probably
using a new type method slot tp_id or similar. The old behaviour would
be unchanged for all other types, naturally. I'd also need to alter
ceval.c to use builtin_id instead of the raw pointer for comparison
when using PyCmp_IS and PyCmp_IS_NOT. I can see that there could very
well be many other sites throughout the C source where the pointer was
directly compared, and would cause interesting issues for me down the
line. I'm just not sure what else to try.

I'd like to know if I'm being laughably naive or not before I went
about this plan, and whether it'd be worthwhile contributing the patch
back, considering the number of potentially overridden-id-unaware
areas throught the rest of the python source base.

Thanks.
Tom.