inheritance, multiple inheritance and the weaklist and instance dictionaries

Wed Feb 9 23:02:42 EST 2011

On 02/09/2011 08:40 PM, Carl Banks wrote:
> I explained why in my last post; there's a bunch of reasons.
> Generally you can't assume someone's going to go through the type
> structure to find the object's dict, nor can you expect inherited
> methods to always use the derived class's type structure (some methods
> might use their own type's tp_dictoffset or tp_weakreflist, which
> would be wrong if called from a superclass that changes those
> values).

Who do you mean by someone? The code is generated by a program. No human 
is required to touch it. If it needs to be updated, the program is 
simply run again with the updated specification file. Thus I can make 
those assumptions because I have total control over the code. The only 
thing I don't have control over is the Python code that imports the 
extension, but in Python, the user doesn't get to choose how they access 
the weaklist and instance dictionary.

> Even if you are careful to avoid such usage, the Python
> interpreter can't be sure.  So it has to check for layout conflicts,
> and these checks would become very complex if it allowed dict and
> weakreflist to appear in different locations in the layout (it's have
> to check a lot more).

What is so complex about this? It already uses "obj_instance + 
obj_instance->ob_type->tp_weaklistoffset". That's all the checking it 
needs. It only becomes a problem when trying to derive from two or more 
classes that already have these defined. In such a case the Python 
interpreter can't deduce what the values of tp_weaklistoffset and 
tp_dictoffset in the derived type should be, but it doesn't have to 
because my program tells it what they need to be.

> I would say you do.  Python's type system specifies that a derived
> type's layout is a superset of its base types' layout.  You seem to
> have found a way to derive a type without a common layout, perhaps by
> exploiting a bug, and you claim to be able to keep data access
> straight.  But Python types are not intended to work that way, and you
> are asking for trouble if you try to do it.

I'm not really circumventing this system (except for the varying 
location of the dictionaries. See the explanation below for that). 
Python allows variable-sized objects. Tuples and strings are variable 
sized. This allows them to store the data directly in the object instead 
of having a pointer to another location in memory. And the objects I 
generate are basically this:

struct MyObject {
     PyObject_HEAD
     storage_mode mode;
     char[x] opaque_data;
};

I use the real type instead of char[] when possible because it will have 
the proper alignment but I still treat it like a private hunk of memory 
that only my generate code will touch. What I store in opaque_data is up 
to me. I can store a copy of the wrapped type, or I can store a pointer 
to it. "mode" specifies what is in opaque_data. A derived type would 
look like this:

struct MyDerivedObject {
     PyObject_HEAD
     storage_mode mode;
     char[y] opaque_data;
};

Where y >= x. It's still the same layout. All that's left is some way 
for the original object to know what C++ type is stored in opaque_data. 
I could have used another variable like 'mode', but since there is a 
one-to-one correspondence between PyObject->ob_type and the type that is 
being wrapped, I can determine the type from ob_type instead.

There is no bug being exploited. The actual implementation is a little 
different than this, but the principle is the same. I said before that 
the layout varies, but that's only if you consider the contents of 
opaque_data, but that is neither Python's nor the user's concern.

> I guess there's also no point in arguing that tp_dictoffset and
> tp_weakreflist need to have the same value for base and derived types,
> since you're rejecting the premise that layouts need to be
> compatible.  Therefore, I'll only point out that the layout checking
> code is based on this premise, so that's why you're running afoul of
> it.

That's not what the Python documentation says. Under 
http://docs.python.org/c-api/typeobj.html#tp_weaklistoffset it says 
"This field is inherited by subtypes, but see the rules listed below. A 
subtype may override this offset; this means that the subtype uses a 
different weak reference list head than the base type. Since the list 
head is always found via tp_weaklistoffset, this should not be a 
problem." And under 
http://docs.python.org/c-api/typeobj.html#tp_dictoffset it says "This 
field is inherited by subtypes, but see the rules listed below. A 
subtype may override this offset; this means that the subtype instances 
store the dictionary at a difference offset than the base type. Since 
the dictionary is always found via tp_dictoffset, this should not be a 
problem."

> You claimed in another post you weren't trying to mimic the C++ type
> hierarchy in Python, but this line suggests you are.

When did I make that claim? Perhaps you misunderstood me I said "I 
kind-of already did. The issue only comes up when multiply-inheriting 
from types that have a different combination of the weaklist and 
instance dictionaries. I don't have to support this particular feature."

I was saying I kind-of already did mimic the C++ hierarchy. And when I 
said "this particular feature", I was talking about the thing I 
described in the immediately preceding sentence, not the C++ type hierarchy.