tp_base, ob_type, and tp_bases

Carl Banks pavlovevidence at gmail.com
Sun Jan 18 05:40:29 EST 2009


On Jan 17, 8:12 am, Jeff McNeil <j... at jmcneil.net> wrote:
> On Jan 17, 11:09 am, Jeff McNeil <j... at jmcneil.net> wrote:
>
>
>
> > On Jan 17, 10:50 am, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
>
> > > > So, the documentation states that ob_type is a pointer to the type's
> > > > type, or metatype. Rather, this is a pointer to the new type's
> > > > metaclass?
>
> > > That's actually the same. *Every* ob_type field points to the object's
> > > type, e.g. for strings, integers, tuples, etc. That includes type
> > > objects, where ob_type points to the type's type, i.e. it's meta-type,
> > > also called metaclass (as "class" and "type" are really synonyms).
>
> > > > Next, we have tp_base.  That's defined as "an optional pointer to a
> > > > base type from which type properties are inherited."  The value of
> > > > tp_base is then added to the tp_bases tuple.  This is confusing me. On
> > > > the surface, it sound as though they're one in the same?
>
> > > (I don't understand the English "one in the same" - interpreting it
> > > as "as though they should be the same")
>
> > > No: tp_bases is a tuple of all base types (remember, there is multiple
> > > inheritance); tp_base (if set) provides the first base type.
>
> > > > I *think* (and dir() of a subclass of type seems to back it up), that
> > > > tp_base is only at play when the object in question inherits from
> > > > type?
>
> > > No - it is the normal case for single inheritance. You can leave it
> > > NULL, which means you inherit from object.
>
> > > Regards,
> > > Martin
>
> > Thank you! It was tp_base that was confusing me.  The tp_bases member
> > makes sense as Python supports multiple inheritance.  It wasn't
> > immediately clear that tp_base is there for single inheritance
> > reasons. It's all quite clear now.
>
> > Is that an optimization of sorts?
>
> Well, maybe not specifically for single inheritance reasons, I just
> didn't see an immediate reason to keep a separate pointer to the first
> base type.


The reason you need a separate tp_base is because it doesn't
necessarily point to the first base type; rather, it points to the
first base type that has added any fields or slots to its internal
layout (in other words, the first type with a tp_basicsize > 8, on 32-
bit versions).  I believe this is mainly for the benefit of Python
subclasses that define their own slots.  The type constructor will
begin adding slots at an offset of tp_base->tp_basicsize.

To see an example, int objects have a tp_basicsize of 12 (there are 4
extra bytes for the interger).  So if you multiply-inherit from int
and a Python class, int will always be tp_base.

class A(object): pass

class B(int,A): pass
print B.__base__ # will print <type 'int'>

class C(A,int): pass
print C.__base__ # will print <type 'int'>


A related issue is that you can't multiply inherit from two types that
have tp_basicsize > 8 unless one of them inherits from the other.
There can be only one tp_base.  For instance:

class D(int,tuple): pass # will raise TypeError

class E(object):
    __slots__ = ['a','b']

class F(object):
    __slots__ = ['c','d']

class G(E,G): pass # will raise TypeError

class H(E,int): pass  # will raise TypeError

....

Here's a bit more background (and by "a bit" I mean "a lot"):

In 32-bit Python, objects of types defined in Python are usually only
16 bytes long.  The layout looks like this.

instance dict
weak reference list
reference count
ob_type

The reference count, which is always the thing that the actual
PyObject* points at, isn't actually the first item in the object's
layout.  The dict and weakref list are stored at a lower address.
(There's a reason for it.)

If a Python class defines any __slots__, the type constructor will add
the slots to the object's layout.

instance dict (if there is one)
weak reference list (if there is one)
reference count
ob_type
slot
slot
slot

Note that, because you defined __slots__, the object might not have an
instance dict or weak reference list associated with it.  It might,
though, if one of the base classes had defined an instance dict or
weakref list.  Or, it might not, but then a subclass might have its
onw instance dict or weakref list.  That's why these guys are placed
at a lower address: so that they don't interfere with the layout of
subclasses.  A subclass can add either more slots or a dict.

Object of types defined in C can have arbitrary layout.  For instance,
it could have a layout that looks like this:

reference count
ob_type
PyObject* a
PyObject* b
long c
float d
instance dict

A problem with Python slots and C fields is that, if you want to
inherit from a type with a non-trivial object layout, aside from the
dict and weakrefs, all of the subtypes have to maintain the same
layout (see Liskov substitutability for rationale).

Subtypes can add their own fields or slots if they want, though.  So,
if a Python subtype wants to define its own slots on top of a type
with a non-trivial object layout, it has to know which base has the
largest layout so that it doesn't use the same memory for its own
slots.  Hence tp_base.


Carl Banks



More information about the Python-list mailing list