[Python-3000] A better way to initialize PyTypeObject

Wed Nov 29 08:23:38 CET 2006

Guido van Rossum wrote:
> On 11/28/06, Talin <talin at acm.org> wrote:
>> Guido van Rossum wrote:
>> > Some comments:
>> >
>> > - Fredrik's solution makes one call per registered method. (I don't
>> > know if the patch he refers to follows that model.) That seems a fair
>> > amount of code for an average type -- I'm wondering if it's too early
>> > to worry about code bloat (I don't think the speed is going to
>> > matter).
>>
>> One other thought: The special constants could themselves be nothing
>> more than the offset into the PyTypeObject struct, i.e.:
>>
>>     #define SPECMETHOD_NEW ((const char*)offsetof(PyTypeObject,tp_new))
> 
> I think this would cause too many issues with backwards compatibility.
> 
> I like the idea much better to use special names (e.g. starting with a 
> ".").
> 
>> In the PyType_Ready code, you would see if the method name had a value
>> of less than sizeof(PyTypeObject); If so, then it's a special method
>> name, and you fill in the struct at the specified offset.
>>
>> So the interpretation of the table could be very simple and fast. It has
>> a slight disadvantage from the approach of using actual string names for
>> special methods, in that it doesn't allow the VM to silently
>> promote/demote methods to 'special' status.
> 
> I think the interpretation will be fast enough (or else what you said
> about premature optimization earlier wouldn't be correct. :-)

OK, based on these comments and the other feedback from this thread, 
here's a more concrete proposal:

== Method Table ==

Method definitions are stored in a static table, identical in format to 
the existing PyMethodDef table.

For non-method initializers, the most commonly-used ones will be passed 
in as parameters to the type creation function. Those that are less 
commonly used can be written in as a secondary step after the type has 
been created, or in some cases represented in the tp_members table.

== Method Names ==

As suggested by Guido, we use a naming convention to determine how a 
method in the method table is handled. I propose that methods be divided 
into three categories, which are "Normal", "Special", and "Internal" 
methods, and which are interpreted slightly differently at type 
initialization time.

* Internal methods are those that have no equivalent Python name, such 
as tp_free/tp_alloc. Internal methods names start with a dot ("."), so 
tp_alloc would be represented by the string ".tp_alloc".

Internal methods are always stored into a slot in the PyTypeObject. If 
there is no corresponding slot for a given name, that is a runtime error.

* Special methods have the double-underscore (__special__) naming 
convention. A special method may or may not have a slot definition in 
PyTypeObject. If there is such a slot, the method pointer will be stored 
into it; If there is no such slot, then the method pointer is stored 
into the class dict just like a normal method.

Because the decision whether to put the method into a slot is made by 
the VM, the set of available slots can be modified in future Python 
releases without breaking existing code.

* Normal methods are any methods that are neither special or internal. 
They are not placed in a slot, but are simply stored in the class dict.

Brett Cannon brought up the point about __getitem__ being ambiguous, 
since there are two slots, one for lists and one for mappings. This is 
handled as follows:

The "mapping" version of __getitem__ is a special method, named 
"__getitem__". The "list" version, however, is considered an internal 
method (since it's more specialized), and has the name ".tp_getitem".

Greg Ewing's point about "next" is handled as follows: A function named 
"next" will never be treated as a special method name, since it does not 
follow the naming convention of either internal or special names. 
However, if you want to fill in the "tp_next" slot of the PyTypeObject, 
you can use the string ".tp_next" rather than "next".

== Type Creation ==

For backwards compatibility, the existing PyType_Ready function will 
continue to work on statically-declared PyTypeObject structures. A new 
function, 'PyType_Create' will be added that creates a new type from the 
input parameters and the method initialization tables as described 
previously. The actual type record may be allocated dynamically, as 
suggested by Greg Ewing.

Structures such as tp_as_sequence which extend the PyTypeObject will be 
created as needed, if there are any methods that require those extension 
structures.

== Backwards compatibility ==

The existing PyType_Ready and C-style static initialization mechanism 
will continue to work - the new method for type creation will coexist 
alongside the old.

It is an open question as to whether PyType_Ready should attempt to 
interpret the special method names and fill in the PyTypeObject slots. 
If it does, then PyType_Create can all PyType_Ready as a subroutine 
during the type creation process.

Otherwise, the only modifications to the interpreter will be the 
creation of the new PyType_Create function and any required subroutines. 
Existing code should be unaffected.

-- Talin