[Python-checkins] CVS: python/nondist/peps pep-0253.txt,1.1,1.2
Guido van Rossum
gvanrossum@users.sourceforge.net
Mon, 14 May 2001 18:36:48 -0700
Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv20830
Modified Files:
pep-0253.txt
Log Message:
Add a lot of text. A looooooot of text. Way too much rambling. And
it isn't even finished. I'll do that later. But at least there's
some text here now...
Index: pep-0253.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0253.txt,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** pep-0253.txt 2001/05/14 13:43:23 1.1
--- pep-0253.txt 2001/05/15 01:36:46 1.2
***************
*** 12,20 ****
This PEP proposes ways for creating subtypes of existing built-in
! types, either in C or in Python.
! Introduction
! [XXX to be done.]
--- 12,375 ----
This PEP proposes ways for creating subtypes of existing built-in
! types, either in C or in Python. The text is currently long and
! rambling; I'll go over it again later to make it shorter.
! Traditionally, types in Python have been created statically, by
! declaring a global variable of type PyTypeObject and initializing
! it with a static initializer. The fields in the type object
! describe all aspects of a Python object that are relevant to the
! Python interpreter. A few fields contain dimensional information
! (e.g. the basic allocation size of instances), others contain
! various flags, but most fields are pointers to functions to
! implement various kinds of behaviors. A NULL pointer means that
! the type does not implement the specific behavior; in that case
! the system may provide a default behavior in that case or raise an
! exception when the behavior is invoked. Some collections of
! functions pointers that are usually defined together are obtained
! indirectly via a pointer to an additional structure containing.
! While the details of initializing a PyTypeObject structure haven't
! been documented as such, they are easily glanced from the examples
! in the source code, and I am assuming that the reader is
! sufficiently familiar with the traditional way of creating new
! Python types in C.
!
! This PEP will introduce the following optional features to types:
!
! - create an instance of a type by calling it
!
! - create a subtype in C by specifying a base type pointer
!
! - create a subtype in Python using a class statement
!
! - multiple inheritance
!
! This PEP builds on PEP 252, which adds standard introspection to
! types; in particular, types are assumed to have e.g. a __hash__
! method when the type object defines the tp_hash slot. PEP 252 also
! adds a dictionary to type objects which contains all methods. At
! the Python level, this dictionary is read-only; at the C level, it
! is accessible directly (but modifying it is not recommended except
! as part of initialization).
!
!
! Metatypes
!
! Inevitably the following discussion will come to mention metatypes
! (or metaclasses). Metatypes are nothing new in Python: Python has
! always been able to talk about the type of a type:
!
! >>> a = 0
! >>> type(a)
! <type 'int'>
! >>> type(type(a))
! <type 'type'>
! >>> type(type(type(a)))
! <type 'type'>
! >>>
!
! In this example, type(a) is a "regular" type, and type(type(a)) is
! a metatype. While as distributed all types have the same metatype
! (which is also its own metatype), this is not a requirement, and
! in fact a useful 3rd party extension (ExtensionClasses by Jim
! Fulton) creates an additional metatype. A related feature is the
! "Don Beaudry hook", which says that if a metatype is callable, its
! instances (which are regular types) can be subclassed (really
! subtyped) using a Python class statement. We will use this rule
! to support subtyping of built-in types, and in the process we will
! introduce some additional metatypes, and a "metametatype". (The
! metametatype is nothing unusual; Python's type system allows any
! number of metalevels.)
!
! Note that Python uses the concept of metatypes or metaclasses in a
! different way than Smalltalk. In Smalltalk-80, there is a
! hierarchy of metaclasses that mirrors the hierarchy of regular
! classes, metaclasses map 1-1 to classes (except for some funny
! business at the root of the hierarchy), and each class statement
! creates both a regular class and its metaclass, putting class
! methods in the metaclass and instance methods in the regular
! class.
!
! Nice though this may be in the context of Smalltalk, it's not
! compatible with the traditional use of metatypes in Python, and I
! prefer to continue in the Python way. This means that Python
! metatypes are typically written in C, and may be shared between
! many regular types. (It will be possible to subtype metatypes in
! Python, so it won't be absolutely necessary to write C in order to
! use metatypes; but the power of Python metatypes will be limited,
! e.g. Python code will never be allowed to allocate raw memory and
! initialize it at will.)
!
!
! Instantiation by calling the type object
!
! Traditionally, for each type there is at least one C function that
! creates instances of the type. This function has to take care of
! both allocating memory for the object and initializing that
! memory. As of Python 2.0, it also has to interface with the
! garbage collection subsystem, if the type chooses to participate
! in garbage collection (which is optional, but strongly recommended
! for so-called "container" types: types that may contain arbitrary
! references to other objects, and hence may participate in
! reference cycles).
!
! If we're going to implement subtyping, we must separate allocation
! and initialization: typically, the most derived subtype is in
! charge of allocation (and hence deallocation!), but in most cases
! each base type's initializer (constructor) must still be called,
! from the "most base" type to the most derived type.
!
! But let's first get the interface for instantiation right. If we
! call an object, the tp_call slot if its type gets invoked. Thus,
! if we call a type, this invokes the tp_call slot of the type's
! type: in other words, the tp_call slot of the metatype.
! Traditionally this has been a NULL pointer, meaning that types
! can't be called. Now we're adding a tp_call slot to the metatype,
! which makes all types "callable" in a trivial sense. But
! obviously the metatype's tp_call implementation doesn't know how
! to initialize individual types. So the type defines a new slot,
! tp_construct, which is invoked by the metatype's tp_call slot. If
! the tp_construct slot is NULL, the metatype's tp_call issues a
! nice error message: the type isn't callable.
!
! We already know that tp_construct is responsible for initializing
! the object (this will be important for subtyping too). Who should
! be responsible for allocation of the new object? Either the
! metatype's tp_call can allocate the object, or the type's
! tp_construct can allocate it. The solution is copied from typical
! C++ implementations: if the metatype's tp_call allocates storage
! for the object it passes the storage as a pointer to the type's
! tp_construct; if the metatype's tp_call does not allocate storage,
! it passes a NULL pointer to the type's tp_call in which case the
! type allocates the storage itself. This moves the policy decision
! to the metatype, and different metatypes may have different
! policies. The mechanisms are fixed though: either the metatype's
! tp_call allocates storage, or the type's tp_construct allocates.
!
! The deallocation mechanism chosen should match the allocation
! mechanism: an allocation policy should prescribe both the
! allocation and deallocation mechanism. And again, planning ahead
! for subtyping would be nice. But the available mechanisms are
! different. The deallocation function has always been part of the
! type structure, as tp_dealloc, which combines the
! "uninitialization" with deallocation. This was good enough for
! the traditional situation, where it matched the combined
! allocation and initialization of the creation function. But now
! imagine a type whose creation function uses a special free list
! for allocation. It's deallocation function puts the object's
! memory back on the same free list. But when allocation and
! creation are separate, the object may have been allocated from the
! regular heap, and it would be wrong (in some cases disastrous) if
! it were placed on the free list by the deallocation function.
!
! A solution would be for the tp_construct function to somehow mark
! whether the object was allocated from the special free list, so
! that the tp_dealloc function can choose the right deallocation
! method (assuming that the only two alternatives are a special free
! list or the regular heap). A variant that doesn't require space
! for an allocation flag bit would be to have two type objects,
! identical in the contents of all their slots except for their
! deallocation slot. But this requires that all type-checking code
! (e.g. the PyDict_Check()) recognizes both types. We'll come back
! to this solution in the context of subtyping. Another alternative
! is to require the metatype's tp_call to leave the allocation to
! the tp_construct method, by passing in a NULL pointer. But this
! doesn't work once we allow subtyping.
!
! Eventually, when we add any form of subtyping, we'll have to
! separate deallocation from uninitialization. The way to do this
! is to add a separate slot to the type object that does the
! uninitialization without the deallocation. Fortunately, there is
! already such a slot: tp_clear, currently used by the garbage
! collection subsystem. A simple rule makes this slot reusable as
! an uninitialization: for types that support separate allocation
! and initialization, tp_clear must be defined (even if the object
! doesn't support garbage collection) and it must DECREF all
! contained objects and FREE all other memory areas the object owns.
! It must also be reentrant: it must be possible to clear an already
! cleared object. The easiest way to do this is to replace all
! pointers DECREFed or FREEd with NULL pointers.
!
!
! Subtyping in C
!
! The simplest form of subtyping is subtyping in C. It is the
! simplest form because we can require the C code to be aware of the
! various problems, and it's acceptable for C code that doesn't
! follow the rules to dump core; while for Python subtyping we would
! need to catch all errors before they become core dumps.
!
! The idea behind subtyping is very similar to that of single
! inheritance in C++. A base type is described by a structure
! declaration plus a type object. A derived type can extend the
! structure (but must leave the names, order and type of the fields
! of the base structure unchanged) and can override certain slots in
! the type object, leaving others the same.
!
! Not every type can serve as a base type. The base type must
! support separation of allocation and initialization by having a
! tp_construct slot that can be called with a preallocated object,
! and it must support uninitialization without deallocation by
! having a tp_clear slot as described above. The derived type must
! also export the structure declaration for its instances through a
! header file, as it is needed in order to derive a subtype. The
! type object for the base type must also be exported.
!
! If the base type has a type-checking macro (e.g. PyDict_Check()),
! this macro may be changed to recognize subtypes. This can be done
! by using the new PyObject_TypeCheck(object, type) macro, which
! calls a function that follows the base class links. There are
! arguments for and against changing the type-checking macro in this
! way. The argument for the change should be clear: it allows
! subtypes to be used in places where the base type is required,
! which is often the prime attraction of subtyping (as opposed to
! sharing implementation). An argument against changing the
! type-checking macro could be that the type check is used
! frequently and a function call would slow things down too much
! (hard to believe); or one could fear that a subtype might break an
! invariant assumed by the support functions of the base type.
! Sometimes it would be wise to change the base type to remove this
! reliance; other times, it would be better to require that derived
! types (implemented in C) maintain the invariants.
!
! The derived type begins by declaring a type structure which
! contains the base type's structure. For example, here's the type
! structure for a subtype of the built-in list type:
!
! typedef struct {
! PyListObject list;
! int state;
! } spamlistobject;
!
! Note that the base type structure field (here PyListObject) must
! be the first field in the structure; any following fields are
! extension fields. Also note that the base type is not referenced
! via a pointer; the actual contents of its structure must be
! included! (The goal is for the memory lay out of the beginning of
! the subtype instance to be the same as that of the base type
! instance.)
!
! Next, the derived type must declare a type object and initialize
! it. Most of the slots in the type object may be initialized to
! zero, which is a signal that the base type slot must be copied
! into it. Some fields that must be initialized properly:
!
! - the object header must be filled in as usual; the type should be
! PyType_Type
!
! - the tp_basicsize field must be set to the size of the subtype
! instances
!
! - the tp_base field must be set to the address of the base type's
! type object
!
! - the tp_dealloc slot function must be a deallocation function for
! the subtype
!
! - the tp_flags field must be set to the usual Py_TPFLAGS_DEFAULT
! value
!
! - the tp_name field must be set (otherwise it will be inherited,
! which is wrong)
!
! Exception: if the subtype defines no additional fields in its
! structure (i.e., it only defines new behavior, no new data), the
! tp_basicsize and the tp_dealloc fields may be set to zero. In
! order to complete the initialization of the type,
! PyType_InitDict() must be called. This replaces zero slots in the
! subtype with the value of the corresponding base type slots. It
! also fills in tp_dict, the type's dictionary; this is more a
! matter of PEP 252.
!
! The subtype's tp_dealloc slot deserves special attention. It must
! uninitialize and deallocate the object in an orderly manner: first
! it must uninitialize the fields added by the extension type; then
! it must call the base type's tp_clear function; finally it must
! deallocate the memory of the object. Usually, the base type's
! tp_clear function has no global name; it is permissible to call it
! via the base type's tp_clear slot, e.g. PyListType.tp_clear(obj).
! Only if it is known that the base type uses the same allocation
! method as the subtype and the subtype requires no uninitialization
! (e.g. it adds no data fields or all its data fields are numbers)
! is it permissible to leave tp_dealloc set to zero in the subtype's
! type object; it will be copied from the base type.
!
! A subtype is not usable until PyType_InitDict() is called for it;
! this is best done during module initialization, assuming the
! subtype belongs to a module. An alternative for subtypes added to
! the Python core (which don't live in a particular module) would be
! to initialize the subtype in their constructor function. It is
! allowed to call PyType_InitDict() more than once, the second and
! further calls have no effect. In order to avoid unnecessary
! calls, a test for tp_dict==NULL can be made.
!
! If the subtype itself should be subtypable (usually desirable), it
! should follow the same rules are given above for base types: have
! a tp_construct that accepts a preallocated object and calls the
! base type's tp_construct, and have a tp_clear that calls the base
! type's tp_clear.
!
!
! Subtyping in Python
!
! The next step is to allow subtyping of selected built-in types
! through a class statement in Python. Limiting ourselves to single
! inheritance for now, here is what happens for a simple class
! statement:
!
! class C(B):
! var1 = 1
! def method1(self): pass
! # etc.
!
! The body of the class statement is executes in a fresh environment
! (basically, a new dictionary used as local namespace), and then C
! is created. The following explains how C is created.
!
! Assume B is a type object. Since type objects are objects, and
! every object has a type, B has a type. B's type is accessible via
! type(B) or B.__class__ (the latter notation is new for types; it
! is introduced in PEP 252). Let's say B's type is M (for
! Metatype). The class statement will create a new type, C. Since
! C will be a type object just like B, we view the creation of C as
! an instantiation of the metatype, M. The information that needs
! to be provided for the creation of C is: its name (in this example
! the string "C"); the list of base classes (a singleton tuple
! containing B); and the results of executing the class body, in the
! form of a dictionary (e.g. {"var1": 1, "method1": <function...>,
! ...}).
!
! According to the Don Beaudry hook, the following call is made:
!
! C = M("C", (B,), dict)
!
! (where dict is the dictionary resulting from execution of the
! class body). In other words, the metatype (M) is called. Note
! that even though we currently require there to be exactly one base
! class, we still pass in a (singleton) sequence of base classes;
! this makes it possible to support multiple inheritance later (or
! for types with a different metaclass!) without changing this
! interface.
!
! Note that calling M requires that M itself has a type: the
! meta-metatype. In the current implementation, I have introduced a
! new type object for this purpose, named turtle because of my
! fondness of the phrase "turtles all the way down". However I now
! believe that it would be better if M were its own metatype, just
! like before. This can be accomplished by making M's tp_call slot
! slightly more flexible.
!
! In any case, the work for creating C is done by M's tp_construct
! slot. It allocates space for an "extended" type structure, which
! contains space for: the type object; the auxiliary structures
! (as_sequence etc.); the string object containing the type name (to
! ensure that this object isn't deallocated while the type object is
! still referencing it); and some more auxiliary storage (to be
! described later). It initializes this storage to zeros except for
! a few crucial slots (e.g. tp_name is set to point to the type
! name) and then sets the tp_base slot to point to B. Then
! PyType_InitDict() is called to inherit B's slots. Finally, C's
! tp_dict slot is updated with the contents of the namespace
! dictionary (the third argument to the call to M).