[pypy-dev] Objects and types in the stdobjspace

Armin Rigo arigo at tunes.org
Sat Jun 7 15:00:50 CEST 2003


Hello Holger,

On Sat, Jun 07, 2003 at 08:51:51AM +0200, holger krekel wrote:
> Btw, could someone from the Sweden sprint crew post some
> notes about how the current pypy-source tree can be used? What is
> expected to work (or if that list is shorter: what does not work :-).

I would like to put this information in the Wiki, but I begin to feel confused
with the page names. There are quite a lot of pages with similar intent but
different names or vice-versa, and the name "sprint" generally refers to the
goals or results of one of the sprints, without any logic to know which one.  
Just browse RecentChanges to see what I mean.

We should really name the sprints, e.g. HildesheimSprint and GothenburgSprint.
Can we rename or delete pages ? (Maybe via the svn repository containing the
wiki instance ?)

> - how the new test-machinery works so i can make sure that i 
>   don't break big stuff while trying to fix/play around 

pypy/testall.py runs all the tests, using the object space specified in the
environment variable OBJSPACE -- which must be exactly
pypy.objspace.std.StdObjSpace to use the standard object space; the trivial
object space is used otherwise.

pypy/testwice.py is a hack around the previous one to run all the tests in 
both object spaces.

The individual .../test/test_xxx.py files are still runnable. There is a 
testsupport.py file in all test directories for glueing purposes; running it 
directly should execute all the tests in that directory.

> - a recap of the current StdObjsSpace registration/multimethod
>   mechanisms.

The W_XxxObject classes of the standard object space are, precisely,
*implementations* of objects. This is not the same as the *type* of the
object. It is possible to provide several implementations for the same type
(the user should not see the difference, though). Typical uses of this would
be to hide the int/long disctinction to the user altogether, or more
interestingly to provide more efficient versions of the data structures like
string, list or dict when they become large. 

For example, a complex string implementation can allow constant-time
concatenation. (This would allow algorithms that build a long string to use +=
instead of the less readable trick of storing small strings in a list and
calling ''.join(list) at the end.)  But the simple strings of CPython are much
better for the typical short strings, hence we really need both.

The register() method of a multimethod registers a function that accepts
arguments with a particular *implementation*, and not a particular *type*.  
This is clear in your type_repr() :

>     def type_repr(space, w_obj):
>         return space.wrap("<type '%s'>" % w_obj.typename)

This is the repr() for W_TypeObject's, i.e. for types that are implemented as
a W_TypeObject instance. The W_TypeObject class defines the 'typename'
attribute, so you can read it there. If we had another different
implementation for types the above type_repr() would not apply to it.

There is no way to dispatch a multimethod based on the type of an argument,
but so far it seems that in most cases you really want a specific
implementation anyway. What it seems that we may need at some places is type
checking (not dispatching), to enforce the type of an object independently of
any particular implementation. BTW the same is true for the built-ins that we
implement in Python; right now they do no type checking (they might fail with
a TypeError in the middle of the code, just like any Python function receiving
ill-typed arguments); this is unlike CPython's builtins, which are very
strict.

Let's come back to multimethods. All operators are multimethods, but there are
also type-specific multimethods (like list.append()) defined in the W_XxxType
class. This allows the type to provide several implementations of the method
(typically, if there are several list implementations, each one must define
its own implementation of append()).

As an exception to what I just said, the type of an argument of a multimethod
is actually used for the dispatch in the case where the user refers to this
multimethod as a (bound or unbound) Python method. For example, when we write
'int.__add__(x, y)', the first argument must be an int object (by Python's
unbound methods rule) and so the multimethod only dispatches to the
implementations that have as a first argument some W_XxxObject instance
implementing an int (or a subclass of int). Similarily, when we use
'list.append(lst,obj)' we create an unbound method which corresponds to the
'list_append' multimethod, with the additional condition that the first
(implicit self) argument is constrained to be of type list. The more common
idiom 'lst.append(obj)' does the same, with a bound method instead. (BTW, I
think there is no support for unbound methods in instmethobject.py right now).

In the arguments to register(), the name W_ANY (a synonym for W_Object) means
that anything at this position is fine. This can also be used to write default
implementations that will be called if the multimethod cannot find a more
specific implementation (or if the more specific implementation raises a
FailedToImplement exception). There are examples of that in the file
default.py. Another example would be to provide default type-specific method
implementations that would work for any implementation of the type, which
would be used unless a given implementation provides a more efficient version.
For example :

def any_list_extend(space, w_list, w_otherlist):
    return space.inplace_add(w_list, w_otherlist)

W_ListType.list_extend.register(any_list_extend, W_ANY, W_ANY)

Note that the first argument is not W_ListObject. It still does not mean that
all objects in the world will automatically get an extend() method that
defaults to +=, because the only way to access the list_extend multimethod is
via a bound or unbound method of the list type, restricting its usage to
objects of type list. (The four lines above should be put in listtype.py, not
in listobject.py.)

The getattr multimethod is a bad example that needs to be fixed at some time:
we generally dispatch on the second argument ('w_attr') being W_ANY, but in 
general what we want is a string, and one that we can immediately read and 
compare with other strings, so that what we want is generally a W_StringObject 
and nothing else.

The whole thing still has rough edges. I feel that the inheritance hierarchy
among the classes in objspace/std is confusing (it does not correspond in any
way to the inheritance of Python types). I would also like to investigate the
possibility to dissociate the "container overhead" (the ob_type and
ob_refcount fields in CPython) and the actual object implementation (say,
ob_ival for integers). I am not too sure but I feel that it would allow us to
do some clean-up and could result in interesting features, like an
implementation of "lists of similar objects" that could simply pack the
object's contents in a big array without the ob_type+ob_refcount overhead
(e.g. packing a list of int objects results in a list of structures containing
a single ob_ival field, i.e. an efficient packed array of ints). Yes, I'm
dreaming about obsoleting the 'array' module as well :-)


> - any "entry points" other than interactive.py?

There is main.py in the same directory. Its purpose is that 'python main.py
script-and-options' should be the same as 'pypy script-and-options' if we had
a working 'pypy' program. I guess that main.py should invoke interactive.py
when started with no argument (it doesn't right now).

> - is loading of python-modules of the underlying python version
>   supported? 

Yes, althought there are few modules from the standard library that we can
actually import. And site.py is not automatically loaded, which I guess is the
reason why (our application-level) sys.path does not contain the current
directory, but you can add it (importing sys and manipulating a list work
fine). We also have a problem with exception handling: when application-level
code raises an exception, PyPy tries to import (at application-level again)  
the types module, which fails and raises an exception again, which only
confuses things more. I think Michael has done something about it; if not this
particular point should be fixed by removing dependencies to the types module
in opcode_app.py. (The necessary type objects could be obtained differently;  
at least for now I'm fine with the idea of simply pushing them into
opcode_app.py after we load it.)

> Is the distinction especially between 'W_TypeObject' and 'W_TypeType' 
> really "right"? I thought that these two sort of fall together. 

No: the instances of W_TypeObject are types. There are a lot of them. But the 
single instance of W_TypeType is 'type'.

> I also noticed that string representations aka 
> 
>     "with'mixed'quotes" 
> 
> always result in 
> 
>     'with'mixed'quotes'
> 
> but i wasn't sure how to fix this. Any hints? 

That's in stringobject.py. str_repr() does no quoting at all, really. Here too
Michael already replaced it with another equally bad implementation which
however has the advantage of producing the correct result. The correct thing
to do is to rewrite the quoting algorithm from stringobject.c.


A bientôt,

Armin


More information about the Pypy-dev mailing list