[Python-Dev] Re: opcode performance measurements

Sat, 2 Feb 2002 03:25:05 +0100

From: Jeremy Hylton <jeremy@zope.com>
> >>>>> "SP" == Samuele Pedroni <pedronis@bluewin.ch> writes:
...
>   SP> one can try to guess the slots of a class looking for the
>   SP> "self.attr" pattern at compile time in a more or less clever
>   SP> way.  The set of compile-time guessed attrs will be passed to
>   SP> MAKE_CLASS which will construct the runtime guess using the
>   SP> union of the super-classes guesses and the compile time guess
>   SP> for the class.  This information can be used to layout a dlict.
> 
> Right!  There's another step necessary to take advantage though.  When
> you execute a method you don't know the receiver type
> (self.__class__).  So you need to specialize the bytecode to a
> particular receiver the first time the method is called.  Since this
> could be relatively expensive and you don't know how often the method
> will be executed, you need to decide dynamically when to do it.  Just
> like HotSpot.

Right, because with multiple inheritance you cannot make the layout
of a subclass compatible with that of *all* superclasses, so simple
monomorphic inline caches will not work :(.
OTOH you can use polymorphic inline cachesm, that means
a bunch of class->index lines for each bytecode or
not specialize the bytecode but (insane idea) choose
on method entry a different bunch of cache-lines based on self class.

> We probably have to worry about a class or instance being modified in
> a way that invalidates the dlict offsets computed.  (Not sure here,
> but I think that's the case.)  If so, we probably need a different
> object -- call it a template -- that represents the concrete layout
> and is tied to unmodified concrete class.  When objects or classes are
> modified in dangerous ways, we'd need to invalidate the template
> pointer for the affected instances.

This would be similar to the Self VM map concept (although python
is type/class based because of the very dynamic nature of instances it
has similar problems to prototype based languages).

I don't know if we need that and if it can be implemented effectively,
I considered that too during my brainstorming.

AFAIK caching/memoization plays an important role in all
high perf dynamic object languages impls. Abstractly it seems
effective for Python too, but it is unclear if the complexity
of the internal models will render it ineffective.

With caching you can probably simply timestamp classes,
when a class is changed structurally you increment its
timestamp and that of all direct and inderect subclasses,
you don't touch instances. Then you compare
the cached timestamp with that of instance class to
check if the entry is valid.

The tricky part is that in python an instance attribute
can be added at any point that shadows a class
attribute. I don't know if there are open issues,
but an approach would be in that case to increment
the timestamp of the instance classe too.

The problem is that there are so many cases and
situations, that's why the multi-staged cache-lines
approach in theory makes some sense 
but could be anyway
totally ineffective in practice <wink>.

These are all interesting topics, although from these
more or less informal discussions to results there is
a lot of details and code :(.

But already improving the global lookup thing
would be a good step.

Hope this makes some kind of sense. Samuele.