[pypy-dev] Brainstorming at Python UK

Wed Apr 28 11:45:08 CEST 2004

Hi Armin,

[Armin Rigo Tue, Apr 27, 2004 at 10:15:50PM +0100]
> Here is as promized a few words about the "brainstorming" at Python UK about
> the annotation stuff.
> 
> "Annotations" were the mean through which typing information about specific
> variables were stored.  We'd like to replace them with explicit "abstract
> objects", which would be instances of simple classes like:

Maybe it makes sense to talk about low-level information/representation objects 
rather than "abstract objects" which isn't a very expressive term IMO. 

> class SomeObject:
>   "Nothing is known about this value."
> 
> class SomeInteger:
>   def __init__(self, start=-sys.maxint-1, end=sys.maxint+1):
>     self.start = start
>     self.end   = end
> 
> class SomeList:
>   def __init__(self, anyitem):
>     self.anyitem = anyitem     # another SomeXxx instance
> 
> So the "type inference" phase of the translation would take a control flow
> graph as input, as it currently does; then its goal is to associate one
> SomeXxx instance to each variable.  This information can then be used by the
> code generator (Pyrex or Lisp).

Makes sense to me ... 

> A new idea inspired by Starkiller, which is an important difference with the
> old way annotations worked, is that all the SomeXxx instances are always
> immutables.  For example, SomeInteger(0,10) is and will always mean an integer
> in range(0,10).  If later a more general value is found that could arrive into
> the same variable, then the variable gets associated to a new, more general
> instance like SomeInteger(-10,10).  This change triggers a recomputation of
> the type inference, and the new SomeInteger(-10,10) will eventually propagate
> forward to wherever the variable is used.

Actually we have triggered re-computation in previous schemes, too. Keeping
the low level representation objects "virtually immutable" seems like a good
simplifying idea ... if it works ... 

> Here is an example:
> 
>   Block(v1,v2,v3):
>     v4 = v1+v2
>     v5 = v4+v3
>     jump to Block2 with arg (v5)

Ok, i can see this working for variables containing "naturally"
immutable objects like integers, strings and numbers. But how does the
example apply to building a list in a loop?  I am a bit doubtful about a
"virtually immutable" SomeList object unless you intend to use a low-level
representation like e.g.: 

    class SomeGrowingList:
     self.r_items = someitem    # a SomeXxx instance general enough to 
                                # hold all possible items of the list 
     self.r_indexes = SomeInteger(0, sys.maxint-1) 

Is something like this the underlying idea to allow "virtually
immutable" low level representation objects of
would-otherwise-be-mutable objects? 

Note that i guess that marking low level representations with 'r_' or
some such might make sense.  This is similar to what we do  for
app-level representation objects with 'w_' to indicate they are
'wrapped' and only an object space knows how to operate on them. 

> ...
> Having immutable SomeXxx instances even for mutable objects is very useful for
> the object-oriented part of the analysis.  Say that a SomeInstance would
> represent a Python instance, in the old-style way: a __class__ and some
> attributes.  The SomeInstance is created at the point in the flow graph that
> creates the instance:
> 
>    v2 = simple_call(<class X>)
> 
> This would register somewhere that the class X can be instanciated from this
> point in the flow graph.  It would then create and store into v2 a
> SomeInstance of class X, which initially has no known attribute. If an
> attribute is added later on v2, then the SomeInstance detects it is not
> general enough.  It kills the inference process; it records the existence of
> the new attribute in some data structure on the class X; 

It should probably store it in the SomeInstance instance associated to 
class X, right? 

> ...
> It seems that we will end up cancelling and restarting large parts of the
> analysis over and over again this way, but it may not be a big problem in
> practice: we can expect that a lot of information about the attributes will be
> known already after the __init__ call completed.  We may stop and re-analyse
> __init__ itself 7 times consecutively if it defines 7 attributes on self, each
> time progressing a bit further until the next new attribute kills us, but that
> shouldn't be a big problem because it is fairly local (but we'll have to try
> to analyse bigger programs to really know).  This is much cleaner this way
> than with the annotation stuff.

Yes, i agree that it seems so.  But we have had schemes coming and failing so i am eager
to get an idea of how it works for the problem cases (lists, instances, ...) 
we can identify.  Also how to represent exceptions at a lower level and
then translate them to the target language is not clear yet. 

cheers,

    holger