[pypy-svn] rev 2691 - pypy/trunk/doc/translation

Sat Dec 27 13:02:44 CET 2003

Author: arigo
Date: Sat Dec 27 13:02:42 2003
New Revision: 2691

Modified:
   pypy/trunk/doc/translation/annotation.txt
Log:
A few words about the current annotations implementation.


Modified: pypy/trunk/doc/translation/annotation.txt
==============================================================================

--- pypy/trunk/doc/translation/annotation.txt	(original)
+++ pypy/trunk/doc/translation/annotation.txt	Sat Dec 27 13:02:42 2003
@@ -1,13 +1,15 @@
 The annotation pass
 ===================
 
-Let's assume that the control flow graph building pass can be
-done entierely before the annotation pass.  (See notes at the
-end for why we'd like to mix them.)
+We describe below how a control flow graph can be "annotated"
+to discover the types of the objects.  This annotation pass is
+done after control flow graphs are built by the FlowObjSpace,
+but before these graphs are translated into low-level code
+(e.g. C/Lisp/Pyrex).
 
 
-Factorial
----------
+An example: the factorial
+-------------------------
 
 Say we want to make the control flow graph and type inference
 on the following program::
@@ -44,23 +46,23 @@
 We start type inference on the first block::
 
   Analyse(StartBlock):
-    v1 ----> X1   type(X1)=int
-    v2 ----> X2   type(X2)=bool
+    v1 ----> SV1   type(SV1)=int
+    v2 ----> SV2   type(SV2)=bool
 
 The notation is as follows.  Everything at the right of the arrows
 lives in a "big heap" of objects and annotations; a object is like a
 CPython ``PyObject`` object structure in the heap, althought it can
-here be unknown (X1, X2, X3...).  Annotations give some information
+here be unknown (SV1, SV2, SV3...).  Annotations give some information
 about the unknown heap objects.  The arrows represent binding from
-variables to objects.
+variables to objects.  ``SV`` means ``SomeValue``.
 
 After StartBlock, we proceed to the type inference of its exits;
 first Block2::
 
   Analyse(Block2):
-    v3 ------------> X1   # copied from StartBlock
-    v4 ------------> X4   type(X4)=int
-    v7 ------------> failure
+    v3 ------------> SV1   # copied from StartBlock
+    v4 ------------> SV4   type(SV4)=int
+    v7 ------------> impossiblevalue
 
 It fails at the simple_call to f, because we don't know yet anything
 about the return value of f.  We suspend the analysis of Block2 and
@@ -68,46 +70,47 @@
 from the StackBlock, which is jumping to Block3::
 
   Analyse(Block3):
-    v9 --------> 1    # and we have type(1)=int automatically
+    v9 --------> SV31    # const(SV31)=1, type(SV31)=int
 
+The object SV31 is the constant object 1.
 Then we proceed to ReturnBlock::
 
   Analyse(ReturnBlock):
-    retval --------> 1
+    retval --------> SV31
 
 And we are done.  We can now try to resume the suspended analysis of
 Block2 -- in practice it is easier to just restart it::
 
   Analyse(Block2):
-    v3 ------------> X1   # copied from StartBlock
-    v4 ------------> X4   type(X4)=int
-    v7 ------------> 1    # because that's the retval of f
-    v8 ------------> X8   type(X8)=int  eq(X1,X8)=True  # because X8=X1*1
+    v3 ------------> SV1   # copied from StartBlock
+    v4 ------------> SV4   type(SV4)=int
+    v7 ------------> SV31  # because that's the retval of f
+    v8 ------------> SV8   type(SV8)=int
 
 And now is the second branch into ReturnBlock.  We must combine the
 annotations coming from the two branches::
 
   Intersection(ReturnBlock):
-    previous annotations for retval -------> 1    type(1)=int
-    new annotations for retval ------------> X8   type(X8)=int  eq(X8,X1)=True
-    intersection of both is retval --------> X10  type(X10)=int
+    previous annotations for retval ----> SV31 type(SV31)=int const(SV31)=1
+    new annotations for retval ---------> SV8  type(SV8)=int
+    intersection of both is retval -----> SV10 type(SV10)=int
 
 We invalidate the analysis of the blocks that depend on this new result,
 namely ReturnBlock, which in turn invalidates the analysis of Block2
 which depended on the return value.  Then we can restart it once more::
 
   Analyse(Block2):
-    v3 ------------> X1   # with type(X1)=int
-    v4 ------------> X4   type(X4)=int
-    v7 ------------> X10  # with type(X10)=int
-    v8 ------------> X11  type(X11)=int
+    v3 ------------> SV1   # with type(SV1)=int
+    v4 ------------> SV4   type(SV4)=int
+    v7 ------------> SV10  # with type(SV10)=int
+    v8 ------------> SV11  type(SV11)=int
 
 Again, we must redo the intersection of the two branches
 that enter ReturnBlock::
 
   Intersection(ReturnBlock):
-    previous annotations for retval -------> X10  type(X10)=int
-    new annotations for retval ------------> X11  type(X11)=int
+    previous annotations for retval -------> SV10  type(SV10)=int
+    new annotations for retval ------------> SV11  type(SV11)=int
     intersection doesn't change any more.
 
 Now the annotations are stable, and we are done.  In the final version
@@ -115,24 +118,24 @@
 in retval are properly annotated::
 
   Bindings:
-    v1 ------> X1
-    v2 ------> X2
-    v3 ------> X1
-    v4 ------> X4
-    v7 ------> X10
-    v8 ------> X11
-    v9 ------> 1
-    retval --> X10
+    v1 ------> SV1
+    v2 ------> SV2
+    v3 ------> SV1
+    v4 ------> SV4
+    v7 ------> SV10
+    v8 ------> SV11
+    v9 ------> SV31
+    retval --> SV10
 
   Annotations:
-    type(X1)=int
-    type(X2)=bool
-    type(X4)=int
-    type(X10)=int
-    type(X11)=int
+    type(SV1)=int
+    type(SV2)=bool
+    type(SV4)=int
+    type(SV10)=int
+    type(SV11)=int
 
 The bindings are implemented as a dictionary, and the annotations as
-an AnnSet instance.
+an AnnotationSet instance.  More about it below.
 
 
 Whole-program analysis
@@ -170,29 +173,55 @@
     v3 = simple_call(g, v1, 6)
 
   Analyse(F_StartBlock):
-    v1 -------> X1   type(X1)=list  len(X1)=0  getitem(X1,*)=?
-    v2 -------> crash
+    v1 -------> SV1  type(SV1)=list len(SV1)=0 listitems(SV1)=impossiblevalue
+    v2 -------> impossiblevalue
 
-The ``?`` is a special value meaning ``no analysis information``, and ``*`` is a special catch-all value.  The type analysis fails because of the calls to ``g``, but it triggers the analysis of ``g`` with the input arguments' annotations::
+The annotations about ``SV1`` mean that it is an empty list.  When trying
+to get any item out of it, the result is ``impossiblevalue``, because if we
+try to execute code like ``c=b[0]`` then obviously it is impossible for
+``c`` to contain any value.  It is important not to confuse the two extreme
+values: ``impossiblevalue`` means that no value can ever be found there
+during actual execution, and corresponds to a ``SVx`` about which all
+annotations are still possible.  During the annotation pass, annotations
+about a ``SVx`` can only *decrease*: we can later remove annotations that
+we find to be incorrect, but we don't add new annotations.  Thus an
+``impossiblevalue`` is a value with potentially all annotations at first.
+The other extreme is a ``SVx`` with no annotation at all; it represents
+an object about which we know nothing at all -- and about which nothing
+will be known later either: it means that we have inferred that many
+different objects could be found at that point during execution.
+Typically it shows a problem, e.g. that type inference failed to figure
+out what object type can appear at that point.
+
+Let's come back to the example.  The type analysis above fails at ``v2``
+because of the calls to ``g``, but it triggers the analysis of ``g`` with
+the input arguments' annotations::
 
   G_StartBlock(v4, v5):
     v6 = getattr(v4, 'append')
     v7 = simple_call(v6, v5)
 
   Analyse(G_StartBlock):
-    v4 -------> X1    # from the call above
-    v5 -------> 5     # from the call above
-    v6 -------> X6    im_self(X6)=X1  im_func(X6)=list_append
-    v7 -------> None  REMOVE{len(X1)=0}  getitem(X1,?)=5
-
-Note that the call to list_append corrects the annotations about ``X1``.
-This would invalidate any type inference that would depend on the modified
-annotations.  (Hopefully, we eventually reach a fixpoint; this could be
-enforced by requiring that we can only either remove annotations or give
-a value to a ``?``.)
+    v4 -------> SV1   # from the call above
+    v5 -------> SV35  const(SV35)=5  type(SV35)=int
+    v6 -------> SV6   im_self(SV6)=SV1  im_func(SV6)=list_append
+    v7 -------> SV30  const(SV30)=None
+
+And most importantly the call to list_append corrects the annotations about
+``SV1``.  The annotation ``len(SV1)=0`` is deleted, and ``listitems(SV1)``
+is generalized from ``impossiblevalue`` to ``SV35`` -- this is done by
+the same intersection process as above: we already know that
+``listitems(SV1)`` can be ``impossiblevalue``, and now we figure out that
+it could also be ``SV35``, so we take the intersection of the annotations
+that apply to both ``impossiblevalue`` and ``SV35``.  The result in this
+case is just ``SV35``.
+
+Note that killing annotations like ``len(SV1)=0`` invalidates the inference
+in any block that explicitely depends on it.  Such blocks are marked as
+"to be redone".  (There isn't any such block in the present example.)
 
 Only after this can the analysis of ``F_StartBlock`` proceed, and
-now we know that v1 points to the list ``X1`` with the correct annotations:
+now we know that v1 points to the list ``SV1`` with the correct annotations:
 unknown length, all items are ``5``.
 
 In the above example I also show a second call to ``g(b, 6)``, which
@@ -200,23 +229,23 @@
 previously thought to be used with ``5`` only::
 
   Intersection(G_StartBlock):
-    previous annotations for v5 -------> 5    type(5)=int
-    new annotations for v5 ------------> 6    type(6)=int
-    intersection of both is v5 --------> X5   type(X5)=int
+    previous annotations for v5 -----> SV35 const(SV35)=5 type(SV35)=int
+    new annotations for v5 ----------> SV36 const(SV36)=6 type(SV36)=int
+    intersection of both is v5 ------> SV5                type(SV5)=int
 
-And so this time the list ``X1`` is updated with::
+And so this time the list ``SV1`` is updated with::
 
-    getitem(X1,*)=X5
+    listitems(SV1)=SV5
 
 and now we know that we have a list of integers.
 
-Note that during this whole process the same list is represented by ``X1``.
+Note that during this whole process the same list is represented by ``SV1``.
 This is important, so that any code anywhere that could modify the list
 can kill invalid annotations about it.  Intersection must be clever about
 mutable objects: we have seen above an example where ``retval`` could map
-to ``X10`` or ``X11``, and the intersection said it was fine because they
-had the same annotations.  It would not be fine if ``X10`` and ``X11``
-could be of a mutable type.  In this case we must force ``X10==X11`` for
+to ``SV10`` or ``SV11``, and the intersection said it was fine because they
+had the same annotations.  It would not be fine if ``SV10`` and ``SV11``
+could be of a mutable type.  In this case we must force ``SV10==SV11`` for
 the whole program.  In other words the representation choosen for a list
 depends on all the places where this list could go, and these places
 themselves use a representation that depends on all the lists that could
@@ -227,6 +256,8 @@
 Polymorphism and mixed flowing/inference
 ----------------------------------------
 
+(This paragraph is just an idea, it is not implemented.)
+
 We might eventually mix type inference and control flow generation a bit
 more than described above.  The annotations could influence the generation
 of the graph.
@@ -252,3 +283,84 @@
 insufficently many annotations left.  By contrast, in the factorial
 example above, all merges are fine because they conserve at least the
 ``type(X)=int`` annotation.
+
+
+SomeValue
+---------
+
+An abstract Python object in the heap is represented by an
+instance of ``pypy.annotation.model.SomeValue``.  All these SomeValue()
+instances print as SV0, SV1, SV2... for debugging, but we only
+use their identity internally.
+
+A SomeValue() alone represents an object about which nothing
+is known.  To collect information about such an object we use an
+instance of ``pypy.annotation.annset.AnnotationSet``.  An
+annotation is like an attribute of a SomeValue(); for example, to
+say that an object SV5 is known to be an integer, then we set the
+annotation ``SV5.type = int``.  However, for various good and bad
+reasons, the annotation is not actually stored as an attribute,
+but managed by the AnnotationSet().  The allowed "attributes",
+i.e. the various annotations that exist, are in
+``pypy.annotation.model.ANN``.  Thus for the above example we
+set the annotation ``ANN.type`` of SV5 to ``int``.  This is what
+we wrote in the above examples ``type(SV5) = int``.
+
+Note that unlike previous attempts an annotation is now always
+a "name" (ANN.type) with just two arguments: the subject (SV5) and
+the associated value (int).  Just like with attributes, there are
+never more than one associated value per subject and attribute name.
+But unlike attributes, all annotations have a default value:
+``mostgeneralvalue``, which is a SomeValue() about which nothing
+is known.  The only differences between the ``mostgeneralvalue``
+and a normal SomeValue() with no annotations are that AnnotationSet
+will complain if you try to set annotations to ``mostgeneralvalue``;
+and for convenience reasons ``mostgeneralvalue`` is false in a
+boolean context.
+
+
+AnnotationSet and ANN
+---------------------
+
+AnnotationSet has two main methods: ``get(name, subject)`` to
+read the current annotation ``name`` about ``subject``, and
+``set(name, subject, value)`` to set it.
+
+The meaning of ``value`` depends on the annotation.  In some
+cases it is a usual Python object (int, 3, True...).  In other
+cases it is specifically a SomeValue() instance, on which we
+can recursively have partial information only.  Here are
+a few common annotations:
+
+* ``ANN.type``: the ``value`` is the type (int, list, ...).
+
+* ``ANN.len``: the ``value`` is the length of the object
+  (if known and constant, of course).
+
+* ``ANN.const``: we know that the ``subject`` is a constant
+  object; the ``value`` of ``ANN.const`` is precisely this
+  constant.
+
+* ``ANN.listitems``: the ``value`` is another SomeValue()
+  which stands for any item of the list.  Thus the
+  annotations about this sub-SomeValue() tell what is known
+  in general about all the items in the list.
+
+* ``ANN.tupleitem[index]``: this is a family of annotations.
+  This is one of the reasons why we don't just use attributes
+  to store annotations: the whole expression ``ANN.tupleitem[0]``
+  would be the attribute name.  The expression ``ANN.tupleitem[1]``
+  would be a different attribute name, and so on.  Annotation-wise,
+  ``ANN.tupleitem[i]`` has a ``value`` which is a SomeValue()
+  describing what is known about the item ``i`` of a tuple.
+
+* ``ANN.immutable``: the ``value`` is always ``True``, unless
+  the annotation is not set, in which case it automatically
+  defaults to ``mostgeneralvalue`` (which is considered as false
+  in a boolean context for convenient checking).  When
+  ``ANN.immutable`` is set, it means that the subject is known to
+  be of an immutable type (int, float, tuple...).  This influences
+  the intersection algorithm.
+
+The interection algorithm is implemented in the ``merge()`` method
+of AnnotationSet.