[pypy-svn] r29252 - pypy/dist/pypy/doc

Fri Jun 23 16:49:36 CEST 2006

Author: mwh
Date: Fri Jun 23 16:49:35 2006
New Revision: 29252

Modified:
   pypy/dist/pypy/doc/translation.txt
Log:
more stuff for translation.txt, including stuff stolen from the WP07 report
draft.


Modified: pypy/dist/pypy/doc/translation.txt
==============================================================================

--- pypy/dist/pypy/doc/translation.txt	(original)
+++ pypy/dist/pypy/doc/translation.txt	Fri Jun 23 16:49:35 2006
@@ -139,8 +139,8 @@
 inference.  It operates on the control flow graphs built by the Flow
 Object Space.
 
-For a more comprehensive description of the annotation process, see
-sections XXX of `Compiling Dynamic Language Implementations`_.
+For a more comprehensive description of the annotation process, see section 4
+of `Compiling Dynamic Language Implementations`_.
 
 The major goal of the annotator is to "annotate" each variable that
 appears in a flow graph.  An "annotation" describes all the possible
@@ -311,14 +311,138 @@
 Backend Optimizations
 ---------------------
 
-Inlining, malloc removal, ...
+The point of the backend optimizations are to make the compiled program run
+faster.  Compared to many parts of the PyPy translator, which are very unlike
+a traditional compiler, most of these will be fairly familiar to people who
+know how compilers work.
+
+Function Inlining
++++++++++++++++++
+
+To reduce the overhead of the many funtion calls that occur when running the
+PyPy interpreter we implemented function inlining. This is an optimization
+which takes a flow graph and a callsite and inserts a copy of the flow graph
+into the graph of the calling function, renaming occuring variables as
+appropriate. This leads to problems if the original function was surrounded by
+a ``try: ... except: ...`` guard. In this case inlining is not always
+possible.  If the called function is not directly raising an exception (but an
+exception is potentially raised by further called functions) inlining is safe,
+though.
+
+In addition we also implemented heuristics which function to inline where. For
+this purpose we assign every function a "size". This size should roughly
+correspond to the increase in code-size which is to be expected should the
+function be inlined somewhere. This estimate is the sum of two numbers: for
+one every operations is assigned a specific weight, the default being a weight
+of one. Some operations are considered to be more effort than others,
+e.g. memory allocation and calls others are considered to be no effort at all
+(casts...). The size estimate is for one the sum of the weights of all
+operations occuring in the graph. This is called the "static instruction
+count". The other part of the size estimate of a graph is the "median
+execution cost". This is again the sum of the weight of all operations in the
+graph, but this time weighted with a guess how often the operation is
+executed. To arrive at this guess we assume that at every branch we take both
+paths equally often, except for branches that are the end of loops, where the
+jump back to the end of the loop is considered more likely.  This leads to a
+system of equations which can be solved to get approximate weights for all
+operations.
+
+After the size estimate for all function has been determined, functions are
+being inlined into their callsites, starting from the smallest functions. Every
+time a function is being inlined into another function, the size of the outer
+function is recalculated. This is done until the remaining functions all have a
+size greater than a predefined limit.
+
+Malloc Removal
+++++++++++++++
+
+Since RPython is a garbage collected language there is a lot of heap memory
+allocation going on all the time, which would either not occur at all in a more
+traditional explicitely managed language or results in an object which dies at
+a time known in advance and can thus be explicitely deallocated. For example a
+loop of the following form::
+
+    for i in range(n):
+        ...
+
+which simply iterates over all numbers from 0 to n - 1 is equivalent to the
+following in Python::
+
+    l = range(n)
+    iterator = iter(n)
+    try:
+        while 1:
+            i = iterator.next()
+            ...
+    except StopIteration:
+        pass
+
+Which means that three memory allocations are executed: The range object, the
+iterator for the range object and the StopIteration instance, which ends the
+loop.
+
+After a small bit of inlining all these three objects are never even passed as
+arguments to another function and are also not stored into a globally reachable
+position. In such a situation the object can be removed (since it would die
+anyway after the function returns) and can be replaced by its contained values.
+
+This pattern (an allocated object never leaves the current function and thus
+dies after the function returns) occurs quite often, especially after some
+inlining has happened. Therefore we implemented an optimization which
+"explodes" objects and thus saves one allocation in this simple (but quite
+common) situation.
+
+Escape Analysis and Stack Allocation
+++++++++++++++++++++++++++++++++++++
+
+Another technique to reduce the memory allocation penalty is to use stack
+allocation for objects that can be proved not to life longer than the stack
+frame they have been allocated in. If this is the case it is possible to
+allocate the object on the stack. This makes allocation faster, since stack
+allocation is just the increase of a pointer, and makes deallocation basically
+free since deallocation happens automatically when the function returns.
+Therefore we wrote an analysis, which analyses which malloc positions lead to
+mallocs which "escape" the current function, e.g. have references to them
+stored into a place where they can be accessed by something outside of the
+stack of frames starting with the frame where the malloc occured.
+
+For this we choose a naive, pessimistic approach (XXX reference). The analysis
+assumes that an object escapes if one of the following situation occurs:
+
+  * the object is returned
+  
+  * the object is raised as an exception
+  
+  * the object is stored into a field of any another object
+  
+The algorithm uses abstract interpretation together with a fix point search to
+find a solution.
+
+After using the escape analysis to find malloc sites that don't escape, we
+replace the mallocs by stack allocations. This cannot be done in all cases,
+namely if the allocated object is variable-sized or if the allocation occurs in
+a loop. Both cases should be avoided because they make stack overflows more
+likely. Also objects that have a finalizer cannot be allocated on the stack,
+since the finalizer might resurrect the object.
+
+The resulting performance improvements by this optimization were quite
+poor. We think that this is due to the fact that the Boehm garbage
+collector becomes slower when the stack is bigger, thus compensating
+any speed improvement achieved by having faster allocation. We did not
+implement stack allocation with any of the other GCs that PyPy can
+use.
 
 The Stackless Transform
 -----------------------
 
-XXX write this bit
+(this section is very incomplete currently)
 
-.. or steal it from Carl...
+The stackless transform converts functions into a form that knows how
+to save the execution point and active variables into a heap structure
+and resume execution at that point.  This is used to implement
+coroutines as an RPython-level feature, which in turn are used to
+implement `coroutines, greenlets and tasklets`_ as an application
+level feature for the Standard Interpreter.
 
 .. _`preparing the graphs for source generation`:
 
@@ -342,16 +466,68 @@
 Making Exception Handling Explicit
 ----------------------------------
 
-XXX
+RPython code is free to use exceptions in much the same way as unrestricted
+Python, but the final result is a C program, and C has no concept of
+exceptions.  The exception transformer implements exception handling in a
+similar way to CPython: exceptions are indicated by special return values and
+the current exception is stored in a global data structure.
+
+In a sense the input to the exception transformer is a program in terms of the
+lltypesystem_ with exceptions and the output is a program in terms of the bare
+lltypesystem.
+
+.. _lltypesystem: glossary.html#lltypesystem
 
 Memory Management Details
 -------------------------
 
-Three options:
+As well as featuring exceptions, RPython is a garbage collected language;
+again, C is not.  To square this circle, decisions about memory management
+must be made.  In keeping with PyPy's approach to flexibility, there is
+freedom to change how to do it.  There are three approaches implemented today:
 
  - reference counting
- - Boehm GC
- - our own frameworks
+ - using the `Boehm-Demers-Weiser conservative garbage collector`_
+ - using a mark and sweep collector implemented in RPython
+
+.. _`Boehm-Demers-Weiser conservative garbage collector`: http://www.hpl.hp.com/personal/Hans_Boehm/gc/
+
+Almost all application-level Python code allocates objects at a very fast
+rate; this means that the memory management implementation is critical too the
+performance of the PyPy interpreter.  That said, work so far has many focussed
+on flexibility and robustness, not performance.
+
+Reference Counting
+++++++++++++++++++
+
+`Reference counting`_ is a well-known and conceptually simple approach to
+memory management.  An integer is stored at the front of each heap object that
+counts how many references exist to the object, and this count is updated as
+references are created and disposed of.
+
+Reference counting has some well known problems: it can be slow if you make
+frequent updates to the reference count, and unless you take special steps,
+cycles of objects will leak.  We make a little effort to reduce unnecessary
+reference counts, but not a great deal, and no efforts to avoid the problems
+with cyclic reference counts.  It is the worst performing of the three options.
+
+For these reasons and others, the reference counting option doesn't seem the
+most interesting at present.  It will be maintained, but probably not
+developed further.
+
+.. _`Reference counting`: http://en.wikipedia.org/wiki/Reference_counting
+
+Using the B-D-W collector
++++++++++++++++++++++++++
+
+C with the addition of the Boehm collector actually has a very similar memory
+management model to that of RPython, so the BoehmGCTransformer is quite
+simple.  The Boehm GC performs somewhat better than the other options currently.
+
+Using our own collector
++++++++++++++++++++++++
+
+XXX
 
 Building the Low-Level Database
 -------------------------------