[pypy-svn] rev 2674 - pypy/trunk/doc

Mon Dec 22 16:26:21 CET 2003

Author: alex
Date: Mon Dec 22 16:26:20 2003
New Revision: 2674

Modified:
   pypy/trunk/doc/architecture.txt
Log:
Lots of editing, mostly "copy-editing level" (typo fixes, removal
of some passive forms, general enhancement of Englisg, etc).



Modified: pypy/trunk/doc/architecture.txt
==============================================================================

--- pypy/trunk/doc/architecture.txt	(original)
+++ pypy/trunk/doc/architecture.txt	Mon Dec 22 16:26:20 2003
@@ -1,138 +1,146 @@
 Overview on PyPy's current architecture  (Dec. 2003)
 ====================================================
 
-The different parts of PyPy have always been under more or less
-heavy refacting during our five one-week sprints in 2003. 
-However, the basic architecture remains rather simple and unchanged: 
-a plain interpreter reads and dispatches bytecodes, shuffling objects 
-around on the stack and between namespaces of which it knows almost 
-nothing.  For any operation on an object it delegates to an so called 
-Object Space which performs modifications, creation and and destruction 
-of objects.  Such objects are often refered to as application level 
-objects because they are the objects you naturally work with from
-a python program. 
- 
+The various parts of PyPy have always been under more or less heavy
+refactoring during our five one-week sprints in 2003.  However, the
+basic architecture remains rather simple and unchanged: a plain
+*interpreter* reads and dispatches *bytecodes*, shuffling objects around
+on the stack and between namespaces of which it knows almost nothing.
+For any operation on an object, the interpreter delegates to an
+so-called "Object Space", which performs modifications, creation and
+destruction of objects.  Such objects are often refered to as
+*application-level objects*, because they are the objects you naturally
+work with from a python program. 
+
 The Interpreter
 ===============
 
-The interpreter accepts python code objects which it obtains by invoking
-Python's builtin compiler (we have a way of constructing those code
-objects from python code only but it's still not integrated).  Code
-objects are a nicely preprocessed structured representation of source
-code and their main content is *Bytecode*.  In addition code objects
-also know how to create a *Frame* object which has the responsibility to
-*interpret* a code object's bytecode.  Each bytecode is implemented via
-a python function which in turn will delegate operations on an
-application's objects to an object space. 
+The interpreter handles python code objects. The interpreter can built
+code objects from Python sources, when needed, by invoking Python's
+builtin compiler (we also have a way of constructing those code objects
+from python code only, but we have not integrated it yet).  Code objects
+are a nicely preprocessed, structured representation of source code, and
+their main content is *bytecode*.  In addition, code objects also know
+how to create a *frame* object which has the responsibility to
+*interpret* a code object's bytecode.  Each bytecode is implemented by a
+python function, which, in turn, delegates operations on
+application-level objects to an object space. 
 
 The Object Space
 ================
 
-The object space creates all objects and knows how to perform operations 
-on the objects. You may think of an object space as being a 
-library offering a fixed API, a set of *operations*, with 
-implementations that correspond to the known semantics of Python objects. 
-An example of an operation is *add*: add's implementations are e.g. responsible for 
-performing numeric addition if *add* works on numbers, concatenation when it
-works on built-in sequences.
-
-All object space operations take and return "application level" objects. 
-There is only one minimal operation that allows the interpreter to gain 
-knowledge about the value of an application level object: *is_true()* which 
-will return a boolean interpreter level value.  This is neccessary for 
-implementing e.g. if-statements (or rather their branching bytecodes). 
-
-We currently have 4 working object spaces which can be plugged into the
-interpreter. 
-
-- The Trivial Object Space, which is basically delegating almost all operations
-  to the underlying CPython interpreter. It was and still is used to test our 
-  interpreter. Though it is not essential it stays useful for testing and is
-  thus there to stay for some time. 
+The object space creates all objects and knows how to perform operations
+on the objects. You may think of an object space as being a library
+offering a fixed API, a set of *operations*, with implementations that
+correspond to the known semantics of Python objects.  An example of an
+operation is *add*: add's implementations are, for example, responsible
+for performing numeric addition when add works on numbers, concatenation
+when add works on built-in sequences.
+
+All object-space operations take and return "application level" objects.
+There is only one, very simple, object-space operation which allows the
+interpreter to gain some knowledge about the value of an
+application-level object: ``is_true()``, which returns a boolean
+interpreter-level value.  This is necessary to implement, for example,
+if-statements (or rather, to be pedantic, to implement the
+conditional-branching bytecodes into which if-statements get compiled). 
+
+We currently have four working object spaces which can be plugged into
+the interpreter:
+
+- The Trivial Object Space, which basically delegates almost all
+  operations to the underlying CPython interpreter. It was, and still
+  is, used to test our interpreter. Alhough it is not essential, it
+  remains useful for testing, and thus it is here to stay.
 
 - The Standard Object Space, which is an almost complete implementation 
-  of the various Python objects. This is the main focus of this document, 
-  since it is - together with the interpreter - the foundation of our 
-  Python implementation. 
-
-- the Flow Object Space which is used for transforming a python program
-  into a flow graph representation.  It does this by "abstract interpretation"
-  which will be explained later. 
+  of the various Python objects. This is the main focus of this
+  document, since the Standard Object Space, together with the
+  interpreter, is the foundation of our Python implementation. 
+
+- the Flow Object Space, which transforms a python program into a
+  flow-graph representation.  The Flow Object Space performs this
+  transformation task through "abstract interpretation", which we will
+  explain later in this document.
 
-- the Trace Object Space which wraps the trivial or standard object
-  space in order to trace the execution of bytecodes, frames and 
+- the Trace Object Space, which wraps either the trivial or the standard
+  object space in order to trace the execution of bytecodes, frames and
   object space operations. 
 
 The Standard Object Space
 =========================
 
-The Standard Object Space implements python's objects and types 
-and all operations between them.  It is thus an essential 
-component in order to reach CPython comptability. 
-
-The implementations of ints, floats, strings, dicts, lists etc. 
-all live in separate files and are bound together by a "multimethod"
-mechanism.  Multimethods allow a caller - most notably the interpreter - 
-to stay free from knowing anything about an object's implementation. 
-Thus multimethods implement a way of delegating to the right implementation
-based on the passed in objects (which it previously created, anyway). 
-We will examine how the multimethod mechanism works through an example.
-
-We examine the add-operation of ``int`` and ``float`` objects and disregard 
-all other objects for the moment.  There is one multimethod ``add`` that both
-implementations of add(intimpl, intimpl) and add(floatimpl, floatimpl) 
-register with.  
-
-If in our application program we have the expression ``2+3``, the
-interpreter will create an application-level object containing the 
-value ``2`` and one containing the value ``3``.  We will here talk about 
-them as ``W_Int(2)`` and ``W_Int(3)`` respectively. The interpreter then 
-calls the Standard Object Space with ``add(W_Int(2), W_Int(3))``.
-
-The Object Space then examines the objects passed in and delegates directly 
-to the add(intimpl, intimpl) function: since this is a "direct hit", the 
-multimethod can immediately dispatch the operation to the correct implementation
-i.e., the one registered as the implementation for this signature.
+The Standard Object Space implements python objects and types, and all
+operations on them.  It is thus an essential component in order to reach
+CPython compatibility. 
+
+The implementations of ints, floats, strings, dicts, lists, etc, all
+live in separate files, and are bound together by a "multimethod"
+mechanism.  Multimethods allow a caller - most notably the interpreter -
+to stay free from knowing anything about objects' implementations.  Thus
+multimethods implement a way of delegating to the right implementation
+based on the passed in objects (objects previously created by the same
+subsystem).  We examine how the multimethod mechanism works through an
+example.
+
+We consider the add-operation of ``int`` and ``float`` objects, and
+disregard all other object types for the moment.  There is one
+multimethod ``add``, and both relevant implementations, ``add(intimpl,
+intimpl)`` and ``add(floatimpl, floatimpl)``, *register* with that one
+``add`` multimethod.
+
+When in our application program we have the expression ``2+3``, the
+interpreter creates an application-level object containing the value
+``2`` and one containing the value ``3``.  We talk about them as
+``W_Int(2)`` and ``W_Int(3)`` respectively. The interpreter then calls
+the Standard Object Space with ``add(W_Int(2), W_Int(3))``.
+
+The Object Space then examines the objects passed in, and delegates
+directly to the ``add(intimpl, intimpl)`` function: since this is a
+"direct hit", the multimethod immediately dispatches the operation to
+the correct implementation, i.e., the one registered as the
+implementation for this signature.
 
-If the multimethod doesn't have any registered functions for the 
-exact given signature, as would be the case for example for the expression
+If the multimethod doesn't have any registered functions for the exact
+given signature, as would be the case for example for the expression
 ``2+3.0``, the multimethod tests if it can use coercion to find a
 function with a signature that works. In this case we would coerce
 ``W_Int(2)`` to ``W_Float(2.0)`` in order to find a function in the
 multimethod that has a correct signature. Note that the multimethod
-mechanism is still considered a major refactoring target as it is
+mechanism is still considered a major refactoring target, since it is
 not easy to get it completly right, fast and accurate.  
 
 Application-level and interpreter-level execution and objects
 =============================================================
 
-Since Python is used for implementing all of our code base there
-is a crucial distinction to be aware of: interpreter level objects
-and application level objects.  The latter are the ones that you
-deal with when you write normal python programs.  Interpreter level
-code, however, cannot invoke operations or access attributes from
-our application-level objects.  You will immediately recognize any 
-interpreter level code in PyPy because all variable and object names 
-start with a `w_` which indicates that they are wrapped/application
-level values. 
-
-To show the difference with an example: to sum the contents of two 
-variables ``a`` and ``b``, typical application-level code is ``a+b`` 
--- in sharp contrast, typical interpreter-level code is ``space.add(w_a, w_b)``, 
-where ``space`` is an instance of an object space and ``w_a`` and ``w_b`` 
-are typical names for the *wrapped* versions of the two variables.  
+Since Python is used for implementing all of our code base, there is a
+crucial distinction to be aware of: *interpreter-level* objects versus
+*application level* objects.  The latter are the ones that you deal with
+when you write normal python programs.  Interpreter-level code, however,
+cannot invoke operations nor access attributes from application-level
+objects.  You will immediately recognize any interpreter level code in
+PyPy, because all variable and object names start with a ``w_``, which
+indicates that they are "wrapped" application-level values. 
+
+Let's show the difference with a simple example.  To sum the contents of
+two variables ``a`` and ``b``, typical application-level code is ``a+b``
+-- in sharp contrast, typical interpreter-level code is ``space.add(w_a,
+w_b)``, where ``space`` is an instance of an object space, and ``w_a``
+and ``w_b`` are typical names for the *wrapped* versions of the two
+variables.  
 
-It also helps to remember how CPython deals with the same issue:
-interpreter level code is written in C and thus typical code for the
+It helps to remember how CPython deals with the same issue: interpreter
+level code, in CPython, is written in C, and thus typical code for the
 addition is ``PyNumber_Add(p_a, p_b)`` where ``p_a`` and ``p_b`` are C
 variables of type ``PyObject*``. This is very similar to how we write
-our interpreter-level python code. 
+our interpreter-level code in Python.
 
 Moreover, in PyPy we have to make a sharp distinction between
 interpreter and application level *exceptions*: application exceptions
-are always contained in an ``OperationError``.  This makes it easy 
-to distinguish failures in our interpreter-level code from those 
-appearing in a python application level program. 
+are always contained inside an instance of ``OperationError``.  This
+makes it easy to distinguish failures in our interpreter-level code from
+those appearing in a python application level program that we are
+interpreting.
 
 
 Application level is often preferable 
@@ -142,7 +150,7 @@
 write and debug.  For example, suppose we want to implement the
 ``update`` method of dict objects.  Programming at the application
 level, we can write the obvious, simple implementation, one that looks
-like an **executable definition** of ``update``::
+like an **executable definition** of ``update``, for example::
 
     def update(self, other):
         for k in other.keys():
@@ -160,18 +168,20 @@
             w_value = space.getitem(w_other, w_key)
             space.setitem(w_self, w_key, w_value)
 
-This interpreter-level implementation looks much more similar to the C source
-code although it is probably still more readable.  In any case, it should be 
-obvious that the application-level implementation is definitely more readable,
-more elegant and maintainable than the interpreter-level one.
-
-In fact, in almost all parts of PyPy you will find application level code
-in the middle of interpreter-level code.  Apart from some bootstrapping 
-problems (application level functions need a certain initialization level
-of the object space to be executed) application level code is usually 
-preferable.  We have an abstraction (called 'Gateway') which allows the caller
-of a function to stay ignorant whether a particular function is implemented
-at application or interpreter level. 
+This interpreter-level implementation looks much more similar to the C
+source code, although it is probably still more readable.  In any case,
+it should be obvious that the application-level implementation is
+definitely more readable, more elegant and more maintainable than the
+interpreter-level one.
+
+In fact, in almost all parts of PyPy, you find application level code in
+the middle of interpreter-level code.  Apart from some bootstrapping
+problems (application level functions need a certain initialization
+level of the object space before they can be executed), application
+level code is usually preferable.  We have an abstraction (called
+'Gateway') which allows the caller of a function to remain ignorant of
+whether a particular function is implemented at application or
+interpreter level. 
 
 Wrapping
 ========
@@ -200,57 +210,63 @@
 RPython, the Flow Object Space and translation
 ==============================================
 
-At last we want to translate our interpreter and standard object
-space into a low level language.  In order for our translation 
-and type inference mechanisms to work effectively we need to restrict 
-the dynamism of our interpreter-level Python code at some point.  However,
-we are completly free to do all kind of nice python constructs up to 
-using metaclasses and executing dynamically constructed strings. 
-When the initialization phase finishes (mainly ``objspace.initialize()``) 
-all involved code objects need to adhere to a (non-formally defined) more
-static subset of Python: Restricted Python or 'RPython'. 
-
-A so called Flow Object Space will then - with the help of our plain
-interpreter - work through those initialized "RPython" code objects.
-The result of this *abstract interpretation* is a flow graph: yet another
-representation of a python program which is suitable for applying
-translation and type inference techniques.  The nodes of the graphs are
-basic blocks consisting of Object Space operations, flowing of values
-and an exitswitch to one, two or multiple links which connect it to
-other basic blocks. 
+One of PyPy's longer-term objectives is to enable translation of our
+interpreter and standard object space into a lower-level language.  In
+order for our translation and type inference mechanisms to work
+effectively, we need to restrict the dynamism of our interpreter-level
+Python code at some point.  However, in the start-up phase, we are
+completly free to use all kind of nice python constructs.  including
+metaclasses and execution of dynamically constructed strings.  When the
+initialization phase (mainly, the function ``objspace.initialize()``)
+finishes, however, all code objects involved need to adhere to a
+(non-formally defined) more static subset of Python: Restricted Python,
+also known as 'RPython'. 
+
+The Flow Object Space will then, with the help of our plain interpreter,
+work through those initialized "RPython" code objects.  The result of
+this *abstract interpretation* is a flow graph: yet another
+representation of a python program, but one which is suitable for
+applying translation and type inference techniques.  The nodes of the
+graphs are basic blocks consisting of Object Space operations, flowing
+of values and an exitswitch to one, two or multiple links which connect
+it to other basic blocks. 
 
 The flow graphs are fed as input to the Annotator. The Annotator, given
 entry point types, infers the types of values that flow through the
-program variables.  And here we have one of the informal definitions of
-RPython: it's restricted in a way that the translator can still compile
-low-level typed code.  How much dynamism we allow in RPython depends and
-is restricted by the Flow Object Space and the Annotator implementation.
-The more we can improve this translation phase the more we can allow
-dynamism.  But in some cases it will probably more feasible to just get
-rid of some dynamism we use in our interpreter level code.  It is mainly
-because of this trade-off situatio that we don't currently try to
-formally define 'RPython'. 
-
-The actual low-level code (or in fact also other high-level code) 
-is emitted by visiting the type-annotated flow graph. Currently 
-we have a Pyrex backend and a Lisp backend.  We use (a slightly 
-hacked version of) Pyrex to generate C libraries.  As Pyrex also
-accepts plain non-typed python code we can test translation even 
-though it is not complete.  
+program variables.  Here, one of the informal definitions of RPython
+comes into play: RPython code is restricted in a way that the translator
+can still compile low-level typed code.  How much dynamism we allow in
+RPython depends, and is restricted by, the Flow Object Space and the
+Annotator implementation.  The more we can improve this translation
+phase, the more dynamism we can allow.  In some cases, however, it will
+probably be more feasible and practical to just get rid of some of the
+dynamism we use in our interpreter level code.  It is mainly because of
+this trade-off situation that we don't currently try to formally define
+'RPython'. 
+
+The actual low-level code (and, in fact, also other high-level code) is
+emitted by "visiting" the type-annotated flow graph. Currently, we have
+a Pyrex-producing backend, and a Lisp-producing backend.  We use (a
+slightly hacked version of) Pyrex to generate C libraries.  Since Pyrex
+also accepts plain non-typed python code. we can test translation even
+though type annotation is not complete.  
 
 Trace Object Space 
 ==================
 
-A recent addition is the Trace Object space which allows to wrap
-a standard and trivial object space in order to trace all object
-space operations, frame creation, deletion and bytecode execution. 
-The ease with which the Trace Object Space could be implemented
-at the Amsterdam Sprint underlines the power of the Object Space
-abstraction.  (Of course the formerly implemented Flow Object Space 
-producing the flow graph already was proof enough). 
-
-There are certainly many more possibly useful Object Space ideas
-like a ProxySpace that connects to a remote machine where the
-actual operations are performed. At the other end, we wouldn't
-need to change object spaces at all if we want to extend or modify
-the interpreter by e.g. adding or removing some bytecodes. 
+A recent addition is the Trace Object space, which wraps a standard or
+trivial object space in order to trace all object space operations,
+frame creation, deletion and bytecode execution.  The ease with which
+the Trace Object Space was implemented at the Amsterdam Sprint
+underlines the power of the Object Space abstraction.  (Of course, the
+previously-implemented Flow Object Space producing the flow graph
+already was proof enough). 
+
+There are certainly many more possibly useful Object Space ideas, such
+as a ProxySpace that connects to a remote machine where the actual
+operations are performed. At the other end, we wouldn't need to change
+object spaces at all if we want to extend or modify the interpreter,
+e.g. by adding or removing some bytecodes.  Thus, the interpreter and
+object-space cooperation nicely splits the python runtime into two
+reasonably-independent halves, cooperating along a reasonably narrow
+interface, and suitable for multiple separate implementations.