[pypy-svn] rev 2674 - pypy/trunk/doc
alex at codespeak.net
alex at codespeak.net
Mon Dec 22 16:26:21 CET 2003
Author: alex
Date: Mon Dec 22 16:26:20 2003
New Revision: 2674
Modified:
pypy/trunk/doc/architecture.txt
Log:
Lots of editing, mostly "copy-editing level" (typo fixes, removal
of some passive forms, general enhancement of Englisg, etc).
Modified: pypy/trunk/doc/architecture.txt
==============================================================================
--- pypy/trunk/doc/architecture.txt (original)
+++ pypy/trunk/doc/architecture.txt Mon Dec 22 16:26:20 2003
@@ -1,138 +1,146 @@
Overview on PyPy's current architecture (Dec. 2003)
====================================================
-The different parts of PyPy have always been under more or less
-heavy refacting during our five one-week sprints in 2003.
-However, the basic architecture remains rather simple and unchanged:
-a plain interpreter reads and dispatches bytecodes, shuffling objects
-around on the stack and between namespaces of which it knows almost
-nothing. For any operation on an object it delegates to an so called
-Object Space which performs modifications, creation and and destruction
-of objects. Such objects are often refered to as application level
-objects because they are the objects you naturally work with from
-a python program.
-
+The various parts of PyPy have always been under more or less heavy
+refactoring during our five one-week sprints in 2003. However, the
+basic architecture remains rather simple and unchanged: a plain
+*interpreter* reads and dispatches *bytecodes*, shuffling objects around
+on the stack and between namespaces of which it knows almost nothing.
+For any operation on an object, the interpreter delegates to an
+so-called "Object Space", which performs modifications, creation and
+destruction of objects. Such objects are often refered to as
+*application-level objects*, because they are the objects you naturally
+work with from a python program.
+
The Interpreter
===============
-The interpreter accepts python code objects which it obtains by invoking
-Python's builtin compiler (we have a way of constructing those code
-objects from python code only but it's still not integrated). Code
-objects are a nicely preprocessed structured representation of source
-code and their main content is *Bytecode*. In addition code objects
-also know how to create a *Frame* object which has the responsibility to
-*interpret* a code object's bytecode. Each bytecode is implemented via
-a python function which in turn will delegate operations on an
-application's objects to an object space.
+The interpreter handles python code objects. The interpreter can built
+code objects from Python sources, when needed, by invoking Python's
+builtin compiler (we also have a way of constructing those code objects
+from python code only, but we have not integrated it yet). Code objects
+are a nicely preprocessed, structured representation of source code, and
+their main content is *bytecode*. In addition, code objects also know
+how to create a *frame* object which has the responsibility to
+*interpret* a code object's bytecode. Each bytecode is implemented by a
+python function, which, in turn, delegates operations on
+application-level objects to an object space.
The Object Space
================
-The object space creates all objects and knows how to perform operations
-on the objects. You may think of an object space as being a
-library offering a fixed API, a set of *operations*, with
-implementations that correspond to the known semantics of Python objects.
-An example of an operation is *add*: add's implementations are e.g. responsible for
-performing numeric addition if *add* works on numbers, concatenation when it
-works on built-in sequences.
-
-All object space operations take and return "application level" objects.
-There is only one minimal operation that allows the interpreter to gain
-knowledge about the value of an application level object: *is_true()* which
-will return a boolean interpreter level value. This is neccessary for
-implementing e.g. if-statements (or rather their branching bytecodes).
-
-We currently have 4 working object spaces which can be plugged into the
-interpreter.
-
-- The Trivial Object Space, which is basically delegating almost all operations
- to the underlying CPython interpreter. It was and still is used to test our
- interpreter. Though it is not essential it stays useful for testing and is
- thus there to stay for some time.
+The object space creates all objects and knows how to perform operations
+on the objects. You may think of an object space as being a library
+offering a fixed API, a set of *operations*, with implementations that
+correspond to the known semantics of Python objects. An example of an
+operation is *add*: add's implementations are, for example, responsible
+for performing numeric addition when add works on numbers, concatenation
+when add works on built-in sequences.
+
+All object-space operations take and return "application level" objects.
+There is only one, very simple, object-space operation which allows the
+interpreter to gain some knowledge about the value of an
+application-level object: ``is_true()``, which returns a boolean
+interpreter-level value. This is necessary to implement, for example,
+if-statements (or rather, to be pedantic, to implement the
+conditional-branching bytecodes into which if-statements get compiled).
+
+We currently have four working object spaces which can be plugged into
+the interpreter:
+
+- The Trivial Object Space, which basically delegates almost all
+ operations to the underlying CPython interpreter. It was, and still
+ is, used to test our interpreter. Alhough it is not essential, it
+ remains useful for testing, and thus it is here to stay.
- The Standard Object Space, which is an almost complete implementation
- of the various Python objects. This is the main focus of this document,
- since it is - together with the interpreter - the foundation of our
- Python implementation.
-
-- the Flow Object Space which is used for transforming a python program
- into a flow graph representation. It does this by "abstract interpretation"
- which will be explained later.
+ of the various Python objects. This is the main focus of this
+ document, since the Standard Object Space, together with the
+ interpreter, is the foundation of our Python implementation.
+
+- the Flow Object Space, which transforms a python program into a
+ flow-graph representation. The Flow Object Space performs this
+ transformation task through "abstract interpretation", which we will
+ explain later in this document.
-- the Trace Object Space which wraps the trivial or standard object
- space in order to trace the execution of bytecodes, frames and
+- the Trace Object Space, which wraps either the trivial or the standard
+ object space in order to trace the execution of bytecodes, frames and
object space operations.
The Standard Object Space
=========================
-The Standard Object Space implements python's objects and types
-and all operations between them. It is thus an essential
-component in order to reach CPython comptability.
-
-The implementations of ints, floats, strings, dicts, lists etc.
-all live in separate files and are bound together by a "multimethod"
-mechanism. Multimethods allow a caller - most notably the interpreter -
-to stay free from knowing anything about an object's implementation.
-Thus multimethods implement a way of delegating to the right implementation
-based on the passed in objects (which it previously created, anyway).
-We will examine how the multimethod mechanism works through an example.
-
-We examine the add-operation of ``int`` and ``float`` objects and disregard
-all other objects for the moment. There is one multimethod ``add`` that both
-implementations of add(intimpl, intimpl) and add(floatimpl, floatimpl)
-register with.
-
-If in our application program we have the expression ``2+3``, the
-interpreter will create an application-level object containing the
-value ``2`` and one containing the value ``3``. We will here talk about
-them as ``W_Int(2)`` and ``W_Int(3)`` respectively. The interpreter then
-calls the Standard Object Space with ``add(W_Int(2), W_Int(3))``.
-
-The Object Space then examines the objects passed in and delegates directly
-to the add(intimpl, intimpl) function: since this is a "direct hit", the
-multimethod can immediately dispatch the operation to the correct implementation
-i.e., the one registered as the implementation for this signature.
+The Standard Object Space implements python objects and types, and all
+operations on them. It is thus an essential component in order to reach
+CPython compatibility.
+
+The implementations of ints, floats, strings, dicts, lists, etc, all
+live in separate files, and are bound together by a "multimethod"
+mechanism. Multimethods allow a caller - most notably the interpreter -
+to stay free from knowing anything about objects' implementations. Thus
+multimethods implement a way of delegating to the right implementation
+based on the passed in objects (objects previously created by the same
+subsystem). We examine how the multimethod mechanism works through an
+example.
+
+We consider the add-operation of ``int`` and ``float`` objects, and
+disregard all other object types for the moment. There is one
+multimethod ``add``, and both relevant implementations, ``add(intimpl,
+intimpl)`` and ``add(floatimpl, floatimpl)``, *register* with that one
+``add`` multimethod.
+
+When in our application program we have the expression ``2+3``, the
+interpreter creates an application-level object containing the value
+``2`` and one containing the value ``3``. We talk about them as
+``W_Int(2)`` and ``W_Int(3)`` respectively. The interpreter then calls
+the Standard Object Space with ``add(W_Int(2), W_Int(3))``.
+
+The Object Space then examines the objects passed in, and delegates
+directly to the ``add(intimpl, intimpl)`` function: since this is a
+"direct hit", the multimethod immediately dispatches the operation to
+the correct implementation, i.e., the one registered as the
+implementation for this signature.
-If the multimethod doesn't have any registered functions for the
-exact given signature, as would be the case for example for the expression
+If the multimethod doesn't have any registered functions for the exact
+given signature, as would be the case for example for the expression
``2+3.0``, the multimethod tests if it can use coercion to find a
function with a signature that works. In this case we would coerce
``W_Int(2)`` to ``W_Float(2.0)`` in order to find a function in the
multimethod that has a correct signature. Note that the multimethod
-mechanism is still considered a major refactoring target as it is
+mechanism is still considered a major refactoring target, since it is
not easy to get it completly right, fast and accurate.
Application-level and interpreter-level execution and objects
=============================================================
-Since Python is used for implementing all of our code base there
-is a crucial distinction to be aware of: interpreter level objects
-and application level objects. The latter are the ones that you
-deal with when you write normal python programs. Interpreter level
-code, however, cannot invoke operations or access attributes from
-our application-level objects. You will immediately recognize any
-interpreter level code in PyPy because all variable and object names
-start with a `w_` which indicates that they are wrapped/application
-level values.
-
-To show the difference with an example: to sum the contents of two
-variables ``a`` and ``b``, typical application-level code is ``a+b``
--- in sharp contrast, typical interpreter-level code is ``space.add(w_a, w_b)``,
-where ``space`` is an instance of an object space and ``w_a`` and ``w_b``
-are typical names for the *wrapped* versions of the two variables.
+Since Python is used for implementing all of our code base, there is a
+crucial distinction to be aware of: *interpreter-level* objects versus
+*application level* objects. The latter are the ones that you deal with
+when you write normal python programs. Interpreter-level code, however,
+cannot invoke operations nor access attributes from application-level
+objects. You will immediately recognize any interpreter level code in
+PyPy, because all variable and object names start with a ``w_``, which
+indicates that they are "wrapped" application-level values.
+
+Let's show the difference with a simple example. To sum the contents of
+two variables ``a`` and ``b``, typical application-level code is ``a+b``
+-- in sharp contrast, typical interpreter-level code is ``space.add(w_a,
+w_b)``, where ``space`` is an instance of an object space, and ``w_a``
+and ``w_b`` are typical names for the *wrapped* versions of the two
+variables.
-It also helps to remember how CPython deals with the same issue:
-interpreter level code is written in C and thus typical code for the
+It helps to remember how CPython deals with the same issue: interpreter
+level code, in CPython, is written in C, and thus typical code for the
addition is ``PyNumber_Add(p_a, p_b)`` where ``p_a`` and ``p_b`` are C
variables of type ``PyObject*``. This is very similar to how we write
-our interpreter-level python code.
+our interpreter-level code in Python.
Moreover, in PyPy we have to make a sharp distinction between
interpreter and application level *exceptions*: application exceptions
-are always contained in an ``OperationError``. This makes it easy
-to distinguish failures in our interpreter-level code from those
-appearing in a python application level program.
+are always contained inside an instance of ``OperationError``. This
+makes it easy to distinguish failures in our interpreter-level code from
+those appearing in a python application level program that we are
+interpreting.
Application level is often preferable
@@ -142,7 +150,7 @@
write and debug. For example, suppose we want to implement the
``update`` method of dict objects. Programming at the application
level, we can write the obvious, simple implementation, one that looks
-like an **executable definition** of ``update``::
+like an **executable definition** of ``update``, for example::
def update(self, other):
for k in other.keys():
@@ -160,18 +168,20 @@
w_value = space.getitem(w_other, w_key)
space.setitem(w_self, w_key, w_value)
-This interpreter-level implementation looks much more similar to the C source
-code although it is probably still more readable. In any case, it should be
-obvious that the application-level implementation is definitely more readable,
-more elegant and maintainable than the interpreter-level one.
-
-In fact, in almost all parts of PyPy you will find application level code
-in the middle of interpreter-level code. Apart from some bootstrapping
-problems (application level functions need a certain initialization level
-of the object space to be executed) application level code is usually
-preferable. We have an abstraction (called 'Gateway') which allows the caller
-of a function to stay ignorant whether a particular function is implemented
-at application or interpreter level.
+This interpreter-level implementation looks much more similar to the C
+source code, although it is probably still more readable. In any case,
+it should be obvious that the application-level implementation is
+definitely more readable, more elegant and more maintainable than the
+interpreter-level one.
+
+In fact, in almost all parts of PyPy, you find application level code in
+the middle of interpreter-level code. Apart from some bootstrapping
+problems (application level functions need a certain initialization
+level of the object space before they can be executed), application
+level code is usually preferable. We have an abstraction (called
+'Gateway') which allows the caller of a function to remain ignorant of
+whether a particular function is implemented at application or
+interpreter level.
Wrapping
========
@@ -200,57 +210,63 @@
RPython, the Flow Object Space and translation
==============================================
-At last we want to translate our interpreter and standard object
-space into a low level language. In order for our translation
-and type inference mechanisms to work effectively we need to restrict
-the dynamism of our interpreter-level Python code at some point. However,
-we are completly free to do all kind of nice python constructs up to
-using metaclasses and executing dynamically constructed strings.
-When the initialization phase finishes (mainly ``objspace.initialize()``)
-all involved code objects need to adhere to a (non-formally defined) more
-static subset of Python: Restricted Python or 'RPython'.
-
-A so called Flow Object Space will then - with the help of our plain
-interpreter - work through those initialized "RPython" code objects.
-The result of this *abstract interpretation* is a flow graph: yet another
-representation of a python program which is suitable for applying
-translation and type inference techniques. The nodes of the graphs are
-basic blocks consisting of Object Space operations, flowing of values
-and an exitswitch to one, two or multiple links which connect it to
-other basic blocks.
+One of PyPy's longer-term objectives is to enable translation of our
+interpreter and standard object space into a lower-level language. In
+order for our translation and type inference mechanisms to work
+effectively, we need to restrict the dynamism of our interpreter-level
+Python code at some point. However, in the start-up phase, we are
+completly free to use all kind of nice python constructs. including
+metaclasses and execution of dynamically constructed strings. When the
+initialization phase (mainly, the function ``objspace.initialize()``)
+finishes, however, all code objects involved need to adhere to a
+(non-formally defined) more static subset of Python: Restricted Python,
+also known as 'RPython'.
+
+The Flow Object Space will then, with the help of our plain interpreter,
+work through those initialized "RPython" code objects. The result of
+this *abstract interpretation* is a flow graph: yet another
+representation of a python program, but one which is suitable for
+applying translation and type inference techniques. The nodes of the
+graphs are basic blocks consisting of Object Space operations, flowing
+of values and an exitswitch to one, two or multiple links which connect
+it to other basic blocks.
The flow graphs are fed as input to the Annotator. The Annotator, given
entry point types, infers the types of values that flow through the
-program variables. And here we have one of the informal definitions of
-RPython: it's restricted in a way that the translator can still compile
-low-level typed code. How much dynamism we allow in RPython depends and
-is restricted by the Flow Object Space and the Annotator implementation.
-The more we can improve this translation phase the more we can allow
-dynamism. But in some cases it will probably more feasible to just get
-rid of some dynamism we use in our interpreter level code. It is mainly
-because of this trade-off situatio that we don't currently try to
-formally define 'RPython'.
-
-The actual low-level code (or in fact also other high-level code)
-is emitted by visiting the type-annotated flow graph. Currently
-we have a Pyrex backend and a Lisp backend. We use (a slightly
-hacked version of) Pyrex to generate C libraries. As Pyrex also
-accepts plain non-typed python code we can test translation even
-though it is not complete.
+program variables. Here, one of the informal definitions of RPython
+comes into play: RPython code is restricted in a way that the translator
+can still compile low-level typed code. How much dynamism we allow in
+RPython depends, and is restricted by, the Flow Object Space and the
+Annotator implementation. The more we can improve this translation
+phase, the more dynamism we can allow. In some cases, however, it will
+probably be more feasible and practical to just get rid of some of the
+dynamism we use in our interpreter level code. It is mainly because of
+this trade-off situation that we don't currently try to formally define
+'RPython'.
+
+The actual low-level code (and, in fact, also other high-level code) is
+emitted by "visiting" the type-annotated flow graph. Currently, we have
+a Pyrex-producing backend, and a Lisp-producing backend. We use (a
+slightly hacked version of) Pyrex to generate C libraries. Since Pyrex
+also accepts plain non-typed python code. we can test translation even
+though type annotation is not complete.
Trace Object Space
==================
-A recent addition is the Trace Object space which allows to wrap
-a standard and trivial object space in order to trace all object
-space operations, frame creation, deletion and bytecode execution.
-The ease with which the Trace Object Space could be implemented
-at the Amsterdam Sprint underlines the power of the Object Space
-abstraction. (Of course the formerly implemented Flow Object Space
-producing the flow graph already was proof enough).
-
-There are certainly many more possibly useful Object Space ideas
-like a ProxySpace that connects to a remote machine where the
-actual operations are performed. At the other end, we wouldn't
-need to change object spaces at all if we want to extend or modify
-the interpreter by e.g. adding or removing some bytecodes.
+A recent addition is the Trace Object space, which wraps a standard or
+trivial object space in order to trace all object space operations,
+frame creation, deletion and bytecode execution. The ease with which
+the Trace Object Space was implemented at the Amsterdam Sprint
+underlines the power of the Object Space abstraction. (Of course, the
+previously-implemented Flow Object Space producing the flow graph
+already was proof enough).
+
+There are certainly many more possibly useful Object Space ideas, such
+as a ProxySpace that connects to a remote machine where the actual
+operations are performed. At the other end, we wouldn't need to change
+object spaces at all if we want to extend or modify the interpreter,
+e.g. by adding or removing some bytecodes. Thus, the interpreter and
+object-space cooperation nicely splits the python runtime into two
+reasonably-independent halves, cooperating along a reasonably narrow
+interface, and suitable for multiple separate implementations.
More information about the Pypy-commit
mailing list