[pypy-svn] r41229 - pypy/dist/pypy/doc
hpk at codespeak.net
hpk at codespeak.net
Sat Mar 24 12:27:26 CET 2007
Author: hpk
Date: Sat Mar 24 12:27:25 2007
New Revision: 41229
Added:
pypy/dist/pypy/doc/new-architecture.txt (contents, props changed)
Log:
add a new draft of the architecture document
(which actually contains and always contained mission/goals)
see pypy-dev mail for more details.
Added: pypy/dist/pypy/doc/new-architecture.txt
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/doc/new-architecture.txt Sat Mar 24 12:27:25 2007
@@ -0,0 +1,257 @@
+==================================================
+PyPy - Goals and Architecture Overview
+==================================================
+
+.. contents::
+.. sectnum::
+
+This document gives an overview of the goals and architecture of PyPy.
+See `getting started`_ for a practical introduction and starting points.
+
+Mission statement
+====================
+
+We aim to provide:
+
+* a common translation framework for generating interpreters
+ and implementations of dynamic languages, supporting a clean separation
+ between language specification and implementation aspects.
+
+* a compliant and fast Python Language interpreter
+ enabling new advanced features without the requirement
+ to encode low level details into it.
+
+The choice of target platform as well as advanced optimisations
+techniques are to become aspects of the translation process, up
+to the ultimate point of *generating Just-in-Time compilers*
+for dynamic language interpreters.
+
+
+High Level Goals
+=============================
+
+PyPy - The Translation Framework
+-----------------------------------------------
+
+Traditionally, language interpreters are written in a target platform language
+like C/Posix, Java or C#. Each such implementation fundamentally provides
+a mapping from application source code to the target environment. One of
+the goals of the "all-encompassing" environments, like the .NET framework
+and to some extent the Java virtual machine, is to provide standardized
+and higher level functionalities in order to support language implementors
+for writing language implementations.
+
+PyPy is experimenting with a more ambitious approach. We are using a
+subset of a VHLL language, called RPython, to specify languages
+without many references and dependencies to lower level details,
+leaving it to the translation framework to add these as translation
+aspects and produce custom implementations for particular feature
+and platform configurations.
+
+Particularly, we want to help avoiding having to write ``n * m * o``
+interpreters for ``n`` dynamic languages and ``m`` platforms
+with ``o`` crucial design decisions. PyPy aims at having any
+one of these parameters changeable independently from each
+other:
+
+* ``n``: modify or replace the language we analyse and regenerate
+ a concrete interpreter for each target;
+
+* ``m``: write new translator back-ends to target new
+ physical and virtual platforms;
+
+* ``o``: tweak and optimize the translation process to produce
+ platform specific code based on different models and tradeoffs.
+
+By contrast, a standardized target environment - say .NET -
+enforces ``m=1`` as far as it's concerned. This helps making ``o`` a
+bit smaller by providing a higher-level base to build upon. Still,
+we believe that enforcing the use of one common environment
+is not necessary. PyPy's goal is to give weight to this claim - at least
+as far as language implementation is concerned - showing an approach
+to the ``n * m * o`` problem that does not rely on standardization.
+
+Particularly, we set ourselves the goal to *generate
+Just-In-Time Compilers* in addition to traditional
+Interpreter implementations - an area of language
+implementation that is commonly considered the ultimate
+in complexity.
+
+
+PyPy - the Python Interpreter
+--------------------------------------------
+
+Our goal is to provide a full featured, customizable and fast Python
+implementation, written in a subset of Python itself, working on and interacting
+with a large variety of platforms and allowing to quickly introduce
+new advanced language features.
+
+The architecture and abstractions of PyPy's "Python language specification"
+aim to enable new implementation and optimization features that
+tradtionally require pervasive changes in a language implementation
+source code.
+
+An important aspect of implementing Python in RPython is the high level of
+abstraction and compactness of the language. This allows an implementation
+that is, in many respects, easier to understand and play with than the one
+written in C (referred to throughout the PyPy documentation and source as
+"CPython").
+
+Another goal is to specify the language implementation in the form
+of a number of independent modules and abstractions, with clearly defined and
+automatically tested API's. This eases reuse and allows experimenting with
+variations and combinations of features.
+
+Our Python language implementation architecture, however, also serves as a
+key part for the translation framework: we re-use its bytecode evaluator
+to analyse RPython programs, PyPy's implementation language for specifying
+language semantics and interpretation.
+
+
+PyPy Architecture
+===========================
+
+As you would expect from a project implemented using ideas from the world
+of `Extreme Programming`_, the architecture of PyPy has evolved over time
+and continues to evolve. Nevertheless, the high level architecture is
+stable. There are two rather independent basic subsystems: the `Python
+Interpreter`_ and `the Translation Framework`_. We first talk about the
+Python Interpreter because the Translation framework in fact re-uses
+parts of its architecture and code.
+
+.. _`standard interpreter`:
+
+The Python Interpreter
+-------------------------------------
+
+The *Python Interpreter* is the subsystem implementing the Python language
+with the following key components:
+
+- a bytecode compiler responsible for producing Python Code objects
+
+- a `bytecode evaluator`_ responsible for interpreting
+ Python code objects.
+
+- an `standard object space`_ responsible for creating, accessing and
+ modifying Python application level objects.
+
+The *bytecode evaluator* is the part that interprets the compact
+bytecode format produced from user Python sources by a preprocessing
+phase, the *bytecode compiler*. The bytecode compiler itself is
+implemented as a chain of flexible passes (tokenizer, lexer, parser,
+abstract syntax tree builder, bytecode generator). The bytecode
+evaluator does its work by delegating all actual manipulation of
+user objects to the *object space*. The latter can be thought of as the
+library of built-in types. It defines the implementation of the user
+objects, like integers and lists, as well as the operations between
+them, like addition or truth-value-testing.
+
+This division between bytecode evaluator and object space is very
+important, as it gives a lot of flexibility. It is possible to use
+different `object spaces`_ to get different behaviours of the Python
+objects. Using a special object space is also an important technique
+for our translation process.
+
+.. _`bytecode evaluator`: interpreter.html
+.. _`standard object space`: objspace.html#the-standard-object-space
+.. _`object spaces`: objspace.html
+
+The Translation Process
+-----------------------
+
+The *translation process* is implemented in four parts:
+
+- producing a *flow graph* representation of an RPython program source,
+ A combination of the `bytecode evaluator`_ and a *flow object space*
+ performs `abstract interpretation`_ to record the flow of objects
+ and execution throughout a python program into such a *flow graph*;
+
+- the *annotator* which performs type inference on the flow graph;
+
+- the *typer* which, based on the type annotations, turns the flow graph
+ into another representation fitting the model of the target platform;
+
+- the *backend* which emits code for and integrates with the target platform.
+
+.. _`initialization time`:
+.. _`translation process in more details`:
+
+In order for our generic translation and type inference mechanisms to
+master complexity, we restrict the dynamism of our source
+RPython program, using a particularly dynamic definition of RPython_.
+During initialization the source program can make unrestricted
+use of Python (including metaclasses and execution of dynamically
+constructed strings). However, Python code objects that we eventually
+see during the production and analysis of flow graphs, must adhere
+to a more static subset of Python.
+
+The `bytecode evaluator`_ and the Flow Object Space work through
+those initialized RPython code objects. The result of this
+`abstract interpretation`_ is a flow graph: yet another
+representation of the source program, but one which is suitable for
+applying translation and type inference techniques. The nodes of the
+graph are basic blocks consisting of Object Space operations, flowing
+of values, and an exitswitch to one, two or multiple links which connect
+each basic block to other basic blocks.
+
+The flow graphs are fed as input into the Annotator. The Annotator,
+given entry point types, infers the types of values that flow through
+the program variables. RPython code is restricted in such a way that the
+Annotator is able to infer consistent types. How much dynamism we allow in
+RPython depends on, and is mostly restricted by, the Flow Object Space and
+the Annotator implementation. The more we can improve this translation
+phase, the more dynamism we can allow.
+
+The *Typer* is responsible to prepare and produce target platform specific
+representations of the annotated high level RPython flowgraphs. It visits
+the flowgraphs in order to transform and amend its contained operations
+into specialized representations, suitable for either high level or
+low level platforms. High level platforms usually have their own
+garbace collectors and high level builtin types, while low level platforms
+require dealing with machine level types and pointers.
+
+The actual target platform code is eventually emitted by
+the backend through "visiting" the type-annontated flow graph
+and adding platform specific integration code.
+
+Here is a graphical overview of the translation process (`PDF color version`_):
+
+ .. image:: image/translation-greyscale-small.png
+
+
+Further reading
+===============
+
+* `[VMC]`_ PyPy's approach to virtual machine construction
+ (Dynamic Languages Symposium 2006).
+
+* The `translation document`_ describes our translation process in detail.
+ You might also be interested in reading the more
+ theoretically-oriented paper `Compiling dynamic language
+ implementations`_.
+
+* All our `Technical reports`_. XXX reference specific reports
+ and provide a summary here?
+
+* `Getting started`_ with PyPy for a practical introduction.
+
+.. _`Extreme Programming`: http://www.extremeprogramming.com/
+.. _`statistics web page`: http://codespeak.net/~hpk/pypy-stat/
+.. _`very compliant`: http://www2.openend.se/~pedronis/pypy-c-test/allworkingmodules/summary.html
+.. _`Boehm-Demers-Weiser garbage collector`: http://www.hpl.hp.com/personal/Hans_Boehm/gc/
+.. _`RPython`: coding-guide.html#rpython
+.. _`abstract interpretation`: theory.html#abstract-interpretation
+.. _`Compiling dynamic language implementations`: dynamic-language-translation.html
+.. _`translation document`: translation.html
+.. _LLVM: http://llvm.org/
+.. _`PDF color version`: image/translation.pdf
+.. _`getting started`: getting-started.html
+.. _`[VMC]`: http://codespeak.net/svn/pypy/extradoc/talk/dls2006/pypy-vm-construction.pdf
+.. _`Technical reports`: index-report.html
+
+.. _Python: http://docs.python.org/ref
+.. _Psyco: http://psyco.sourceforge.net
+.. _Stackless: http://stackless.com
+
+.. include:: _ref.txt
+
More information about the Pypy-commit
mailing list