[pypy-svn] rev 1853 - pypy/trunk/doc/funding

Mon Oct 13 17:31:22 CEST 2003

Author: alex
Date: Mon Oct 13 17:31:21 2003
New Revision: 1853

Modified:
   pypy/trunk/doc/funding/B6.0_detailed_implementation.txt
Log:
A first proofreading, copyediting, and polishing of about the first
half of the document (up to the middle of the point on JIT).



Modified: pypy/trunk/doc/funding/B6.0_detailed_implementation.txt
==============================================================================

--- pypy/trunk/doc/funding/B6.0_detailed_implementation.txt	(original)
+++ pypy/trunk/doc/funding/B6.0_detailed_implementation.txt	Mon Oct 13 17:31:21 2003
@@ -14,16 +14,25 @@
 ~~~~~~~~~~~~~~~~~~~~~
 
 
-The PyPy project can be roughly divided into three phases, the last two of
-which are somewhat independent from each other:
+The PyPy project can be divided into three phases:
 
 - Phase 1: The core of PyPy itself must first be developed.
 
-- Phase 2: This code base can be used as an research/integration platform of choice.
+- Phase 2: This code base can be used as an research/integration platform of
+  choice.
 
 - Phase 3: Specific applications can be implemented and disseminated.
 
-Moreover, several project-long infrastructure tasks are of paramount importance. In particular, coordination is assured by the project coordinator in workpackage WP01_, with the help of the management and technical boards, as described in section B5. It involves the collection and monitoring of monthly status reports, reporting to the EU, organising sprints and maintaining an internal web site in collaboration with the maintenance workpackage.
+Phase 1 is a prerequisite for Phases 2 and 3; Phases 2 and 3 are reasonably
+independent from each other.
+
+Beyond phase-specific tasks, several project-long infrastructure tasks are of
+paramount importance. In particular, coordination is assured by the project
+coordinator in workpackage WP01_, with the help of the management and
+technical boards, as described in section B5. This workpackage involves
+collecting and monitoring monthly status reports, reporting to the EU,
+organising sprints, and maintaining an internal web site in collaboration with
+the maintenance workpackage.
 
 
 
@@ -40,59 +49,169 @@
 Phase 1
 -------
 
-The first goal is to develop a reasonably complete Python interpreter written in Python. It must be entierely compatible with the language specification. It consists of the following major parts:
-
-- A *bytecode compiler,* which translates Python source code into an internal intermediate representation, the *bytecode.*
-
-- A *bytecode interpreter,* which interprets bytecodes and manages the supporting internal structures (frames, exception tracebacks...). It considers objects as black boxes and delegates all individual operations on them to a library of built-in types, the *Object Space.*
-
-- An *Object Space,* which captures the semantics of the various types of the language.
-
-This subdivision is common among interpreter implementations, although we place special emphasis on the library and its separation from the bytecode interpreter. The implementation will closely follow the current reference C implementation (CPython). It is thus expected to be relatively straightforward, changing only a few design decisions (mainly with respect to the Object Space separation of concerns), and changing strictly nothing to the Python language itself.
-
-This task is done in workpackage WP04_. After the initial development, WP04_ will switch its focus to implementing the number of extension modules written in C that are either standard or widely-used in CPython. (The Python Standard Library is written in a large part in Python already, so that the problem only concerns C extension modules.) For each of these, two strategies can be followed: either the module is translated into a (regular) Python module, or it is written as an extension module of the PyPy interpreter (i.e. in Python too, but at the level of the interpreter.)
-
-The result so far can only run on top of another Python implementation like CPython. It still has advantages over the existing implementation, in education (as a much more compact, modular and readable piece of code than CPython), and in flexibility (as a basis to plug in alternate Object Spaces, alternate interpreters, or alternate compilers) -- more about it below.
-
-At this point, to make PyPy stand-alone (and running at a reasonable speed), we must have restricted key areas of the source to be written in a subset of the Python language, a sublanguage (RPython) in which staticness restrictions are enforced. This sublanguage is suitable for analysis and translation into a lower-level language. Its precise definition is a balance between the amount of dynamic desired to write PyPy and the amount of effort we put in the translation tools.
-
-Note that translation is not a one-shot process; the only source code for PyPy will be in Python or RPython, and translation can be repeated freely as part of a compilation process.
-
-We are giving translation an innovative emphasis (and thus a whole workpackage, WP05_) in the project. It is not merely an RPython-to-C translator; it is an essential piece towards the flexibility goals. Numerous aspects that were design decisions influencing the whole source code of the current CPython are now merely customizable behaviour of the translator. Indeed, instead of hard-coding such design decisions, we will keep the PyPy source as simple as possible and plug the required knowledge into the translator. For example, the high-level source need not be concerned about memory management issues (garbage collection, reference counting...); this aspect can be "weaved" into the low-level code by the translator. This point is essential for the separation of concerns. It has deep advantages over the classical monolithic approach, ranging from education (the main source base is not encumbered by details) to raw performance (choice of appropriate low-level models based on real-world context-dependent measures and comparisons). Also note its extreme adaptability: instead of generating C code, it is straightforward to target other runtime environments like Java or .NET. By contrast, today's costs of maintaining several evolving implementations (CPython, Jython for Java...) are very high.
-
-The translation process itself requires some kind of analysis of the RPython code. Among the various ways to perform this analysis we will most probably choose the one based on *abstract interpretation,* as opposed to source-level or bytecode-level analysis:
+The first goal is to develop a reasonably complete Python interpreter written
+in Python. It must be entierely compatible with the language specification. It
+consists of the following major parts:
+
+- A *bytecode compiler,* which translates Python source code into an internal
+  intermediate representation, the *bytecode.*
+
+- A *bytecode interpreter,* which interprets bytecodes and manages the
+  supporting internal structures (frames, exception tracebacks...). It
+  considers objects as black boxes and delegates all individual operations on
+  them to a library of built-in types, the *Object Space.*
+
+- An *Object Space,* which captures the semantics of the various types of the
+  language.
+
+This subdivision is common among interpreter implementations, although we
+place special emphasis on the library of built-in types and its separation
+from the bytecode interpreter. The implementation will closely follow the
+current reference C implementation (CPython). It is thus expected to be
+relatively straightforward, varying from CPython only in a few design
+decisions (mainly with respect to the Object Space separation of concerns),
+and changing strictly nothing in the Python language itself.
+
+This task is done in workpackage WP04_. After the initial development, WP04_
+will switch its focus to reimplementing the many extension modules written in
+C that are either standard or widely-used in CPython. (The Python Standard
+Library is already mostly written in Python, so that the problem only concerns
+C-coded extension modules.) For each extension module, two strategies can be
+followed: either the module is translated into a (regular) Python module, or
+it is written as an extension module of the PyPy interpreter (i.e. in Python
+too, but at the level of the interpreter.)
+
+The result of WP04_ up to this point will still only be able to run on top of
+another Python implementation, such as CPython. Despite this limitation, it
+will still present some advantages over the existing CPython implementation,
+in terms of education (being a much more compact, modular and readable piece
+of code than CPython) and in terms of flexibility (as a basis into which
+experimenters can plug in alternate Object Spaces, alternate interpreters, or
+alternate compilers) -- more about this below.
+
+At this point, to make PyPy stand-alone (and running at a reasonable speed),
+we must ensure that key areas of the source have been restricted to be written
+in a subset of the Python language, a sublanguage (RPython) in which
+staticness restrictions are enforced. This sublanguage is suitable for
+analysis and translation into a lower-level language. The sublanguage's
+precise definition is a trade-off between the amount of dynamic power desired
+to write PyPy and the amount of effort we put in the translation tools.
+
+Note that translation is not a one-shot process; the only source code for PyPy
+will be in Python or RPython, and translation can be repeated freely as part
+of a compilation process.
+
+We are giving translation an innovative emphasis (and thus a whole
+workpackage, WP05_) in the project. It is not merely an RPython-to-C
+translator; it is an essential part of our flexibility goals. Numerous aspects
+that used to be design decisions influencing the whole source code of the
+current CPython have by now become merely customizable behaviour of the
+translator.  Indeed, instead of hard-coding such design decisions, we will
+keep the PyPy source as simple as possible, and "plug-in" the required
+knowledge into the translator. For example, the high-level source need not be
+concerned about memory management issues (garbage collection, reference
+counting...); this aspect can be "weaved" into the low-level code by the
+translator. This point is essential for the separation of concerns. It has
+deep advantages over the classic monolithic approach, ranging from education
+(the main source base is not encumbered by details) to raw performance (choice
+of appropriate low-level models based on real-world context-dependent measures
+and comparisons). Also note this architecture's extreme adaptability: instead
+of generating C code, it is straightforward to target other runtime
+environments like Java or .NET. By contrast, today's costs of maintaining
+several evolving implementations (CPython, Jython for Java...) are very high.
+
+The translation process itself requires some kind of analysis of the RPython
+code. Among the various ways to perform this analysis we will most probably
+choose the one based on *abstract interpretation,* as opposed to source-level
+or bytecode-level analysis:
 
 .. image:: translation.png
 
-The basic idea is to write an alternative "abstract" Object Space which, instead of actually performing any operation between objects, records these operations and traces the control flow. The "abstract" Object Space will be plugged into the existing bytecode interpreter; these two components will then function as an abstract (or symbolic) interpreter in the common sense. The net result is that we can actually analyse RPython source code without writing any code specific to the language (!) given that we already have a bytecode interpreter which is flexible enough to accomodate a non-standard Object Space. In other words, the combination of PyPy and an "abstract" Object Space performs as the front-end of the translator and can be used to translate (for example) the regular PyPy interpreter and its standard Object Space. Note the two different roles of the bytecode interpreter in the diagram above.
-
-Another notable advantage of this approach is that instead of operating on static source code, it works on the result of loading and initializing the code into the existing CPython interpreter. (Python, unlike more static languages, allows arbitrary computations to be performed while loading modules, e.g. initializing caches or selecting components according to external parameters.) We are thus not restricted to RPython at initialization time, which is important to acheive the configurability goals.
+The basic idea is to write an alternative "abstract" Object Space which,
+instead of actually performing any operation between objects, records these
+operations and traces the control flow. The "abstract" Object Space will be
+plugged into the existing bytecode interpreter; these two components,
+together, will then function as an abstract (or symbolic) interpreter in the
+usual sense of the word. The net result is that we can actually analyse
+RPython source code without writing any code specific to the language (!),
+given that we already have a bytecode interpreter which is flexible enough to
+accomodate a non-standard Object Space. In other words, the combination of
+PyPy and an "abstract" Object Space performs as the front-end of the
+translator, and can be used to translate (for example) the regular PyPy
+interpreter and its standard Object Space. Note the two different roles played
+by the bytecode interpreter in the diagram above.
+
+Another important advantage of this approach is that, instead of operating on
+static source code, it works on the result of loading and initializing the
+code into the existing CPython interpreter. (Python, unlike more static
+languages, allows arbitrary computations to be performed while loading
+modules, e.g. initializing caches or selecting components according to
+external parameters.) We are thus not restricted to RPython at initialization
+time, which is important in order to achieve the configurability goals.
 
 
 Phase 2
 -------
 
-The completion of the first translated stand-alone PyPy interpreter is where the project could potentially branch into a large number of directions. Numerous exciting applications can be foreseen; we will see some of them in more details in Phase 3.
+The completion of the first translated stand-alone PyPy interpreter is where
+the project could potentially branch into a large number of directions.
+Numerous exciting applications can be foreseen; we will examine some of them
+in more details in Phase 3.
 
-The Phase 2 is concerned about research and integration of research, based on the extreme flexibility provided by the PyPy platform.
+Phase 2 is concerned about research, and integration of research, based on the
+extreme flexibility afforded by the PyPy platform.
 
 
 Performance
 +++++++++++
 
-Part of Phase 2 focuses essentially on performance issues, which are important in helping to establish a language implementation, and to open the language to a wide range of applications for which it were previously thought to be unsuitable.
-
-The flexibility in PyPy allows a number of designs to be reconsidered; better yet, it allows different design decisions to coexist. Indeed, most "hard" issues in interpreters have no obvious best solution; they are all depend in complicated ways to the specific details of the runtime environment and on the particular application considered and its possibly evolving context. PyPy will provide a good platform to experiment with and compare empirically several different implementations for many of these issues. For example:
-
-- The language's core object types can have several implementations with different trade-offs. To experimented with this, we will write a collection of alternatives in the "standard" Object Space implementation and heuristics to select between them. This kind of research effort is common, but PyPy can provide a good platform for real-world comparisons, and to help isolate which particular choices have which effects in an otherwise unchanged environment. Such data is notoriously hard to obtain in monolithic interpreters. This is the focus of WP06_.
-
-- Similarily, as described above, pervasive design decisions can be experimented with by tailoring the translator. This is the focus of WP07_.
-
-We will in particular investigate in detail two specific ways to customize the translator:
-
-- Generating Continuation Passing Style (CPS) low-level code. This makes the advanced notion of continuation available for the programmer; but -- most importantly in our case -- it allows the development, with the help of an appropriate runtime system, to support massive parallelism. Indeed, in almost any OS, native threads are not appropriate for massive usage. Applications (e.g. web servers handling thousands of connections) have to somehow emulate parallelism explicitely. Soft-threads are an ideal target for language integration. This work (also part of WP07_) would consist of exploiting this idea, which has been first tried for Python in the Stackless project.
-
-- Generating a JIT compiler. Existing work in the Psyco project has shown that it would actually be possible to mostly generate a JIT compiler instead of having to write it from scratch. The basic idea is again abstract interpretation: instead of actually performing any operation between two objects, we *generate* machine code that can perform the required operation. Again, no change to the bytecode interpreter is needed; all we need is to translate individual operations to processor instructions, together with a supporting runtime systems. This is defined by WP08_.
+Part of Phase 2 focuses essentially on performance issues, which are important
+in helping to establish the real-world success of a language implementation,
+and may open the language to a wide range of applications for which it was
+previously thought to be unsuitable.
+
+The flexibility in PyPy allows a number of design decisions to be easily
+reconsidered; better yet, it allows different design decisions to coexist.
+Indeed, most "hard" issues in interpreters have no obvious best solution; they
+are all depend in complicated ways on the specific details of the runtime
+environment and on the particular application considered and its possibly
+evolving context. PyPy will provide a good platform to experiment with, and
+compare empirically, several different possible approaches for many of these
+issues. For example:
+
+- The language's core object types can have several implementations with
+  different trade-offs. To experiment with this, we will write a collection
+  of alternatives in the "standard" Object Space implementation and heuristics
+  to select between them. This kind of research effort is common, but PyPy can
+  provide a good platform for real-world comparisons, and to help isolate
+  which particular choices have which effects in an otherwise unchanged
+  environment. Such data is notoriously hard to obtain in monolithic
+  interpreters. This is the focus of WP06_.
+
+- Similarily, as described above, pervasive design decisions can be
+  experimented with by tailoring the translator. This is the focus of WP07_.
+
+We will in particular investigate in detail two specific ways to customize the
+translator:
+
+- Generating Continuation Passing Style (CPS) low-level code. This makes the
+  advanced notion of continuation available for the programmer; but -- most
+  importantly in our case -- it allows the development, with the help of an
+  appropriate runtime system, to support massive parallelism. Indeed, in
+  almost any OS, native threads are not appropriate for massive usage.
+  Applications (e.g. web servers handling thousands of connections) have to
+  somehow emulate parallelism explicitly. Soft-threads are an ideal target for
+  language integration. This work (also part of WP07_) consists of exploiting
+  this idea, which has been first tried for Python in the Stackless project.
+
+- Generating a JIT compiler. Existing work in the Psyco project has shown that
+  it would actually be possible to mostly generate a JIT compiler instead of
+  having to write it from scratch. The basic idea is again abstract
+  interpretation: instead of actually performing any operation between two
+  objects, we *generate* machine code that can perform the required operation.
+  Again, no change to the bytecode interpreter is needed; all we need is to
+  translate individual operations to processor instructions, together with a
+  supporting runtime systems. This is defined by WP08_.
 
 In dynamic languges, the truth behind JIT compiling is a bit more involved than the above paragraph suggests. All the "standard" operations in Python, including the intuitively simple ones, are actually relatively complex because they depend heavily on the runtime type of the involved objects. This complex code is already written in detail in the "standard" Object Space. Thus the JIT compiler will work by abstract interpretation of RPython code, i.e. abstract interpretation of the interpreter itself (as opposed to user application code). This is similar to the ideas behind the translator, which operates on the RPython source (i.e. the bytecode interpreter and the standard Object Space). We plan to write the dynamic part of the JIT as a plug-in to the translator: instead of generating C code that is the direct translation of PyPy, we will generate C code that itself generates machine code. This extra indirection has large benefits: the operations the JIT need to be taught about are only the ones allowed in RPython. The resulting piece of C code would thus be the JIT-enabled version of PyPy.