[pypy-svn] r43654 - pypy/extradoc/talk/dyla2007

Fri May 25 22:33:37 CEST 2007

Author: arigo
Date: Fri May 25 22:33:31 2007
New Revision: 43654

Added:
   pypy/extradoc/talk/dyla2007/section3.txt   (contents, props changed)
Log:
Starting a new file with the "Metaprogramming is Good" section.
For now it seems to make sense to have subsections corresponding
to the top-level points of the outline.


Added: pypy/extradoc/talk/dyla2007/section3.txt
==============================================================================

--- (empty file)
+++ pypy/extradoc/talk/dyla2007/section3.txt	Fri May 25 22:33:31 2007
@@ -0,0 +1,156 @@
+
+Metaprogramming Is Good
+=======================
+
+The present paper proposes to approach the implementation of dynamic
+languages from a meta-level: virtual machines for such languages should
+not be written by hand, but generated automatically "around" a
+description of the language in the form of an interpreter for it.  We
+argue that this approach gives many of the benefits usually expected by
+an implementer when he decides to target an existing object-oriented
+virtual machine.  It also gives other benefits that we will describe -
+mostly in term of flexibility.  But most importantly, it lets a
+community write a single source implementation of the language, avoiding
+the time-consuming task of keeping multiple ones in sync.  The single
+source can be used to generate either custom VMs for C-like
+environments, or interpreters running on top of OO VMs.  It makes it
+practical to experiment with large changes to the language and with
+entirely new languages, like domain-specific languages, while at any
+time being able to run the implemented language in a variety of
+environments, from C/Posix to the JVM to .NET.
+
+PyPy architecture
+-----------------
+
+We implemented this idea in the PyPy project.  The dynamic language for
+which we wrote an interpreter is Python.  It is a language which,
+because of its size and rather intricate semantics, is a good target for
+our approach, in the following sense: its previous reimplementation
+(Jython for the JVM and IronPython for .NET) have each proved to be very
+time-consuming to maintain.  Our implementation is by construction
+easier to maintain, and extremely portable (including to C/Posix, to the
+JVM and to .NET).
+
+In metaprogramming terms, the PyPy architecture is as follows:
+
+* we use a very expressive *object language* (RPython - an analyzable
+  subset of Python) as the language in which the complete Python
+  interpreter is written, together with the implementation of its
+  built-in types.  The language is still close to Python, e.g.  it is
+  object-oriented, provides rich built-in types and has automatic memory
+  management.  In other words, the source code of our complete Python
+  interpreter is mostly free of low-level details - no explicit memory
+  management, no pieces of C or C-level code.
+
+* we use a very expressive metalanguage (namely regular Python) to
+  perform the analysis of RPython code (control flow and data flow
+  construction, type inference, etc.) and its successive
+  transformations.
+
+* this metaprogramming component of PyPy is called the *translation
+  framework,* as it translates RPython source code (i.e. the full Python
+  interpreter) into lower-level code.  Its purpose is to add aspects to
+  and specialize the interpreter to fit a selectable virtual or hardware
+  runtime environment.  This either turns the interpreter into a
+  standalone virtual machine, or integrates it into an existing OO VM.
+  The necessary support code - e.g. the garbage collector when
+  targetting C - is itself written in RPython in much the same spirit
+  that the Jikes RVM's GCs are written in Java [JIKES]; as needed, it is
+  translated together with the interpreter to form the final custom VM.
+
+A detailed description of this translation process is beyond the scope
+of the present paper; it can be found in [VMC].  The actual Python
+interpreter of PyPy and the results we achieved by translating it to C,
+LLVM [LLVM] and .NET are described in [XXX].  These results show that
+the approach is practical and gives results whose performance is within
+the same order of magnitude (within a factor of 2 and improving) of the
+hand-written, well-tuned CPython, the C reference implementation.
+
+A single source
+---------------
+
+Our approach - a single "meta-written" implementation - naturally leads
+to language implementations that have various advantages over the
+"hand-written" implementations.  First of all, it is a single-source
+approach - we explicitly seek to solve the problem of proliferation of
+implementation.  In the sequel we will show more precise evidence that
+this can be done in a practical way with no major drawback.  By itself
+this would already be a valid argument against the need for
+standardization on a single OO VM.  But there are also other advantages
+in generating language implementations - which are in our opinion very
+significant, to the extent that it hints that metaprogramming, though
+not widely used in general-purpose programming, is an essential tool in
+a language implementer's toolbox.
+
+Writing the interpreter is easier
+---------------------------------
+
+A first point is that it makes interpreters easy to write, update and
+generally experiment with.  More expressiveness helps at all levels: our
+Python interpreter is written in RPython as a relatively simple
+interpreter, in some respects easier to understand than CPython.  We are
+using its high level and flexibility to quickly experiment with features
+or implementation techniques in ways that would, in a traditional
+approach, require pervasive changes to the source code.  For example,
+PyPy's Python interpreter can optionally provide lazily computed objects
+- a 150-lines extension in PyPy that would require global changes in
+CPython.  Further examples can be found in our technical reports; we
+should notably mention an extension adding a state-of-the-art security
+model for Python based on data flow tacking [XXX], and general
+performance improvements found by extensive experimentation [XXX], some
+of which were backported to CPython.
+
+If we compare with hand-writing an implementation for a specific OO VM,
+then the latter requires not only good knowledge of the OO VM in
+question and its object model - it requires the language implementer to
+fit the language into the imposed models.  For example, it is natural to
+map classes and instances of a dynamic object-oriented language to the
+OO VM's notion of classes and instances, but this might not be a simple
+task at all if the models are substantially different and/or if the OO
+VM is essentially less dynamic than the language to implement.  In our
+approach, this efforts is done at two levels: in a first step, while
+writing the interpreter, the implementer does not need to worry about
+integration with an OO VM's object model.  Of course, the integration
+effort does not simply vanish - indeed, a simple translation of such an
+interpreter to a given OO VM would give an interpreter for the dynamic
+language which is unable to communicate with the host VM (which might
+already be interesting in specific cases, but not in general).
+Integration comes as a second step, and occurs at a different level, by
+introducing mappings between the relevant classes of the interpreter and
+the corresponding classes of the OO VM.  As of yet we have no evidence
+that this makes the total integration effort much lower or higher; the
+point is that it has proven possible to take the single, OO
+VM-independent source code of PyPy's Python interpreter, and produce
+from it a version that runs in and integrates with the rest of the .NET
+environment - by writing *orthogonal* code.
+
+At the level of the translation framework, the ability to change or
+insert new whole-program transformations makes some aspects of the
+interpreter easier to deal with.  By "aspect" we mean, in the original
+AOP sense, a feature that is added to an object program by a
+meta-program.  The most obvious example in our context is the garbage
+collector for the target environments that lack it.  The hand-written C
+source of CPython is littered with macro calls that increment and
+decrement reference counters.  Our translation framework can insert such
+macro calls automatically - in fact, we have a choice of GCs, and
+reference counting is only one of them (not a particularly efficient
+one, either).  Some GCs require different transformations of the code.
+By contrast, supporting more than one GC in CPython is close to
+impossible without forking the whole source code base.
+
+In fact, there is a well-known example of a CPython fork: Stackless
+Python [XXX], which adds support for coroutines.  This is very hard to
+do in CPython because the interpreter is written in a highly recursive
+style.  Stackless Python required large-scale changes; it is not merged
+back into CPython due to the pervasive increase in complexity that it
+requires.  In PyPy, though, an optional "stackless" transformation is
+able to turn the Python interpreter - also written in a simple highly
+recursive style - into an efficient variant of continuation-passing
+style (CPS), enabling the usage of coroutines in the translated
+interpreter.  For more details and other examples of translation-level
+transformations, see [XXX].
+
+Getting good GCs and tools is possible
+--------------------------------------
+
+xxx