[pypy-svn] r45097 - pypy/extradoc/talk/dyla2007
arigo at codespeak.net
arigo at codespeak.net
Sun Jul 15 12:01:38 CEST 2007
Author: arigo
Date: Sun Jul 15 12:01:37 2007
New Revision: 45097
Modified:
pypy/extradoc/talk/dyla2007/dyla.bib
pypy/extradoc/talk/dyla2007/dyla.tex
Log:
More clarifications based on reviewers' feedback.
Modified: pypy/extradoc/talk/dyla2007/dyla.bib
==============================================================================
--- pypy/extradoc/talk/dyla2007/dyla.bib (original)
+++ pypy/extradoc/talk/dyla2007/dyla.bib Sun Jul 15 12:01:37 2007
@@ -20,6 +20,12 @@
url = "http://psyco.sourceforge.net/"
}
+ at misc{ invokedynamic,
+ title = "Java Specification Request 292: Supporting Dynamically Typed Languages on the Java Platform",
+ note = "http://web1.jcp.org/en/jsr/detail?id=292",
+ url = "http://web1.jcp.org/en/jsr/detail?id=292"
+}
+
% P\#
@inproceedings{ psharp,
Modified: pypy/extradoc/talk/dyla2007/dyla.tex
==============================================================================
--- pypy/extradoc/talk/dyla2007/dyla.tex (original)
+++ pypy/extradoc/talk/dyla2007/dyla.tex Sun Jul 15 12:01:37 2007
@@ -2,6 +2,7 @@
\usepackage{makeidx}
\usepackage{graphicx}
+\sloppy
\begin{document}
@@ -63,10 +64,18 @@
object model supporting the high level dynamic language's objects. It
typically provides features like automatic garbage collection. Recent
languages like Python, Ruby, Perl and JavaScript have complicated
-semantics which are most easily mapped to a simple interpreter operating
-on syntax trees or bytecode; simpler languages like Lisp and Self
-typically have more efficient implementations based on code
-generation.
+semantics which are most easily mapped to a naive interpreter operating
+on syntax trees or bytecode; simpler languages\footnote
+{
+In the sense of the primitive semantics. ``Simple'' here is
+as opposed to ``complicated'', not as opposed to ``complex'': Common
+Lisp for example is not a small language, but it can at least in theory
+be expressed from a smaller core of primitives. In Python, all
+primitive operations have complicated semantics. The argument developed
+in the present paper is more relevant to ``complicated'' dynamic languages.
+}
+like Lisp, Smalltalk and Self typically have more
+efficient implementations based on code generation.
The effort required to build a new virtual machine is relatively
large. This is particularly true for languages which are complex
@@ -122,10 +131,10 @@
instead that VMs should not be \emph{written} in the first place -- they
should be generated from simple interpreters written in any suitable
high-level\footnote{``High-level'' is taken by opposition to languages
-like PreScheme \cite{kelsey-prescheme} or the subset of Smalltalk that the
-Squeak VM is written in \cite{Squeak} which use the syntax and
+like Scheme48's PreScheme \cite{kelsey-prescheme} or Squeak's \cite{Squeak}
+SLang which use the syntax and
metaprogramming facilities of a high-level language but encode
-low-level details like memory management.} language.
+low-level details like object layout and memory management.} language.
In section \ref{sect:approaches} we will explore the way VMs are typically
implemented in C and on top of OO VMs and some of the problems of these
@@ -237,9 +246,8 @@
\item
\emph{Better GCs:} While this is obvious in theory, OO VMs tend to have a
higher memory overhead to start with. For example, an instance of Sun's
-Java VM which just loaded Jython consumes XXX MB of non-shared memory
-(XXX for the JVM and an additional XXX after loading Jython), while a
-CPython process fits in XXX MB.
+Java VM which just loaded Jython consumes 34-42 MB of memory, while a
+CPython process fits in 3-4 MB.
\item
\emph{Cross-platform portability:} While this is true to some extent, the
@@ -268,8 +276,8 @@
that are quite unnatural both for the OO VM and for Prolog.
Another important point that makes implementation of languages on top of OO VMs
-harder is that typically OO VMs don't support meta-programming very well, or do so only
-at the bytecode level.
+harder is that typically general-purpose OO VMs don't support meta-programming
+very well, or do so only at the bytecode level.
\end{itemize}
Nevertheless, some of the benefits are real and very useful, the most
@@ -500,14 +508,22 @@
enough and write the dynamic language implementation accordingly so
that most of the bookkeeping work involved in running the dynamic
language can be removed -- dispatching, boxing, unboxing... However
-this has not been demonstrated yet.
-
-By far the fastest Python implementation, Psyco \cite{psyco-software}, contains a
-hand-written language-specific dynamic compiler. PyPy's translation
-tool-chain is able to extend the generated VMs with an automatically
-generated dynamic compiler that uses techniques similar to those of Psyco
-\cite{Psyco-paper}, derived from the
-interpreter. This is achieved by a pragmatic application of partial
+this has not been demonstrated yet.\footnote
+{Still in the draft stage, a proposed
+extension to the Java bytecode \cite{invokedynamic} might help achieve
+better integration between the Java JITs and dynamic language
+implementations running on top of JVMs.}
+
+By far the fastest Python implementation, Psyco \cite{psyco-software}
+contains a hand-written language-specific dynamic compiler. It works by
+specializing (parts of) Python functions by feeding runtime information
+back into the compiler (typically, but not exclusively, object types).
+The reader is referred to \cite{Psyco-paper} for more details.
+
+PyPy abstracts on this approach: its translation tool-chain is able to
+extend the generated VMs with an \emph{automatically generated} dynamic
+compiler that uses techniques similar to those of Psyco, derived from
+the interpreter. This is achieved by a pragmatic application of partial
evaluation techniques guided by a few hints added to the source of the
interpreter. In other words, it is possible to produce a reasonably
good language-specific JIT compiler and insert it into a VM, alongside
@@ -531,16 +547,17 @@
%XXX doesn't look entirely nice
\begin{itemize}
-\item \emph{Do not write dynamic language implementations ``by hand''.}
-Writing them more abstractly, at a higher level, has primarily only
-advantages, among them the avoidance of a proliferation of diverging
-implementations. Writing interpreters both flexibly and efficiently
-is difficult and meta-programming is a good way to achieve it.
+\item \emph{High-level languages are suitable to implement dynamic languages.}
+They allow an interpreter to be written more abstractly, which has many
+advantages -- among them the avoidance of a proliferation of diverging
+implementations, and better ways to combine flexibility with efficiency.
Moreover, this is not incompatible with targeting and benefiting from
-existing high-quality object-oriented virtual machines like those of the Java and .NET.
-
+existing high-quality object-oriented virtual machines like those of the
+Java and .NET.
\item \emph{Do not write VMs ``by hand''.}
+In other words, write an \emph{interpreter} but not a
+\emph{virtual machine} for the language.
Writing language-specific virtual machines is a time-consuming task for
medium to large languages. Unless large amounts of resources can be
invested, the resulting VMs are bound to have limitations which lead to
@@ -558,8 +575,8 @@
Aside from the advantages described in section
\ref{sect:metaprogramming}, a translation toolchain need not be
standardized for inter-operability but can be tailored to the needs of
-each project. Diversity is good; there is no need to attempt to
-standardize on a single OO VM.
+each project. Diversity is good; translation toolchains offset the need
+to attempt to standardize on a single OO VM.
\end{itemize}
The approach we outlined is actually just one in a very large, mostly
More information about the Pypy-commit
mailing list