[pypy-svn] r78198 - pypy/extradoc/talk/pepm2011

Fri Oct 22 13:56:31 CEST 2010

Author: cfbolz
Date: Fri Oct 22 13:56:29 2010
New Revision: 78198

Modified:
   pypy/extradoc/talk/pepm2011/escape-tracing.pdf
   pypy/extradoc/talk/pepm2011/math.lyx
   pypy/extradoc/talk/pepm2011/paper.tex
Log:
various fixes, as well as two bugs in the math


Modified: pypy/extradoc/talk/pepm2011/escape-tracing.pdf
==============================================================================
Binary files. No diff available.

Modified: pypy/extradoc/talk/pepm2011/math.lyx
==============================================================================

--- pypy/extradoc/talk/pepm2011/math.lyx	(original)
+++ pypy/extradoc/talk/pepm2011/math.lyx	Fri Oct 22 13:56:29 2010
@@ -552,7 +552,7 @@
 \begin_inset Text
 
 \begin_layout Plain Layout
-\begin_inset Formula ${\displaystyle \frac{v^{*}\in\mathrm{dom}(S),\,\left(v^{*},S\right)\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle v^{*}=\mathtt{new}\left(T\right)\right\rangle ::ops,S^{\prime}}}$
+\begin_inset Formula ${\displaystyle \frac{v^{*}\in\mathrm{dom}(S),\,\left(v^{*},S\right)\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle v^{*}=\mathtt{new}\left(\mathrm{type}\left(S\left(v^{*}\right)\right)\right)\right\rangle ::ops,S^{\prime}}}$
 \end_inset
 
 
@@ -575,7 +575,7 @@
 \begin_inset Text
 
 \begin_layout Plain Layout
-\begin_inset Formula ${\displaystyle \frac{\left(S\left(v^{*}\right)_{L},S\setminus\left\{ v^{*}\mapsto S\left(v^{*}\right)\right\} \right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{L},S^{\prime}\right),\,\left(S\left(v^{*}\right)_{R},S^{\prime}\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{R},S^{\prime\prime}\right)}{v^{*},S\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\mathrm{ops}_{L}::ops_{R}::\left\langle \mathtt{set}\left(v^{*},L,S\left(v^{*}\right)_{L}\right),\,\mathtt{set}\left(v^{*},R,S\left(v^{*}\right)_{R}\right)\right\rangle ,S^{\prime}}}$
+\begin_inset Formula ${\displaystyle \frac{\left(S\left(v^{*}\right)_{L},S\setminus\left\{ v^{*}\mapsto S\left(v^{*}\right)\right\} \right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{L},S^{\prime}\right),\,\left(S\left(v^{*}\right)_{R},S^{\prime}\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{R},S^{\prime\prime}\right)}{v^{*},S\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\mathrm{ops}_{L}::ops_{R}::\left\langle \mathtt{set}\left(v^{*},L,S\left(v^{*}\right)_{L}\right),\,\mathtt{set}\left(v^{*},R,S\left(v^{*}\right)_{R}\right)\right\rangle ,S^{\prime\prime}}}$
 \end_inset
 
 

Modified: pypy/extradoc/talk/pepm2011/paper.tex
==============================================================================
--- pypy/extradoc/talk/pepm2011/paper.tex	(original)
+++ pypy/extradoc/talk/pepm2011/paper.tex	Fri Oct 22 13:56:29 2010
@@ -235,7 +235,7 @@
 the translation to C, PyPy's tools can generate a tracing just-in-time compiler for the
 language that the interpreter is implementing. This process is mostly
 automatic; it only needs to be guided by the language implementer using a small number of
-source-code hints. Mostly-automatically generating a JIT compiler has many advantages
+source-code hints in the interpreter. Mostly-automatically generating a JIT compiler has many advantages
 over writing one manually, an error-prone and tedious process.
 By construction, the generated JIT has the same semantics as the interpreter.
 Optimizations can be shared between different languages implemented with PyPy.
@@ -289,7 +289,7 @@
 during tracing, because if
 it isn't, the rest of the trace would not be valid.
 
-When generating machine code, every guard is be turned into a quick check to
+When generating machine code, every guard is turned into a quick check to
 see whether the assumption still holds. When such a guard is hit during the
 execution of the machine code and the assumption does not hold, the execution of
 the machine code is stopped, and interpreter continues to run from that point
@@ -297,9 +297,9 @@
 loop end condition also takes the form of a guard.
 
 If one specific guard fails often enough, the tracing JIT will generate a new
-trace that starts exactly at the position of the failing guard. The existing
-assembler is patched to jump to the new trace when the guard fails
-\cite{andreas_gal_incremental_2006}.  This approach guarantees that all the
+trace that starts exactly at the position of the failing guard
+\cite{andreas_gal_incremental_2006}. The existing assembler is patched to jump
+to the new trace when the guard fails.  This approach guarantees that all the
 hot paths in the program will eventually be traced and compiled into efficient
 code.
 
@@ -377,7 +377,7 @@
 implement the numeric tower needs two method calls per arithmetic operation,
 which is costly due to the method dispatch.
 
-Let us now consider a simple interpreter function \lstinline{f} that uses the
+Let us now consider a simple ``interpreter'' function \lstinline{f} that uses the
 object model (see the bottom of Figure~\ref{fig:objmodel}).
 The loop in \lstinline{f} iterates \lstinline{y} times, and computes something in the process.
 Simply running this function is slow, because there are lots of virtual method
@@ -460,7 +460,8 @@
 corresponding to the stack level of the function that contains the traced
 operation. The trace is in single-assignment form, meaning that each variable is
 assigned a value exactly once. The arguments $p_0$ and $p_1$ of the loop correspond
-to the live variables \lstinline{y} and \lstinline{res} in the original function.
+to the live variables \lstinline{y} and \lstinline{res} in the while-loop of
+the original function.
 
 The operations in the trace correspond to the operations in the RPython program
 in Figure~\ref{fig:objmodel}:
@@ -473,11 +474,13 @@
     (inlined) method call and is followed by the trace of the called method.
     \item \lstinline{int_add} and \lstinline{int_gt} are integer addition and
     comparison (``greater than''), respectively.
+    \item \lstinline{guard_true} checks that a boolean is true.
 \end{itemize}
 
 The method calls in the trace are always preceded by a \lstinline{guard_class}
 operation, to check that the class of the receiver is the same as the one that
-was observed during tracing.\footnote{\lstinline{guard_class} performs a precise
+was observed during tracing.\footnote{\lstinline{guard_class} XXX lstinline too large in footnotes
+performs a precise
 class check, not checking for subclasses.} These guards make the trace specific
 to the situation where \lstinline{y} is really a \lstinline{BoxedInteger}. When
 the trace is turned into machine code and afterwards executed with
@@ -575,16 +578,16 @@
 The final trace after optimization can be seen in Figure~\ref{fig:step1} (the
 line numbers are the lines of the unoptimized trace where the operation originates).
 
-To optimize the trace, it is traversed from beginning to end while an output
+To optimize the trace, it is traversed from beginning to end and an output
 trace is produced. Every operation in the input trace is either
-removed or put into the output trace. Sometimes new operations need to be
+removed or copied into the output trace. Sometimes new operations need to be
 produced as well. The optimizer can only remove operations that manipulate
 objects that have been allocated within the trace, while all other operations are copied to the
 output trace unchanged.
 
 Looking at the example trace of Figure~\ref{fig:unopt-trace}, the operations
 in lines 1--9 are manipulating objects which existed before the trace and that
-are passed in as arguments: therefore the optimizer just puts them into the
+are passed in as arguments: therefore the optimizer just copies them into the
 output trace.
 
 The following operations (lines 10--17) are more interesting:
@@ -608,11 +611,11 @@
 stored in the fields of the allocated object come from.
 
 In the snippet above, the two \lstinline{new} operations are removed and two
-static objects are constructed. The \lstinline{set} operations manipulate a
-static object, therefore they can be removed as well; their effect is
+static objects are constructed. The \lstinline{set} operations manipulate
+static objects, therefore they can be removed as well; their effect is
 remembered in the static objects.
 
-The static object associated with $p_{5}$ would store the knowledge that it is a
+After the operations the static object associated with $p_{5}$ would store the knowledge that it is a
 \lstinline{BoxedInteger} whose \lstinline{intval} field contains $i_{4}$; the
 one associated with $p_{6}$ would store that it is a \lstinline{BoxedInteger}
 whose \lstinline{intval} field contains the constant -100.
@@ -628,8 +631,8 @@
 $i_{9}$ = int_add($i_{7}$, $i_{8}$)
 \end{lstlisting}
 
-The \lstinline{guard_class} operations can be removed, since their argument is a
-static object with the matching type \lstinline{BoxedInteger}. The
+The \lstinline{guard_class} operations can be removed, since their arguments are
+static objects with the matching type \lstinline{BoxedInteger}. The
 \lstinline{get} operations can be removed as well, because each of them reads a
 field out of a static object. The results of the get operation are replaced with
 what the static object stores in these fields: all the occurences of $i_{7}$ and $i_{8}$ in the trace are just
@@ -642,11 +645,11 @@
 
 The rest of the trace from Figure~\ref{fig:unopt-trace} is optimized in a
 similar vein. The operations in lines 27--35 produce two more static objects and
-are removed. Those in line 36--39 are just put into the output trace because they
+are removed. Those in line 36--39 are just copied into the output trace because they
 manipulate objects that are allocated before the trace. Lines 40--42 are removed
-because they operate on a static object. Line 43 is put into the output trace.
+because they operate on a static object. Line 43 is copied into the output trace.
 Lines 44--46 produce a new static object and are removed, lines 48--51 manipulate
-that static object and are removed as well. Lines 52--54 are put into the output
+that static object and are removed as well. Lines 52--54 are copied into the output
 trace.
 
 The last operation (line 55) is an interesting case. It is the \lstinline{jump}
@@ -759,16 +762,16 @@
 
 In this section we want to give a formal description of the semantics of the
 traces and of the optimizer and liken the optimization to partial evaluation.
-We focus on the operations for manipulating dynamically allocated objects,
+We focus on the operations for manipulating heap allocated objects,
 as those are the only ones that are actually optimized. We also consider only
-objects with two fields in this section, generalizing to arbitrary many fields
+objects with two fields $L$ and $R$ in this section, generalizing to arbitrary many fields
 is straightforward.
 
 Traces are lists of operations. The operations considered here are
 \lstinline{new}, \lstinline{get}, \lstinline{set} and \lstinline{guard_class}.
 The values of all
 variables are locations (\ie pointers). Locations are mapped to objects, which
-are represented by triples of a type $T$, and two locations that represent the
+are represented by triples $(T,l_1,l_2)$ of a type $T$, and two locations that represent the
 fields of the object. When a new object is created, the fields are initialized
 to null, but we require that they are initialized to a real
 location before being read, otherwise the trace is malformed (this condition is
@@ -776,7 +779,7 @@
 
 We use some abbreviations when dealing with object triples. To read the type of
 an object, $\mathrm{type}((T,l_1,l_2))=T$ is used. Reading a field $F$ from an
-object is written $(T,l_1,l_2)_F$ which either returns $l_1$ if $F=L$ or $l_2$
+object is written $(T,l_1,l_2)_F$ which either is $l_1$ if $F=L$ or $l_2$
 if $F=R$. To set field $F$ to a new location $l$, we use the notation
 $(T,l_1,l_2)!_Fl$, which yields a new triple $(T,l,l_2)$ if $F=L$ or a new
 triple $(T,l_1,l)$ if $F=R$.
@@ -827,8 +830,8 @@
 \emph{guard} & ${\displaystyle \frac{E(v)\in\mathrm{dom}(S),\,\mathrm{type}(S(E(v)))=T}{\mathtt{guard\_class}(v,T),E,S\overset{\mathrm{opt}}{\Longrightarrow}\left\langle \,\right\rangle ,E,S}}$\tabularnewline[3em]
  & ${\displaystyle \frac{E(v)\notin\mathrm{dom}(S)\vee\mathrm{type}(S(E(v)))\neq T,\,\left(E(v),S\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{\mathtt{guard\_class}(v,T),E,S\overset{\mathrm{opt}}{\Longrightarrow}\mathrm{ops}::\left\langle \mathtt{guard\_class}(E\left(v\right),T)\right\rangle ,E,S^{\prime}}}$\tabularnewline[3em]
 \emph{lifting} & ${\displaystyle \frac{v^{*}\notin\mathrm{dom}(S)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle \,\right\rangle ,S}}$\tabularnewline[3em]
- & ${\displaystyle \frac{v^{*}\in\mathrm{dom}(S),\,\left(v^{*},S\right)\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle v^{*}=\mathtt{new}\left(T\right)\right\rangle ::ops,S^{\prime}}}$\tabularnewline[3em]
- & ${\displaystyle \frac{\left(S\left(v^{*}\right)_{L},S\setminus\left\{ v^{*}\mapsto S\left(v^{*}\right)\right\} \right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{L},S^{\prime}\right),\,\left(S\left(v^{*}\right)_{R},S^{\prime}\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{R},S^{\prime\prime}\right)}{v^{*},S\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\mathrm{ops}_{L}::ops_{R}::\left\langle \mathtt{set}\left(v^{*},L,S\left(v^{*}\right)_{L}\right),\,\mathtt{set}\left(v^{*},R,S\left(v^{*}\right)_{R}\right)\right\rangle ,S^{\prime}}}$\tabularnewline[3em]
+ & ${\displaystyle \frac{v^{*}\in\mathrm{dom}(S),\,\left(v^{*},S\right)\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle v^{*}=\mathtt{new}\left(\mathrm{type}\left(S\left(v^{*}\right)\right)\right)\right\rangle ::ops,S^{\prime}}}$\tabularnewline[3em]
+ & ${\displaystyle \frac{\left(S\left(v^{*}\right)_{L},S\setminus\left\{ v^{*}\mapsto S\left(v^{*}\right)\right\} \right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{L},S^{\prime}\right),\,\left(S\left(v^{*}\right)_{R},S^{\prime}\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{R},S^{\prime\prime}\right)}{v^{*},S\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\mathrm{ops}_{L}::ops_{R}::\left\langle \mathtt{set}\left(v^{*},L,S\left(v^{*}\right)_{L}\right),\,\mathtt{set}\left(v^{*},R,S\left(v^{*}\right)_{R}\right)\right\rangle ,S^{\prime\prime}}}$\tabularnewline[3em]
 \end{tabular}
 \end{center}
 
@@ -958,7 +961,7 @@
 \subsection{Analysis of the Algorithm}
 \label{sub:analysis}
 
-While we do not offer a formal proof of it, it should be relatively clear
+While we do not offer a formal proof of it, it can argue informally
 that the algorithm presented above is sound: it works by delaying (and
 often completely removing) some operations.  The algorithm runs in a
 single pass over the list of operations.  We can check that although
@@ -1014,7 +1017,7 @@
 To evaluate our allocation removal algorithm, we look at the effectiveness when
 used in the generated tracing JIT of PyPy's Python interpreter. This interpreter
 is a full implementation of Python 2.5 language semantics and is about 30'000
-lines of code.
+lines of RPython code.
 
 The
 benchmarks we used are small-to-medium Python programs, some synthetic
@@ -1058,7 +1061,7 @@
 CPython 2.6.6\footnote{\texttt{http://python.org}}, which uses a bytecode-based
 interpreter. Furthermore we compared against
 Psyco\cite{rigo_representation-based_2004} 1.6,
-an extension to CPython which is a
+a (hand-written) extension module to CPython which is a
 just-in-time compiler that produces machine code at run-time. It is not based
 on traces. Finally, we used two versions of PyPy's Python interpreter (revision
 77823 of SVN trunk\footnote{\texttt{http://codespeak.net/svn/pypy/trunk}}): one
@@ -1178,17 +1181,17 @@
 loop to only allocate it once, instead of every iteration. No details are given
 for this optimization. The fact that the object is still allocated and needs to
 be written to means that only the allocations are optimized away, but not the
-reads and writes out of/into the object.
+reads out of and writes into the object.
 
 SPUR, a tracing JIT for C\# seems to be able to remove allocations in a similar
 way to the approach described here, as hinted at in the technical report
-\cite{michael_bebenita_spur:_2010}. However, no details for the approach and its
+\cite{bebenita_spur:_2010}. However, no details for the approach and its
 implementation are given.
 
 Psyco \cite{rigo_representation-based_2004} is a (non-tracing) JIT for Python
 that implements a more ad-hoc version of the allocation removal described here.
 Our static objects could be related to what are called \emph{virtual} objects
-in Psyco.  It is a hand-written extension module for CPython. Historically,
+in Psyco.  Historically,
 PyPy's JIT can be seen as some successor of Psyco for a general context (one of
 the authors of this paper is the author of Psyco).