[pypy-svn] r78198 - pypy/extradoc/talk/pepm2011
cfbolz at codespeak.net
cfbolz at codespeak.net
Fri Oct 22 13:56:31 CEST 2010
Author: cfbolz
Date: Fri Oct 22 13:56:29 2010
New Revision: 78198
Modified:
pypy/extradoc/talk/pepm2011/escape-tracing.pdf
pypy/extradoc/talk/pepm2011/math.lyx
pypy/extradoc/talk/pepm2011/paper.tex
Log:
various fixes, as well as two bugs in the math
Modified: pypy/extradoc/talk/pepm2011/escape-tracing.pdf
==============================================================================
Binary files. No diff available.
Modified: pypy/extradoc/talk/pepm2011/math.lyx
==============================================================================
--- pypy/extradoc/talk/pepm2011/math.lyx (original)
+++ pypy/extradoc/talk/pepm2011/math.lyx Fri Oct 22 13:56:29 2010
@@ -552,7 +552,7 @@
\begin_inset Text
\begin_layout Plain Layout
-\begin_inset Formula ${\displaystyle \frac{v^{*}\in\mathrm{dom}(S),\,\left(v^{*},S\right)\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle v^{*}=\mathtt{new}\left(T\right)\right\rangle ::ops,S^{\prime}}}$
+\begin_inset Formula ${\displaystyle \frac{v^{*}\in\mathrm{dom}(S),\,\left(v^{*},S\right)\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle v^{*}=\mathtt{new}\left(\mathrm{type}\left(S\left(v^{*}\right)\right)\right)\right\rangle ::ops,S^{\prime}}}$
\end_inset
@@ -575,7 +575,7 @@
\begin_inset Text
\begin_layout Plain Layout
-\begin_inset Formula ${\displaystyle \frac{\left(S\left(v^{*}\right)_{L},S\setminus\left\{ v^{*}\mapsto S\left(v^{*}\right)\right\} \right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{L},S^{\prime}\right),\,\left(S\left(v^{*}\right)_{R},S^{\prime}\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{R},S^{\prime\prime}\right)}{v^{*},S\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\mathrm{ops}_{L}::ops_{R}::\left\langle \mathtt{set}\left(v^{*},L,S\left(v^{*}\right)_{L}\right),\,\mathtt{set}\left(v^{*},R,S\left(v^{*}\right)_{R}\right)\right\rangle ,S^{\prime}}}$
+\begin_inset Formula ${\displaystyle \frac{\left(S\left(v^{*}\right)_{L},S\setminus\left\{ v^{*}\mapsto S\left(v^{*}\right)\right\} \right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{L},S^{\prime}\right),\,\left(S\left(v^{*}\right)_{R},S^{\prime}\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{R},S^{\prime\prime}\right)}{v^{*},S\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\mathrm{ops}_{L}::ops_{R}::\left\langle \mathtt{set}\left(v^{*},L,S\left(v^{*}\right)_{L}\right),\,\mathtt{set}\left(v^{*},R,S\left(v^{*}\right)_{R}\right)\right\rangle ,S^{\prime\prime}}}$
\end_inset
Modified: pypy/extradoc/talk/pepm2011/paper.tex
==============================================================================
--- pypy/extradoc/talk/pepm2011/paper.tex (original)
+++ pypy/extradoc/talk/pepm2011/paper.tex Fri Oct 22 13:56:29 2010
@@ -235,7 +235,7 @@
the translation to C, PyPy's tools can generate a tracing just-in-time compiler for the
language that the interpreter is implementing. This process is mostly
automatic; it only needs to be guided by the language implementer using a small number of
-source-code hints. Mostly-automatically generating a JIT compiler has many advantages
+source-code hints in the interpreter. Mostly-automatically generating a JIT compiler has many advantages
over writing one manually, an error-prone and tedious process.
By construction, the generated JIT has the same semantics as the interpreter.
Optimizations can be shared between different languages implemented with PyPy.
@@ -289,7 +289,7 @@
during tracing, because if
it isn't, the rest of the trace would not be valid.
-When generating machine code, every guard is be turned into a quick check to
+When generating machine code, every guard is turned into a quick check to
see whether the assumption still holds. When such a guard is hit during the
execution of the machine code and the assumption does not hold, the execution of
the machine code is stopped, and interpreter continues to run from that point
@@ -297,9 +297,9 @@
loop end condition also takes the form of a guard.
If one specific guard fails often enough, the tracing JIT will generate a new
-trace that starts exactly at the position of the failing guard. The existing
-assembler is patched to jump to the new trace when the guard fails
-\cite{andreas_gal_incremental_2006}. This approach guarantees that all the
+trace that starts exactly at the position of the failing guard
+\cite{andreas_gal_incremental_2006}. The existing assembler is patched to jump
+to the new trace when the guard fails. This approach guarantees that all the
hot paths in the program will eventually be traced and compiled into efficient
code.
@@ -377,7 +377,7 @@
implement the numeric tower needs two method calls per arithmetic operation,
which is costly due to the method dispatch.
-Let us now consider a simple interpreter function \lstinline{f} that uses the
+Let us now consider a simple ``interpreter'' function \lstinline{f} that uses the
object model (see the bottom of Figure~\ref{fig:objmodel}).
The loop in \lstinline{f} iterates \lstinline{y} times, and computes something in the process.
Simply running this function is slow, because there are lots of virtual method
@@ -460,7 +460,8 @@
corresponding to the stack level of the function that contains the traced
operation. The trace is in single-assignment form, meaning that each variable is
assigned a value exactly once. The arguments $p_0$ and $p_1$ of the loop correspond
-to the live variables \lstinline{y} and \lstinline{res} in the original function.
+to the live variables \lstinline{y} and \lstinline{res} in the while-loop of
+the original function.
The operations in the trace correspond to the operations in the RPython program
in Figure~\ref{fig:objmodel}:
@@ -473,11 +474,13 @@
(inlined) method call and is followed by the trace of the called method.
\item \lstinline{int_add} and \lstinline{int_gt} are integer addition and
comparison (``greater than''), respectively.
+ \item \lstinline{guard_true} checks that a boolean is true.
\end{itemize}
The method calls in the trace are always preceded by a \lstinline{guard_class}
operation, to check that the class of the receiver is the same as the one that
-was observed during tracing.\footnote{\lstinline{guard_class} performs a precise
+was observed during tracing.\footnote{\lstinline{guard_class} XXX lstinline too large in footnotes
+performs a precise
class check, not checking for subclasses.} These guards make the trace specific
to the situation where \lstinline{y} is really a \lstinline{BoxedInteger}. When
the trace is turned into machine code and afterwards executed with
@@ -575,16 +578,16 @@
The final trace after optimization can be seen in Figure~\ref{fig:step1} (the
line numbers are the lines of the unoptimized trace where the operation originates).
-To optimize the trace, it is traversed from beginning to end while an output
+To optimize the trace, it is traversed from beginning to end and an output
trace is produced. Every operation in the input trace is either
-removed or put into the output trace. Sometimes new operations need to be
+removed or copied into the output trace. Sometimes new operations need to be
produced as well. The optimizer can only remove operations that manipulate
objects that have been allocated within the trace, while all other operations are copied to the
output trace unchanged.
Looking at the example trace of Figure~\ref{fig:unopt-trace}, the operations
in lines 1--9 are manipulating objects which existed before the trace and that
-are passed in as arguments: therefore the optimizer just puts them into the
+are passed in as arguments: therefore the optimizer just copies them into the
output trace.
The following operations (lines 10--17) are more interesting:
@@ -608,11 +611,11 @@
stored in the fields of the allocated object come from.
In the snippet above, the two \lstinline{new} operations are removed and two
-static objects are constructed. The \lstinline{set} operations manipulate a
-static object, therefore they can be removed as well; their effect is
+static objects are constructed. The \lstinline{set} operations manipulate
+static objects, therefore they can be removed as well; their effect is
remembered in the static objects.
-The static object associated with $p_{5}$ would store the knowledge that it is a
+After the operations the static object associated with $p_{5}$ would store the knowledge that it is a
\lstinline{BoxedInteger} whose \lstinline{intval} field contains $i_{4}$; the
one associated with $p_{6}$ would store that it is a \lstinline{BoxedInteger}
whose \lstinline{intval} field contains the constant -100.
@@ -628,8 +631,8 @@
$i_{9}$ = int_add($i_{7}$, $i_{8}$)
\end{lstlisting}
-The \lstinline{guard_class} operations can be removed, since their argument is a
-static object with the matching type \lstinline{BoxedInteger}. The
+The \lstinline{guard_class} operations can be removed, since their arguments are
+static objects with the matching type \lstinline{BoxedInteger}. The
\lstinline{get} operations can be removed as well, because each of them reads a
field out of a static object. The results of the get operation are replaced with
what the static object stores in these fields: all the occurences of $i_{7}$ and $i_{8}$ in the trace are just
@@ -642,11 +645,11 @@
The rest of the trace from Figure~\ref{fig:unopt-trace} is optimized in a
similar vein. The operations in lines 27--35 produce two more static objects and
-are removed. Those in line 36--39 are just put into the output trace because they
+are removed. Those in line 36--39 are just copied into the output trace because they
manipulate objects that are allocated before the trace. Lines 40--42 are removed
-because they operate on a static object. Line 43 is put into the output trace.
+because they operate on a static object. Line 43 is copied into the output trace.
Lines 44--46 produce a new static object and are removed, lines 48--51 manipulate
-that static object and are removed as well. Lines 52--54 are put into the output
+that static object and are removed as well. Lines 52--54 are copied into the output
trace.
The last operation (line 55) is an interesting case. It is the \lstinline{jump}
@@ -759,16 +762,16 @@
In this section we want to give a formal description of the semantics of the
traces and of the optimizer and liken the optimization to partial evaluation.
-We focus on the operations for manipulating dynamically allocated objects,
+We focus on the operations for manipulating heap allocated objects,
as those are the only ones that are actually optimized. We also consider only
-objects with two fields in this section, generalizing to arbitrary many fields
+objects with two fields $L$ and $R$ in this section, generalizing to arbitrary many fields
is straightforward.
Traces are lists of operations. The operations considered here are
\lstinline{new}, \lstinline{get}, \lstinline{set} and \lstinline{guard_class}.
The values of all
variables are locations (\ie pointers). Locations are mapped to objects, which
-are represented by triples of a type $T$, and two locations that represent the
+are represented by triples $(T,l_1,l_2)$ of a type $T$, and two locations that represent the
fields of the object. When a new object is created, the fields are initialized
to null, but we require that they are initialized to a real
location before being read, otherwise the trace is malformed (this condition is
@@ -776,7 +779,7 @@
We use some abbreviations when dealing with object triples. To read the type of
an object, $\mathrm{type}((T,l_1,l_2))=T$ is used. Reading a field $F$ from an
-object is written $(T,l_1,l_2)_F$ which either returns $l_1$ if $F=L$ or $l_2$
+object is written $(T,l_1,l_2)_F$ which either is $l_1$ if $F=L$ or $l_2$
if $F=R$. To set field $F$ to a new location $l$, we use the notation
$(T,l_1,l_2)!_Fl$, which yields a new triple $(T,l,l_2)$ if $F=L$ or a new
triple $(T,l_1,l)$ if $F=R$.
@@ -827,8 +830,8 @@
\emph{guard} & ${\displaystyle \frac{E(v)\in\mathrm{dom}(S),\,\mathrm{type}(S(E(v)))=T}{\mathtt{guard\_class}(v,T),E,S\overset{\mathrm{opt}}{\Longrightarrow}\left\langle \,\right\rangle ,E,S}}$\tabularnewline[3em]
& ${\displaystyle \frac{E(v)\notin\mathrm{dom}(S)\vee\mathrm{type}(S(E(v)))\neq T,\,\left(E(v),S\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{\mathtt{guard\_class}(v,T),E,S\overset{\mathrm{opt}}{\Longrightarrow}\mathrm{ops}::\left\langle \mathtt{guard\_class}(E\left(v\right),T)\right\rangle ,E,S^{\prime}}}$\tabularnewline[3em]
\emph{lifting} & ${\displaystyle \frac{v^{*}\notin\mathrm{dom}(S)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle \,\right\rangle ,S}}$\tabularnewline[3em]
- & ${\displaystyle \frac{v^{*}\in\mathrm{dom}(S),\,\left(v^{*},S\right)\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle v^{*}=\mathtt{new}\left(T\right)\right\rangle ::ops,S^{\prime}}}$\tabularnewline[3em]
- & ${\displaystyle \frac{\left(S\left(v^{*}\right)_{L},S\setminus\left\{ v^{*}\mapsto S\left(v^{*}\right)\right\} \right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{L},S^{\prime}\right),\,\left(S\left(v^{*}\right)_{R},S^{\prime}\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{R},S^{\prime\prime}\right)}{v^{*},S\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\mathrm{ops}_{L}::ops_{R}::\left\langle \mathtt{set}\left(v^{*},L,S\left(v^{*}\right)_{L}\right),\,\mathtt{set}\left(v^{*},R,S\left(v^{*}\right)_{R}\right)\right\rangle ,S^{\prime}}}$\tabularnewline[3em]
+ & ${\displaystyle \frac{v^{*}\in\mathrm{dom}(S),\,\left(v^{*},S\right)\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\left(\mathrm{ops},S^{\prime}\right)}{v^{*},S\overset{\mathrm{lift}}{\Longrightarrow}\left\langle v^{*}=\mathtt{new}\left(\mathrm{type}\left(S\left(v^{*}\right)\right)\right)\right\rangle ::ops,S^{\prime}}}$\tabularnewline[3em]
+ & ${\displaystyle \frac{\left(S\left(v^{*}\right)_{L},S\setminus\left\{ v^{*}\mapsto S\left(v^{*}\right)\right\} \right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{L},S^{\prime}\right),\,\left(S\left(v^{*}\right)_{R},S^{\prime}\right)\overset{\mathrm{lift}}{\Longrightarrow}\left(\mathrm{ops}_{R},S^{\prime\prime}\right)}{v^{*},S\overset{\mathrm{liftfields}}{=\!=\!\Longrightarrow}\mathrm{ops}_{L}::ops_{R}::\left\langle \mathtt{set}\left(v^{*},L,S\left(v^{*}\right)_{L}\right),\,\mathtt{set}\left(v^{*},R,S\left(v^{*}\right)_{R}\right)\right\rangle ,S^{\prime\prime}}}$\tabularnewline[3em]
\end{tabular}
\end{center}
@@ -958,7 +961,7 @@
\subsection{Analysis of the Algorithm}
\label{sub:analysis}
-While we do not offer a formal proof of it, it should be relatively clear
+While we do not offer a formal proof of it, it can argue informally
that the algorithm presented above is sound: it works by delaying (and
often completely removing) some operations. The algorithm runs in a
single pass over the list of operations. We can check that although
@@ -1014,7 +1017,7 @@
To evaluate our allocation removal algorithm, we look at the effectiveness when
used in the generated tracing JIT of PyPy's Python interpreter. This interpreter
is a full implementation of Python 2.5 language semantics and is about 30'000
-lines of code.
+lines of RPython code.
The
benchmarks we used are small-to-medium Python programs, some synthetic
@@ -1058,7 +1061,7 @@
CPython 2.6.6\footnote{\texttt{http://python.org}}, which uses a bytecode-based
interpreter. Furthermore we compared against
Psyco\cite{rigo_representation-based_2004} 1.6,
-an extension to CPython which is a
+a (hand-written) extension module to CPython which is a
just-in-time compiler that produces machine code at run-time. It is not based
on traces. Finally, we used two versions of PyPy's Python interpreter (revision
77823 of SVN trunk\footnote{\texttt{http://codespeak.net/svn/pypy/trunk}}): one
@@ -1178,17 +1181,17 @@
loop to only allocate it once, instead of every iteration. No details are given
for this optimization. The fact that the object is still allocated and needs to
be written to means that only the allocations are optimized away, but not the
-reads and writes out of/into the object.
+reads out of and writes into the object.
SPUR, a tracing JIT for C\# seems to be able to remove allocations in a similar
way to the approach described here, as hinted at in the technical report
-\cite{michael_bebenita_spur:_2010}. However, no details for the approach and its
+\cite{bebenita_spur:_2010}. However, no details for the approach and its
implementation are given.
Psyco \cite{rigo_representation-based_2004} is a (non-tracing) JIT for Python
that implements a more ad-hoc version of the allocation removal described here.
Our static objects could be related to what are called \emph{virtual} objects
-in Psyco. It is a hand-written extension module for CPython. Historically,
+in Psyco. Historically,
PyPy's JIT can be seen as some successor of Psyco for a general context (one of
the authors of this paper is the author of Psyco).
More information about the Pypy-commit
mailing list