[pypy-commit] extradoc extradoc: more text in the performance section

Thu May 29 11:00:34 CEST 2014

Author: Remi Meier <remi.meier at inf.ethz.ch>
Branch: extradoc
Changeset: r5280:b1b187b808d6
Date: 2014-05-29 11:00 +0200
http://bitbucket.org/pypy/extradoc/changeset/b1b187b808d6/

Log:	more text in the performance section

diff --git a/talk/dls2014/paper/paper.tex b/talk/dls2014/paper/paper.tex
--- a/talk/dls2014/paper/paper.tex
+++ b/talk/dls2014/paper/paper.tex
@@ -1091,6 +1091,30 @@
 
 \subsection{Performance Benchmarks\label{sec:performance-bench}}
 
+In this section we want to look at the general performance of our
+system. As explained above, the JIT does not simply speed up the
+performance by a certain factor. In order to better understand the
+behaviour of our system, we therefore split this section into two
+parts. We first look at how it behaves without the JIT, and then with
+the JIT. We look at six benchmarks:
+
+\begin{itemize}
+\item \emph{btree} and \emph{skiplist}, which are both inserting,
+  removing, and finding elements in a data structure
+\item \emph{threadworms}, which simulates worms walking on a grid in
+  parallel and checking for collisions with each other
+\item \emph{mandelbrot}, \emph{raytrace}, and \emph{richards}, which
+  all perform simple, independent computations in parallel
+\end{itemize}
+
+We use coarse-grained locking for the first three benchmarks and no
+locking for the last three. This is important because Jython, which
+uses fine-grained locking instead of a GIL, is only expected to scale
+with the number of threads for the latter group. It is not able to
+scale when using coarse-grained locking. STM, however, uses atomic
+blocks instead, so it may still be able to scale since they are
+implemented as simple transactions.
+
 % To isolate factors we look at performance w/o JIT and perf w JIT.
 % w/o JIT:
 %  - it scales
@@ -1101,13 +1125,27 @@
 %  - changed working set size because generally ~50x faster
 %  - competing w our own JIT on non-stm a challenge
 %  - gil scales negatively
-\remi{For performance we first look at no-JIT behaviour of STM. Since
-we cannot compete well even with CPython, we later show JIT benchmarks
-where we see the unstable performance but also that we can still scale.
-(with more work we can use our STM system to parallelise jitted code
-too)} See figure \ref{fig:performance-nojit}
 
-% TODO: pypy-nostm, Jython?
+\paragraph{Non-JIT benchmarks:} First we run our benchmarks on four
+different interpreters: Jython (fine-grained locking), CPython (GIL),
+and PyPy with STM and with the GIL (both without the JIT). The results
+are shown in \ref{fig:performance-nojit}.
+
+As expected, all interpreters with a GIL do not scale with the number
+of threads. They even become slower because of the overhead of
+thread-switching and GIL handling. We also see Jython scale when we
+expect it to (mandelbrot, raytrace, richards), and behave similar to
+the GIL interpreters in the other cases.
+
+PyPy using our STM system (pypy-stm-nojit) scales in all benchmarks to
+a certain degree. We see that the average overhead from switching from
+GIL to STM is \remi{$35.5\%$}, the maximum in richards is
+\remi{$63\%$}. pypy-stm-nojit beats pypy-nojit already on two threads;
+however, it never even beats CPython, the reference implementation of
+Python. This means that without the JIT, our performance is not
+competitive. We now look at how well our system works when we enable
+the JIT.
+
 \begin{figure}[h]
   \centering
   \includegraphics[width=1\columnwidth]{plots/performance_nojit.pdf}
@@ -1115,11 +1153,20 @@
 \end{figure}
 
 
-% TODO: Jython, compare to cpython? or just jython as common baseline with no-jit?
-\remi{Some benchmarks (figure \ref{fig:performance-jit} with enabled
-JIT show that we can be competitive with the other solutions. It also
-shows that more work is needed in that area to make performance more
-stable.}
+\paragraph{JIT benchmarks:} The speedups from enabling the JIT in
+these benchmarks range from $10-50\times$. This is why we had to do
+without Jython and CPython here, since they would be much further up
+in the plots. Also, in order to get more stable results, we increased
+the input size of all benchmarks to get reasonable execution times.
+
+The results are shown in \ref{fig:performance-nojit}. We see that the
+performance is much less stable. There is certainly more work required
+in this area. In general, we see that the group of non-locked
+benchmarks certainly scales best. The other three scale barely or not
+at all with the number of threads. The slowdown factor from GIL to STM
+ranges around \remi{$1-2.4\times$} and we beat GIL performance in half
+of the benchmarks.
+
 
 \begin{figure}[h]
   \centering
@@ -1127,6 +1174,14 @@
   \caption{Comparing runtime between interpreters with JIT\label{fig:performance-jit}}
 \end{figure}
 
+
+Overall PyPy needs the JIT in order for its performance to be
+competitive.  It would be interesting to see how using our STM system
+in CPython would perform, but it is a lot of work. On its own, our
+system scales well so we hope to also see that with the JIT in the
+future.
+
+
 \section{Related Work}
 
 
diff --git a/talk/dls2014/paper/plots/performance.pdf b/talk/dls2014/paper/plots/performance.pdf
index 77bf2dcfa24532e9526c848ceba949109221dd2a..91fb1a124c3de207c8b7a0781e2869c04d295d85
GIT binary patch

[cut]

diff --git a/talk/dls2014/paper/plots/performance_nojit.pdf b/talk/dls2014/paper/plots/performance_nojit.pdf
index 3f0d4de4954fcf4899ec81cc72cfac591c344941..d70024b9d2b9fa5ecc474a6fdf12f8aec8a29d36
GIT binary patch

[cut]

diff --git a/talk/dls2014/paper/plots/plot_performance.py b/talk/dls2014/paper/plots/plot_performance.py
--- a/talk/dls2014/paper/plots/plot_performance.py
+++ b/talk/dls2014/paper/plots/plot_performance.py
@@ -32,7 +32,7 @@
 
 
 interps_styles = {
-    "pypy-stm-jit": {'fmt':'r-'},
+    "pypy-stm-jit": {'fmt':'r-', 'linewidth':2},
     "pypy-jit": {'fmt':'b', 'dashes':(1,1)},
     "jython": {'fmt':'m', 'dashes':(2, 5)},
     "best": {'fmt':"k:"}        # only fmt allowed
@@ -161,8 +161,8 @@
             if interp not in legend:
                 legend[interp] = artist
 
-        legend["best"], = ax.plot(ts, [best_y] * len(ts),
-                                  interps_styles["best"]['fmt'])
+        # legend["best"], = ax.plot(ts, [best_y] * len(ts),
+        #                           interps_styles["best"]['fmt'])
 
         if i // w == h-1:
             ax.set_xlim(0, 5)
@@ -174,7 +174,7 @@
 
     return axs[w*(h-1)].legend(tuple(legend.values()), tuple(legend.keys()),
                                ncol=4,
-                               loc=(0,-0.4))
+                               loc=(-0.15,-0.5))
 
 
 def main():
diff --git a/talk/dls2014/paper/plots/plot_performance_nojit.py b/talk/dls2014/paper/plots/plot_performance_nojit.py
--- a/talk/dls2014/paper/plots/plot_performance_nojit.py
+++ b/talk/dls2014/paper/plots/plot_performance_nojit.py
@@ -30,8 +30,9 @@
 
 
 interps_styles = {
-    "pypy-stm-nojit": {'fmt':'r-'},
+    "pypy-stm-nojit": {'fmt':'r-', 'linewidth':2},
     "cpython": {'fmt':'b', 'dashes':(1,1)},
+    "pypy-nojit": {'fmt':'g', 'dashes':(5, 2)},
     "jython": {'fmt':'m', 'dashes':(2, 5)},
     "best": {'fmt':"k:"}        # only fmt allowed
 }
@@ -53,10 +54,16 @@
             [2.84]
         ],
         "jython":[
-            [2.74,2.75],
-            [2.9,3.1,3.0],
-            [2.89,3.01,2.95],
-            [3.0,2.99,2.97]
+            [2.95,2.95,2.96],
+            [1.65,1.68,1.54],
+            [1.2,1.15,1.3,1.3],
+            [1.09,0.9,0.97,0.99,1.03]
+        ],
+        "pypy-nojit":[
+            [5.5,5.7,5.8],
+            [7,6.97],
+            [6.68,6.77],
+            [6.4,6.4]
         ]},
 
     "btree":{
@@ -77,6 +84,12 @@
             [2.60,2.46,2.6],
             [2.56,2.6,2.51],
             [2.57,2.52,2.48]
+        ],
+        "pypy-nojit":[
+            [6.63,6.73],
+            [10.6,10.5],
+            [11.4,11.4],
+            [12.0,12.3]
         ]},
 
     "skiplist":{
@@ -97,6 +110,12 @@
             [1.8,1.77,1.81],
             [1.81,1.79,1.88],
             [1.99,1.92,1.74,1.84]
+        ],
+        "pypy-nojit":[
+            [4.9,4.8,4.6,4.7],
+            [6.87,7.53,6.64],
+            [7.74,7.3,7.35],
+            [7.38,7.28,7.31,7.54]
         ]},
 
     "threadworms":{
@@ -117,6 +136,12 @@
             [3.0,2.87,3.3,3.1],
             [3.35,3.22,3.19],
             [3.19,3.37,3.26,3.36]
+        ],
+        "pypy-nojit":[
+            [4.49,4.36],
+            [7.86,7.81],
+            [8.76,8.73],
+            [9.23,9.27]
         ]},
 
     "mandelbrot":{
@@ -137,11 +162,17 @@
             [2.84,3,2.8,2.96],
             [2.13,2.03,2.04,2.11],
             [1.8,1.74,1.8,1.88]
+        ],
+        "pypy-nojit":[
+            [3.67,3.54],
+            [4.53,4.82,4.75],
+            [4.14,4.23],
+            [4.38,4.23]
         ]},
 
     "richards":{
         "pypy-stm-nojit":[
-            [11.2],
+            [10.7],
             [6.1],
             [5.4,4.9],
             [4.8,4.9,5]
@@ -157,9 +188,23 @@
             [2.32,1.95,2.18],
             [1.86,1.66],
             [1.49,1.63,1.59]
+        ],
+        "pypy-nojit":[
+            [6.6,6.5],
+            [7.98,7.98],
+            [7.56,7.33],
+            [7.05,7.28]
         ]}
 }
 
+import numpy as np
+sls = []
+for bench_name, interps in benchs.items():
+    slowdown = np.mean(interps["pypy-stm-nojit"][0]) / np.mean(interps["pypy-nojit"][0])
+    print "overhead", bench_name, ":", slowdown
+    sls.append(slowdown)
+print "avg,max slowdown of STM", np.mean(sls), np.max(sls)
+