[pypy-commit] extradoc extradoc: calculate the percentage of guards responsible for 50% of the guard failures for the different benchmarks

Thu Aug 16 14:35:32 CEST 2012

Author: David Schneider <david.schneider at picle.org>
Branch: extradoc
Changeset: r4611:70ba43f0646e
Date: 2012-08-16 14:34 +0200
http://bitbucket.org/pypy/extradoc/changeset/70ba43f0646e/

Log:	calculate the percentage of guards responsible for 50% of the guard
	failures for the different benchmarks

diff --git a/talk/vmil2012/paper.tex b/talk/vmil2012/paper.tex
--- a/talk/vmil2012/paper.tex
+++ b/talk/vmil2012/paper.tex
@@ -145,8 +145,8 @@
 frequency and the overhead related to guards.
 
 It is important to handle guards well, because they are very common operations
-in the traces produced by tracing JITs. As can be seen in
-Figure~\ref{fig:guard_percent} guards account for about 14\% to 22\% of the
+in the traces produced by tracing JITs. As we will see later (Figure~\ref{fig:benchmarks})
+guards account for about 14\% to 22\% of the
 operations before and for about 15\% to 20\% of the operations after optimizing
 the traces generated for the different benchmarks used in this paper. An
 additional property is that guard failure rates are very uneven. The majority
@@ -180,12 +180,6 @@
   and low-level components of the JIT and describe the rationale behind the design
 \end{itemize}
 
-\begin{figure}
-    \include{figures/guard_table}
-    \caption{Percentage of guards before and after optimization for different benchmarks}
-    \label{fig:guard_percent}
-\end{figure}
-
 The set of central concepts upon which this work is based are described in
 Section~\ref{sec:Background}, such as the PyPy project, the RPython language
 and its meta-tracing JIT. Based on these concepts in Section~\ref{sec:Resume
@@ -764,7 +758,7 @@
     \include{figures/backend_table}
     \caption{Total size of generated machine code and resume data}
     \label{fig:backend_data}
-\end{figure}e.
+\end{figure}
 
 Why the efficient storing of the resume data is a central concern in the design
 of guards is illustrated by Figure~\ref{fig:resume_data_sizes}. This figure shows
@@ -800,7 +794,7 @@
 
 \begin{figure}
     \include{figures/failing_guards_table}
-    \caption{Failing guards relative to the total number of guards}
+    \caption{Failing guards, guards with more than 200 failures and guards responsible for 50\% of the failures relative to the total number of guards}
     \label{fig:failing_guards}
 \end{figure}
 
@@ -809,7 +803,10 @@
 2.4\% and 5.7\% of all guards. As can be expected, even fewer guards fail often
 enough that a bridge is compiled for them, only 1.2\% to 3.6\% of all guards
 fail often enough that a bridge is compiled. Also, of all failing guards a few fail extremely often
-and most fail rarely. The results emphasize that as most of the guards never
+and most fail rarely. Reinforcing this notion the figure shows that, depending on the
+benchmark, between 0.008\% and 0.225\% of the guards are responsible for 50\%
+of the total guards failures.
+These results emphasize that as most of the guards never
 fail it is important to make sure that the successful execution of a guard does
 not have unnecessary overhead.
 
diff --git a/talk/vmil2012/tool/build_tables.py b/talk/vmil2012/tool/build_tables.py
--- a/talk/vmil2012/tool/build_tables.py
+++ b/talk/vmil2012/tool/build_tables.py
@@ -26,22 +26,40 @@
 
     table = []
     head = ['Benchmark',
-            'Failing guards',
-            'Over %d failures' % BRIDGE_THRESHOLD]
+            'Failing',
+            '> %d failures' % BRIDGE_THRESHOLD,
+            '50\% of failures']
 
     for bench, info in failures.iteritems():
-        total = failures[bench]['nguards']
+        total = info['nguards']
         total_failures = len(info['results'])
         bridges = len([k for k,v in info['results'].iteritems() \
                                             if v > BRIDGE_THRESHOLD])
         res = [bench.replace('_', '\\_'),
                 "%.1f\\%%" % (100 * total_failures/total),
                 "%.1f\\%%" % (100 * bridges/total),
+                "%.3f\\%%"  % (100 * we_are_99_percent(info)),
         ]
         table.append(res)
     output = render_table(template, head, sorted(table))
     write_table(output, texfile)
 
+def we_are_50_percent(info):
+    total_guards = info['nguards']
+    failure_counts = info['results'].values()
+    print failure_counts
+    failure_counts.sort()
+    print failure_counts
+    failure_counts.reverse()
+    print failure_counts
+
+    total_failures = sum(failure_counts)
+    current_sum = 0
+    for i, f in enumerate(failure_counts):
+        current_sum += f
+        if current_sum > total_failures * 0.50:
+            return (i + 1)/total_guards
+    return -1
 
 def build_resume_data_table(csvfiles, texfile, template):
     assert len(csvfiles) == 1