[pypy-commit] extradoc extradoc: draft

Wed Oct 29 15:08:37 CET 2014

Author: Maciej Fijalkowski <fijall at gmail.com>
Branch: extradoc
Changeset: r5449:e015701f8bee
Date: 2014-10-29 15:07 +0100
http://bitbucket.org/pypy/extradoc/changeset/e015701f8bee/

Log:	draft

diff --git a/blog/draft/io-improvements.rst b/blog/draft/io-improvements.rst
new file mode 100644
--- /dev/null
+++ b/blog/draft/io-improvements.rst
@@ -0,0 +1,45 @@
+
+Hello everyone!
+
+We're about to wrap up the Warsaw sprint, so I would like to describe some
+branches we merged before or during the sprint. This blog post describes
+two branches, one with IO improvements the other one with GC improvements.
+
+The first one was a branch started by Wenzhu Man during the summer of code
+and finished by Maciej Fijalkowski and Armin Rigo about not zeroing the nursery.
+The way PyPy GC works is that it allocates new objects in the young object
+area (the nursery) using bump pointer generation. To simplify things we
+used to zero the nursery beforehand, because all the GC references can't
+point to random memory. This both affects cache, since you zero a large
+memory at once and does unnecessary work for things that don't require zeroing
+like large strings. We somehow mitigated the first problem with incremental
+nursery zeroing, but this branch removes the zeroing completely, thus
+improving the string handling and recursive code (since jitframes don't
+requires zeroed memory either). I run the effect on three examples, one
+`doing IO`_ in a loop, second one running famous `fibonacci`_ recursively,
+which I would argue is a good fit this one time and the last one running
+`gcbench`_. The results for fibonacci and gcbench are below
+(normalized to cpython 2.7). Benchmarks were run 50 times each:
+
+XXXX
+
+The second branch was done by Gregor Wegberg for his master thesis and finished
+by Maciej Fijalkowski and Armin Rigo. Since in PyPy objects can move in memory,
+PyPy 2.4 solves the problem by copying a buffer before calling read or write.
+This is obviously inefficient. The branch "pins" the objects for a short period
+of time, by making sure they can't move. This introduces slight complexity
+in the garbage collector, where bump pointer allocator needs to "jump over"
+pinned objects, but improves the IO quite drastically. In this benchmark
+we either write a number of bytes from a freshly allocated string into
+/dev/null or read a number of bytes from /dev/full. I'm showing the results
+for PyPy 2.4, PyPy with non-zero-nursery and PyPy with non-zero-nursery and
+object pinning. Those are wall times for cases using ``os.read/os.write``
+and ``file.read/file.write``, normalized against CPython 2.7.
+
+Benchmarks were done using PyPy 2.4 and revisions ``85646d1d07fb`` for
+non-zero-nursery and ``3d8fe96dc4d9`` for non-zero-nursery and pinning.
+The benchmarks were run once, since the standard deviation was small.
+
+XXXX
+
+XXX summary
diff --git a/talk/pyconpl-2014/benchmarks/fib.py b/talk/pyconpl-2014/benchmarks/fib.py
--- a/talk/pyconpl-2014/benchmarks/fib.py
+++ b/talk/pyconpl-2014/benchmarks/fib.py
@@ -1,7 +1,11 @@
 
 import time
 import numpy
-from matplotlib import pylab
+try:
+    from matplotlib import pylab
+except:
+    from embed.emb import import_mod
+    pylab = import_mod('matplotlib.pylab')
 
 def fib(n):
     if n == 0 or n == 1:
@@ -21,7 +25,7 @@
 
 hist, bins = numpy.histogram(times, 20)
 #pylab.plot(bins[:-1], hist)
-pylab.ylim(ymin=0, ymax=max(times) * 1.2)
-pylab.plot(times)
+pylab.ylim(0, max(times) * 1.2)
+pylab.plot(numpy.array(times))
 #pylab.hist(hist, bins, histtype='bar')
 pylab.show()
diff --git a/talk/pyconpl-2014/benchmarks/talk.rst b/talk/pyconpl-2014/benchmarks/talk.rst
--- a/talk/pyconpl-2014/benchmarks/talk.rst
+++ b/talk/pyconpl-2014/benchmarks/talk.rst
@@ -1,3 +1,5 @@
+.. include:: ../beamerdefs.txt
+
 ---------------------
 How to benchmark code
 ---------------------
@@ -5,7 +7,11 @@
 Who are we?
 ------------
 
-xxx
+* Maciej Fijalkowski, Armin Rigo
+
+* working on PyPy
+
+* interested in performance
 
 What is this talk is about?
 ---------------------------
@@ -76,3 +82,82 @@
 |pause|
 
 * not ideal at all
+
+Writing benchmarks - typical approach
+-------------------------------------
+
+* write a set of small programs that exercise one particular thing
+
+  * recursive fibonacci
+
+  * pybench
+
+PyBench
+-------
+
+* used to be a tool to compare python implementations
+
+* only uses microbenchmarks
+
+* assumes operation times are concatenative
+
+Problems
+--------
+
+* a lot of effects are not concatenative
+
+* optimizations often collapse consecutive operations
+
+* large scale effects only show up on large programs
+
+An example
+----------
+
+* python 2.6 vs python 2.7 had minimal performance changes
+
+* somewhere in the changelog, there is a gc change mentioned
+
+* it made pypy translation toolchain jump from 3h to 1h
+
+* it's "impossible" to write a microbenchmarks for this
+
+More problems
+-------------
+
+* half of the blog posts comparing VM performance uses recursive fibonacci
+
+* most of the others use computer language shootout
+
+PyPy benchmark suite
+--------------------
+
+* programs from small to medium and large
+
+* 50 LOC to 100k LOC
+
+* try to exercise various parts of language (but e.g. lack IO)
+
+Solutions
+---------
+
+* measure what you are really interested in
+
+* derive microbenchmarks from your bottlenecks
+
+* be skeptical
+
+* understand what you're measuring
+
+Q&A
+---
+
+- http://pypy.org/
+
+- http://morepypy.blogspot.com/
+
+- http://baroquesoftware.com/
+
+- ``#pypy`` at freenode.net
+
+- Any question?
+