[pypy-commit] extradoc extradoc: merge

fijal noreply at buildbot.pypy.org
Mon Oct 14 17:57:34 CEST 2013

Author: Maciej Fijalkowski <fijall at gmail.com>
Branch: extradoc
Changeset: r5073:8ae7445f165e
Date: 2013-10-14 17:57 +0200

Log:	merge

diff --git a/blog/draft/stm-oct2013.rst b/blog/draft/stm-oct2013.rst
new file mode 100644
--- /dev/null
+++ b/blog/draft/stm-oct2013.rst
@@ -0,0 +1,78 @@
+Update on STM
+Hi all,
+the sprint in London was a lot of fun and very fruitful. In the last
+update on STM, Armin was working on improving and specializing the 
+automatic barrier placement.
+There is still a lot to do in that area, but that work was merged and
+lowered the overhead of STM over non-STM to around **XXX**. The same
+improvement has still to be done in the JIT.
+But that is not all. Right after the sprint, we were able to squeeze
+the last obvious bugs in the STM-JIT combination. However, the performance
+was nowhere near to what we want. So until now, we fixed some of the most
+obvious issues. Many come from RPython erring on the side of caution
+and e.g. making a transaction inevitable even if that is not strictly
+necessary, thereby limiting parallelism.
+**XXX any interesting details? transaction breaks maybe? guard counters?**
+There are still many performance issues of various complexity left
+to tackle. So stay tuned or contribute :)
+Now, since the JIT is all about performance, we want to at least 
+show you some numbers that are indicative of things to come.
+Our set of STM benchmarks is very small unfortunately 
+(something you can help us out with), so this is 
+not representative of real-world performance. We tried to
+minimize the effect of JIT warm-up in the benchmark results.
+**Raytracer** from `stm-benchmarks <https://bitbucket.org/Raemi/stm-benchmarks/src>`_:
+Render times in seconds for a 1024x1024 image:
+| Interpreter | Base time: 1 thread  | 8 threads         |
+| PyPy-2.1    |    2.47              |     2.56          |
+| CPython     |    81.1              |     73.4          |
+| PyPy-STM    |    50.2              |     10.8          |
+For comparison, disabling the JIT gives 148ms on PyPy-2.1 and 87ms on
+PyPy-STM (with 8 threads).
+**Richards** from `PyPy repository on the stmgc-c4
+branch <https://bitbucket.org/pypy/pypy/commits/branch/stmgc-c4>`_:
+Average time per iteration in milliseconds using 8 threads:
+| Interpreter | Base time: 1 thread  | 8 threads         |
+| PyPy-2.1    |   15.6               |  15.4             |
+| CPython     |   239                |  237              |
+| PyPy-STM    |   371                |  116              |
+For comparison, disabling the JIT gives 492ms on PyPy-2.1 and 538ms on
+All this can be found in the `PyPy repository on the stmgc-c4
+branch <https://bitbucket.org/pypy/pypy/commits/branch/stmgc-c4>`_.
+Try it for yourself, but keep in mind that this is still experimental
+with a lot of things yet to come.
+You can also download a prebuilt binary from here: **XXX**
+As a summary, what the numbers tell us is that PyPy-STM is, as expected,
+the only of the three interpreters where multithreading gives a large
+improvement in speed.  What they also tell us is that, obviously, the
+result is not good enough *yet:* it still takes longer on a 8-threaded
+PyPy-STM than on a regular single-threaded PyPy-2.1.  As you should know
+by now, we are good at promizing speed and delivering it years later.
+It has been two years already since PyPy-STM started, so we're in the
+fast-progressing step right now :-)
diff --git a/blog/draft/stm-sept2013.rst b/blog/draft/stm-sept2013.rst
deleted file mode 100644
--- a/blog/draft/stm-sept2013.rst
+++ /dev/null
@@ -1,52 +0,0 @@
-Update on STM
-Hi all,
-the sprint in London was a lot of fun and very fruitful. In the last
-update on STM, Armin was working on improving and specializing the 
-automatic barrier placement.
-There is still a lot to do in that area, but that work was merged and
-lowered the overhead of STM over non-STM to around **XXX**. The same
-improvement has still to be done in the JIT.
-But that is not all. Right after the sprint, we were able to squeeze
-the last obvious bugs in the STM-JIT combination. However, the performance
-was nowhere near what we want. So until now, we fixed some of the most
-obvious issues. Many come from RPython erring on the side of caution
-and e.g. making a transaction inevitable even if that is not strictly
-necessary, thereby limiting parallelism.
-**XXX any interesting details?**
-There are still many performance issues of various complexity left
-to tackle. So stay tuned or contribute :)
-Now, since the JIT is all about performance, we want to at least 
-show you some numbers that are indicative of things to come.
-Our set of STM benchmarks is very small unfortunately 
-(something you can help us out with), so this is 
-not representative of real-world performance.
-**Raytracer** from `stm-benchmarks <https://bitbucket.org/Raemi/stm-benchmarks/src>`_:
-Render times for a 1024x1024 image using 6 threads
-| Interpeter  | Time (no-JIT / JIT)  |
-| PyPy-2.1    | ... / ...            |
-| CPython     | ... / -              |
-| PyPy-STM    | ... / ...            |
-**XXX same for Richards**
-All this can be found in the `PyPy repository on the stmgc-c4
-branch <https://bitbucket.org/pypy/pypy/commits/branch/stmgc-c4>`_.
-Try it for yourself, but keep in mind that this is still experimental
-with a lot of things yet to come.
-You can also download a prebuilt binary frome here: **XXX**
diff --git a/planning/jit.txt b/planning/jit.txt
--- a/planning/jit.txt
+++ b/planning/jit.txt
@@ -45,9 +45,6 @@
   (SETINTERIORFIELD, GETINTERIORFIELD). This is needed for the previous item to
   fully work.
-- {}.update({}) is not fully unrolled and constant folded because HeapCache
-  loses track of values in virtual-to-virtual ARRAY_COPY calls.
 - ovfcheck(a << b) will do ``result >> b`` and check that the result is equal
   to ``a``, instead of looking at the x86 flags.
diff --git a/talk/pyconza2013/Makefile b/talk/pyconza2013/Makefile
--- a/talk/pyconza2013/Makefile
+++ b/talk/pyconza2013/Makefile
@@ -1,13 +1,13 @@
 view: talk.pdf
-	xpdf talk.pdf
+	evince talk.pdf
 talk.pdf: talk.tex
 	64bit pdflatex talk.tex
-talk.tex: talk1.tex fix.py
-	python fix.py < talk1.tex > talk.tex
+talk.tex: talk.rst
+	rst2beamer --stylesheet=stylesheet.latex --documentoptions=14pt --input-encoding=utf8 --output-encoding=utf8 --overlaybullets=false $< > talk.tex
-talk1.tex: talk.rst
-	rst2beamer $< > talk1.tex
+	rm -f talk.tex talk.pdf
diff --git a/talk/pyconza2013/stylesheet.latex b/talk/pyconza2013/stylesheet.latex
new file mode 100644
--- /dev/null
+++ b/talk/pyconza2013/stylesheet.latex
@@ -0,0 +1,10 @@
+\definecolor{darkgreen}{rgb}{0, 0.5, 0.0}
+\addtobeamertemplate{block begin}{}{\setlength{\parskip}{35pt plus 1pt minus 1pt}}
diff --git a/talk/pyconza2013/talk.pdf b/talk/pyconza2013/talk.pdf
index 6fed83a5c845e1d71cd4c32a98eb6a6b93d07bcf..fec69aacfbd0fc9af5c9c60eb65501eed188fc5a
GIT binary patch


diff --git a/talk/pyconza2013/talk.rst b/talk/pyconza2013/talk.rst
--- a/talk/pyconza2013/talk.rst
+++ b/talk/pyconza2013/talk.rst
@@ -1,25 +1,25 @@
 .. include:: beamerdefs.txt
-Software Transactional Memory with PyPy
+.. raw:: latex
+   \title{Software Transactional Memory with PyPy}
+   \author[arigo]{Armin Rigo}
-Software Transactional Memory with PyPy
+   \institute{PyCon ZA 2013}
+   \date{4th October 2013}
-* PyCon ZA 2013
-* talk by Armin Rigo
-* sponsored by crowdfunding (thanks!)
+   \maketitle
+* me: Armin Rigo
 * what is PyPy: an alternative implementation of Python
+* very compatible
 * main focus is on speed
@@ -27,13 +27,21 @@
 .. image:: speed.png
-   :scale: 65%
+   :scale: 67%
    :align: center
 SQL by example
+.. raw:: latex
+   %empty
+SQL by example
@@ -58,6 +66,27 @@
+    ...
+    obj.value += 1
+    ...
+Python by example
+    ...
+    x = obj.value
+    obj.value = x + 1
+    ...
+Python by example
     x = obj.value
     obj.value = x + 1
@@ -100,10 +129,10 @@
-    SELECT * FROM ...;    SELECT * FROM ...;    SELEC..
-    UPDATE ...;           UPDATE ...;           UPDAT..
-    COMMIT;               COMMIT;               COMMI..
+    SELECT * FROM ...;  SELECT * FROM ...;    SELEC..
+    UPDATE ...;         UPDATE ...;           UPDAT..
+    COMMIT;             COMMIT;               COMMI..
 Locks != Transactions
@@ -111,9 +140,9 @@
-    with the_lock:        with the_lock:        with ..
-      x = obj.val           x = obj.val           x =..
-      obj.val = x + 1       obj.val = x + 1       obj..
+    with the_lock:     with the_lock:        with ..
+      x = obj.val        x = obj.val           x =..
+      obj.val = x + 1    obj.val = x + 1       obj..
 Locks != Transactions
@@ -121,9 +150,9 @@
-    with atomic:          with atomic:          with ..
-      x = obj.val           x = obj.val           x =..
-      obj.val = x + 1       obj.val = x + 1       obj..
+    with atomic:       with atomic:          with ..
+      x = obj.val        x = obj.val           x =..
+      obj.val = x + 1    obj.val = x + 1       obj..
@@ -134,14 +163,46 @@
 * advanced but not magic (same as databases)
-STM versus HTM
+By the way
-* Software versus Hardware
+* STM replaces the GIL (Global Interpreter Lock)
-* CPU hardware specially to avoid the high overhead
+* any existing multithreaded program runs on multiple cores
-* too limited for now
+By the way
+* the GIL is necessary and very hard to avoid,
+  but if you look at it like a lock around every single
+  subexpression, then it can be replaced with `with atomic` too
+* yes, any existing multithreaded program runs on multiple cores
+* yes, we solved the GIL
+* great
+* no, it would be quite hard to implement it in standard CPython
+* too bad for now, only in PyPy
+* but it would not be completely impossible
+* but only half of the story in my opinion `:-)`
 Example 1
@@ -149,11 +210,13 @@
-  def apply_interest_rate(self):
+  def apply_interest(self):
      self.balance *= 1.05
   for account in all_accounts:
-     account.apply_interest_rate()
+     account.apply_interest()
+                                                 .
 Example 1
@@ -161,12 +224,27 @@
-  def apply_interest_rate(self):
+  def apply_interest(self):
      self.balance *= 1.05
   for account in all_accounts:
-     add_task(account.apply_interest_rate)
-  run_tasks()
+     account.apply_interest()
+     ^^^ run this loop multithreaded
+Example 1
+  def apply_interest(self):
+     #with atomic: --- automatic
+        self.balance *= 1.05
+  for account in all_accounts:
+     add_task(account.apply_interest)
+  run_all_tasks()
@@ -178,6 +256,8 @@
 * uses threads, but internally only
+* very simple, pure Python
 Example 2
@@ -187,7 +267,7 @@
   def next_iteration(all_trains):
      for train in all_trains:
         start_time = ...
-        for othertrain in train.dependencies:
+        for othertrain in train.deps:
            if ...:
               start_time = ...
         train.start_time = start_time
@@ -215,37 +295,29 @@
 * but with `objects` instead of `records`
-* the transaction aborts and automatically retries
+* the transaction aborts and retries automatically
-* means "unavoidable"
+* "inevitable" (means "unavoidable")
 * handles I/O in a `with atomic`
 * cannot abort the transaction any more
-By the way
-* STM replaces the GIL
-* any existing multithreaded program runs on multiple cores
 Current status
 * basics work, JIT compiler integration almost done
-* different executable called `pypy-stm`
+* different executable (`pypy-stm` instead of `pypy`)
 * slow-down: around 3x (in bad cases up to 10x)
-* speed-ups measured with 4 cores
+* real time speed-ups measured with 4 or 8 cores
 * Linux 64-bit only
@@ -258,9 +330,11 @@
     Detected conflict:
+      File "foo.py", line 58, in wtree
+        walk(root)
       File "foo.py", line 17, in walk
         if node.left not in seen:
-    Transaction aborted, 0.000047 seconds lost
+    Transaction aborted, 0.047 sec lost
 User feedback
@@ -273,11 +347,11 @@
     Forced inevitable:
       File "foo.py", line 19, in walk
         print >> log, logentry
-    Transaction blocked others for 0.xx seconds
+    Transaction blocked others for XX s
-Async libraries
+Asynchronous libraries
 * future work
@@ -287,11 +361,11 @@
 * existing Twisted apps still work, but we need to
   look at conflicts/inevitables
-* similar with Tornado, gevent, and so on
+* similar with Tornado, eventlib, and so on
-Async libraries
+Asynchronous libraries
@@ -318,6 +392,16 @@
 * reduce slow-down, port to other OS'es
+STM versus HTM
+* Software versus Hardware
+* CPU hardware specially to avoid the high overhead (Intel Haswell processor)
+* too limited for now
 Under the cover
@@ -329,8 +413,8 @@
 * the most recent version can belong to one thread
-* synchronization only when a thread "steals" another thread's most
-  recent version, to make it shared
+* synchronization only at the point where one thread "steals"
+  another thread's most recent version, to make it shared
 * integrated with a generational garbage collector, with one
   nursery per thread
@@ -345,4 +429,8 @@
 * a small change for Python users
+* (and the GIL is gone)
+* this work is sponsored by crownfunding (thanks!)
 * `Q & A`

More information about the pypy-commit mailing list