[pypy-commit] pypy stmgc-c7: hg merge release-2.5.x (i.e. 2.5.1)

Thu Mar 26 16:57:11 CET 2015

Author: Armin Rigo <arigo at tunes.org>
Branch: stmgc-c7
Changeset: r76578:a20040a72e73
Date: 2015-03-26 16:55 +0100
http://bitbucket.org/pypy/pypy/changeset/a20040a72e73/

Log:	hg merge release-2.5.x (i.e. 2.5.1)

diff too long, truncating to 2000 out of 2301 lines

diff --git a/.hgtags b/.hgtags
--- a/.hgtags
+++ b/.hgtags
@@ -11,3 +11,8 @@
 32f35069a16d819b58c1b6efb17c44e3e53397b2 release-2.2=3.1
 32f35069a16d819b58c1b6efb17c44e3e53397b2 release-2.3.1
 10f1b29a2bd21f837090286174a9ca030b8680b2 release-2.5.0
+8e24dac0b8e2db30d46d59f2c4daa3d4aaab7861 release-2.5.1
+8e24dac0b8e2db30d46d59f2c4daa3d4aaab7861 release-2.5.1
+0000000000000000000000000000000000000000 release-2.5.1
+0000000000000000000000000000000000000000 release-2.5.1
+e3d046c43451403f5969580fc1c41d5df6c4082a release-2.5.1
diff --git a/pypy/doc/release-2.5.1.rst b/pypy/doc/release-2.5.1.rst
--- a/pypy/doc/release-2.5.1.rst
+++ b/pypy/doc/release-2.5.1.rst
@@ -67,7 +67,19 @@
   `PyPy documentation`_  and we now have seperate `RPython documentation`_.
   Tell us what still isn't clear, or even better help us improve the documentation.
 
-* We merged version 2.7.9 of python's stdlib
+* We merged version 2.7.9 of python's stdlib. From the python release notice:
+
+  * The entirety of Python 3.4's `ssl module`_ has been backported. 
+    See `PEP 466`_ for justification.
+
+  * HTTPS certificate validation using the system's certificate store is now
+    enabled by default. See `PEP 476`_ for details.
+
+  * SSLv3 has been disabled by default in httplib and its reverse dependencies
+    due to the `POODLE attack`_.
+
+  * The `ensurepip module`_ has been backported, which provides the pip
+    package manager in every Python 2.7 installation. See `PEP 477`_.
 
 * The garbage collector now ignores parts of the stack which did not change
   since the last collection, another performance boost
@@ -84,6 +96,12 @@
 
 .. _`PyPy documentation`: http://doc.pypy.org
 .. _`RPython documentation`: http://rpython.readthedocs.org
+.. _`ssl module`: https://docs.python.org/3/library/ssl.html
+.. _`PEP 466`: https://www.python.org/dev/peps/pep-0466
+.. _`PEP 476`: https://www.python.org/dev/peps/pep-0476
+.. _`PEP 477`: https://www.python.org/dev/peps/pep-0477
+.. _`POODLE attack`: https://www.imperialviolet.org/2014/10/14/poodle.html
+.. _`ensurepip module`: https://docs.python.org/2/library/ensurepip.html
 .. _resolved: http://doc.pypy.org/en/latest/whatsnew-2.5.1.html
 
 Please try it out and let us know what you think. We welcome
diff --git a/pypy/doc/stm.rst b/pypy/doc/stm.rst
--- a/pypy/doc/stm.rst
+++ b/pypy/doc/stm.rst
@@ -46,13 +46,14 @@
   multiple cores.
 
 * ``pypy-stm`` provides (but does not impose) a special API to the
-  user in the pure Python module `transaction`_.  This module is based
-  on the lower-level module `pypystm`_, but also provides some
+  user in the pure Python module ``transaction``.  This module is based
+  on the lower-level module ``pypystm``, but also provides some
   compatibily with non-STM PyPy's or CPython's.
 
 * Building on top of the way the GIL is removed, we will talk
-  about `Atomic sections, Transactions, etc.: a better way to write
-  parallel programs`_.
+  about `How to write multithreaded programs: the 10'000-feet view`_
+  and `transaction.TransactionQueue`_.
+
 
 
 Getting Started
@@ -89,7 +90,7 @@
 Current status (stmgc-c7)
 -------------------------
 
-* It seems to work fine, without crashing any more.  Please `report
+* **NEW:** It seems to work fine, without crashing any more.  Please `report
   any crash`_ you find (or other bugs).
 
 * It runs with an overhead as low as 20% on examples like "richards".
@@ -97,33 +98,47 @@
   2x for "translate.py"-- which we are still trying to understand.
   One suspect is our partial GC implementation, see below.
 
+* **NEW:** the ``PYPYSTM`` environment variable and the
+  ``pypy/stm/print_stm_log.py`` script let you know exactly which
+  "conflicts" occurred.  This is described in the section
+  `transaction.TransactionQueue`_ below.
+
+* **NEW:** special transaction-friendly APIs (like ``stmdict``),
+  described in the section `transaction.TransactionQueue`_ below.  The
+  old API changed again, mostly moving to different modules.  Sorry
+  about that.  I feel it's a better idea to change the API early
+  instead of being stuck with a bad one later...
+
 * Currently limited to 1.5 GB of RAM (this is just a parameter in
   `core.h`__ -- theoretically.  In practice, increase it too much and
   clang crashes again).  Memory overflows are not correctly handled;
   they cause segfaults.
 
-* The JIT warm-up time improved recently but is still bad.  In order to
-  produce machine code, the JIT needs to enter a special single-threaded
-  mode for now.  This means that you will get bad performance results if
-  your program doesn't run for several seconds, where *several* can mean
-  *many.*  When trying benchmarks, be sure to check that you have
-  reached the warmed state, i.e. the performance is not improving any
-  more.  This should be clear from the fact that as long as it's
-  producing more machine code, ``pypy-stm`` will run on a single core.
+* **NEW:** The JIT warm-up time improved again, but is still
+  relatively large.  In order to produce machine code, the JIT needs
+  to enter "inevitable" mode.  This means that you will get bad
+  performance results if your program doesn't run for several seconds,
+  where *several* can mean *many.* When trying benchmarks, be sure to
+  check that you have reached the warmed state, i.e. the performance
+  is not improving any more.
 
 * The GC is new; although clearly inspired by PyPy's regular GC, it
   misses a number of optimizations for now.  Programs allocating large
   numbers of small objects that don't immediately die (surely a common
-  situation) suffer from these missing optimizations.
+  situation) suffer from these missing optimizations.  (The bleeding
+  edge ``stmgc-c8`` is better at that.)
 
 * Weakrefs might appear to work a bit strangely for now, sometimes
   staying alive throught ``gc.collect()``, or even dying but then
-  un-dying for a short time before dying again.
+  un-dying for a short time before dying again.  A similar problem can
+  show up occasionally elsewhere with accesses to some external
+  resources, where the (apparent) serialized order doesn't match the
+  underlying (multithreading) order.  These are bugs (partially fixed
+  already in ``stmgc-c8``).
 
 * The STM system is based on very efficient read/write barriers, which
   are mostly done (their placement could be improved a bit in
-  JIT-generated machine code).  But the overall bookkeeping logic could
-  see more improvements (see `Low-level statistics`_ below).
+  JIT-generated machine code).
 
 * Forking the process is slow because the complete memory needs to be
   copied manually.  A warning is printed to this effect.
@@ -132,7 +147,8 @@
   crash on an assertion error because of a non-implemented overflow of
   an internal 28-bit counter.
 
-.. _`report bugs`: https://bugs.pypy.org/
+
+.. _`report any crash`: https://bitbucket.org/pypy/pypy/issues?status=new&status=open
 .. __: https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/translator/stm/src_stm/stm/core.h
 
 
@@ -155,10 +171,41 @@
 interpreter and other ones might have slightly different needs.
 
 
-
 User Guide
 ==========
 
+How to write multithreaded programs: the 10'000-feet view
+---------------------------------------------------------
+
+PyPy-STM offers two ways to write multithreaded programs:
+
+* the traditional way, using the ``thread`` or ``threading`` modules,
+  described first__.
+
+* using ``TransactionQueue``, described next__, as a way to hide the
+  low-level notion of threads.
+
+.. __: `Drop-in replacement`_
+.. __: `transaction.TransactionQueue`_
+
+The issue with low-level threads are well known (particularly in other
+languages that don't have GIL-based interpreters): memory corruption,
+deadlocks, livelocks, and so on.  There are alternative approaches to
+dealing directly with threads, like OpenMP_.  These approaches
+typically enforce some structure on your code.  ``TransactionQueue``
+is in part similar: your program needs to have "some chances" of
+parallelization before you can apply it.  But I believe that the scope
+of applicability is much larger with ``TransactionQueue`` than with
+other approaches.  It usually works without forcing a complete
+reorganization of your existing code, and it works on any Python
+program which has got *latent* and *imperfect* parallelism.  Ideally,
+it only requires that the end programmer identifies where this
+parallelism is likely to be found, and communicates it to the system
+using a simple API.
+
+.. _OpenMP: http://en.wikipedia.org/wiki/OpenMP
+
+
 Drop-in replacement
 -------------------
 
@@ -181,8 +228,8 @@
 order.
 
 
-A better way to write parallel programs
----------------------------------------
+transaction.TransactionQueue
+----------------------------
 
 In CPU-hungry programs, we can often easily identify outermost loops
 over some data structure, or other repetitive algorithm, where each
@@ -216,41 +263,116 @@
 behavior did not change because we are using ``TransactionQueue``.
 All the calls still *appear* to execute in some serial order.
 
-Now the performance should ideally be improved: if the function calls
-turn out to be actually independent (most of the time), then it will
-be.  But if the function calls are not, then the total performance
-will crawl back to the previous case, with additionally some small
-penalty for the overhead.
+A typical usage of ``TransactionQueue`` goes like that: at first,
+the performance does not increase.
+In fact, it is likely to be worse.  Typically, this is
+indicated by the total CPU usage, which remains low (closer to 1 than
+N cores).  First note that it is expected that the CPU usage should
+not go much higher than 1 in the JIT warm-up phase: you must run a
+program for several seconds, or for larger programs at least one
+minute, to give the JIT a chance to warm up enough.  But if CPU usage
+remains low even afterwards, then the ``PYPYSTM`` environment variable
+can be used to track what is going on.
 
-This case occurs typically when you see the total CPU usage remaining
-low (closer to 1 than N cores).  Note first that it is expected that
-the CPU usage should not go much higher than 1 in the JIT warm-up
-phase.  You must run a program for several seconds, or for larger
-programs at least one minute, to give the JIT a chance to warm up
-correctly.  But if CPU usage remains low even though all code is
-executing in a ``TransactionQueue.run()``, then the ``PYPYSTM``
-environment variable can be used to track what is going on.
+Run your program with ``PYPYSTM=logfile`` to produce a log file called
+``logfile``.  Afterwards, use the ``pypy/stm/print_stm_log.py``
+utility to inspect the content of this log file.  It produces output
+like this (sorted by amount of time lost, largest first)::
 
-Run your program with ``PYPYSTM=stmlog`` to produce a log file called
-``stmlog``.  Afterwards, use the ``pypy/stm/print_stm_log.py`` utility
-to inspect the content of this log file.  It produces output like
-this::
+    10.5s lost in aborts, 1.25s paused (12412x STM_CONTENTION_WRITE_WRITE)
+    File "foo.py", line 10, in f
+      someobj.stuff = 5
+    File "bar.py", line 20, in g
+      someobj.other = 10
 
-    documentation in progress!
+This means that 10.5 seconds were lost running transactions that were
+aborted (which caused another 1.25 seconds of lost time by pausing),
+because of the reason shown in the two independent single-entry
+tracebacks: one thread ran the line ``someobj.stuff = 5``, whereas
+another thread concurrently ran the line ``someobj.other = 10`` on the
+same object.  These two writes are done to the same object.  This
+causes a conflict, which aborts one of the two transactions.  In the
+example above this occurred 12412 times.
 
+The two other conflict sources are ``STM_CONTENTION_INEVITABLE``,
+which means that two transactions both tried to do an external
+operation, like printing or reading from a socket or accessing an
+external array of raw data; and ``STM_CONTENTION_WRITE_READ``, which
+means that one transaction wrote to an object but the other one merely
+read it, not wrote to it (in that case only the writing transaction is
+reported; the location for the reads is not recorded because doing so
+is not possible without a very large performance impact).
+
+Common causes of conflicts:
+
+* First of all, any I/O or raw manipulation of memory turns the
+  transaction inevitable ("must not abort").  There can be only one
+  inevitable transaction running at any time.  A common case is if
+  each transaction starts with sending data to a log file.  You should
+  refactor this case so that it occurs either near the end of the
+  transaction (which can then mostly run in non-inevitable mode), or
+  delegate it to a separate transaction or even a separate thread.
+
+* Writing to a list or a dictionary conflicts with any read from the
+  same list or dictionary, even one done with a different key.  For
+  dictionaries and sets, you can try the types ``transaction.stmdict``
+  and ``transaction.stmset``, which behave mostly like ``dict`` and
+  ``set`` but allow concurrent access to different keys.  (What is
+  missing from them so far is lazy iteration: for example,
+  ``stmdict.iterkeys()`` is implemented as ``iter(stmdict.keys())``;
+  and, unlike PyPy's dictionaries and sets, the STM versions are not
+  ordered.)  There are also experimental ``stmiddict`` and
+  ``stmidset`` classes using the identity of the key.
+
+* ``time.time()`` and ``time.clock()`` turn the transaction inevitable
+  in order to guarantee that a call that appears to be later will
+  really return a higher number.  If getting slightly unordered
+  results is fine, use ``transaction.time()`` or
+  ``transaction.clock()``.
+
+* ``transaction.threadlocalproperty`` can be used at class-level::
+
+      class Foo(object):     # must be a new-style class!
+          x = transaction.threadlocalproperty()
+          y = transaction.threadlocalproperty(dict)
+
+  This declares that instances of ``Foo`` have two attributes ``x``
+  and ``y`` that are thread-local: reading or writing them from
+  concurrently-running transactions will return independent results.
+  (Any other attributes of ``Foo`` instances will be globally visible
+  from all threads, as usual.)  The optional argument to
+  ``threadlocalproperty()`` is the default value factory: in case no
+  value was assigned in the current thread yet, the factory is called
+  and its result becomes the value in that thread (like
+  ``collections.defaultdict``).  If no default value factory is
+  specified, uninitialized reads raise ``AttributeError``.  Note that
+  with ``TransactionQueue`` you get a pool of a fixed number of
+  threads, each running the transactions one after the other; such
+  thread-local properties will have the value last stored in them in
+  the same thread,, which may come from a random previous transaction.
+  This means that ``threadlocalproperty`` is useful mainly to avoid
+  conflicts from cache-like data structures.
+
+Note that Python is a complicated language; there are a number of less
+common cases that may cause conflict (of any kind) where we might not
+expect it at priori.  In many of these cases it could be fixed; please
+report any case that you don't understand.  (For example, so far,
+creating a weakref to an object requires attaching an auxiliary
+internal object to that object, and so it can cause write-write
+conflicts.)
 
 
 Atomic sections
 ---------------
 
-PyPy supports *atomic sections,* which are blocks of code which you
-want to execute without "releasing the GIL".  In STM terms, this means
-blocks of code that are executed while guaranteeing that the
-transaction is not interrupted in the middle.  *This is experimental
-and may be removed in the future* if `lock elision`_ is ever
-implemented.
+The ``TransactionQueue`` class described above is based on *atomic
+sections,* which are blocks of code which you want to execute without
+"releasing the GIL".  In STM terms, this means blocks of code that are
+executed while guaranteeing that the transaction is not interrupted in
+the middle.  *This is experimental and may be removed in the future*
+if `Software lock elision`_ is ever implemented.
 
-Here is a usage example::
+Here is a direct usage example::
 
     with transaction.atomic:
         assert len(lst1) == 10
@@ -281,8 +403,8 @@
 it likely that such a piece of code will eventually block all other
 threads anyway.
 
-Note that if you want to experiment with ``atomic``, you may have to add
-manually a transaction break just before the atomic block.  This is
+Note that if you want to experiment with ``atomic``, you may have to
+manually add a transaction break just before the atomic block.  This is
 because the boundaries of the block are not guaranteed to be the
 boundaries of the transaction: the latter is at least as big as the
 block, but may be bigger.  Therefore, if you run a big atomic block, it
@@ -295,7 +417,8 @@
 including with a ``print`` to standard output.  If one thread tries to
 acquire a lock while running in an atomic block, and another thread
 has got the same lock at that point, then the former may fail with a
-``thread.error``.  The reason is that "waiting" for some condition to
+``thread.error``.  (Don't rely on it; it may also deadlock.)
+The reason is that "waiting" for some condition to
 become true --while running in an atomic block-- does not really make
 sense.  For now you can work around it by making sure that, say, all
 your prints are either in an ``atomic`` block or none of them are.
@@ -354,106 +477,38 @@
 .. _`software lock elision`: https://www.repository.cam.ac.uk/handle/1810/239410
 
 
-Atomic sections, Transactions, etc.: a better way to write parallel programs
-----------------------------------------------------------------------------
+Miscellaneous functions
+-----------------------
 
-(This section is based on locks as we plan to implement them, but also
-works with the existing atomic sections.)
-
-In the cases where elision works, the block of code can run in parallel
-with other blocks of code *even if they are protected by the same lock.*
-You still get the illusion that the blocks are run sequentially.  This
-works even for multiple threads that run each a series of such blocks
-and nothing else, protected by one single global lock.  This is
-basically the Python application-level equivalent of what was done with
-the interpreter in ``pypy-stm``: while you think you are writing
-thread-unfriendly code because of this global lock, actually the
-underlying system is able to make it run on multiple cores anyway.
-
-This capability can be hidden in a library or in the framework you use;
-the end user's code does not need to be explicitly aware of using
-threads.  For a simple example of this, there is `transaction.py`_ in
-``lib_pypy``.  The idea is that you write, or already have, some program
-where the function ``f(key, value)`` runs on every item of some big
-dictionary, say::
-
-    for key, value in bigdict.items():
-        f(key, value)
-
-Then you simply replace the loop with::
-
-    for key, value in bigdict.items():
-        transaction.add(f, key, value)
-    transaction.run()
-
-This code runs the various calls to ``f(key, value)`` using a thread
-pool, but every single call is executed under the protection of a unique
-lock.  The end result is that the behavior is exactly equivalent --- in
-fact it makes little sense to do it in this way on a non-STM PyPy or on
-CPython.  But on ``pypy-stm``, the various locked calls to ``f(key,
-value)`` can tentatively be executed in parallel, even if the observable
-result is as if they were executed in some serial order.
-
-This approach hides the notion of threads from the end programmer,
-including all the hard multithreading-related issues.  This is not the
-first alternative approach to explicit threads; for example, OpenMP_ is
-one.  However, it is one of the first ones which does not require the
-code to be organized in a particular fashion.  Instead, it works on any
-Python program which has got latent, imperfect parallelism.  Ideally, it
-only requires that the end programmer identifies where this parallelism
-is likely to be found, and communicates it to the system, using for
-example the ``transaction.add()`` scheme.
-
-.. _`transaction.py`: https://bitbucket.org/pypy/pypy/raw/stmgc-c7/lib_pypy/transaction.py
-.. _OpenMP: http://en.wikipedia.org/wiki/OpenMP
-
-
-.. _`transactional_memory`:
-
-API of transactional_memory
----------------------------
-
-The new pure Python module ``transactional_memory`` runs on both CPython
-and PyPy, both with and without STM.  It contains:
-
-* ``getsegmentlimit()``: return the number of "segments" in
+* ``transaction.getsegmentlimit()``: return the number of "segments" in
   this pypy-stm.  This is the limit above which more threads will not be
   able to execute on more cores.  (Right now it is limited to 4 due to
   inter-segment overhead, but should be increased in the future.  It
   should also be settable, and the default value should depend on the
   number of actual CPUs.)  If STM is not available, this returns 1.
 
-* ``print_abort_info(minimum_time=0.0)``: debugging help.  Each thread
-  remembers the longest abort or pause it did because of cross-thread
-  contention_.  This function prints it to ``stderr`` if the time lost
-  is greater than ``minimum_time`` seconds.  The record is then
-  cleared, to make it ready for new events.  This function returns
-  ``True`` if it printed a report, and ``False`` otherwise.
+* ``__pypy__.thread.signals_enabled``: a context manager that runs its
+  block of code with signals enabled.  By default, signals are only
+  enabled in the main thread; a non-main thread will not receive
+  signals (this is like CPython).  Enabling signals in non-main
+  threads is useful for libraries where threads are hidden and the end
+  user is not expecting his code to run elsewhere than in the main
+  thread.
 
+* ``pypystm.exclusive_atomic``: a context manager similar to
+  ``transaction.atomic`` but which complains if it is nested.
 
-API of __pypy__.thread
-----------------------
+* ``transaction.is_atomic()``: return True if called from an atomic
+  context.
 
-The ``__pypy__.thread`` submodule is a built-in module of PyPy that
-contains a few internal built-in functions used by the
-``transactional_memory`` module, plus the following:
+* ``pypystm.count()``: return a different positive integer every time
+  it is called.  This works without generating conflicts.  The
+  returned integers are only roughly in increasing order; this should
+  not be relied upon.
 
-* ``__pypy__.thread.atomic``: a context manager to run a block in
-  fully atomic mode, without "releasing the GIL".  (May be eventually
-  removed?)
 
-* ``__pypy__.thread.signals_enabled``: a context manager that runs its
-  block with signals enabled.  By default, signals are only enabled in
-  the main thread; a non-main thread will not receive signals (this is
-  like CPython).  Enabling signals in non-main threads is useful for
-  libraries where threads are hidden and the end user is not expecting
-  his code to run elsewhere than in the main thread.
-
-
-.. _contention:
-
-Conflicts
----------
+More details about conflicts
+----------------------------
 
 Based on Software Transactional Memory, the ``pypy-stm`` solution is
 prone to "conflicts".  To repeat the basic idea, threads execute their code
@@ -469,25 +524,26 @@
 the transaction).  If this occurs too often, parallelization fails.
 
 How much actual parallelization a multithreaded program can see is a bit
-subtle.  Basically, a program not using ``__pypy__.thread.atomic`` or
+subtle.  Basically, a program not using ``transaction.atomic`` or
 eliding locks, or doing so for very short amounts of time, will
 parallelize almost freely (as long as it's not some artificial example
 where, say, all threads try to increase the same global counter and do
 nothing else).
 
-However, using if the program requires longer transactions, it comes
+However, if the program requires longer transactions, it comes
 with less obvious rules.  The exact details may vary from version to
 version, too, until they are a bit more stabilized.  Here is an
 overview.
 
 Parallelization works as long as two principles are respected.  The
-first one is that the transactions must not *conflict* with each other.
-The most obvious sources of conflicts are threads that all increment a
-global shared counter, or that all store the result of their
-computations into the same list --- or, more subtly, that all ``pop()``
-the work to do from the same list, because that is also a mutation of
-the list.  (It is expected that some STM-aware library will eventually
-be designed to help with conflict problems, like a STM-aware queue.)
+first one is that the transactions must not *conflict* with each
+other.  The most obvious sources of conflicts are threads that all
+increment a global shared counter, or that all store the result of
+their computations into the same list --- or, more subtly, that all
+``pop()`` the work to do from the same list, because that is also a
+mutation of the list.  (You can work around it with
+``transaction.stmdict``, but for that specific example, some STM-aware
+queue should eventually be designed.)
 
 A conflict occurs as follows: when a transaction commits (i.e. finishes
 successfully) it may cause other transactions that are still in progress
@@ -503,22 +559,23 @@
 Another issue is that of avoiding long-running so-called "inevitable"
 transactions ("inevitable" is taken in the sense of "which cannot be
 avoided", i.e. transactions which cannot abort any more).  Transactions
-like that should only occur if you use ``__pypy__.thread.atomic``,
-generally become of I/O in atomic blocks.  They work, but the
+like that should only occur if you use ``atomic``,
+generally because of I/O in atomic blocks.  They work, but the
 transaction is turned inevitable before the I/O is performed.  For all
 the remaining execution time of the atomic block, they will impede
 parallel work.  The best is to organize the code so that such operations
-are done completely outside ``__pypy__.thread.atomic``.
+are done completely outside ``atomic``.
 
-(This is related to the fact that blocking I/O operations are
+(This is not unrelated to the fact that blocking I/O operations are
 discouraged with Twisted, and if you really need them, you should do
 them on their own separate thread.)
 
-In case of lock elision, we don't get long-running inevitable
-transactions, but a different problem can occur: doing I/O cancels lock
-elision, and the lock turns into a real lock, preventing other threads
-from committing if they also need this lock.  (More about it when lock
-elision is implemented and tested.)
+In case lock elision eventually replaces atomic sections, we wouldn't
+get long-running inevitable transactions, but the same problem occurs
+in a different way: doing I/O cancels lock elision, and the lock turns
+into a real lock.  This prevents other threads from committing if they
+also need this lock.  (More about it when lock elision is implemented
+and tested.)
 
 
 
@@ -528,56 +585,18 @@
 XXX this section mostly empty for now
 
 
-Low-level statistics
---------------------
-
-When a non-main thread finishes, you get low-level statistics printed to
-stderr, looking like that::
-
-      thread 0x7f73377fe600:
-          outside transaction          42182    0.506 s
-          run current                  85466    0.000 s
-          run committed                34262    3.178 s
-          run aborted write write       6982    0.083 s
-          run aborted write read         550    0.005 s
-          run aborted inevitable         388    0.010 s
-          run aborted other                0    0.000 s
-          wait free segment                0    0.000 s
-          wait write read                 78    0.027 s
-          wait inevitable                887    0.490 s
-          wait other                       0    0.000 s
-          sync commit soon                 1    0.000 s
-          bookkeeping                  51418    0.606 s
-          minor gc                    162970    1.135 s
-          major gc                         1    0.019 s
-          sync pause                   59173    1.738 s
-          longest recordered marker          0.000826 s
-          "File "x.py", line 5, in f"
-
-On each line, the first number is a counter, and the second number gives
-the associated time --- the amount of real time that the thread was in
-this state.  The sum of all the times should be equal to the total time
-between the thread's start and the thread's end.  The most important
-points are "run committed", which gives the amount of useful work, and
-"outside transaction", which should give the time spent e.g. in library
-calls (right now it seems to be larger than that; to investigate).  The
-various "run aborted" and "wait" entries are time lost due to
-conflicts_.  Everything else is overhead of various forms.  (Short-,
-medium- and long-term future work involves reducing this overhead :-)
-
-The last two lines are special; they are an internal marker read by
-``transactional_memory.print_abort_info()``.
-
-
 Reference to implementation details
 -----------------------------------
 
-The core of the implementation is in a separate C library called stmgc_,
-in the c7_ subdirectory.  Please see the `README.txt`_ for more
-information.  In particular, the notion of segment is discussed there.
+The core of the implementation is in a separate C library called
+stmgc_, in the c7_ subdirectory (current version of pypy-stm) and in
+the c8_ subdirectory (bleeding edge version).  Please see the
+`README.txt`_ for more information.  In particular, the notion of
+segment is discussed there.
 
 .. _stmgc: https://bitbucket.org/pypy/stmgc/src/default/
 .. _c7: https://bitbucket.org/pypy/stmgc/src/default/c7/
+.. _c8: https://bitbucket.org/pypy/stmgc/src/default/c8/
 .. _`README.txt`: https://bitbucket.org/pypy/stmgc/raw/default/c7/README.txt
 
 PyPy itself adds on top of it the automatic placement of read__ and write__
diff --git a/pypy/goal/getnightly.py b/pypy/goal/getnightly.py
--- a/pypy/goal/getnightly.py
+++ b/pypy/goal/getnightly.py
@@ -7,7 +7,7 @@
 if sys.platform.startswith('linux'):
     arch = 'linux'
     cmd = 'wget "%s"'
-    tar = "tar -x -v --wildcards --strip-components=2 -f %s '*/bin/pypy'"
+    tar = "tar -x -v --wildcards --strip-components=2 -f %s '*/bin/pypy' '*/bin/libpypy-c.so'"
     if os.uname()[-1].startswith('arm'):
         arch += '-armhf-raspbian'
 elif sys.platform.startswith('darwin'):
diff --git a/pypy/interpreter/pycode.py b/pypy/interpreter/pycode.py
--- a/pypy/interpreter/pycode.py
+++ b/pypy/interpreter/pycode.py
@@ -137,7 +137,9 @@
             filename = filename[:-1]
         basename = os.path.basename(filename)
         lastdirname = os.path.basename(os.path.dirname(filename))
-        self.co_filename = '<builtin>/%s/%s' % (lastdirname, basename)
+        if lastdirname:
+            basename = '%s/%s' % (lastdirname, basename)
+        self.co_filename = '<builtin>/%s' % (basename,)
 
     co_names = property(lambda self: [self.space.unwrap(w_name) for w_name in self.co_names_w]) # for trace
 
diff --git a/pypy/interpreter/pyopcode.py b/pypy/interpreter/pyopcode.py
--- a/pypy/interpreter/pyopcode.py
+++ b/pypy/interpreter/pyopcode.py
@@ -1626,6 +1626,13 @@
     def prepare_exec(f, prog, globals, locals, compile_flags, builtin, codetype):
         """Manipulate parameters to exec statement to (codeobject, dict, dict).
         """
+        if (globals is None and locals is None and
+            isinstance(prog, tuple) and
+            (len(prog) == 2 or len(prog) == 3)):
+            globals = prog[1]
+            if len(prog) == 3:
+                locals = prog[2]
+            prog = prog[0]
         if globals is None:
             globals = f.f_globals
             if locals is None:
diff --git a/pypy/interpreter/test/test_exec.py b/pypy/interpreter/test/test_exec.py
--- a/pypy/interpreter/test/test_exec.py
+++ b/pypy/interpreter/test/test_exec.py
@@ -262,3 +262,11 @@
         """]
         for c in code:
             compile(c, "<code>", "exec")
+
+    def test_exec_tuple(self):
+        # note: this is VERY different than testing exec("a = 42", d), because
+        # this specific case is handled specially by the AST compiler
+        d = {}
+        x = ("a = 42", d)
+        exec x
+        assert d['a'] == 42
diff --git a/pypy/module/_csv/test/__init__.py b/pypy/module/_csv/test/__init__.py
new file mode 100644
diff --git a/pypy/module/_io/test/__init__.py b/pypy/module/_io/test/__init__.py
new file mode 100644
diff --git a/pypy/module/_multiprocessing/test/__init__.py b/pypy/module/_multiprocessing/test/__init__.py
new file mode 100644
diff --git a/pypy/module/_socket/interp_socket.py b/pypy/module/_socket/interp_socket.py
--- a/pypy/module/_socket/interp_socket.py
+++ b/pypy/module/_socket/interp_socket.py
@@ -30,7 +30,7 @@
                                space.wrap(addr.get_protocol()),
                                space.wrap(addr.get_pkttype()),
                                space.wrap(addr.get_hatype()),
-                               space.wrap(addr.get_addr())])
+                               space.wrap(addr.get_haddr())])
     elif rsocket.HAS_AF_UNIX and isinstance(addr, rsocket.UNIXAddress):
         return space.wrap(addr.get_path())
     elif rsocket.HAS_AF_NETLINK and isinstance(addr, rsocket.NETLINKAddress):
@@ -79,7 +79,7 @@
         raise NotImplementedError
 
 # XXX Hack to seperate rpython and pypy
-def addr_from_object(family, space, w_address):
+def addr_from_object(family, fd, space, w_address):
     if family == rsocket.AF_INET:
         w_host, w_port = space.unpackiterable(w_address, 2)
         host = space.str_w(w_host)
@@ -89,8 +89,9 @@
     if family == rsocket.AF_INET6:
         pieces_w = space.unpackiterable(w_address)
         if not (2 <= len(pieces_w) <= 4):
-            raise TypeError("AF_INET6 address must be a tuple of length 2 "
-                               "to 4, not %d" % len(pieces_w))
+            raise oefmt(space.w_TypeError,
+                        "AF_INET6 address must be a tuple of length 2 "
+                        "to 4, not %d", len(pieces_w))
         host = space.str_w(pieces_w[0])
         port = space.int_w(pieces_w[1])
         port = make_ushort_port(space, port)
@@ -105,6 +106,28 @@
     if rsocket.HAS_AF_NETLINK and family == rsocket.AF_NETLINK:
         w_pid, w_groups = space.unpackiterable(w_address, 2)
         return rsocket.NETLINKAddress(space.uint_w(w_pid), space.uint_w(w_groups))
+    if rsocket.HAS_AF_PACKET and family == rsocket.AF_PACKET:
+        pieces_w = space.unpackiterable(w_address)
+        if not (2 <= len(pieces_w) <= 5):
+            raise oefmt(space.w_TypeError,
+                        "AF_PACKET address must be a tuple of length 2 "
+                        "to 5, not %d", len(pieces_w))
+        ifname = space.str_w(pieces_w[0])
+        ifindex = rsocket.PacketAddress.get_ifindex_from_ifname(fd, ifname)
+        protocol = space.int_w(pieces_w[1])
+        if len(pieces_w) > 2: pkttype = space.int_w(pieces_w[2])
+        else:                 pkttype = 0
+        if len(pieces_w) > 3: hatype = space.int_w(pieces_w[3])
+        else:                 hatype = 0
+        if len(pieces_w) > 4: haddr = space.str_w(pieces_w[4])
+        else:                 haddr = ""
+        if len(haddr) > 8:
+            raise OperationError(space.w_ValueError, space.wrap(
+                "Hardware address must be 8 bytes or less"))
+        if protocol < 0 or protocol > 0xfffff:
+            raise OperationError(space.w_OverflowError, space.wrap(
+                "protoNumber must be 0-65535."))
+        return rsocket.PacketAddress(ifindex, protocol, pkttype, hatype, haddr)
     raise RSocketError("unknown address family")
 
 # XXX Hack to seperate rpython and pypy
@@ -172,7 +195,8 @@
     # convert an app-level object into an Address
     # based on the current socket's family
     def addr_from_object(self, space, w_address):
-        return addr_from_object(self.sock.family, space, w_address)
+        fd = intmask(self.sock.fd)
+        return addr_from_object(self.sock.family, fd, space, w_address)
 
     def bind_w(self, space, w_addr):
         """bind(address)
diff --git a/pypy/module/_socket/test/test_sock_app.py b/pypy/module/_socket/test/test_sock_app.py
--- a/pypy/module/_socket/test/test_sock_app.py
+++ b/pypy/module/_socket/test/test_sock_app.py
@@ -1,4 +1,4 @@
-import sys
+import sys, os
 import py
 from pypy.tool.pytest.objspace import gettestobjspace
 from rpython.tool.udir import udir
@@ -615,6 +615,28 @@
             os.chdir(oldcwd)
 
 
+class AppTestPacket:
+    def setup_class(cls):
+        if not hasattr(os, 'getuid') or os.getuid() != 0:
+            py.test.skip("AF_PACKET needs to be root for testing")
+        w_ok = space.appexec([], "(): import _socket; " +
+                                 "return hasattr(_socket, 'AF_PACKET')")
+        if not space.is_true(w_ok):
+            py.test.skip("no AF_PACKET on this platform")
+        cls.space = space
+
+    def test_convert_between_tuple_and_sockaddr_ll(self):
+        import _socket
+        s = _socket.socket(_socket.AF_PACKET, _socket.SOCK_RAW)
+        assert s.getsockname() == ('', 0, 0, 0, '')
+        s.bind(('lo', 123))
+        a, b, c, d, e = s.getsockname()
+        assert (a, b, c) == ('lo', 123, 0)
+        assert isinstance(d, int)
+        assert isinstance(e, str)
+        assert 0 <= len(e) <= 8
+
+
 class AppTestSocketTCP:
     HOST = 'localhost'
 
diff --git a/pypy/module/_ssl/test/__init__.py b/pypy/module/_ssl/test/__init__.py
new file mode 100644
diff --git a/pypy/module/itertools/test/__init__.py b/pypy/module/itertools/test/__init__.py
new file mode 100644
diff --git a/pypy/module/pwd/test/__init__.py b/pypy/module/pwd/test/__init__.py
new file mode 100644
diff --git a/pypy/module/pyexpat/__init__.py b/pypy/module/pyexpat/__init__.py
--- a/pypy/module/pyexpat/__init__.py
+++ b/pypy/module/pyexpat/__init__.py
@@ -39,8 +39,6 @@
         'error':         'space.fromcache(interp_pyexpat.Cache).w_error',
 
         '__version__':   'space.wrap("85819")',
-        'EXPAT_VERSION': 'interp_pyexpat.get_expat_version(space)',
-        'version_info':  'interp_pyexpat.get_expat_version_info(space)',
         }
 
     submodules = {
@@ -53,3 +51,9 @@
                  'XML_PARAM_ENTITY_PARSING_ALWAYS']:
         interpleveldefs[name] = 'space.wrap(interp_pyexpat.%s)' % (name,)
 
+    def startup(self, space):
+        from pypy.module.pyexpat import interp_pyexpat
+        w_ver = interp_pyexpat.get_expat_version(space)
+        space.setattr(self, space.wrap("EXPAT_VERSION"), w_ver)
+        w_ver = interp_pyexpat.get_expat_version_info(space)
+        space.setattr(self, space.wrap("version_info"), w_ver)
diff --git a/pypy/module/select/test/__init__.py b/pypy/module/select/test/__init__.py
new file mode 100644
diff --git a/pypy/module/struct/test/__init__.py b/pypy/module/struct/test/__init__.py
new file mode 100644
diff --git a/pypy/module/zipimport/test/test_zipimport_deflated.py b/pypy/module/zipimport/test/test_zipimport_deflated.py
--- a/pypy/module/zipimport/test/test_zipimport_deflated.py
+++ b/pypy/module/zipimport/test/test_zipimport_deflated.py
@@ -14,7 +14,7 @@
     def setup_class(cls):
         try:
             import rpython.rlib.rzlib
-        except ImportError:
+        except CompilationError:
             py.test.skip("zlib not available, cannot test compressed zipfiles")
         cls.make_class()
         cls.w_BAD_ZIP = cls.space.wrap(BAD_ZIP)
diff --git a/rpython/annotator/binaryop.py b/rpython/annotator/binaryop.py
--- a/rpython/annotator/binaryop.py
+++ b/rpython/annotator/binaryop.py
@@ -132,13 +132,11 @@
         impl = pair(s_c1, s_o2).getitem
         return read_can_only_throw(impl, s_c1, s_o2)
 
-    def getitem_idx_key((s_c1, s_o2)):
+    def getitem_idx((s_c1, s_o2)):
         impl = pair(s_c1, s_o2).getitem
         return impl()
-    getitem_idx_key.can_only_throw = _getitem_can_only_throw
+    getitem_idx.can_only_throw = _getitem_can_only_throw
 
-    getitem_idx = getitem_idx_key
-    getitem_key = getitem_idx_key
 
 
 class __extend__(pairtype(SomeType, SomeType),
@@ -565,14 +563,10 @@
         return lst1.listdef.read_item()
     getitem.can_only_throw = []
 
-    getitem_key = getitem
-
     def getitem_idx((lst1, int2)):
         return lst1.listdef.read_item()
     getitem_idx.can_only_throw = [IndexError]
 
-    getitem_idx_key = getitem_idx
-
     def setitem((lst1, int2), s_value):
         lst1.listdef.mutate()
         lst1.listdef.generalize(s_value)
@@ -588,14 +582,10 @@
         return SomeChar(no_nul=str1.no_nul)
     getitem.can_only_throw = []
 
-    getitem_key = getitem
-
     def getitem_idx((str1, int2)):
         return SomeChar(no_nul=str1.no_nul)
     getitem_idx.can_only_throw = [IndexError]
 
-    getitem_idx_key = getitem_idx
-
     def mul((str1, int2)): # xxx do we want to support this
         return SomeString(no_nul=str1.no_nul)
 
@@ -604,14 +594,10 @@
         return SomeUnicodeCodePoint()
     getitem.can_only_throw = []
 
-    getitem_key = getitem
-
     def getitem_idx((str1, int2)):
         return SomeUnicodeCodePoint()
     getitem_idx.can_only_throw = [IndexError]
 
-    getitem_idx_key = getitem_idx
-
     def mul((str1, int2)): # xxx do we want to support this
         return SomeUnicodeString()
 
diff --git a/rpython/doc/jit/index.rst b/rpython/doc/jit/index.rst
--- a/rpython/doc/jit/index.rst
+++ b/rpython/doc/jit/index.rst
@@ -23,11 +23,15 @@
 
    overview
    pyjitpl5
+   optimizer
    virtualizable
 
 - :doc:`Overview <overview>`: motivating our approach
 
 - :doc:`Notes <pyjitpl5>` about the current work in PyPy
 
+- :doc:`Optimizer <optimizer>`: the step between tracing and writing
+  machine code
+
 - :doc:`Virtulizable <virtualizable>` how virtualizables work and what they are
   (in other words how to make frames more efficient).
diff --git a/rpython/doc/jit/optimizer.rst b/rpython/doc/jit/optimizer.rst
new file mode 100644
--- /dev/null
+++ b/rpython/doc/jit/optimizer.rst
@@ -0,0 +1,196 @@
+.. _trace_optimizer:
+
+Trace Optimizer
+===============
+
+Traces of user programs are not directly translated into machine code.
+The optimizer module implements several different semantic preserving
+transformations that either allow operations to be swept from the trace
+or convert them to operations that need less time or space.
+
+The optimizer is in `rpython/jit/metainterp/optimizeopt/`.
+When you try to make sense of this module, this page might get you started.
+
+Before some optimizations are explained in more detail, it is essential to
+understand how traces look like.
+The optimizer comes with a test suit. It contains many trace
+examples and you might want to take a look at it
+(in `rpython/jit/metainterp/optimizeopt/test/*.py`).
+The allowed operations can be found in `rpython/jit/metainterp/resoperation.py`.
+Here is an example of a trace::
+
+    [p0,i0,i1]
+    label(p0, i0, i1)
+    i2 = getarray_item_raw(p0, i0, descr=<Array Signed>)
+    i3 = int_add(i1,i2)
+    i4 = int_add(i0,1)
+    i5 = int_le(i4, 100) # lower-or-equal
+    guard_true(i5)
+    jump(p0, i4, i3)
+
+At the beginning it might be clumsy to read but it makes sense when you start
+to compare the Python code that constructed the trace::
+
+    from array import array
+    a = array('i',range(101))
+    sum = 0; i = 0
+    while i <= 100: # can be seen as label
+        sum += a[i]
+        i += 1
+        # jumps back to the while header
+
+There are better ways to compute the sum from ``[0..100]``, but it gives a better intuition on how
+traces are constructed than ``sum(range(101))``.
+Note that the trace syntax is the one used in the test suite. It is also very
+similar to traces printed at runtime by PYPYLOG_. The first line gives the input variables, the
+second line is a ``label`` operation, the last one is the backwards ``jump`` operation.
+
+.. _PYPYLOG: logging.html
+
+These instructions mentioned earlier are special:
+
+* the input defines the input parameter type and name to enter the trace.
+* ``label`` is the instruction a ``jump`` can target. Label instructions have
+  a ``JitCellToken`` associated that uniquely identifies the label. Any jump
+  has a target token of a label.
+
+The token is saved in a so called `descriptor` of the instruction. It is
+not written explicitly because it is not done in the tests either. But
+the test suite creates a dummy token for each trace and adds it as descriptor
+to ``label`` and ``jump``. Of course the optimizer does the same at runtime,
+but using real values.
+The sample trace includes a descriptor in ``getarrayitem_raw``. Here it
+annotates the type of the array. It is a signed integer array.
+
+High level overview
+-------------------
+
+Before the JIT backend transforms any trace into machine code, it tries to
+transform the trace into an equivalent trace that executes faster. The method
+`optimize_trace` in `rpython/jit/metainterp/optimizeopt/__init__.py` is the
+main entry point.
+
+Optimizations are applied in a sequence one after another and the base
+sequence is as follows::
+
+    intbounds:rewrite:virtualize:string:earlyforce:pure:heap:unroll
+
+Each of the colon-separated name has a class attached, inheriting from
+the `Optimization` class.  The `Optimizer` class itself also
+derives from the `Optimization` class and implements the control logic for
+the optimization. Most of the optimizations only require a single forward pass.
+The trace is 'propagated' into each optimization using the method
+`propagate_forward`. Instruction by instruction, it flows from the
+first optimization to the last optimization. The method `emit_operation`
+is called for every operation that is passed to the next optimizer.
+
+A frequently encountered pattern
+--------------------------------
+
+To find potential optimization targets it is necessary to know the instruction
+type. Simple solution is to switch using the operation number (= type)::
+
+    for op in operations:
+        if op.getopnum() == rop.INT_ADD:
+            # handle this instruction
+            pass
+        elif op.getopnum() == rop.INT_FLOOR_DIV:
+            pass
+        # and many more
+
+Things get worse if you start to match the arguments
+(is argument one constant and two variable or vice versa?). The pattern to tackle
+this code bloat is to move it to a separate method using
+`make_dispatcher_method`. It associates methods with instruction types::
+
+    class OptX(Optimization):
+        def prefix_INT_ADD(self, op):
+            pass # emit, transform, ...
+
+    dispatch_opt = make_dispatcher_method(OptX, 'prefix_',
+                                          default=OptX.emit_operation)
+    OptX.propagate_forward = dispatch_opt
+
+    optX = OptX()
+    for op in operations:
+        optX.propagate_forward(op)
+
+``propagate_forward`` searches for the method that is able to handle the instruction
+type. As an example `INT_ADD` will invoke `prefix_INT_ADD`. If there is no function
+for the instruction, it is routed to the default implementation (``emit_operation``
+in this example).
+
+Rewrite optimization
+--------------------
+
+The second optimization is called 'rewrite' and is commonly also known as
+strength reduction. A simple example would be that an integer multiplied
+by 2 is equivalent to the bits shifted to the left once
+(e.g. ``x * 2 == x << 1``). Not only strength reduction is done in this
+optimization but also boolean or arithmetic simplifications. Other examples
+would be: ``x & 0 == 0``, ``x - 0 == x``
+
+Whenever such an operation is encountered (e.g. ``y = x & 0``), no operation is
+emitted. Instead the variable y is made equal to 0
+(= ``make_equal_to(op.result, 0)``). The variables found in a trace are
+instances of Box classes that can be found in
+`rpython/jit/metainterp/history.py`. `OptValue` wraps those variables again
+and maps the boxes to the optimization values in the optimizer. When a
+value is made equal, the two variable's boxes are made to point to the same
+`OptValue` instance.
+
+**NOTE: this OptValue organization is currently being refactored in a branch.**
+
+Pure optimization
+-----------------
+
+Is interwoven into the basic optimizer. It saves operations, results,
+arguments to be known to have pure semantics.
+
+"Pure" here means the same as the ``jit.elidable`` decorator:
+free of "observable" side effects and referentially transparent
+(the operation can be replaced with its result without changing the program
+semantics). The operations marked as ALWAYS_PURE in `resoperation.py` are a
+subset of the NOSIDEEFFECT operations. Operations such as new, new array,
+getfield_(raw/gc) are marked as NOSIDEEFFECT but not as ALWAYS_PURE.
+
+Pure operations are optimized in two different ways.  If their arguments
+are constants, the operation is removed and the result is turned into a
+constant.  If not, we can still use a memoization technique: if, later,
+we see the same operation on the same arguments again, we don't need to
+recompute its result, but can simply reuse the previous operation's
+result.
+
+Unroll optimization
+-------------------
+
+A detailed description can be found the document
+`Loop-Aware Optimizations in PyPy's Tracing JIT`__
+
+.. __: http://www2.maths.lth.se/matematiklth/vision/publdb/reports/pdf/ardo-bolz-etal-dls-12.pdf
+
+This optimization does not fall into the traditional scheme of one forward
+pass only. In a nutshell it unrolls the trace _once_, connects the two
+traces (by inserting parameters into the jump and label of the peeled trace)
+and uses information to iron out allocations, propagate constants and
+do any other optimization currently present in the 'optimizeopt' module.
+
+It is prepended to all optimizations and thus extends the Optimizer class
+and unrolls the loop once before it proceeds.
+
+
+What is missing from this document
+----------------------------------
+
+* Guards are not explained
+* Several optimizations are not explained
+
+
+Further references
+------------------
+
+* `Allocation Removal by Partial Evaluation in a Tracing JIT`__
+* `Loop-Aware Optimizations in PyPy's Tracing JIT`__
+
+.. __: http://www.stups.uni-duesseldorf.de/mediawiki/images/b/b0/Pub-BoCuFiLePeRi2011.pdf
+.. __: http://www2.maths.lth.se/matematiklth/vision/publdb/reports/pdf/ardo-bolz-etal-dls-12.pdf
diff --git a/rpython/doc/rtyper.rst b/rpython/doc/rtyper.rst
--- a/rpython/doc/rtyper.rst
+++ b/rpython/doc/rtyper.rst
@@ -118,8 +118,7 @@
 given this representation.  The RTyper also computes a ``concretetype`` for
 Constants, to match the way they are used in the low-level operations (for
 example, ``int_add(x, 1)`` requires a ``Constant(1)`` with
-``concretetype=Signed``, but an untyped ``add(x, 1)`` works with a
-``Constant(1)`` that must actually be a PyObject at run-time).
+``concretetype=Signed``).
 
 In addition to ``lowleveltype``, each Repr subclass provides a set of methods
 called ``rtype_op_xxx()`` which define how each high-level operation ``op_xxx``
@@ -306,14 +305,14 @@
 ~~~~~~~~~~~~~
 
 As in C, pointers provide the indirection needed to make a reference modifiable
-or sharable.  Pointers can only point to a structure, an array, a function
-(see below) or a PyObject (see below).  Pointers to primitive types, if needed,
-must be done by pointing to a structure with a single field of the required
-type.  Pointer types are declared by::
+or sharable.  Pointers can only point to a structure, an array or a function
+(see below).  Pointers to primitive types, if needed, must be done by pointing
+to a structure with a single field of the required type.  Pointer types are
+declared by::
 
    Ptr(TYPE)
 
-At run-time, pointers to GC structures (GcStruct, GcArray and PyObject) hold a
+At run-time, pointers to GC structures (GcStruct, GcArray) hold a
 reference to what they are pointing to.  Pointers to non-GC structures that can
 go away when their container is deallocated (Struct, Array) must be handled
 with care: the bigger structure of which they are part of could be freed while
@@ -356,22 +355,6 @@
     :graph:      the flow graph of the function.
 
 
-The PyObject Type
-~~~~~~~~~~~~~~~~~
-
-This is a special type, for compatibility with CPython: it stands for a
-structure compatible with PyObject.  This is also a "container" type (thinking
-about C, this is ``PyObject``, not ``PyObject*``), so it is usually manipulated
-via a Ptr.  A typed graph can still contain generic space operations (add,
-getitem, etc.) provided they are applied on objects whose low-level type is
-``Ptr(PyObject)``.  In fact, code generators that support this should consider
-that the default type of a variable, if none is specified, is ``Ptr(PyObject)``.
-In this way, they can generate the correct code for fully-untyped flow graphs.
-
-The testing implementation allows you to "create" PyObjects by calling
-``pyobjectptr(obj)``.
-
-
 Opaque Types
 ~~~~~~~~~~~~
 
diff --git a/rpython/flowspace/model.py b/rpython/flowspace/model.py
--- a/rpython/flowspace/model.py
+++ b/rpython/flowspace/model.py
@@ -140,6 +140,12 @@
             newlink.llexitcase = self.llexitcase
         return newlink
 
+    def replace(self, mapping):
+        def rename(v):
+            if v is not None:
+                return v.replace(mapping)
+        return self.copy(rename)
+
     def settarget(self, targetblock):
         assert len(self.args) == len(targetblock.inputargs), (
             "output args mismatch")
@@ -215,13 +221,12 @@
         return uniqueitems([w for w in result if isinstance(w, Constant)])
 
     def renamevariables(self, mapping):
-        self.inputargs = [mapping.get(a, a) for a in self.inputargs]
-        for op in self.operations:
-            op.args = [mapping.get(a, a) for a in op.args]
-            op.result = mapping.get(op.result, op.result)
-        self.exitswitch = mapping.get(self.exitswitch, self.exitswitch)
+        self.inputargs = [a.replace(mapping) for a in self.inputargs]
+        self.operations = [op.replace(mapping) for op in self.operations]
+        if self.exitswitch is not None:
+            self.exitswitch = self.exitswitch.replace(mapping)
         for link in self.exits:
-            link.args = [mapping.get(a, a) for a in link.args]
+            link.args = [a.replace(mapping) for a in link.args]
 
     def closeblock(self, *exits):
         assert self.exits == [], "block already closed"
@@ -327,6 +332,8 @@
             newvar.concretetype = self.concretetype
         return newvar
 
+    def replace(self, mapping):
+        return mapping.get(self, self)
 
 
 class Constant(Hashable):
@@ -356,6 +363,9 @@
             # cannot count on it not mutating at runtime!
             return False
 
+    def replace(self, mapping):
+        return self
+
 
 class FSException(object):
     def __init__(self, w_type, w_value):
@@ -431,8 +441,8 @@
                                 ", ".join(map(repr, self.args)))
 
     def replace(self, mapping):
-        newargs = [mapping.get(arg, arg) for arg in self.args]
-        newresult = mapping.get(self.result, self.result)
+        newargs = [arg.replace(mapping) for arg in self.args]
+        newresult = self.result.replace(mapping)
         return type(self)(self.opname, newargs, newresult, self.offset)
 
 class Atom(object):
diff --git a/rpython/flowspace/operation.py b/rpython/flowspace/operation.py
--- a/rpython/flowspace/operation.py
+++ b/rpython/flowspace/operation.py
@@ -76,8 +76,8 @@
         self.offset = -1
 
     def replace(self, mapping):
-        newargs = [mapping.get(arg, arg) for arg in self.args]
-        newresult = mapping.get(self.result, self.result)
+        newargs = [arg.replace(mapping) for arg in self.args]
+        newresult = self.result.replace(mapping)
         newop = type(self)(*newargs)
         newop.result = newresult
         newop.offset = self.offset
@@ -422,8 +422,6 @@
 add_operator('delattr', 2, dispatch=1, pyfunc=delattr)
 add_operator('getitem', 2, dispatch=2, pure=True)
 add_operator('getitem_idx', 2, dispatch=2, pure=True)
-add_operator('getitem_key', 2, dispatch=2, pure=True)
-add_operator('getitem_idx_key', 2, dispatch=2, pure=True)
 add_operator('setitem', 3, dispatch=2)
 add_operator('delitem', 2, dispatch=2)
 add_operator('getslice', 3, dispatch=1, pyfunc=do_getslice, pure=True)
@@ -686,8 +684,6 @@
 # the annotator tests
 op.getitem.canraise = [IndexError, KeyError, Exception]
 op.getitem_idx.canraise = [IndexError, KeyError, Exception]
-op.getitem_key.canraise = [IndexError, KeyError, Exception]
-op.getitem_idx_key.canraise = [IndexError, KeyError, Exception]
 op.setitem.canraise = [IndexError, KeyError, Exception]
 op.delitem.canraise = [IndexError, KeyError, Exception]
 op.contains.canraise = [Exception]    # from an r_dict
diff --git a/rpython/flowspace/test/test_objspace.py b/rpython/flowspace/test/test_objspace.py
--- a/rpython/flowspace/test/test_objspace.py
+++ b/rpython/flowspace/test/test_objspace.py
@@ -867,7 +867,7 @@
                 raise
         graph = self.codetest(f)
         simplify_graph(graph)
-        assert self.all_operations(graph) == {'getitem_idx_key': 1}
+        assert self.all_operations(graph) == {'getitem_idx': 1}
 
         g = lambda: None
         def f(c, x):
@@ -877,7 +877,7 @@
                 g()
         graph = self.codetest(f)
         simplify_graph(graph)
-        assert self.all_operations(graph) == {'getitem_idx_key': 1,
+        assert self.all_operations(graph) == {'getitem_idx': 1,
                                               'simple_call': 2}
 
         def f(c, x):
@@ -896,7 +896,7 @@
                 raise
         graph = self.codetest(f)
         simplify_graph(graph)
-        assert self.all_operations(graph) == {'getitem_key': 1}
+        assert self.all_operations(graph) == {'getitem': 1}
 
         def f(c, x):
             try:
@@ -915,7 +915,7 @@
         graph = self.codetest(f)
         simplify_graph(graph)
         self.show(graph)
-        assert self.all_operations(graph) == {'getitem_idx_key': 1}
+        assert self.all_operations(graph) == {'getitem_idx': 1}
 
         def f(c, x):
             try:
@@ -933,7 +933,7 @@
                 return -1
         graph = self.codetest(f)
         simplify_graph(graph)
-        assert self.all_operations(graph) == {'getitem_key': 1}
+        assert self.all_operations(graph) == {'getitem': 1}
 
         def f(c, x):
             try:
diff --git a/rpython/memory/gc/incminimark.py b/rpython/memory/gc/incminimark.py
--- a/rpython/memory/gc/incminimark.py
+++ b/rpython/memory/gc/incminimark.py
@@ -52,6 +52,9 @@
 # XXX total addressable size.  Maybe by keeping some minimarkpage arenas
 # XXX pre-reserved, enough for a few nursery collections?  What about
 # XXX raw-malloced memory?
+
+# XXX try merging old_objects_pointing_to_pinned into
+# XXX old_objects_pointing_to_young (IRC 2014-10-22, fijal and gregor_w)
 import sys
 from rpython.rtyper.lltypesystem import lltype, llmemory, llarena, llgroup
 from rpython.rtyper.lltypesystem.lloperation import llop
@@ -63,6 +66,7 @@
 from rpython.rlib.rarithmetic import LONG_BIT_SHIFT
 from rpython.rlib.debug import ll_assert, debug_print, debug_start, debug_stop
 from rpython.rlib.objectmodel import specialize
+from rpython.memory.gc.minimarkpage import out_of_memory
 
 #
 # Handles the objects in 2 generations:
@@ -471,10 +475,10 @@
         # the start of the nursery: we actually allocate a bit more for
         # the nursery than really needed, to simplify pointer arithmetic
         # in malloc_fixedsize().  The few extra pages are never used
-        # anyway so it doesn't even counct.
+        # anyway so it doesn't even count.
         nursery = llarena.arena_malloc(self._nursery_memory_size(), 0)
         if not nursery:
-            raise MemoryError("cannot allocate nursery")
+            out_of_memory("cannot allocate nursery")
         return nursery
 
     def allocate_nursery(self):
@@ -685,23 +689,48 @@
 
     def collect_and_reserve(self, totalsize):
         """To call when nursery_free overflows nursery_top.
-        First check if the nursery_top is the real top, otherwise we
-        can just move the top of one cleanup and continue
-
-        Do a minor collection, and possibly also a major collection,
-        and finally reserve 'totalsize' bytes at the start of the
-        now-empty nursery.
+        First check if pinned objects are in front of nursery_top. If so,
+        jump over the pinned object and try again to reserve totalsize.
+        Otherwise do a minor collection, and possibly a major collection, and
+        finally reserve totalsize bytes.
         """
 
         minor_collection_count = 0
         while True:
             self.nursery_free = llmemory.NULL      # debug: don't use me
+            # note: no "raise MemoryError" between here and the next time
+            # we initialize nursery_free!
 
             if self.nursery_barriers.non_empty():
+                # Pinned object in front of nursery_top. Try reserving totalsize
+                # by jumping into the next, yet unused, area inside the
+                # nursery. "Next area" in this case is the space between the
+                # pinned object in front of nusery_top and the pinned object
+                # after that. Graphically explained:
+                # 
+                #     |- allocating totalsize failed in this area
+                #     |     |- nursery_top
+                #     |     |    |- pinned object in front of nursery_top,
+                #     v     v    v  jump over this
+                # +---------+--------+--------+--------+-----------+ }
+                # | used    | pinned | empty  | pinned |  empty    | }- nursery
+                # +---------+--------+--------+--------+-----------+ }
+                #                       ^- try reserving totalsize in here next
+                #
+                # All pinned objects are represented by entries in
+                # nursery_barriers (see minor_collection). The last entry is
+                # always the end of the nursery. Therefore if nursery_barriers
+                # contains only one element, we jump over a pinned object and
+                # the "next area" (the space where we will try to allocate
+                # totalsize) starts at the end of the pinned object and ends at
+                # nursery's end.
+                #
+                # find the size of the pinned object after nursery_top
                 size_gc_header = self.gcheaderbuilder.size_gc_header
                 pinned_obj_size = size_gc_header + self.get_size(
                         self.nursery_top + size_gc_header)
-
+                #
+                # update used nursery space to allocate objects
                 self.nursery_free = self.nursery_top + pinned_obj_size
                 self.nursery_top = self.nursery_barriers.popleft()
             else:
@@ -729,6 +758,9 @@
                             "Seeing minor_collection() at least twice."
                             "Too many pinned objects?")
             #
+            # Tried to do something about nursery_free overflowing
+            # nursery_top before this point. Try to reserve totalsize now.
+            # If this succeeds break out of loop.
             result = self.nursery_free
             if self.nursery_free + totalsize <= self.nursery_top:
                 self.nursery_free = result + totalsize
@@ -1491,7 +1523,7 @@
         # being moved, not from being collected if it is not reachable anymore.
         self.surviving_pinned_objects = self.AddressStack()
         # The following counter keeps track of alive and pinned young objects
-        # inside the nursery. We reset it here and increace it in
+        # inside the nursery. We reset it here and increase it in
         # '_trace_drag_out()'.
         any_pinned_object_from_earlier = self.any_pinned_object_kept
         self.pinned_objects_in_nursery = 0
@@ -1625,7 +1657,9 @@
         else:
             llarena.arena_reset(prev, self.nursery + self.nursery_size - prev, 0)
         #
+        # always add the end of the nursery to the list
         nursery_barriers.append(self.nursery + self.nursery_size)
+        #
         self.nursery_barriers = nursery_barriers
         self.surviving_pinned_objects.delete()
         #
@@ -1950,7 +1984,7 @@
         #
         arena = llarena.arena_malloc(raw_malloc_usage(totalsize), False)
         if not arena:
-            raise MemoryError("cannot allocate object")
+            out_of_memory("out of memory: couldn't allocate a few KB more")
         llarena.arena_reserve(arena, totalsize)
         #
         size_gc_header = self.gcheaderbuilder.size_gc_header
@@ -2058,7 +2092,7 @@
 
             # XXX A simplifying assumption that should be checked,
             # finalizers/weak references are rare and short which means that
-            # they do not need a seperate state and do not need to be
+            # they do not need a separate state and do not need to be
             # made incremental.
             if (not self.objects_to_trace.non_empty() and
                 not self.more_objects_to_trace.non_empty()):
@@ -2148,9 +2182,9 @@
                     # even higher memory consumption.  To prevent it, if it's
                     # the second time we are here, then abort the program.
                     if self.max_heap_size_already_raised:
-                        llop.debug_fatalerror(lltype.Void,
-                                              "Using too much memory, aborting")
+                        out_of_memory("using too much memory, aborting")
                     self.max_heap_size_already_raised = True
+                    self.gc_state = STATE_SCANNING
                     raise MemoryError
 
                 self.gc_state = STATE_FINALIZING
diff --git a/rpython/memory/gc/minimarkpage.py b/rpython/memory/gc/minimarkpage.py
--- a/rpython/memory/gc/minimarkpage.py
+++ b/rpython/memory/gc/minimarkpage.py
@@ -2,7 +2,7 @@
 from rpython.rtyper.lltypesystem import lltype, llmemory, llarena, rffi
 from rpython.rlib.rarithmetic import LONG_BIT, r_uint
 from rpython.rlib.objectmodel import we_are_translated
-from rpython.rlib.debug import ll_assert
+from rpython.rlib.debug import ll_assert, fatalerror
 
 WORD = LONG_BIT // 8
 NULL = llmemory.NULL
@@ -294,7 +294,7 @@
         # be a page-aligned address
         arena_base = llarena.arena_malloc(self.arena_size, False)
         if not arena_base:
-            raise MemoryError("couldn't allocate the next arena")
+            out_of_memory("out of memory: couldn't allocate the next arena")
         arena_end = arena_base + self.arena_size
         #
         # 'firstpage' points to the first unused page
@@ -593,3 +593,10 @@
     if isinstance(size, int):
         size = llmemory.sizeof(lltype.Char) * size
     return size
+
+def out_of_memory(errmsg):
+    """Signal a fatal out-of-memory error and abort.  For situations where
+    it is hard to write and test code that would handle a MemoryError
+    exception gracefully.
+    """
+    fatalerror(errmsg)
diff --git a/rpython/memory/gctransform/asmgcroot.py b/rpython/memory/gctransform/asmgcroot.py
--- a/rpython/memory/gctransform/asmgcroot.py
+++ b/rpython/memory/gctransform/asmgcroot.py
@@ -368,6 +368,13 @@
         if rpy_fastgil != 1:
             ll_assert(rpy_fastgil != 0, "walk_stack_from doesn't have the GIL")
             initialframedata = rffi.cast(llmemory.Address, rpy_fastgil)
+            #
+            # very rare issue: initialframedata.address[0] is uninitialized
+            # in this case, but "retaddr = callee.frame_address.address[0]"
+            # reads it.  If it happens to be exactly a valid return address
+            # inside the C code, then bad things occur.
+            initialframedata.address[0] = llmemory.NULL
+            #
             self.walk_frames(curframe, otherframe, initialframedata)
             stackscount += 1
         #
@@ -519,17 +526,15 @@
             from rpython.jit.backend.llsupport.jitframe import STACK_DEPTH_OFS
 
             tid = self.gc.get_possibly_forwarded_type_id(ebp_in_caller)
-            ll_assert(rffi.cast(lltype.Signed, tid) ==
-                      rffi.cast(lltype.Signed, self.frame_tid),
-                      "found a stack frame that does not belong "
-                      "anywhere I know, bug in asmgcc")
-            # fish the depth
-            extra_stack_depth = (ebp_in_caller + STACK_DEPTH_OFS).signed[0]
-            ll_assert((extra_stack_depth & (rffi.sizeof(lltype.Signed) - 1))
-                       == 0, "asmgcc: misaligned extra_stack_depth")
-            extra_stack_depth //= rffi.sizeof(lltype.Signed)
-            self._shape_decompressor.setjitframe(extra_stack_depth)
-            return
+            if (rffi.cast(lltype.Signed, tid) ==
+                    rffi.cast(lltype.Signed, self.frame_tid)):
+                # fish the depth
+                extra_stack_depth = (ebp_in_caller + STACK_DEPTH_OFS).signed[0]
+                ll_assert((extra_stack_depth & (rffi.sizeof(lltype.Signed) - 1))
+                           == 0, "asmgcc: misaligned extra_stack_depth")
+                extra_stack_depth //= rffi.sizeof(lltype.Signed)
+                self._shape_decompressor.setjitframe(extra_stack_depth)
+                return
         llop.debug_fatalerror(lltype.Void, "cannot find gc roots!")
 
     def getlocation(self, callee, ebp_in_caller, location):
diff --git a/rpython/rlib/_rsocket_rffi.py b/rpython/rlib/_rsocket_rffi.py
--- a/rpython/rlib/_rsocket_rffi.py
+++ b/rpython/rlib/_rsocket_rffi.py
@@ -199,7 +199,7 @@
 WSA_INVALID_PARAMETER WSA_NOT_ENOUGH_MEMORY WSA_OPERATION_ABORTED
 SIO_RCVALL SIO_KEEPALIVE_VALS
 
-SIOCGIFNAME
+SIOCGIFNAME SIOCGIFINDEX
 '''.split()
 
 for name in constant_names:
@@ -328,7 +328,8 @@
 
     if _HAS_AF_PACKET:
         CConfig.sockaddr_ll = platform.Struct('struct sockaddr_ll',
-                              [('sll_ifindex', rffi.INT),
+                              [('sll_family', rffi.INT),
+                               ('sll_ifindex', rffi.INT),
                                ('sll_protocol', rffi.INT),
                                ('sll_pkttype', rffi.INT),
                                ('sll_hatype', rffi.INT),
diff --git a/rpython/rlib/rsocket.py b/rpython/rlib/rsocket.py
--- a/rpython/rlib/rsocket.py
+++ b/rpython/rlib/rsocket.py
@@ -5,15 +5,8 @@
 a drop-in replacement for the 'socket' module.
 """
 
-# Known missing features:
-#
-#   - address families other than AF_INET, AF_INET6, AF_UNIX, AF_PACKET
-#   - AF_PACKET is only supported on Linux
-#   - methods makefile(),
-#   - SSL
-#
-# It's unclear if makefile() and SSL support belong here or only as
-# app-level code for PyPy.
+# XXX this does not support yet the least common AF_xxx address families
+# supported by CPython.  See http://bugs.pypy.org/issue1942
 
 from rpython.rlib import _rsocket_rffi as _c, jit, rgc
 from rpython.rlib.objectmodel import instantiate, keepalive_until_here
@@ -200,23 +193,49 @@
         family = AF_PACKET
         struct = _c.sockaddr_ll
         maxlen = minlen = sizeof(struct)
+        ifr_name_size = _c.ifreq.c_ifr_name.length
+        sll_addr_size = _c.sockaddr_ll.c_sll_addr.length
+
+        def __init__(self, ifindex, protocol, pkttype=0, hatype=0, haddr=""):
+            addr = lltype.malloc(_c.sockaddr_ll, flavor='raw', zero=True,
+                                 track_allocation=False)
+            self.setdata(addr, PacketAddress.maxlen)
+            rffi.setintfield(addr, 'c_sll_family', AF_PACKET)
+            rffi.setintfield(addr, 'c_sll_protocol', htons(protocol))
+            rffi.setintfield(addr, 'c_sll_ifindex', ifindex)
+            rffi.setintfield(addr, 'c_sll_pkttype', pkttype)
+            rffi.setintfield(addr, 'c_sll_hatype', hatype)
+            halen = rffi.str2chararray(haddr,
+                                       rffi.cast(rffi.CCHARP, addr.c_sll_addr),
+                                       PacketAddress.sll_addr_size)
+            rffi.setintfield(addr, 'c_sll_halen', halen)
+
+        @staticmethod
+        def get_ifindex_from_ifname(fd, ifname):
+            p = lltype.malloc(_c.ifreq, flavor='raw')
+            iflen = rffi.str2chararray(ifname,
+                                       rffi.cast(rffi.CCHARP, p.c_ifr_name),
+                                       PacketAddress.ifr_name_size - 1)
+            p.c_ifr_name[iflen] = '\0'
+            err = _c.ioctl(fd, _c.SIOCGIFINDEX, p)
+            ifindex = p.c_ifr_ifindex
+            lltype.free(p, flavor='raw')
+            if err != 0:
+                raise RSocketError("invalid interface name")
+            return ifindex
 
         def get_ifname(self, fd):
+            ifname = ""
             a = self.lock(_c.sockaddr_ll)
-            p = lltype.malloc(_c.ifreq, flavor='raw')
-            rffi.setintfield(p, 'c_ifr_ifindex',
-                             rffi.getintfield(a, 'c_sll_ifindex'))
-            if (_c.ioctl(fd, _c.SIOCGIFNAME, p) == 0):
-                # eh, the iface name is a constant length array
-                i = 0
-                d = []
-                while p.c_ifr_name[i] != '\x00' and i < len(p.c_ifr_name):
-                    d.append(p.c_ifr_name[i])
-                    i += 1
-                ifname = ''.join(d)
-            else:
-                ifname = ""
-            lltype.free(p, flavor='raw')
+            ifindex = rffi.getintfield(a, 'c_sll_ifindex')
+            if ifindex:
+                p = lltype.malloc(_c.ifreq, flavor='raw')
+                rffi.setintfield(p, 'c_ifr_ifindex', ifindex)
+                if (_c.ioctl(fd, _c.SIOCGIFNAME, p) == 0):
+                    ifname = rffi.charp2strn(
+                        rffi.cast(rffi.CCHARP, p.c_ifr_name),
+                        PacketAddress.ifr_name_size)
+                lltype.free(p, flavor='raw')
             self.unlock()
             return ifname
 
@@ -235,11 +254,11 @@
 
         def get_hatype(self):
             a = self.lock(_c.sockaddr_ll)
-            res = bool(rffi.getintfield(a, 'c_sll_hatype'))
+            res = rffi.getintfield(a, 'c_sll_hatype')
             self.unlock()
             return res
 
-        def get_addr(self):
+        def get_haddr(self):
             a = self.lock(_c.sockaddr_ll)
             lgt = rffi.getintfield(a, 'c_sll_halen')
             d = []
diff --git a/rpython/rlib/rzipfile.py b/rpython/rlib/rzipfile.py
--- a/rpython/rlib/rzipfile.py
+++ b/rpython/rlib/rzipfile.py
@@ -8,7 +8,7 @@
 
 try:
     from rpython.rlib import rzlib
-except (ImportError, CompilationError):
+except CompilationError:
     rzlib = None
 
 crc_32_tab = [
diff --git a/rpython/rlib/rzlib.py b/rpython/rlib/rzlib.py
--- a/rpython/rlib/rzlib.py
+++ b/rpython/rlib/rzlib.py
@@ -22,13 +22,10 @@
         includes=['zlib.h'],
         testonly_libraries = testonly_libraries
     )
-try:
-    eci = rffi_platform.configure_external_library(
-        libname, eci,
-        [dict(prefix='zlib-'),
-         ])
-except CompilationError:
-    raise ImportError("Could not find a zlib library")
+eci = rffi_platform.configure_external_library(
+    libname, eci,
+    [dict(prefix='zlib-'),
+     ])
 
 
 constantnames = '''
diff --git a/rpython/rlib/test/test_rzipfile.py b/rpython/rlib/test/test_rzipfile.py
--- a/rpython/rlib/test/test_rzipfile.py
+++ b/rpython/rlib/test/test_rzipfile.py
@@ -9,7 +9,7 @@
 
 try:
     from rpython.rlib import rzlib
-except ImportError, e:
+except CompilationError as e:
     py.test.skip("zlib not installed: %s " % (e, ))
 
 class BaseTestRZipFile(BaseRtypingTest):
diff --git a/rpython/rtyper/lltypesystem/rffi.py b/rpython/rtyper/lltypesystem/rffi.py
--- a/rpython/rtyper/lltypesystem/rffi.py
+++ b/rpython/rtyper/lltypesystem/rffi.py
@@ -835,6 +835,14 @@
         else:
             lltype.free(cp, flavor='raw', track_allocation=False)
 
+    # str -> already-existing char[maxsize]
+    def str2chararray(s, array, maxsize):
+        length = min(len(s), maxsize)
+        ll_s = llstrtype(s)
+        copy_string_to_raw(ll_s, array, 0, length)
+        return length
+    str2chararray._annenforceargs_ = [strtype, None, int]
+
     # char* -> str
     # doesn't free char*
     def charp2str(cp):
@@ -985,19 +993,19 @@
     return (str2charp, free_charp, charp2str,
             get_nonmovingbuffer, free_nonmovingbuffer,
             alloc_buffer, str_from_buffer, keep_buffer_alive_until_here,
-            charp2strn, charpsize2str,
+            charp2strn, charpsize2str, str2chararray,
             )
 
 (str2charp, free_charp, charp2str,
  get_nonmovingbuffer, free_nonmovingbuffer,
  alloc_buffer, str_from_buffer, keep_buffer_alive_until_here,
- charp2strn, charpsize2str,
+ charp2strn, charpsize2str, str2chararray,
  ) = make_string_mappings(str)
 
 (unicode2wcharp, free_wcharp, wcharp2unicode,
  get_nonmoving_unicodebuffer, free_nonmoving_unicodebuffer,
  alloc_unicodebuffer, unicode_from_buffer, keep_unicodebuffer_alive_until_here,
- wcharp2unicoden, wcharpsize2unicode,
+ wcharp2unicoden, wcharpsize2unicode, unicode2wchararray,
  ) = make_string_mappings(unicode)
 
 # char**
diff --git a/rpython/rtyper/lltypesystem/rlist.py b/rpython/rtyper/lltypesystem/rlist.py
--- a/rpython/rtyper/lltypesystem/rlist.py
+++ b/rpython/rtyper/lltypesystem/rlist.py
@@ -1,6 +1,7 @@
 from rpython.rlib import rgc, jit, types
 from rpython.rlib.debug import ll_assert
 from rpython.rlib.signature import signature
+from rpython.rtyper.error import TyperError
 from rpython.rtyper.lltypesystem import rstr
 from rpython.rtyper.lltypesystem.lltype import (GcForwardReference, Ptr, GcArray,
      GcStruct, Void, Signed, malloc, typeOf, nullptr, typeMethod)
@@ -57,7 +58,7 @@
         elif variant == ("reversed",):
             return ReversedListIteratorRepr(self)
         else:
-            raise NotImplementedError(variant)
+            raise TyperError("unsupported %r iterator over a list" % (variant,))
 
     def get_itemarray_lowleveltype(self):
         ITEM = self.item_repr.lowleveltype
diff --git a/rpython/rtyper/lltypesystem/rrange.py b/rpython/rtyper/lltypesystem/rrange.py
--- a/rpython/rtyper/lltypesystem/rrange.py
+++ b/rpython/rtyper/lltypesystem/rrange.py
@@ -1,5 +1,6 @@
 from rpython.rtyper.lltypesystem.lltype import Ptr, GcStruct, Signed, malloc, Void
 from rpython.rtyper.rrange import AbstractRangeRepr, AbstractRangeIteratorRepr
+from rpython.rtyper.error import TyperError
 
 # ____________________________________________________________
 #
@@ -59,7 +60,10 @@
         self.ll_newrange = ll_newrange
         self.ll_newrangest = ll_newrangest
 
-    def make_iterator_repr(self):
+    def make_iterator_repr(self, variant=None):
+        if variant is not None:
+            raise TyperError("unsupported %r iterator over a range list" %
+                             (variant,))
         return RangeIteratorRepr(self)
 
 
diff --git a/rpython/rtyper/lltypesystem/rstr.py b/rpython/rtyper/lltypesystem/rstr.py
--- a/rpython/rtyper/lltypesystem/rstr.py
+++ b/rpython/rtyper/lltypesystem/rstr.py
@@ -188,7 +188,10 @@
             self.CACHE[value] = p
             return p
 
-    def make_iterator_repr(self):
+    def make_iterator_repr(self, variant=None):
+        if variant is not None:
+            raise TyperError("unsupported %r iterator over a str/unicode" %
+                             (variant,))
         return self.repr.iterator_repr
 
     def can_ll_be_null(self, s_value):
diff --git a/rpython/rtyper/lltypesystem/test/test_rffi.py b/rpython/rtyper/lltypesystem/test/test_rffi.py
--- a/rpython/rtyper/lltypesystem/test/test_rffi.py
+++ b/rpython/rtyper/lltypesystem/test/test_rffi.py
@@ -671,6 +671,23 @@
 
         assert interpret(f, [], backendopt=True) == 43
 
+    def test_str2chararray(self):
+        eci = ExternalCompilationInfo(includes=['string.h'])
+        strlen = llexternal('strlen', [CCHARP], SIZE_T,
+                            compilation_info=eci)
+        def f():
+            raw = str2charp("XxxZy")
+            n = str2chararray("abcdef", raw, 4)
+            assert raw[0] == 'a'
+            assert raw[1] == 'b'
+            assert raw[2] == 'c'
+            assert raw[3] == 'd'
+            assert raw[4] == 'y'
+            lltype.free(raw, flavor='raw')
+            return n
+
+        assert interpret(f, []) == 4
+
     def test_around_extcall(self):
         if sys.platform == "win32":
             py.test.skip('No pipes on windows')
diff --git a/rpython/rtyper/rlist.py b/rpython/rtyper/rlist.py
--- a/rpython/rtyper/rlist.py
+++ b/rpython/rtyper/rlist.py
@@ -269,13 +269,9 @@
         v_res = hop.gendirectcall(llfn, c_func_marker, c_basegetitem, v_lst, v_index)
         return r_lst.recast(hop.llops, v_res)
 
-    rtype_getitem_key = rtype_getitem
-
     def rtype_getitem_idx((r_lst, r_int), hop):
         return pair(r_lst, r_int).rtype_getitem(hop, checkidx=True)
 
-    rtype_getitem_idx_key = rtype_getitem_idx
-
     def rtype_setitem((r_lst, r_int), hop):
         if hop.has_implicit_exception(IndexError):
             spec = dum_checkidx
diff --git a/rpython/rtyper/rmodel.py b/rpython/rtyper/rmodel.py
--- a/rpython/rtyper/rmodel.py
+++ b/rpython/rtyper/rmodel.py
@@ -287,11 +287,9 @@
 
     # default implementation for checked getitems
 
-    def rtype_getitem_idx_key((r_c1, r_o1), hop):
+    def rtype_getitem_idx((r_c1, r_o1), hop):
         return pair(r_c1, r_o1).rtype_getitem(hop)
 
-    rtype_getitem_idx = rtype_getitem_idx_key
-    rtype_getitem_key = rtype_getitem_idx_key
 
 # ____________________________________________________________
 
diff --git a/rpython/rtyper/rstr.py b/rpython/rtyper/rstr.py
--- a/rpython/rtyper/rstr.py
+++ b/rpython/rtyper/rstr.py
@@ -580,13 +580,9 @@
             hop.exception_cannot_occur()
         return hop.gendirectcall(llfn, v_str, v_index)
 
-    rtype_getitem_key = rtype_getitem
-
     def rtype_getitem_idx((r_str, r_int), hop):
         return pair(r_str, r_int).rtype_getitem(hop, checkidx=True)
 
-    rtype_getitem_idx_key = rtype_getitem_idx
-
     def rtype_mul((r_str, r_int), hop):
         str_repr = r_str.repr
         v_str, v_int = hop.inputargs(str_repr, Signed)
diff --git a/rpython/rtyper/rtuple.py b/rpython/rtyper/rtuple.py
--- a/rpython/rtyper/rtuple.py
+++ b/rpython/rtyper/rtuple.py
@@ -210,7 +210,10 @@
 
     ll_str = property(gen_str_function)
 
-    def make_iterator_repr(self):
+    def make_iterator_repr(self, variant=None):
+        if variant is not None:
+            raise TyperError("unsupported %r iterator over a tuple" %
+                             (variant,))
         if len(self.items_r) == 1:
             # subclasses are supposed to set the IteratorRepr attribute
             return self.IteratorRepr(self)
diff --git a/rpython/rtyper/test/test_rstr.py b/rpython/rtyper/test/test_rstr.py
--- a/rpython/rtyper/test/test_rstr.py
+++ b/rpython/rtyper/test/test_rstr.py
@@ -116,6 +116,16 @@
         res = self.interpret(fn, [1])
         assert res == 1 + ord('a') + 10000
 
+    def test_str_iterator_reversed_unsupported(self):
+        const = self.const
+        def fn():
+            total = 0
+            t = const('foo')
+            for x in reversed(t):
+                total += ord(x)
+            return total
+        py.test.raises(TyperError, self.interpret, fn, [])
+
     def test_char_constant(self):
         const = self.const
         def fn(s):
diff --git a/rpython/rtyper/test/test_rtuple.py b/rpython/rtyper/test/test_rtuple.py
--- a/rpython/rtyper/test/test_rtuple.py
+++ b/rpython/rtyper/test/test_rtuple.py
@@ -1,8 +1,10 @@
+import py
 from rpython.rtyper.rtuple import TUPLE_TYPE, TupleRepr
 from rpython.rtyper.lltypesystem.lltype import Signed, Bool
 from rpython.rtyper.rbool import bool_repr
 from rpython.rtyper.rint import signed_repr
 from rpython.rtyper.test.tool import BaseRtypingTest
+from rpython.rtyper.error import TyperError
 from rpython.rlib.objectmodel import compute_hash
 from rpython.translator.translator import TranslationContext
 
@@ -228,6 +230,15 @@
         res = self.interpret(f, [93813])
         assert res == 93813
 
+    def test_tuple_iterator_reversed_unsupported(self):
+        def f(i):
+            total = 0
+            t = (i,)
+            for x in reversed(t):
+                total += x
+            return total
+        py.test.raises(TyperError, self.interpret, f, [93813])
+
     def test_inst_tuple_iter(self):
         class A:
             pass
diff --git a/rpython/translator/c/test/test_newgc.py b/rpython/translator/c/test/test_newgc.py
--- a/rpython/translator/c/test/test_newgc.py
+++ b/rpython/translator/c/test/test_newgc.py
@@ -2,6 +2,7 @@
 import inspect
 import os
 import sys
+import subprocess
 
 import py
 
@@ -50,8 +51,8 @@
             t.viewcg()
         exename = t.compile()
 
-        def run(s, i):
-            data = py.process.cmdexec("%s %s %d" % (exename, s, i))
+        def run(s, i, runner=subprocess.check_output):
+            data = runner([str(exename), str(s), str(i)])
             data = data.strip()
             if data == 'MEMORY-ERROR':
                 raise MemoryError
@@ -115,11 +116,11 @@
             cls.c_allfuncs.close_isolate()
             cls.c_allfuncs = None
 
-    def run(self, name, *args):
+    def run(self, name, *args, **kwds):
         if not args:
             args = (-1, )
         print 'Running %r)' % name
-        res = self.c_allfuncs(name, *args)
+        res = self.c_allfuncs(name, *args, **kwds)
         num = self.name_to_func[name]
         if self.funcsstr[num]:
             return res
@@ -1524,6 +1525,38 @@
         res = self.run("nongc_opaque_attached_to_gc")
         assert res == 0