[pypy-commit] extradoc edits1: break up sentences, other small edits

Thu Sep 20 17:21:16 EDT 2018

Author: Matti Picus <matti.picus at gmail.com>
Branch: edits1
Changeset: r5906:f05b4f08b01a
Date: 2018-09-21 00:20 +0300
http://bitbucket.org/pypy/extradoc/changeset/f05b4f08b01a/

Log:	break up sentences, other small edits

diff --git a/blog/draft/2018-09-cpyext/cpyext.rst b/blog/draft/2018-09-cpyext/cpyext.rst
--- a/blog/draft/2018-09-cpyext/cpyext.rst
+++ b/blog/draft/2018-09-cpyext/cpyext.rst
@@ -1,28 +1,28 @@
-Inside cpyext: why emulating CPython C API is so hard
-======================================================
+Inside cpyext: Why emulating CPython C API is so Hard
+=====================================================
 
-cpyext is PyPy's subsistem which is responsible to provide a compatibility
-layer to compile and run CPython C extensions inside PyPy.  Often people asks
-why it this particular extension doesn't work or it is very slow on PyPy, but
-usually it is hard to answer without going into technical details: the goal of
+``cpyext`` is PyPy's subsystem which provides a compatibility
+layer to compile and run CPython C extensions inside PyPy.  Often people ask
+why a particular C extension doesn't work or is very slow on PyPy.
+Usually it is hard to answer without going into technical details. The goal of
 this blog post is to explain some of these technical details, so that we can
 simply link here instead of explaing again and again :).
 
-From a 10.000 foot view, cpyext is PyPy's version of ``"Python.h"``: every time
-you compile and extension which uses that header file, you are using cpyext:
-this includes extension explicitly written in C (such as ``numpy``) and
+From a 10.000 foot view, ``cpyext`` is PyPy's version of ``"Python.h"``. Every time
+you compile an extension which uses that header file, you are using ``cpyext``.
+This includes extension explicitly written in C (such as ``numpy``) and
 extensions which are generated from other compilers/preprocessors
 (e.g. ``Cython``).
 
 At the time of writing, the current status is that most C extensions "just
-work": generally speaking, you can simply ``pip install`` all of them,
+work". Generally speaking, you can simply ``pip install`` them,
 provided they use the public, `official C API`_ instead of poking at private
-implementation details.  However, the performance of cpyext are generally
-poor, meaning that a Python program which makes heavy use of cpyext extensions
+implementation details.  However, the performance of cpyext is generally
+poor. A Python program which makes heavy use of ``cpyext`` extensions
 is likely to be slower on PyPy than on CPython.
 
 Note: in this blog post we are talking about Python 2.7 because it is still
-the default version of PyPy: however most of the implementation of cpyext is
+the default version of PyPy: however most of the implementation of ``cpyext`` is
 shared with PyPy3, so everything applies to that as well.
 
 .. _`official C API`: https://docs.python.org/2/c-api/index.html
@@ -31,29 +31,29 @@
 C API Overview
 ---------------
 
-In CPython, at the C level, Python objects are represented as ``PyObject*``,
+In CPython, which is written in C, Python objects are represented as ``PyObject*``,
 i.e. (mostly) opaque pointers to some common "base struct".
 
 CPython uses a very simple memory management scheme: when you create an
-object, you allocate a block of memory of the appropriate size on the heap;
-depending on the details you might end up calling different allocators, but
+object, you allocate a block of memory of the appropriate size on the heap.
+Depending on the details, you might end up calling different allocators, but
 for the sake of simplicity, you can think that this ends up being a call to
 ``malloc()``. The resulting block of memory is initialized and casted to to
 ``PyObject*``: this address never changes during the object lifetime, and the
 C code can freely pass it around, store it inside containers, retrieve it
 later, etc.
 
-Memory is managed using reference counting: when you create a new reference to
+Memory is managed using reference counting. When you create a new reference to
 an object, or you discard a reference you own, you have to increment_ or
-decrement_ reference counter accordingly. When the reference counter goes to
-0, it means that the object is no longer used by anyone and can safely be
+decrement_ the reference counter accordingly. When the reference counter goes to
+0, it means that the object is no longer used and can safely be
 destroyed. Again, we can simplify and say that this results in a call to
 ``free()``, which finally releases the memory which was allocated by ``malloc()``.
 
 .. _increment: https://docs.python.org/2/c-api/refcounting.html#c.Py_INCREF
 .. _decrement: https://docs.python.org/2/c-api/refcounting.html#c.Py_DECREF
 
-Generally speaking, the only way to operate on ``PyObject*`` is to call the
+Generally speaking, the only way to operate on a ``PyObject*`` is to call the
 appropriate API functions. For example, to convert a given ``PyObject*`` to a C
 integer, you can use _`PyInt_AsLong()`; to add two objects together, you can
 call _`PyNumber_Add()`.
@@ -61,7 +61,7 @@
 .. _`PyInt_AsLong()`: https://docs.python.org/2/c-api/int.html?highlight=pyint_check#c.PyInt_AsLong
 .. _`PyNumber_Add()`: https://docs.python.org/2/c-api/number.html#c.PyNumber_Add
 
-Internally, PyPy uses a similar approach: all Python objects are subclasses of
+Internally, PyPy uses a similar approach. All Python objects are subclasses of
 the RPython ``W_Root`` class, and they are operated by calling methods on the
 ``space`` singleton, which represents the interpreter.
 
@@ -86,27 +86,27 @@
 
 
 The PyPy GC
--------------
+-----------
 
-To understand some of cpyext challenges, you need to have at least a rough
+To understand some of ``cpyext`` challenges, you need to have at least a rough
 idea of how the PyPy GC works.
 
 Contrarily to the popular belief, the "Garbage Collector" is not only about
-collecting garbage: instead, it is generally responsible of all memory
+collecting garbage: instead, it is generally responsible for all memory
 management, including allocation and deallocation.
 
 Whereas CPython uses a combination of malloc/free/refcounting to manage
 memory, the PyPy GC uses a completely different approach. It is designed
 assuming that a dynamic language like Python behaves the following way:
 
-  - you create, either directly or indirectly, lots of objects;
+  - You create, either directly or indirectly, lots of objects.
 
-  - most of these objects are temporary and very short-lived: think e.g. of
+  - Most of these objects are temporary and very short-lived. Think e.g. of
     doing ``a + b + c``: you need to allocate an object to hold the temporary
-    result of ``a + b``, but it dies very quickly because you no longer need it
-    when you do the final ``+ c`` part;
+    result of ``a + b``, then it dies very quickly because you no longer need it
+    when you do the final ``+ c`` part.
 
-  - only small fraction of the objects survives and stay around for a while.
+  - Only small fraction of the objects survive and stay around for a while.
 
 So, the strategy is: make allocation as fast as possible; make deallocation of
 short-lived objects as fast as possible; find a way to handle the remaining
@@ -114,23 +114,24 @@
 
 This is done using a **Generational GC**: the basic idea is the following:
 
-  1. we have a nursery, where we allocate "young objects" very fast;
+  1. We have a nursery, where we allocate "young objects" very quickly.
 
-  2. when the nursery is full, we start what we call a "minor collection": we
-     do quick scan to determine the small set of objects which survived so
-     far;
+  2. When the nursery is full, we start what we call a "minor collection". 
+     
+     - We do a quick scan to determine the small set of objects which survived so
+       far
 
-  3. we **move** these objects out of the nursery, and we place them in the
-     area of memory which contains the "old objects"; since the address of the
-     objects just changed, we fix all the references to them accordingly;
+     - We **move** these objects out of the nursery, and we place them in the
+       area of memory which contains the "old objects". Since the address of the
+       objects changes, we fix all the references to them accordingly.
 
-  4. now the nursery contains only objects which died young: we can simply
-     discard all of them very quickly, reset the nursery and use the same area
+  4. now the nursery contains only objects which "died young". We can
+     discard all of them very quickly, reset the nursery, and use the same area
      of memory to allocate new objects from now.
 
 In practice, this scheme works very well and it is one of the reasons why PyPy
 is much faster than CPython.  However, careful readers have surely noticed
-that this is a problem for ``cpyext``: on one hand, we have PyPy objects which
+that this is a problem for ``cpyext``. On one hand, we have PyPy objects which
 can potentially move and change their underlying memory address; on the other
 hand, we need a way to represent them as fixed-address ``PyObject*`` when we
 pass them to C extensions.  We surely need a way to handle that.
@@ -141,39 +142,40 @@
 
 Another challenge is that sometimes, ``PyObject*`` structs are not completely
 opaque: there are parts of the public API which expose to the user specific
-fields of some concrete C struct, for example the definition of PyTypeObject_:
-since the low-level layout of PyPy ``W_Root`` objects is completely different
+fields of some concrete C struct. For example the definition of PyTypeObject_
+which exposes many of the ``tp_*`` slots to Cython (OK - ???)
+Since the low-level layout of PyPy ``W_Root`` objects is completely different
 than the one used by CPython, we cannot simply pass RPython objects to C; we
 need a way to handle the difference.
 
 .. _PyTypeObject: https://docs.python.org/2/c-api/typeobj.html
 
-So, we have two issues so far: objects which can move, and incompatible
+So, we have two issues so far: objects can move, and incompatible
 low-level layouts. ``cpyext`` solves both by decoupling the RPython and the C
-representations: we have two "views" of the same entity, depending on whether
+representations. We have two "views" of the same entity, depending on whether
 we are in the PyPy world (the movable ``W_Root`` subclass) or in the C world
 (the non-movable ``PyObject*``).
 
-``PyObject*`` are created lazily, only when they are actually needed: the
+``PyObject*`` are created lazily, only when they are actually needed. The
 vast majority of PyPy objects are never passed to any C extension, so we don't
-pay any penalty in that case; however, the first time we pass a ``W_Root`` to
+pay any penalty in that case. However, the first time we pass a ``W_Root`` to
 C, we allocate and initialize its ``PyObject*`` counterpart.
 
 The same idea applies also to objects which are created in C, e.g. by calling
-_`PyObject_New`: at first, only the ``PyObject*`` exists and it is
-exclusively managed by reference counting: as soon as we pass it to the PyPy
+_`PyObject_New`. At first, only the ``PyObject*`` exists and it is
+exclusively managed by reference counting. As soon as we pass it to the PyPy
 world (e.g. as a return value of a function call), we create its ``W_Root``
 counterpart, which is managed by the GC as usual.
 
 .. _`PyObject_New`: https://docs.python.org/2/c-api/allocation.html#c.PyObject_New
 
 Here we start to see why calling cpyext modules is more costly in PyPy than in
-CPython: we need to pay some penalty for all the conversions between
+CPython. We need to pay some penalty for all the conversions between
 ``W_Root`` and ``PyObject*``.
 
 Moreover, the first time we pass a ``W_Root`` to C we also need to allocate
 the memory for the ``PyObject*`` using a slowish "CPython-style" memory
-allocator: in practice, for all the objects which are passed to C we pay more
+allocator. In practice, for all the objects which are passed to C we pay more
 or less the same costs as CPython, thus effectively "undoing" the speedup
 guaranteed by PyPy's Generational GC under normal circumstances.
 
@@ -181,7 +183,7 @@
 Maintaining the link between ``W_Root`` and ``PyObject*``
 -----------------------------------------------------------
 
-So, we need a way to convert between ``W_Root`` and ``PyObject*`` and
+We now need a way to convert between ``W_Root`` and ``PyObject*`` and
 vice-versa; also, we need to to ensure that the lifetime of the two entities
 are in sync. In particular:
 
@@ -192,7 +194,7 @@
      make sure that the GC does not collect the ``W_Root``.
 
 The ``PyObject*`` ==> ``W_Root`` link is maintained by the special field
-_`ob_pypy_link` which is added to all ``PyObject*``: on a 64 bit machine this
+_`ob_pypy_link` which is added to all ``PyObject*``. On a 64 bit machine this
 means that all ``PyObject*`` have 8 bytes of overhead, but then the
 conversion is very quick, just reading the field.
 
@@ -205,7 +207,7 @@
 However, for a _`few selected` ``W_Root`` subclasses we **do** maintain a
 direct link using the special ``_cpy_ref`` field to improve performance. In
 particular, we use it for ``W_TypeObject`` (which is big anyway, so a 8 bytes
-overhead is negligible) and ``W_NoneObject``: ``None`` is passed around very
+overhead is negligible) and ``W_NoneObject``. ``None`` is passed around very
 often, so we want to ensure that the conversion to ``PyObject*`` is very
 fast. Moreover it's a singleton, so the 8 bytes overhead is negligible as
 well.
@@ -213,7 +215,7 @@
 This means that in theory, passing an arbitrary Python object to C is
 potentially costly, because it involves doing a dictionary lookup.  We assume
 that this cost will eventually show up in the profiler: however, at the time
-of writing there are other parts of cpyext which are even more costly (as we
+of writing there are other parts of ``cpyext`` which are even more costly (as we
 will show later), so the cost of the dict lookup is never evident in the
 profiler.
 
@@ -224,25 +226,25 @@
 
 
 Crossing the border between RPython and C
-------------------------------------------
+-----------------------------------------
 
 There are two other things we need to care about whenever we cross the border
 between RPython and C, and vice-versa: exception handling and the GIL.
 
 In the C API, exceptions are raised by calling `PyErr_SetString()`_ (or one of
 `many other functions`_ which have a similar effect), which basically works by
-creating an exception value and storing it in some global variable; then, the
-function signals that an exception has occurred by returning an error value,
+creating an exception value and storing it in some global variable. The
+function then signals that an exception has occurred by returning an error value,
 usually ``NULL``.
 
-On the other hand, in the PyPy interpreter they are propagated by raising the
+On the other hand, in the PyPy interpreter, exceptions are propagated by raising the
 RPython-level OperationError_ exception, which wraps the actual app-level
-exception values: to harmonize the two worlds, whenever we return from C to
-RPython, we need to check whether a C API exception was raised and turn it
-into an ``OperationError`` if needed.
+exception values: to harmonize the two worlds. Whenever we return from C to
+RPython, we need to check whether a C API exception was raised and if so turn it
+into an ``OperationError``.
 
-About the GIL, we won't dig into details of `how it is handled in cpyext`_:
-for the purpose of this post, it is enough to know that whenever we enter the
+We won't dig into details of `how the GIL is handled in cpyext`_.
+For the purpose of this post, it is enough to know that whenever we enter
 C land, we store the current thread id into a global variable which is
 accessible also from C; conversely, whenever we go back from RPython to C, we
 restore this value to 0.
@@ -251,80 +253,80 @@
 border between C and RPython, e.g. by calling a Python callback from C code.
 
 All this complexity is automatically handled by the RPython function
-`generic_cpy_call`_: if you look at the code you see that it takes care of 4
+`generic_cpy_call`_. If you look at the code you see that it takes care of 4
 things:
 
-  1. handling the GIL as explained above
+  1. Handling the GIL as explained above.
 
-  2. handling exceptions, if they are raised
+  2. Handling exceptions, if they are raised.
 
-  3. converting arguments from ``W_Root`` to ``PyObject*``
+  3. Converting arguments from ``W_Root`` to ``PyObject*``.
 
-  4. converting the return value from ``PyObject*`` to ``W_Root``
+  4. Converting the return value from ``PyObject*`` to ``W_Root``.
 
 
-So, we can see that calling C from RPython introduce some overhead: how much
-is it?
+So, we can see that calling C from RPython introduce some overhead.
+Can we measure it?
 
 Assuming that the conversion between ``W_Root`` and ``PyObject*`` has a
 reasonable cost (as explained by the previous section), the overhead
 introduced by a single border-cross is still accettable, especially if the
 callee is doing some non-negligible amount of work.
 
-However this is not always the case; there are basically three problems that
-make (or used to make) cpyext super slow:
+However this is not always the case. There are basically three problems that
+make (or used to make) ``cpyext`` super slow:
 
-  1. paying the border-crossing cost for trivial operations which are called
-     very often, such as ``Py_INCREF``
+  1. Paying the border-crossing cost for trivial operations which are called
+     very often, such as ``Py_INCREF``.
 
-  2. crossing the border back and forth many times, even if it's not strictly
-     needed
+  2. Crossing the border back and forth many times, even if it's not strictly
+     needed.
 
-  3. paying an excessive cost for argument and return value conversions
+  3. Paying an excessive cost for argument and return value conversions.
 
 
-The next sections are going to explain in more detail each of these problems.
+The next sections explain in more detail each of these problems.
 
 .. _`PyErr_SetString()`: https://docs.python.org/2/c-api/exceptions.html#c.PyErr_SetString
 .. _`many other functions`: https://docs.python.org/2/c-api/exceptions.html#exception-handling
 .. _OperationError: https://bitbucket.org/pypy/pypy/src/b9bbd6c0933349cbdbfe2b884a68a16ad16c3a8a/pypy/interpreter/error.py#lines-20
-.. _`how it is handled in cpyext`: https://bitbucket.org/pypy/pypy/src/b9bbd6c0933349cbdbfe2b884a68a16ad16c3a8a/pypy/module/cpyext/api.py#lines-205
+.. _`how the GIL is handled in cpyext`: https://bitbucket.org/pypy/pypy/src/b9bbd6c0933349cbdbfe2b884a68a16ad16c3a8a/pypy/module/cpyext/api.py#lines-205
 .. _`generic_cpy_call`: https://bitbucket.org/pypy/pypy/src/b9bbd6c0933349cbdbfe2b884a68a16ad16c3a8a/pypy/module/cpyext/api.py#lines-1757
 
 
 Avoiding unnecessary roundtrips
 --------------------------------
 
-Prior to the `2017 Cape Town Sprint`_, cpyext was horribly slow, and we were
+Prior to the `2017 Cape Town Sprint`_, ``cpyext`` was horribly slow, and we were
 well aware of it: the main reason was that we never really paid too much
-attention to performances: as explained by this blog post, emulating all the
+attention to performance. As explained in the blog post, emulating all the
 CPython quirks is basically a nightmare, so better to concentrate on
 correctness first.
 
-However, we didn't really know **why** it was so slow: we had theories and
+However, we didn't really know **why** it was so slow. We had theories and
 assumptions, usually pointing at the cost of conversions between ``W_Root``
 and ``PyObject*``, but we never actually measured it.
 
 So, we decided to write a set of `cpyext microbenchmarks`_ to measure the
-performance of various operation.  The result was somewhat surprising: the
+performance of various operations.  The result was somewhat surprising: the
 theory suggests that when you do a cpyext C call, you should pay the
 border-crossing costs only once, but what the profiler told us was that we
-were paying the cost of ``generic_cpy_call`` several times what we expected.
+were paying the cost of ``generic_cpy_call`` several times more than what we expected.
 
 After a bit of investigation, we discovered this was ultimately caused by our
-"correctness-first" approach.  For simplicity of development and testing, when
-we started cpyext we wrote everything in RPython: thus, every single API call
+"correctness-first" approach. For simplicity of development and testing, when
+we started ``cpyext`` we wrote everything in RPython: thus, every single API call
 made from C (like the omnipresent `PyArg_ParseTuple`_, `PyInt_AsLong`_, etc.)
-had to cross back the C-to-RPython border: this was especially daunting for
+had to cross back the C-to-RPython border. This was especially daunting for
 very simple and frequent operations like ``Py_INCREF`` and ``Py_DECREF``,
 which CPython implements as a single assembly instruction!
 
-Another source of slowness was the implementation of ``PyTypeObject`` slots:
-at the C level, these are function pointers which the interpreter calls to do
+Another source of slow down was the implementation of ``PyTypeObject`` slots.
+At the C level, these are function pointers which the interpreter calls to do
 certain operations, e.g. `tp_new`_ to allocate a new instance of that type.
 
 As usual, we have some magic to implement slots in RPython; in particular,
-`_make_wrapper`_ does the opposite of ``generic_cpy_call``: it takes an
+`_make_wrapper`_ does the opposite of ``generic_cpy_call``: it takes a
 RPython function and wraps it into a C function which can be safely called
 from C, handling the GIL, exceptions and argument conversions automatically.
 
@@ -359,15 +361,15 @@
      ``return result``, during the **C-to-RPython** step we convert it from
      ``PyObject*`` to ``W_IntObject(1234)``.
 
-Phew! After we realized this, it was not so surprising that cpyext was very
-slow :). And this was a simplified example, since we are not passing and
-``PyObject*`` to the API call: if we did, we would need to convert it back and
+Phew! After we realized this, it was not so surprising that ``cpyext`` was very
+slow :). And this was a simplified example, since we are not passing a
+``PyObject*`` to the API call. When we do, we need to convert it back and
 forth at every step.  Actually, I am not even sure that what I described was
 the exact sequence of steps which used to happen, but you get the general
 idea.
 
-The solution is simple: rewrite as much as we can in C instead of RPython, so
-to avoid unnecessary roundtrips: this was the topic of most of the Cape Town
+The solution is simple: rewrite as much as we can in C instead of RPython,
+to avoid unnecessary roundtrips. This was the topic of most of the Cape Town
 sprint and resulted in the ``cpyext-avoid-roundtrip`` branch, which was
 eventually merged_.
 
@@ -377,13 +379,14 @@
 involves list strategies, so we cannot replicate it in C.  However, we
 discovered that a large subset of the C API can benefit from this.
 
-Moreover, the C API is **huge**: the biggest achievement of the branch was to
-discover and invent this new way of writing cpyext code, but we still need to
-convert many of the functions.  Also, sometimes the rewrite is not automatic
-or straighforward: cpyext is a delicate piece of software, so it happens often
-that you end up debugging a segfault in gdb.
+Moreover, the C API is **huge**. While we invented this new way of writing
+``cpyext`` code, we still need to
+convert many of the functions to the new paradigm.  Sometimes the rewrite is
+not automatic
+or straighforward. ``cpyext`` is a delicate piece of software, so it happens often
+that we make a mistake and end up staring at a segfault in gdb.
 
-However, the most important remark is that the performance improvement we got
+However, the most important takeaway is that the performance improvements we got
 from this optimization are impressive, as we will detail later.
 
 .. _`2017 Cape Town Sprint`: https://morepypy.blogspot.com/2017/10/cape-of-good-hope-for-pypy-hello-from.html
@@ -396,13 +399,13 @@
 
 
 Conversion costs
------------------
+----------------
 
 The other potential big source of slowdown is the conversion of arguments
 between ``W_Root`` and ``PyObject*``.
 
 As explained earlier, the first time you pass a ``W_Root`` to C, you need to
-allocate it's ``PyObject*`` counterpart. Suppose to have a ``foo`` function
+allocate its ``PyObject*`` counterpart. Suppose you have a ``foo`` function
 defined in C, which takes a single int argument:
 
 .. sourcecode:: python
@@ -416,12 +419,12 @@
 
 CPython has the very same problem, which is solved by using a `free list`_ to
 `allocate ints`_. So, what we did was to simply `steal the code`_ from CPython
-and do the exact same thing: this was also done in the
+and do the exact same thing. This was also done in the
 ``cpyext-avoid-roundtrip`` branch, and the benchmarks show that it worked
 perfectly.
 
 Every type which is converted often to ``PyObject*`` must have a very fast
-allocator: at the moment of writing, PyPy uses free lists only for ints and
+allocator. At the moment of writing, PyPy uses free lists only for ints and
 tuples_: one of the next steps on our TODO list is certainly to use this
 technique with more types, like ``float``.
 
@@ -439,24 +442,24 @@
 At every iteration, we get an item out of the array: the return type is a an
 instance of ``numpy.float64`` (a numpy scalar), i.e. a ``PyObject'*``: this is
 something which is implemented by numpy entirely in C, so completely
-transparent to cpyext: we don't have any control on how it is allocated,
+transparent to ``cpyext``. We don't have any control on how it is allocated,
 managed, etc., and we can assume that allocation costs are the same than on
 CPython.
 
 As soon as we return these ``PyObject*`` to Python, we need to allocate
-theirs ``W_Root`` equivalent: if you do it in a small loop like in the example
+their ``W_Root`` equivalent. If you do it in a small loop like in the example
 above, you end up allocating all these ``W_Root`` inside the nursery, which is
 a good thing since allocation is super fast (see the section above about the
 PyPy GC).
 
-However, we also need to keep track of the ``W_Root`` to ``PyObject*`` link:
-currently, we do this by putting all of them in a dictionary, but it is very
+However, we also need to keep track of the ``W_Root`` to ``PyObject*`` link.
+Currently, we do this by putting all of them in a dictionary, but it is very
 inefficient, especially because most of these objects dies young and thus it
 is wasted work to do that for them.  Currently, this is one of the biggest
-unresolved problem in cpyext, and it is what casuses the two microbenchmarks
+unresolved problem in ``cpyext``, and it is what casuses the two microbenchmarks
 ``allocate_int`` and ``allocate_tuple`` to be very slow.
 
-We are well aware of the problem, and we have a plan for how to fix it; the
+We are well aware of the problem, and we have a plan for how to fix it. The
 explanation is too technical for the scope of this blog post as it requires a
 deep knowledge of the GC internals to be understood, but the details are
 here_.
@@ -469,36 +472,36 @@
 
 
 C API quirks
---------------------
+------------
 
-Finally, there is another source of slowdown which is beyond our control: some
+Finally, there is another source of slowdown which is beyond our control. Some
 parts of the CPython C API are badly designed and expose some of the
 implementation details of CPython.
 
-The major example is reference counting: the ``Py_INCREF`` / ``Py_DECREF`` API
+The major example is reference counting. The ``Py_INCREF`` / ``Py_DECREF`` API
 is designed in such a way which forces other implementation to emulate
 refcounting even in presence of other GC management schemes, as explained
 above.
 
-Another example is borrowed references: there are API functions which **do
+Another example is borrowed references. There are API functions which **do
 not** incref an object before returning it, e.g. `PyList_GetItem`_.  This is
 done for performance reasons because we can avoid a whole incref/decref pair,
 if the caller needs to handle the returned item only temporarily: the item is
 kept alive because it is in the list anyway.
 
 For PyPy this is a challenge: thanks to `list strategies`_, often lists are
-represented in a compact way: e.g. a list containing only integers is stored
+represented in a compact way. A list containing only integers is stored
 as a C array of ``long``.  How to implement ``PyList_GetItem``? We cannot
 simply create a ``PyObject*`` on the fly, because the caller will never decref
 it and it will result in a memory leak.
 
-The current solution is very inefficient: basically, the first time we do a
+The current solution is very inefficient. The first time we do a
 ``PyList_GetItem``, we convert_ the **whole** list to a list of
 ``PyObject*``. This is bad in two ways: the first is that we potentially pay a
 lot of unneeded conversion cost in case we will never access the other items
-of the list; the second is that by doing that we lose all the performance
-benefit granted by the original list strategy, making it slower even for the
-rest of pure-python code which will manipulate the list later.
+of the list. The second is that by doing that we lose all the performance
+benefit granted by the original list strategy, making it slower for the
+rest of the pure-python code which will manipulate the list later.
 
 ``PyList_GetItem`` is an example of a bad API because it assumes that the list
 is implemented as an array of ``PyObject*``: after all, in order to return a
@@ -506,10 +509,10 @@
 
 Fortunately, (some) CPython developers are aware of these problems, and there
 is an ongoing project to `design a better C API`_ which aims to fix exactly
-this kind of problems.
+this kind of problem.
 
 Nonetheless, in the meantime we still need to implement the current
-half-broken APIs: there is no easy solutions for that, and it is likely that
+half-broken APIs. There is no easy solution for that, and it is likely that
 we will always need to pay some performance penalty in order to implement them
 correctly.
 
@@ -527,22 +530,22 @@
 
 
 Current performance
---------------------
+-------------------
 
-During the whole blog post we kept talking about the slowness of cpyext: how
-much it is, exactly?
+During the whole blog post we claimed ``cpyext`` is slow. How
+slow it is, exactly?
 
-We decided to concentrate on microbenchmarks_ for now: as it should be evident
-by now there are simply too many issues which can slow down a cpyext
+We decided to concentrate on microbenchmarks_ for now. It should be evident
+by now there are simply too many issues which can slow down a ``cpyext``
 benchmark, and microbenchmarks help us to concentrate on one (or few) at a
 time.
 
-The microbenchmarks measure very simple stuff, like calling function and
+The microbenchmarks measure very simple thins, like calling functions and
 methods with the various calling conventions (no arguments, one arguments,
-multiple arguments), passing various types as arguments (to measure conversion
-costs), allocating objects from C, and so on.
+multiple arguments); passing various types as arguments (to measure conversion
+costs); allocating objects from C, and so on.
 
-This was the performance of PyPy 5.8 relative and normalized to CPython 2.7,
+Here are the results from PyPy 5.8 relative and normalized to CPython 2.7,
 the lower the better:
 
 .. image:: pypy58.png
@@ -555,7 +558,7 @@
 of integers is huge.
 
 PyPy 5.8 was the last release before we famouse Cape Town sprint, when we
-started to look at cpyext performance seriously. These are the performance for
+started to look at cpyext performance seriously. Here are the performance for
 PyPy 6.0, the latest release at the time of writing:
 
 .. image:: pypy60.png
@@ -575,9 +578,9 @@
 
 
 Next steps
------------
+----------
 
-Despite the spectacular results we got so far, cpyext is still slow enough to
+Despite the spectacular results we got so far, ``cpyext`` is still slow enough to
 kill performance in most real-world code which uses C extensions extensively
 (e.g., the omnipresent numpy).
 
@@ -596,12 +599,13 @@
 On one hand, this is a daunting task because the C API is huge and we need to
 tackle functions one by one.  On the other hand, not all the functions are
 equally important, and is is enough to optimize a relatively small subset to
-improve lots of different use cases.
+improve many different use cases.
 
-The biggest result is that now we have a clear picture of what are the
-problems, and we developed some technical solutions to fix them. It is "only"
-a matter of tackling them, one by one.  Moreoever, keep in mind that most of
-the work was done during two sprints, for a total 2-3 man-months.
+Where a year ago we announced we have a working answer to run c-extension in PyPy,
+we now have a clear picture of what are the
+performance bottlenecks, and we have developed some technical solutions to fix them. It is "only"
+a matter of tackling them, one by one.  Most of
+the work was done during two sprints, for a total 2-3 person-months of work.
 
 XXX: find a conclusion