[Python-Dev] pyparallel and new memory API discussions...

Wed Jun 19 15:10:02 CEST 2013

    The new memory API discussions (and PEP) warrant a quick pyparallel
    update: a couple of weeks after PyCon, I came up with a solution for
    the biggest show-stopper that has been plaguing pyparallel since its
    inception: being able to detect the modification of "main thread"
    Python objects from within a parallel context.

    For example, `data.append(4)` in the example below will generate an
    AssignmentError exception, because data is a main thread object, and
    `data.append(4)` gets executed from within a parallel context::

        data = [ 1, 2, 3 ]

        def work():
            data.append(4)

        async.submit_work(work)

    The solution turned out to be deceptively simple:

      1.  Prior to running parallel threads, lock all "main thread"
          memory pages as read-only (via VirtualProtect on Windows,
          mprotect on POSIX).

      2.  Detect attempts to write to main thread pages during parallel
          thread execution (via SEH on Windows or a SIGSEGV trap on POSIX),
          and raise an exception instead (detection is done in the ceval
          frame exec loop).

      3.  Prior to returning control back to the main thread (which will
          be paused whilst all the parallel threads are running), unlock
          all the "main thread" pages.

      4.  Pause all parallel threads while the main thread runs.

      5.  Go back to 1.

    I got a proof-of-concept working on Windows a while back (and also
    played around with large page support in the same commit).  The main
    changes were to obmalloc.c:

        https://bitbucket.org/tpn/pyparallel/commits/0e70a0caa1c07dc0c14bb5c99cbe808c1c11779f#chg-Objects/obmalloc.c

    The key was the introduction of two new API calls, intended to be
    called by the pyparallel.c infrastructure:

        _PyMem_LockMainThreadPages()
        _PyMem_UnlockMainThreadPages()

    The implementation is pretty simple:

+int
+_PyMem_LockMainThreadPages(void)
+{
+    DWORD old = 0;
+
+    if (!VirtualProtect(base_addr, nbytes_committed, PAGE_READONLY, &old)) {
+        PyErr_SetFromWindowsErr(0);
+        return -1;
+    }

    Note the `base_addr` and `nbytes_committed` argument.  Basically, I
    re-organized obmalloc.c a little bit such that we never actually
    call malloc() directly.  Instead, we exploit the ability to reserve
    huge virtual address ranges without actually committing the memory,
    giving us a fixed `base_addr` void pointer that we can pass to calls
    like VirtualProtect or mprotect.

    We then incrementally commit more pages as demand increases, and
    simply adjust our `nbytes_committed` counter as we go along.  The
    net effect is that we can call VirtualProtect/mprotect once, with a
    single base void pointer and size_t range, and immediately affect the
    protection of all memory pages that fall within that range.

    As an added bonus, we also get a very cheap and elegant way to test
    if a pointer (or any arbitrary memory address, actually) belongs to
    the main thread's memory range (at least in comparison to the
    existing _PyMem_InRange black magic).  (This is very useful for my
    pyparallel infrastructure, which makes extensive use of conditional
    logic based on address tests.)

        (Side-bar: a side-effect of the approach I've used in the proof-
         of-concept (by only having a single base addr pointer) is that
         we effectively limit the maximum memory we could eventually
         commit.

         I actually quite like this -- in fact, I'd like to tweak it
         such that we can actually expose min/max memory values to the
         Python interpreter at startup (analogous to the JVM).

         Having known upper bounds on maximum memory usage will vastly
         simplify some other areas of my pyparallel work (like the async
         socket stuff).

         For example, consider network programs these days that take a
         "max clients" configuration parameter.  That seems a bit
         backwards to me.

         It would be better if we simply said, "here, Python, you have
         1GB to work with".  That allows us to calculate how many
         clients we could simultaneously serve based on socket memory
         requirements, which allows for much more graceful behavior
         under load than leaving it open-ended.

         Maximum memory constraints would also be useful for the
         parallel.map(callable, iterable) stuff I've got in the works,
         as it'll allow us to optimally chunk work and assign to threads
         based on available memory.)

    So, Victor, I'm interested to hear how the new API you're proposing
    will affect this solution I've come up with for pyparallel; I'm
    going to be absolutely dependent upon the ability to lock main
    thread pages as read-only in one fell-swoop -- am I still going to
    be able to do that with your new API in place?

    Regards,

        Trent.