[pypy-commit] extradoc extradoc: Remove the very old STM planning file here

arigo noreply at buildbot.pypy.org
Wed Apr 2 11:52:47 CEST 2014


Author: Armin Rigo <arigo at tunes.org>
Branch: extradoc
Changeset: r5167:b21fff1ac422
Date: 2014-04-02 11:24 +0200
http://bitbucket.org/pypy/extradoc/changeset/b21fff1ac422/

Log:	Remove the very old STM planning file here

diff --git a/planning/stm.txt b/planning/stm.txt
deleted file mode 100644
--- a/planning/stm.txt
+++ /dev/null
@@ -1,322 +0,0 @@
-============
-STM planning
-============
-
-|
-| Bars on the left describe the next thing to work on.
-| On the other hand, "TODO" means "to do later".
-|
-
-
-Python Interface
-----------------
-
-Planned interface refactorings:
-
-* inspired by "concurrent.futures" from Python 3.2: have
-  transaction.add() return a Future instance with a method result().
-  If called inside another transaction, it will suspend it until the
-  result is available, i.e. until the first transaction commits.  Can
-  be used even if the result is not used, just to ensure some ordering.
-  XXX but can that be emulated in transaction.py???
-
-* (later, maybe) allow nested transactions: either by calling
-  transaction.run() inside transactions too, or with actual objects
-  that store collections of transactions.
-
-
-Overview of the GC
-------------------
-
-A saner approach (and likely better results that now): integrate with
-the GC.  Here is the basic plan.
-
-Let T be the number of threads.  Use a custom GC, with T nurseries and
-one "global area."  Every object in the nursery t is only visible to
-thread t.  Every object in the global area is shared but read-only.
-Changes to global objects are only done by committing.
-
-Every thread t allocates new objects in the nursery t.  Accesses to
-nursery objects are the fastest, not monitored at all.  When we need
-read access to a global object, we can read it directly, but we need to
-record the version of the object that we read.  When we need write
-access to a global object, we need to make a whole copy of it into our
-nursery.
-
-| The "global area" should be implemented by reusing gc/minimarkpage.py.
-
-The RPython program can use this hint: 'x = hint(x, stm_write=True)',
-which is like writing to an object in the sense that it forces a local
-copy.
-
-In translator.stm.transform, we track which variables contain objects
-that are known to be local.  It lets us avoid the run-time check.
-That's useful for all freshly malloc'ed objects, which we know are
-always local; and that's useful for special cases like the PyFrames, on
-which we used the "stm_write=True" hint before running the interpreter.
-In both cases the result is: no STM code is needed any more.
-
-When a transaction commits, we do a "minor collection"-like process,
-called an "end-of-transaction collection": we move all surviving objects
-from the nursery to the global area, either as new objects (first step
-done by stmgc.py), or as overwrites of their previous version (second
-step done by et.c).  Unlike the minor collections in other GCs, this one
-occurs at a well-defined time, with no stack roots to scan.
-
-| We also need to consider what occurs if a nursery grows too big while
-| the transaction is still not finished.  In this case we need to run a
-| similar collection of the nursery, but with stack roots to scan.  We
-| call this a local collection.
-|
-| This can also occur before or after we call transaction.run(), when
-| there is only the main thread running.  In this mode, we run the main
-| thread with a nursery too.  It can fill up, needing a local collection.
-| When transaction.run() is called, we also do a local collection to
-| ensure that the nursery of the main thread is empty while the
-| transactions execute.
-|
-| Of course we also need to do from time to time a major collection.  We
-| will need at some point some concurrency here, to be able to run the
-| major collection in a random thread t but detecting changes done by the
-| other threads overwriting objects during their own end-of-transaction
-| collections.  See below.
-
-
-GC flags
---------
-
-Still open to consideration, but the basic GC flags could be:
-    
-  * GC_GLOBAL      if the object is in the global area
-
-  * GC_WAS_COPIED  on a global object: it has at least one local copy
-                   (then we need to look it up in some local dictionary)
-                   on a local object: it comes from a global object
-
-  * and one complete word (for now?) for the version number, see below
-
-(Optimization: objects declared immutable don't need a version number.)
-
-TODO: GC_WAS_COPIED should rather be some counter, counting how many threads
-have a local copy; something like 2 or 3 bits, where the maximum value
-means "overflowed" and is sticky (maybe until some global
-synchronization point, if we have one).  Or, we can be more advanced and
-use 4-5 bits, where in addition we use some "thread hash" value if there
-is only one copy.
-
-
-stm_read
---------
-
-The STM read operation is potentially a complex operation.  (That's why
-it's useful to remove it as much as possible.)
-
-stm_read(obj, offset) -> field value
-
-- If obj is not GC_GLOBAL, then read directly and be done.
-
-- Otherwise, if GC_WAS_COPIED, and if we find 'localobj' in this
-  thread's local dictionary, then read directly from 'localobj' and
-  be done.  (Ideally we should also use 'localobj' instead of 'obj'
-  in future references to this object, but unclear how.)
-
-- Otherwise, we need to do a global read.  This is a real STM read.
-  Done (on x86 [1]) by reading the version number, then the actual field,
-  then *again* the version number.  If the version number didn't change
-  and if it is not more recent than the transaction start, then the read
-  is accepted; otherwise not (we might retry or abort the transaction,
-  depending on cases).  And if the read is accepted then we need to
-  remember in a local list that we've read that object.
-
-For now the thread's local dictionary is in C, as a widely-branching
-search tree.
-
-
-stm_write
----------
-
-- If obj is GC_GLOBAL, we need to find or make a local copy
-
-- Then we just perform the write.
-
-This means that stm_write could be implemented with a write barrier that
-returns potentially a copy of the object, and which is followed by a
-regular write to that copy.
-
-Note that "making a local copy" implies the same rules as stm_read: read
-the version number, copy all fields, then read *again* the version
-number [1].  If it didn't change, then we know that we got at least a
-consistent copy (i.e. nobody changed the object in the middle of us
-reading it).  If it is too recent, then we might have to abort.
-
-TODO: how do we handle MemoryErrors when making a local copy??
-Maybe force the transaction to abort, and then re-raise MemoryError
---- for now it's just a fatal error.
-
-
-End-of-transaction collections
-------------------------------
-
-Start from the "roots" being all local copies of global objects.  (These
-are the only roots: if there are none, then it means we didn't write
-anything in any global object, so there is no new object that can
-survive.)  From the roots, scan and move all fresh new objects to the
-global area.  Add the GC_GLOBAL flag to them, of course.  Then we need,
-atomically (in the STM sense), to overwrite the old global objects with
-their local copies.  This is done by temporarily locking the global
-objects with a special value in their "version" field that will cause
-concurrent reads to spin-loop.
-
-This is also where we need the list of global objects that we've read.
-We need to check that each of these global objects' versions have not
-been modified in the meantime.
-
-
-Static analysis support
------------------------
-
-To get good performance, we should as much as possible use the
-'localobj' version of every object instead of the 'obj' one.  At least
-after a write barrier we should replace the local variable 'obj' with
-'localobj', and translator.stm.transform propagates the
-fact that it is now a localobj that doesn't need special stm support
-any longer.  Similarly, all mallocs return a localobj.
-
-The "stm_write=True" hint is used on PyFrame before the main
-interpreter loop, so that we can then be sure that all accesses to
-'frame' are to a local obj.
-
-TODO: Ideally, we could even track which fields
-of a localobj are themselves localobjs.  This would be useful for
-'PyFrame.fastlocals_w': it should also be known to always be a localobj.
-
-
-Local collections
------------------
-
-|
-| This needs to be done.
-|
-
-If a nursery fills up too much during a transaction, it needs to be
-locally collected.  This is supposed to be a generally rare occurrance,
-with the exception of long-running transactions --- including the main
-thread before transaction.run().
-
-Surviving local objects are moved to the global area.  However, the
-GC_GLOBAL flag is still not set on them, because they are still not
-visible from more than one thread.  For now we have to put all such
-objects in a list: the list of old-but-local objects.  (Some of these
-objects can still have the GC_WAS_COPIED flag and so be duplicates of
-other really global objects.  The dict maintained by et.c must be
-updated when we move these objects.)
-
-Unlike end-of-transaction collections, we need to have the stack roots
-of the current transaction.  For now we just use
-"gcrootfinder=shadowstack" with thread-local variables.  At the end of
-the local collection, we do a sweep: all objects that were previously
-listed as old-but-local but don't survive the present collection are
-marked as free.
-
-TODO: Try to have a generational behavior here.  Could probably be done
-by (carefully) promoting part of the surviving objects to GC_GLOBAL.
-
-If implemented like minimarkpage.py, the global area has for each size a
-chained list of pages that are (at least partially) free.  We make the
-heads of the chained lists thread-locals; so each thread reserves one
-complete page at a time, reducing cross-thread synchronizations.
-
-TODO: The local collection would also be a good time to compress the
-local list of all global reads done --- "compress" in the sense of
-removing duplicates.
-
-
-Global collections
-------------------
-
-|
-| This needs to be done.
-|
-
-We will sometimes need to do a "major" collection, called global
-collection here.  The issue with it is that there might be live
-references to global objects in the local objects of any thread.  The
-problem becomes even harder as some threads may be currently blocked in
-some system call.  As an intermediate solution that should work well
-enough, we could try to acquire a lock for every thread, a kind of LIL
-(local interpreter lock).  Every thread releases its LIL around
-potentially-blocking system calls.  At the end of a transaction and once
-per local collection, we also do the equivalent of a
-release-and-require-the-LIL.  The point is that when a LIL is released,
-another thread can acquire it temporarily and read the shadowstack of
-that thread.
-
-The major collection is orchestrated by whichever thread noticed one
-should start; let's call this thread tg.  So tg first acquires all the
-LILs.  (A way to force another thread to "soon" release its LIL is to
-artifically mark its nursery as exhausted.)  For each thread t, tg
-performs a local collection for t.  This empties all the nurseries and
-gives tg an up-to-date point of view on the liveness of the objects: the
-various lists of old-but-local objects for all the t's.  tg can use
-these --- plus external roots like prebuilt objects --- as the roots of
-a second-level, global mark-and-sweep.
-
-For now we release the LILs only when the major collection is finished.
-
-TODO: either release the LILs earlier, say after we processed the lists
-of old-but-local objects but before we went on marking and sweeping ---
-but we need support for detecting concurrent writes done by concurrent
-commits; or, ask all threads currently waiting on the LIL to help with
-doing the global mark-and-sweep in parallel.
-
-Note: standard terminology:
-
-* Concurrency: there is one thread that does something GC-related,
-  like scan the heap, and at the same time another thread changes
-  some object from the heap.
-
-* Parallelism: there are multiple threads all doing something GC-related,
-  like all scanning the heap together.
-
-
-When not running transactively
-------------------------------
-
-The above describes the mode during which there is a main thread blocked
-in transaction.run().  The other mode is mostly that of "start-up",
-before we call transaction.run().  Of course no STM is needed in that
-mode, but it's still running the same STM-enabled interpreter.
-
-| In this mode, we just have one nursery and the global area.  When
-| transaction.run() is called, we do a local collection to empty it, then
-| make sure to flag all surviving objects as GC_GLOBAL in preparation for
-| starting actual transactions.  Then we can reuse the nursery itself for
-| one of the threads.
-
-
-Pointer equality
-----------------
-
-Another (traditionally messy) issue is that by having several copies of
-the same object, we need to take care of all pointer comparisons too.
-This is all llops of the form ``ptr_eq(x, y)`` or ``ptr_ne(x, y)``.
-
-If we know statically that both copies are local copies, then we can
-just compare the pointers.  Otherwise, we compare
-``stm_normalize_global(x)`` with ``stm_normalize_global(y)``, where
-``stm_normalize_global(obj)`` returns ``globalobj`` if ``obj`` is a
-local, GC_WAS_COPIED object.  Moreover the call to
-``stm_normalize_global()`` can be omitted for constants.
-
-
-JIT support
------------
-
-TODO
-
-
-notes
------
-
-[1] this relies on a property guaranteed so far by the x86, but not,
-    say, by PowerPCs.  (XXX find a reference again)


More information about the pypy-commit mailing list