[pypy-commit] stmgc default: hg merge c8-gil-like

Sun Jun 14 11:52:24 CEST 2015

Author: Armin Rigo <arigo at tunes.org>
Branch: 
Changeset: r1830:ab54aa35b24a
Date: 2015-06-14 11:53 +0200
http://bitbucket.org/pypy/stmgc/changeset/ab54aa35b24a/

Log:	hg merge c8-gil-like

	Fixes the bad timings of a program that does many tiny external
	calls. Previously, it would cause many tiny transactions. Now a
	single larger inevitable transaction covers the series of calls.

diff too long, truncating to 2000 out of 2448 lines

diff --git a/c8/CALL_RELEASE_GIL b/c8/CALL_RELEASE_GIL
new file mode 100644
--- /dev/null
+++ b/c8/CALL_RELEASE_GIL
@@ -0,0 +1,120 @@
+
+c8-gil-like
+===========
+
+A branch to have "GIL-like" behavior for inevitable transactions: one
+not-too-short inevitable transaction that is passed around multiple
+threads.
+
+The goal is to have good fast-case behavior with the PyPy JIT around
+CALL_RELEASE_GIL.  This is how it works in default (with shadowstack):
+
+
+- "rpy_fastgil" is a global variable.  The value 0 means the GIL is
+  definitely unlocked; the value 1 means it is probably locked (it is
+  actually locked only if some mutex object is acquired too).
+
+- before CALL_RELEASE_GIL, we know that we have the GIL and we need to
+  release it.  So we know that "rpy_fastgil" is 1, and we just write 0
+  there.
+
+- then we do the external call.
+
+- after CALL_RELEASE_GIL, two cases:
+
+  - if "rpy_fastgil" has been changed to 1 by some other thread *or*
+    if the (non-thread-local) shadowstack pointer changed, then we
+    call reacqgil_addr();
+
+  - otherwise, we swap rpy_fastgil back to 1 and we're done.
+
+- if the external call is long enough, a different thread will notice
+  that rpy_fastgil == 0 by regular polling, and grab the GIL for
+  itself by swapping it back to 1.  (The changes from 0 to 1 are done
+  with atomic instructions.)
+
+- a different mechanism is used when we voluntarily release the GIL,
+  based on the mutex mentioned above.  The mutex is also used by the
+  the reacqgil_addr() function if it actually needs to wait.
+
+
+Plan for porting this idea to stmgc:
+
+- we add a few macros to stmgc.h which can be used by C code, around
+  external calls; and we also inline these macros manually around
+  CALL_RELEASE_GIL in PyPy's JIT.
+
+- we add the "detached" mode to inevitable transactions: it means that
+  no thread is actively running this inevitable transaction for now,
+  but it was not committed yet.  It is meant to be reattached, by the
+  same or a different thread.
+
+- we add a global variable, "stm_detached_inevitable_from_thread".  It
+  is equal to the stm_thread_local pointer of the thread that detached
+  inevitable transaction (like rpy_fastgil == 0), or NULL if there is
+  no detached inevitable transaction (like rpy_fastgil == 1).
+
+- the macro stm_detach_inevitable_transaction() simply writes the
+  current thread's stm_thread_local pointer into the global variable
+  stm_detached_inevitable_from_thread.  It can only be used if the
+  current transaction is inevitable (and in particular the inevitable
+  transaction was not detached already, because we're running it).
+  After the macro is called, the current thread is assumed not to be
+  running in a transaction any more (no more object or shadowstack
+  access).
+
+- the macro stm_reattach_transaction() does an atomic swap on
+  stm_detached_inevitable_from_thread to change it to NULL.  If the
+  old value was equal to our own stm_thread_local pointer, we are done.  If
+  not, we call a helper, _stm_reattach_transaction().
+
+- we also add the macro stm_detach_transation().  If the current
+  thread is inevitable it calls stm_detach_inevitable_transaction().
+  Otherwise it calls a helper, _stm_detach_noninevitable_transaction().
+
+- _stm_reattach_transaction(old): called with the old value from
+  stm_detached_inevitable_from_thread (which was swapped to be NULL just
+  now).  If old != NULL, this swap had the effect that we took over
+  the inevitable transaction originally detached from a different
+  thread; we need to fix a few things like the stm_thread_local and %gs but
+  then we can continue running this reattached inevitable transaction.
+  If old == NULL, we need to fall back to the current
+  stm_start_transaction().  (A priori, there is no need to wait at
+  this point.  The waiting point is later, in the optional
+  stm_become_inevitable()).
+
+- _stm_detach_noninevitable_transaction(): we try to make the
+  transaction inevitable.  If it works we can then use
+  stm_detach_inevitable_transaction().  On the other hand, if we can't
+  make it inevitable without waiting, then instead we just commit it
+  and continue.  In the latter case,
+  stm_detached_inevitable_from_thread is still NULL.
+
+- other place to fix: major collections.  Maybe simply look inside
+  stm_detached_inevitable_from_thread, and if not NULL, grab the
+  inevitable transaction and commit it now.  Or maybe not.  The point
+  is that we need to prevent a thread from asynchronously grabbing it
+  by an atomic swap of stm_detached_inevitable_from_thread; instead,
+  the parallel threads that finish their external calls should all
+  find NULL in this variable and call _stm_reattach_transaction()
+  which will wait for the major GC to end.
+
+- stm_become_inevitable(): if it finds a detached inevitable
+  transaction, it should attach and commit it as a way to get rid of
+  it.  This is why it might be better to call directly
+  stm_start_inevitable_transaction() when possible: that one is
+  allowed to attach to a detached inevitable transaction and simply
+  return, unlike stm_become_inevitable() which must continue running
+  the existing transaction.
+
+- commit logic of a non-inevitable transaction: we wait if there is
+  an inevitable transaction.  Here too, if the inevitable transaction
+  is found to be detached, we could just commit it now.  Or, a better
+  approach: if we find a detached inevitable transaction we grab it
+  temporarily, and commit only the *non-inevitable* transaction if it
+  doesn't conflict.  The inevitable transaction is then detached
+  again.  (Note that the conflict detection is: we don't commit any
+  write to any of the objects in the inevitable transaction's
+  read-set.  This relies on inevitable threads maintaining their
+  read-set correctly, which should be the case in PyPy, but needs to
+  be checked.)
diff --git a/c8/demo/demo_random.c b/c8/demo/demo_random.c
--- a/c8/demo/demo_random.c
+++ b/c8/demo/demo_random.c
@@ -8,6 +8,8 @@
 #include <sys/wait.h>
 
 #include "stmgc.h"
+#include "stm/fprintcolor.h"
+#include "stm/fprintcolor.c"
 
 #define NUMTHREADS 2
 #define STEPS_PER_THREAD 500
@@ -48,8 +50,10 @@
     int num_roots;
     int num_roots_at_transaction_start;
     int steps_left;
+    long globally_unique;
 };
 __thread struct thread_data td;
+static long progress = 1;
 
 struct thread_data *_get_td(void)
 {
@@ -57,9 +61,16 @@
 }
 
 
+long check_size(long size)
+{
+    assert(size >= sizeof(struct node_s));
+    assert(size <= sizeof(struct node_s) + 4096*70);
+    return size;
+}
+
 ssize_t stmcb_size_rounded_up(struct object_s *ob)
 {
-    return ((struct node_s*)ob)->my_size;
+    return check_size(((struct node_s*)ob)->my_size);
 }
 
 void stmcb_trace(struct object_s *obj, void visit(object_t **))
@@ -69,7 +80,8 @@
 
     /* and the same value at the end: */
     /* note, ->next may be the same as last_next */
-    nodeptr_t *last_next = (nodeptr_t*)((char*)n + n->my_size - sizeof(void*));
+    nodeptr_t *last_next = (nodeptr_t*)((char*)n + check_size(n->my_size)
+                                        - sizeof(void*));
 
     assert(n->next == *last_next);
 
@@ -113,36 +125,36 @@
     }
 }
 
-void reload_roots()
-{
-    int i;
-    assert(td.num_roots == td.num_roots_at_transaction_start);
-    for (i = td.num_roots_at_transaction_start - 1; i >= 0; i--) {
-        if (td.roots[i])
-            STM_POP_ROOT(stm_thread_local, td.roots[i]);
-    }
-
-    for (i = 0; i < td.num_roots_at_transaction_start; i++) {
-        if (td.roots[i])
-            STM_PUSH_ROOT(stm_thread_local, td.roots[i]);
-    }
-}
-
 void push_roots()
 {
     int i;
+    assert(td.num_roots_at_transaction_start <= td.num_roots);
     for (i = td.num_roots_at_transaction_start; i < td.num_roots; i++) {
         if (td.roots[i])
             STM_PUSH_ROOT(stm_thread_local, td.roots[i]);
     }
+    STM_SEGMENT->no_safe_point_here = 0;
 }
 
 void pop_roots()
 {
     int i;
-    for (i = td.num_roots - 1; i >= td.num_roots_at_transaction_start; i--) {
-        if (td.roots[i])
+    STM_SEGMENT->no_safe_point_here = 1;
+
+    assert(td.num_roots_at_transaction_start <= td.num_roots);
+    for (i = td.num_roots - 1; i >= 0; i--) {
+        if (td.roots[i]) {
             STM_POP_ROOT(stm_thread_local, td.roots[i]);
+            assert(td.roots[i]);
+        }
+    }
+
+    dprintf(("stm_is_inevitable() = %d\n", (int)stm_is_inevitable()));
+    for (i = 0; i < td.num_roots_at_transaction_start; i++) {
+        if (td.roots[i]) {
+            dprintf(("root %d: %p\n", i, td.roots[i]));
+            STM_PUSH_ROOT(stm_thread_local, td.roots[i]);
+        }
     }
 }
 
@@ -150,6 +162,7 @@
 {
     int i;
     assert(idx >= td.num_roots_at_transaction_start);
+    assert(idx < td.num_roots);
 
     for (i = idx; i < td.num_roots - 1; i++)
         td.roots[i] = td.roots[i + 1];
@@ -158,6 +171,7 @@
 
 void add_root(objptr_t r)
 {
+    assert(td.num_roots_at_transaction_start <= td.num_roots);
     if (r && td.num_roots < MAXROOTS) {
         td.roots[td.num_roots++] = r;
     }
@@ -184,7 +198,8 @@
         nodeptr_t n = (nodeptr_t)p;
 
         /* and the same value at the end: */
-        nodeptr_t TLPREFIX *last_next = (nodeptr_t TLPREFIX *)((stm_char*)n + n->my_size - sizeof(void*));
+        nodeptr_t TLPREFIX *last_next = (nodeptr_t TLPREFIX *)((stm_char*)n +
+                                       check_size(n->my_size) - sizeof(void*));
         assert(n->next == *last_next);
         n->next = (nodeptr_t)v;
         *last_next = (nodeptr_t)v;
@@ -196,7 +211,8 @@
     nodeptr_t n = (nodeptr_t)p;
 
     /* and the same value at the end: */
-    nodeptr_t TLPREFIX *last_next = (nodeptr_t TLPREFIX *)((stm_char*)n + n->my_size - sizeof(void*));
+    nodeptr_t TLPREFIX *last_next = (nodeptr_t TLPREFIX *)((stm_char*)n +
+                                       check_size(n->my_size) - sizeof(void*));
     OPT_ASSERT(n->next == *last_next);
 
     return n->next;
@@ -229,7 +245,7 @@
                            sizeof(struct node_s) + (get_rand(100000) & ~15),
                            sizeof(struct node_s) + 4096,
                            sizeof(struct node_s) + 4096*70};
-        size_t size = sizes[get_rand(4)];
+        size_t size = check_size(sizes[get_rand(4)]);
         p = stm_allocate(size);
         nodeptr_t n = (nodeptr_t)p;
         n->sig = SIGNATURE;
@@ -240,7 +256,6 @@
         n->next = NULL;
         *last_next = NULL;
         pop_roots();
-        /* reload_roots not necessary, all are old after start_transaction */
         break;
     case 4:  // read and validate 'p'
         read_barrier(p);
@@ -288,6 +303,15 @@
     return p;
 }
 
+static void end_gut(void)
+{
+    if (td.globally_unique != 0) {
+        fprintf(stderr, "[GUT END]");
+        assert(progress == td.globally_unique);
+        td.globally_unique = 0;
+        stm_resume_all_other_threads();
+    }
+}
 
 objptr_t do_step(objptr_t p)
 {
@@ -308,8 +332,14 @@
         return NULL;
     } else if (get_rand(240) == 1) {
         push_roots();
-        stm_become_globally_unique_transaction(&stm_thread_local, "really");
-        fprintf(stderr, "[GUT/%d]", (int)STM_SEGMENT->segment_num);
+        if (td.globally_unique == 0) {
+            stm_stop_all_other_threads();
+            td.globally_unique = progress;
+            fprintf(stderr, "[GUT/%d]", (int)STM_SEGMENT->segment_num);
+        }
+        else {
+            end_gut();
+        }
         pop_roots();
         return NULL;
     }
@@ -347,37 +377,53 @@
 
     objptr_t p;
 
-    stm_start_transaction(&stm_thread_local);
+    stm_enter_transactional_zone(&stm_thread_local);
     assert(td.num_roots >= td.num_roots_at_transaction_start);
     td.num_roots = td.num_roots_at_transaction_start;
     p = NULL;
     pop_roots();                /* does nothing.. */
-    reload_roots();
 
     while (td.steps_left-->0) {
         if (td.steps_left % 8 == 0)
             fprintf(stdout, "#");
 
-        assert(p == NULL || ((nodeptr_t)p)->sig == SIGNATURE);
+        int local_seg = STM_SEGMENT->segment_num;
+        int p_sig = p == NULL ? 0 : ((nodeptr_t)p)->sig;
+
+        assert(p == NULL || p_sig == SIGNATURE);
+        (void)local_seg;
+        (void)p_sig;
+
+        if (!td.globally_unique)
+            ++progress;   /* racy, but good enough */
 
         p = do_step(p);
 
         if (p == (objptr_t)-1) {
             push_roots();
+            end_gut();
 
             long call_fork = (arg != NULL && *(long *)arg);
             if (call_fork == 0) {   /* common case */
-                stm_commit_transaction();
-                td.num_roots_at_transaction_start = td.num_roots;
-                if (get_rand(100) < 98) {
-                    stm_start_transaction(&stm_thread_local);
-                } else {
-                    stm_start_inevitable_transaction(&stm_thread_local);
+                if (get_rand(100) < 50) {
+                    stm_leave_transactional_zone(&stm_thread_local);
+                    /* Nothing here; it's unlikely that a different thread
+                       manages to steal the detached inev transaction.
+                       Give them a little chance with a usleep(). */
+                    dprintf(("sleep...\n"));
+                    usleep(1);
+                    dprintf(("sleep done\n"));
+                    td.num_roots_at_transaction_start = td.num_roots;
+                    stm_enter_transactional_zone(&stm_thread_local);
+                }
+                else {
+                    _stm_commit_transaction();
+                    td.num_roots_at_transaction_start = td.num_roots;
+                    _stm_start_transaction(&stm_thread_local);
                 }
                 td.num_roots = td.num_roots_at_transaction_start;
                 p = NULL;
                 pop_roots();
-                reload_roots();
             }
             else {
                 /* run a fork() inside the transaction */
@@ -401,16 +447,17 @@
         }
     }
     push_roots();
-    stm_commit_transaction();
+    end_gut();
+    stm_force_transaction_break(&stm_thread_local);
 
     /* even out the shadow stack before leaveframe: */
-    stm_start_inevitable_transaction(&stm_thread_local);
+    stm_become_inevitable(&stm_thread_local, "before leaveframe");
     while (td.num_roots > 0) {
         td.num_roots--;
         objptr_t t;
         STM_POP_ROOT(stm_thread_local, t);
     }
-    stm_commit_transaction();
+    stm_leave_transactional_zone(&stm_thread_local);
 
     stm_rewind_jmp_leaveframe(&stm_thread_local, &rjbuf);
     stm_unregister_thread_local(&stm_thread_local);
diff --git a/c8/demo/demo_random2.c b/c8/demo/demo_random2.c
--- a/c8/demo/demo_random2.c
+++ b/c8/demo/demo_random2.c
@@ -8,6 +8,8 @@
 #include <sys/wait.h>
 
 #include "stmgc.h"
+#include "stm/fprintcolor.h"
+#include "stm/fprintcolor.c"
 
 #define NUMTHREADS 3
 #define STEPS_PER_THREAD 50000
@@ -52,8 +54,10 @@
     int active_roots_num;
     long roots_on_ss;
     long roots_on_ss_at_tr_start;
+    long globally_unique;
 };
 __thread struct thread_data td;
+static long progress = 1;
 
 struct thread_data *_get_td(void)
 {
@@ -61,9 +65,16 @@
 }
 
 
+long check_size(long size)
+{
+    assert(size >= sizeof(struct node_s));
+    assert(size <= sizeof(struct node_s) + 4096*70);
+    return size;
+}
+
 ssize_t stmcb_size_rounded_up(struct object_s *ob)
 {
-    return ((struct node_s*)ob)->my_size;
+    return check_size(((struct node_s*)ob)->my_size);
 }
 
 void stmcb_trace(struct object_s *obj, void visit(object_t **))
@@ -73,7 +84,8 @@
 
     /* and the same value at the end: */
     /* note, ->next may be the same as last_next */
-    nodeptr_t *last_next = (nodeptr_t*)((char*)n + n->my_size - sizeof(void*));
+    nodeptr_t *last_next = (nodeptr_t*)((char*)n + check_size(n->my_size)
+                                        - sizeof(void*));
 
     assert(n->next == *last_next);
 
@@ -193,7 +205,8 @@
         nodeptr_t n = (nodeptr_t)p;
 
         /* and the same value at the end: */
-        nodeptr_t TLPREFIX *last_next = (nodeptr_t TLPREFIX *)((stm_char*)n + n->my_size - sizeof(void*));
+        nodeptr_t TLPREFIX *last_next = (nodeptr_t TLPREFIX *)((stm_char*)n +
+                                       check_size(n->my_size) - sizeof(void*));
         assert(n->next == *last_next);
         n->next = (nodeptr_t)v;
         *last_next = (nodeptr_t)v;
@@ -205,7 +218,8 @@
     nodeptr_t n = (nodeptr_t)p;
 
     /* and the same value at the end: */
-    nodeptr_t TLPREFIX *last_next = (nodeptr_t TLPREFIX *)((stm_char*)n + n->my_size - sizeof(void*));
+    nodeptr_t TLPREFIX *last_next = (nodeptr_t TLPREFIX *)((stm_char*)n +
+                                       check_size(n->my_size) - sizeof(void*));
     OPT_ASSERT(n->next == *last_next);
 
     return n->next;
@@ -239,6 +253,7 @@
             sizeof(struct node_s)+32, sizeof(struct node_s)+48,
             sizeof(struct node_s) + (get_rand(100000) & ~15)};
         size_t size = sizes[get_rand(sizeof(sizes) / sizeof(size_t))];
+        size = check_size(size);
         p = stm_allocate(size);
         nodeptr_t n = (nodeptr_t)p;
         n->sig = SIGNATURE;
@@ -296,6 +311,16 @@
     return p;
 }
 
+static void end_gut(void)
+{
+    if (td.globally_unique != 0) {
+        fprintf(stderr, "[GUT END]");
+        assert(progress == td.globally_unique);
+        td.globally_unique = 0;
+        stm_resume_all_other_threads();
+    }
+}
+
 void frame_loop();
 objptr_t do_step(objptr_t p)
 {
@@ -309,13 +334,22 @@
         p = simple_events(p, _r);
     } else if (get_rand(20) == 1) {
         long pushed = push_roots();
-        stm_commit_transaction();
-        td.roots_on_ss_at_tr_start = td.roots_on_ss;
-
-        if (get_rand(100) < 98) {
-            stm_start_transaction(&stm_thread_local);
-        } else {
-            stm_start_inevitable_transaction(&stm_thread_local);
+        end_gut();
+        if (get_rand(100) < 95) {
+            stm_leave_transactional_zone(&stm_thread_local);
+            /* Nothing here; it's unlikely that a different thread
+               manages to steal the detached inev transaction.
+               Give them a little chance with a usleep(). */
+            dprintf(("sleep...\n"));
+            usleep(1);
+            dprintf(("sleep done\n"));
+            td.roots_on_ss_at_tr_start = td.roots_on_ss;
+            stm_enter_transactional_zone(&stm_thread_local);
+        }
+        else {
+            _stm_commit_transaction();
+            td.roots_on_ss_at_tr_start = td.roots_on_ss;
+            _stm_start_transaction(&stm_thread_local);
         }
         td.roots_on_ss = td.roots_on_ss_at_tr_start;
         td.active_roots_num = 0;
@@ -331,15 +365,21 @@
     } else if (get_rand(20) == 1) {
         long pushed = push_roots();
         stm_become_inevitable(&stm_thread_local, "please");
-        assert(stm_is_inevitable());
+        assert(stm_is_inevitable(&stm_thread_local));
         pop_roots(pushed);
         p= NULL;
     } else if (get_rand(20) == 1) {
         p = (objptr_t)-1; // possibly fork
-    } else if (get_rand(20) == 1) {
+    } else if (get_rand(100) == 1) {
         long pushed = push_roots();
-        stm_become_globally_unique_transaction(&stm_thread_local, "really");
-        fprintf(stderr, "[GUT/%d]", (int)STM_SEGMENT->segment_num);
+        if (td.globally_unique == 0) {
+            stm_stop_all_other_threads();
+            td.globally_unique = progress;
+            fprintf(stderr, "[GUT/%d]", (int)STM_SEGMENT->segment_num);
+        }
+        else {
+            end_gut();
+        }
         pop_roots(pushed);
         p = NULL;
     }
@@ -364,6 +404,8 @@
 
         p = do_step(p);
 
+        if (!td.globally_unique)
+            ++progress;   /* racy, but good enough */
 
         if (p == (objptr_t)-1) {
             p = NULL;
@@ -371,6 +413,7 @@
             long call_fork = (thread_may_fork != NULL && *(long *)thread_may_fork);
             if (call_fork) {   /* common case */
                 long pushed = push_roots();
+                end_gut();
                 /* run a fork() inside the transaction */
                 printf("==========   FORK  =========\n");
                 *(long*)thread_may_fork = 0;
@@ -426,7 +469,7 @@
     setup_thread();
 
     td.roots_on_ss_at_tr_start = 0;
-    stm_start_transaction(&stm_thread_local);
+    stm_enter_transactional_zone(&stm_thread_local);
     td.roots_on_ss = td.roots_on_ss_at_tr_start;
     td.active_roots_num = 0;
 
@@ -435,7 +478,8 @@
         frame_loop();
     }
 
-    stm_commit_transaction();
+    end_gut();
+    stm_leave_transactional_zone(&stm_thread_local);
 
     stm_rewind_jmp_leaveframe(&stm_thread_local, &rjbuf);
     stm_unregister_thread_local(&stm_thread_local);
diff --git a/c8/demo/demo_simple.c b/c8/demo/demo_simple.c
--- a/c8/demo/demo_simple.c
+++ b/c8/demo/demo_simple.c
@@ -70,18 +70,20 @@
 
     object_t *tmp;
     int i = 0;
+
+    stm_enter_transactional_zone(&stm_thread_local);
     while (i < ITERS) {
-        stm_start_transaction(&stm_thread_local);
         tl_counter++;
         if (i % 500 < 250)
             STM_PUSH_ROOT(stm_thread_local, stm_allocate(16));//gl_counter++;
         else
             STM_POP_ROOT(stm_thread_local, tmp);
-        stm_commit_transaction();
+        stm_force_transaction_break(&stm_thread_local);
         i++;
     }
 
     OPT_ASSERT(org == (char *)stm_thread_local.shadowstack);
+    stm_leave_transactional_zone(&stm_thread_local);
 
     stm_rewind_jmp_leaveframe(&stm_thread_local, &rjbuf);
     stm_unregister_thread_local(&stm_thread_local);
diff --git a/c8/demo/test_shadowstack.c b/c8/demo/test_shadowstack.c
--- a/c8/demo/test_shadowstack.c
+++ b/c8/demo/test_shadowstack.c
@@ -43,17 +43,16 @@
     stm_register_thread_local(&stm_thread_local);
     stm_rewind_jmp_enterframe(&stm_thread_local, &rjbuf);
 
-    stm_start_transaction(&stm_thread_local);
+    stm_enter_transactional_zone(&stm_thread_local);
     node_t *node = (node_t *)stm_allocate(sizeof(struct node_s));
     node->value = 129821;
     STM_PUSH_ROOT(stm_thread_local, node);
     STM_PUSH_ROOT(stm_thread_local, 333);  /* odd value */
-    stm_commit_transaction();
 
     /* now in a new transaction, pop the node off the shadowstack, but
        then do a major collection.  It should still be found by the
        tracing logic. */
-    stm_start_transaction(&stm_thread_local);
+    stm_force_transaction_break(&stm_thread_local);
     STM_POP_ROOT_RET(stm_thread_local);
     STM_POP_ROOT(stm_thread_local, node);
     assert(node->value == 129821);
diff --git a/c8/stm/atomic.h b/c8/stm/atomic.h
--- a/c8/stm/atomic.h
+++ b/c8/stm/atomic.h
@@ -24,15 +24,21 @@
 
 #if defined(__i386__) || defined(__amd64__)
 
-# define HAVE_FULL_EXCHANGE_INSN
   static inline void spin_loop(void) { asm("pause" : : : "memory"); }
   static inline void write_fence(void) { asm("" : : : "memory"); }
+/*# define atomic_exchange(ptr, old, new)  do {         \
+          (old) = __sync_lock_test_and_set(ptr, new);   \
+      } while (0)*/
 
 #else
 
   static inline void spin_loop(void) { asm("" : : : "memory"); }
   static inline void write_fence(void) { __sync_synchronize(); }
 
+/*# define atomic_exchange(ptr, old, new)  do {           \
+          (old) = *(ptr);                                 \
+      } while (UNLIKELY(!__sync_bool_compare_and_swap(ptr, old, new))); */
+
 #endif
 
 
diff --git a/c8/stm/core.c b/c8/stm/core.c
--- a/c8/stm/core.c
+++ b/c8/stm/core.c
@@ -324,10 +324,7 @@
     /* Don't check this 'cl'. This entry is already checked */
 
     if (STM_PSEGMENT->transaction_state == TS_INEVITABLE) {
-        //assert(first_cl->next == INEV_RUNNING);
-        /* the above assert may fail when running a major collection
-           while the commit of the inevitable transaction is in progress
-           and the element is already attached */
+        assert(first_cl->next == INEV_RUNNING);
         return true;
     }
 
@@ -496,11 +493,23 @@
 
 static void wait_for_other_inevitable(struct stm_commit_log_entry_s *old)
 {
+    intptr_t detached = fetch_detached_transaction();
+    if (detached != 0) {
+        commit_fetched_detached_transaction(detached);
+        return;
+    }
+
     timing_event(STM_SEGMENT->running_thread, STM_WAIT_OTHER_INEVITABLE);
 
     while (old->next == INEV_RUNNING && !safe_point_requested()) {
         spin_loop();
         usleep(10);    /* XXXXXX */
+
+        detached = fetch_detached_transaction();
+        if (detached != 0) {
+            commit_fetched_detached_transaction(detached);
+            break;
+        }
     }
     timing_event(STM_SEGMENT->running_thread, STM_WAIT_DONE);
 }
@@ -509,7 +518,8 @@
 static void readd_wb_executed_flags(void);
 static void check_all_write_barrier_flags(char *segbase, struct list_s *list);
 
-static void _validate_and_attach(struct stm_commit_log_entry_s *new)
+static bool _validate_and_attach(struct stm_commit_log_entry_s *new,
+                                 bool can_sleep)
 {
     struct stm_commit_log_entry_s *old;
 
@@ -571,6 +581,8 @@
             /* XXXXXX for now just sleep.  We should really ask to inev
                transaction to do the commit for us, and then we can
                continue running. */
+            if (!can_sleep)
+                return false;
             dprintf(("_validate_and_attach(%p) failed, "
                      "waiting for inevitable\n", new));
             wait_for_other_inevitable(old);
@@ -591,18 +603,17 @@
 
     if (is_commit) {
         /* compare with _validate_and_add_to_commit_log */
-        STM_PSEGMENT->transaction_state = TS_NONE;
-        STM_PSEGMENT->safe_point = SP_NO_TRANSACTION;
-
         list_clear(STM_PSEGMENT->modified_old_objects);
         STM_PSEGMENT->last_commit_log_entry = new;
         release_modification_lock_wr(STM_SEGMENT->segment_num);
     }
+    return true;
 }
 
-static void _validate_and_turn_inevitable(void)
+static bool _validate_and_turn_inevitable(bool can_sleep)
 {
-    _validate_and_attach((struct stm_commit_log_entry_s *)INEV_RUNNING);
+    return _validate_and_attach((struct stm_commit_log_entry_s *)INEV_RUNNING,
+                                can_sleep);
 }
 
 static void _validate_and_add_to_commit_log(void)
@@ -611,6 +622,8 @@
 
     new = _create_commit_log_entry();
     if (STM_PSEGMENT->transaction_state == TS_INEVITABLE) {
+        assert(_stm_detached_inevitable_from_thread == 0);  /* running it */
+
         old = STM_PSEGMENT->last_commit_log_entry;
         new->rev_num = old->rev_num + 1;
         OPT_ASSERT(old->next == INEV_RUNNING);
@@ -621,17 +634,18 @@
                                       STM_PSEGMENT->modified_old_objects);
 
         /* compare with _validate_and_attach: */
-        STM_PSEGMENT->transaction_state = TS_NONE;
-        STM_PSEGMENT->safe_point = SP_NO_TRANSACTION;
+        acquire_modification_lock_wr(STM_SEGMENT->segment_num);
         list_clear(STM_PSEGMENT->modified_old_objects);
         STM_PSEGMENT->last_commit_log_entry = new;
 
         /* do it: */
         bool yes = __sync_bool_compare_and_swap(&old->next, INEV_RUNNING, new);
         OPT_ASSERT(yes);
+
+        release_modification_lock_wr(STM_SEGMENT->segment_num);
     }
     else {
-        _validate_and_attach(new);
+        _validate_and_attach(new, /*can_sleep=*/true);
     }
 }
 
@@ -1123,7 +1137,7 @@
 
 
 
-static void _stm_start_transaction(stm_thread_local_t *tl)
+static void _do_start_transaction(stm_thread_local_t *tl)
 {
     assert(!_stm_in_transaction(tl));
 
@@ -1140,7 +1154,7 @@
 #endif
     STM_PSEGMENT->shadowstack_at_start_of_transaction = tl->shadowstack;
     STM_PSEGMENT->threadlocal_at_start_of_transaction = tl->thread_local_obj;
-
+    STM_PSEGMENT->total_throw_away_nursery = 0;
 
     assert(list_is_empty(STM_PSEGMENT->modified_old_objects));
     assert(list_is_empty(STM_PSEGMENT->large_overflow_objects));
@@ -1181,35 +1195,34 @@
     stm_validate();
 }
 
-long stm_start_transaction(stm_thread_local_t *tl)
+#ifdef STM_NO_AUTOMATIC_SETJMP
+static int did_abort = 0;
+#endif
+
+long _stm_start_transaction(stm_thread_local_t *tl)
 {
     s_mutex_lock();
 #ifdef STM_NO_AUTOMATIC_SETJMP
-    long repeat_count = 0;    /* test/support.py */
+    long repeat_count = did_abort;    /* test/support.py */
+    did_abort = 0;
 #else
     long repeat_count = stm_rewind_jmp_setjmp(tl);
 #endif
-    _stm_start_transaction(tl);
+    _do_start_transaction(tl);
+
+    if (repeat_count == 0) {  /* else, 'nursery_mark' was already set
+                                 in abort_data_structures_from_segment_num() */
+        STM_SEGMENT->nursery_mark = ((stm_char *)_stm_nursery_start +
+                                     stm_fill_mark_nursery_bytes);
+    }
     return repeat_count;
 }
 
-void stm_start_inevitable_transaction(stm_thread_local_t *tl)
-{
-    /* used to be more efficient, starting directly an inevitable transaction,
-       but there is no real point any more, I believe */
-    rewind_jmp_buf rjbuf;
-    stm_rewind_jmp_enterframe(tl, &rjbuf);
-
-    stm_start_transaction(tl);
-    stm_become_inevitable(tl, "start_inevitable_transaction");
-
-    stm_rewind_jmp_leaveframe(tl, &rjbuf);
-}
-
 #ifdef STM_NO_AUTOMATIC_SETJMP
 void _test_run_abort(stm_thread_local_t *tl) __attribute__((noreturn));
-int stm_is_inevitable(void)
+int stm_is_inevitable(stm_thread_local_t *tl)
 {
+    assert(STM_SEGMENT->running_thread == tl);
     switch (STM_PSEGMENT->transaction_state) {
     case TS_REGULAR: return 0;
     case TS_INEVITABLE: return 1;
@@ -1224,6 +1237,7 @@
 {
     stm_thread_local_t *tl = STM_SEGMENT->running_thread;
 
+    assert(_has_mutex());
     STM_PSEGMENT->safe_point = SP_NO_TRANSACTION;
     STM_PSEGMENT->transaction_state = TS_NONE;
 
@@ -1231,7 +1245,15 @@
     list_clear(STM_PSEGMENT->objects_pointing_to_nursery);
     list_clear(STM_PSEGMENT->old_objects_with_cards_set);
     list_clear(STM_PSEGMENT->large_overflow_objects);
-    timing_event(tl, event);
+    if (tl != NULL)
+        timing_event(tl, event);
+
+    /* If somebody is waiting for us to reach a safe point, we simply
+       signal it now and leave this transaction.  This should be enough
+       for synchronize_all_threads() to retry and notice that we are
+       no longer SP_RUNNING. */
+    if (STM_SEGMENT->nursery_end != NURSERY_END)
+        cond_signal(C_AT_SAFE_POINT);
 
     release_thread_segment(tl);
     /* cannot access STM_SEGMENT or STM_PSEGMENT from here ! */
@@ -1280,24 +1302,55 @@
 }
 
 
-void stm_commit_transaction(void)
+void _stm_commit_transaction(void)
+{
+    assert(STM_PSEGMENT->running_pthread == pthread_self());
+    _core_commit_transaction(/*external=*/ false);
+}
+
+static void _core_commit_transaction(bool external)
 {
     exec_local_finalizers();
 
     assert(!_has_mutex());
     assert(STM_PSEGMENT->safe_point == SP_RUNNING);
-    assert(STM_PSEGMENT->running_pthread == pthread_self());
+    assert(STM_PSEGMENT->transaction_state != TS_NONE);
+    if (globally_unique_transaction) {
+        stm_fatalerror("cannot commit between stm_stop_all_other_threads "
+                       "and stm_resume_all_other_threads");
+    }
 
-    dprintf(("> stm_commit_transaction()\n"));
-    minor_collection(1);
+    dprintf(("> stm_commit_transaction(external=%d)\n", (int)external));
+    minor_collection(/*commit=*/ true, external);
+    if (!external && is_major_collection_requested()) {
+        s_mutex_lock();
+        if (is_major_collection_requested()) {   /* if still true */
+            major_collection_with_mutex();
+        }
+        s_mutex_unlock();
+    }
 
     push_large_overflow_objects_to_other_segments();
     /* push before validate. otherwise they are reachable too early */
 
+    if (external) {
+        /* from this point on, unlink the original 'stm_thread_local_t *'
+           from its segment.  Better do it as soon as possible, because
+           other threads might be spin-looping, waiting for the -1 to
+           disappear. */
+        STM_SEGMENT->running_thread = NULL;
+        write_fence();
+        assert(_stm_detached_inevitable_from_thread == -1);
+        _stm_detached_inevitable_from_thread = 0;
+    }
+
     bool was_inev = STM_PSEGMENT->transaction_state == TS_INEVITABLE;
     _validate_and_add_to_commit_log();
 
-    stm_rewind_jmp_forget(STM_SEGMENT->running_thread);
+    if (!was_inev) {
+        assert(!external);
+        stm_rewind_jmp_forget(STM_SEGMENT->running_thread);
+    }
 
     /* XXX do we still need a s_mutex_lock() section here? */
     s_mutex_lock();
@@ -1314,23 +1367,9 @@
 
     invoke_and_clear_user_callbacks(0);   /* for commit */
 
-    /* >>>>> there may be a FORK() happening in the safepoint below <<<<<*/
-    enter_safe_point_if_requested();
-    assert(STM_SEGMENT->nursery_end == NURSERY_END);
-
-    /* if a major collection is required, do it here */
-    if (is_major_collection_requested()) {
-        major_collection_with_mutex();
-    }
-
-    _verify_cards_cleared_in_all_lists(get_priv_segment(STM_SEGMENT->segment_num));
-
-    if (globally_unique_transaction && was_inev) {
-        committed_globally_unique_transaction();
-    }
-
     /* done */
     stm_thread_local_t *tl = STM_SEGMENT->running_thread;
+    assert(external == (tl == NULL));
     _finish_transaction(STM_TRANSACTION_COMMIT);
     /* cannot access STM_SEGMENT or STM_PSEGMENT from here ! */
 
@@ -1338,7 +1377,8 @@
 
     /* between transactions, call finalizers. this will execute
        a transaction itself */
-    invoke_general_finalizers(tl);
+    if (tl != NULL)
+        invoke_general_finalizers(tl);
 }
 
 static void reset_modified_from_backup_copies(int segment_num)
@@ -1399,7 +1439,7 @@
 
     abort_finalizers(pseg);
 
-    long bytes_in_nursery = throw_away_nursery(pseg);
+    throw_away_nursery(pseg);
 
     /* clear CARD_MARKED on objs (don't care about CARD_MARKED_OLD) */
     LIST_FOREACH_R(pseg->old_objects_with_cards_set, object_t * /*item*/,
@@ -1433,7 +1473,26 @@
     assert(tl->shadowstack == pseg->shadowstack_at_start_of_transaction);
 #endif
     tl->thread_local_obj = pseg->threadlocal_at_start_of_transaction;
-    tl->last_abort__bytes_in_nursery = bytes_in_nursery;
+
+
+    /* Set the next nursery_mark: first compute the value that
+       nursery_mark must have had at the start of the aborted transaction */
+    stm_char *old_mark =pseg->pub.nursery_mark + pseg->total_throw_away_nursery;
+
+    /* This means that the limit, in term of bytes, was: */
+    uintptr_t old_limit = old_mark - (stm_char *)_stm_nursery_start;
+
+    /* If 'total_throw_away_nursery' is smaller than old_limit, use that */
+    if (pseg->total_throw_away_nursery < old_limit)
+        old_limit = pseg->total_throw_away_nursery;
+
+    /* Now set the new limit to 90% of the old limit */
+    pseg->pub.nursery_mark = ((stm_char *)_stm_nursery_start +
+                              (uintptr_t)(old_limit * 0.9));
+
+#ifdef STM_NO_AUTOMATIC_SETJMP
+    did_abort = 1;
+#endif
 
     list_clear(pseg->objects_pointing_to_nursery);
     list_clear(pseg->old_objects_with_cards_set);
@@ -1502,36 +1561,40 @@
 
 void _stm_become_inevitable(const char *msg)
 {
-    if (STM_PSEGMENT->transaction_state == TS_REGULAR) {
+    assert(STM_PSEGMENT->transaction_state == TS_REGULAR);
+    _stm_collectable_safe_point();
+
+    if (msg != MSG_INEV_DONT_SLEEP) {
         dprintf(("become_inevitable: %s\n", msg));
-        _stm_collectable_safe_point();
         timing_become_inevitable();
-
-        _validate_and_turn_inevitable();
-        STM_PSEGMENT->transaction_state = TS_INEVITABLE;
-
-        stm_rewind_jmp_forget(STM_SEGMENT->running_thread);
-        invoke_and_clear_user_callbacks(0);   /* for commit */
+        _validate_and_turn_inevitable(/*can_sleep=*/true);
     }
     else {
-        assert(STM_PSEGMENT->transaction_state == TS_INEVITABLE);
+        if (!_validate_and_turn_inevitable(/*can_sleep=*/false))
+            return;
+        timing_become_inevitable();
     }
+    STM_PSEGMENT->transaction_state = TS_INEVITABLE;
+
+    stm_rewind_jmp_forget(STM_SEGMENT->running_thread);
+    invoke_and_clear_user_callbacks(0);   /* for commit */
 }
 
+#if 0
 void stm_become_globally_unique_transaction(stm_thread_local_t *tl,
                                             const char *msg)
 {
-    stm_become_inevitable(tl, msg);   /* may still abort */
+    stm_become_inevitable(tl, msg);
 
     s_mutex_lock();
     synchronize_all_threads(STOP_OTHERS_AND_BECOME_GLOBALLY_UNIQUE);
     s_mutex_unlock();
 }
-
+#endif
 
 void stm_stop_all_other_threads(void)
 {
-    if (!stm_is_inevitable())         /* may still abort */
+    if (!stm_is_inevitable(STM_SEGMENT->running_thread))  /* may still abort */
         _stm_become_inevitable("stop_all_other_threads");
 
     s_mutex_lock();
diff --git a/c8/stm/core.h b/c8/stm/core.h
--- a/c8/stm/core.h
+++ b/c8/stm/core.h
@@ -152,6 +152,9 @@
     stm_char *sq_fragments[SYNC_QUEUE_SIZE];
     int sq_fragsizes[SYNC_QUEUE_SIZE];
     int sq_len;
+
+    /* For nursery_mark */
+    uintptr_t total_throw_away_nursery;
 };
 
 enum /* safe_point */ {
@@ -170,6 +173,8 @@
     TS_INEVITABLE,
 };
 
+#define MSG_INEV_DONT_SLEEP  ((const char *)1)
+
 #define in_transaction(tl)                                              \
     (get_segment((tl)->last_associated_segment_num)->running_thread == (tl))
 
@@ -297,6 +302,7 @@
 
 static void _signal_handler(int sig, siginfo_t *siginfo, void *context);
 static bool _stm_validate(void);
+static void _core_commit_transaction(bool external);
 
 static inline bool was_read_remote(char *base, object_t *obj)
 {
diff --git a/c8/stm/detach.c b/c8/stm/detach.c
new file mode 100644
--- /dev/null
+++ b/c8/stm/detach.c
@@ -0,0 +1,175 @@
+#ifndef _STM_CORE_H_
+# error "must be compiled via stmgc.c"
+#endif
+
+#include <errno.h>
+
+
+/* Idea: if stm_leave_transactional_zone() is quickly followed by
+   stm_enter_transactional_zone() in the same thread, then we should
+   simply try to have one inevitable transaction that does both sides.
+   This is useful if there are many such small interruptions.
+
+   stm_leave_transactional_zone() tries to make sure the transaction
+   is inevitable, and then sticks the current 'stm_thread_local_t *'
+   into _stm_detached_inevitable_from_thread.
+   stm_enter_transactional_zone() has a fast-path if the same
+   'stm_thread_local_t *' is still there.
+
+   If a different thread grabs it, it atomically replaces the value in
+   _stm_detached_inevitable_from_thread with -1, commits it (this part
+   involves reading for example the shadowstack of the thread that
+   originally detached), and at the point where we know the original
+   stm_thread_local_t is no longer relevant, we reset
+   _stm_detached_inevitable_from_thread to 0.
+*/
+
+volatile intptr_t _stm_detached_inevitable_from_thread;
+
+
+static void setup_detach(void)
+{
+    _stm_detached_inevitable_from_thread = 0;
+}
+
+
+void _stm_leave_noninevitable_transactional_zone(void)
+{
+    int saved_errno = errno;
+    dprintf(("leave_noninevitable_transactional_zone\n"));
+    _stm_become_inevitable(MSG_INEV_DONT_SLEEP);
+
+    /* did it work? */
+    if (STM_PSEGMENT->transaction_state == TS_INEVITABLE) {   /* yes */
+        dprintf(("leave_noninevitable_transactional_zone: now inevitable\n"));
+        stm_thread_local_t *tl = STM_SEGMENT->running_thread;
+        _stm_detach_inevitable_transaction(tl);
+    }
+    else {   /* no */
+        dprintf(("leave_noninevitable_transactional_zone: commit\n"));
+        _stm_commit_transaction();
+    }
+    errno = saved_errno;
+}
+
+static void commit_external_inevitable_transaction(void)
+{
+    assert(STM_PSEGMENT->transaction_state == TS_INEVITABLE); /* can't abort */
+    _core_commit_transaction(/*external=*/ true);
+}
+
+void _stm_reattach_transaction(stm_thread_local_t *tl)
+{
+    intptr_t old;
+    int saved_errno = errno;
+ restart:
+    old = _stm_detached_inevitable_from_thread;
+    if (old != 0) {
+        if (old == -1) {
+            /* busy-loop: wait until _stm_detached_inevitable_from_thread
+               is reset to a value different from -1 */
+            dprintf(("reattach_transaction: busy wait...\n"));
+            while (_stm_detached_inevitable_from_thread == -1)
+                spin_loop();
+
+            /* then retry */
+            goto restart;
+        }
+
+        if (!__sync_bool_compare_and_swap(&_stm_detached_inevitable_from_thread,
+                                          old, -1))
+            goto restart;
+
+        stm_thread_local_t *old_tl = (stm_thread_local_t *)old;
+        int remote_seg_num = old_tl->last_associated_segment_num;
+        dprintf(("reattach_transaction: commit detached from seg %d\n",
+                 remote_seg_num));
+
+        tl->last_associated_segment_num = remote_seg_num;
+        ensure_gs_register(remote_seg_num);
+        commit_external_inevitable_transaction();
+    }
+    dprintf(("reattach_transaction: start a new transaction\n"));
+    _stm_start_transaction(tl);
+    errno = saved_errno;
+}
+
+void stm_force_transaction_break(stm_thread_local_t *tl)
+{
+    dprintf(("> stm_force_transaction_break()\n"));
+    assert(STM_SEGMENT->running_thread == tl);
+    _stm_commit_transaction();
+    _stm_start_transaction(tl);
+}
+
+static intptr_t fetch_detached_transaction(void)
+{
+    intptr_t cur;
+ restart:
+    cur = _stm_detached_inevitable_from_thread;
+    if (cur == 0) {    /* fast-path */
+        return 0;   /* _stm_detached_inevitable_from_thread not changed */
+    }
+    if (cur == -1) {
+        /* busy-loop: wait until _stm_detached_inevitable_from_thread
+           is reset to a value different from -1 */
+        while (_stm_detached_inevitable_from_thread == -1)
+            spin_loop();
+        goto restart;
+    }
+    if (!__sync_bool_compare_and_swap(&_stm_detached_inevitable_from_thread,
+                                      cur, -1))
+        goto restart;
+
+    /* this is the only case where we grabbed a detached transaction.
+       _stm_detached_inevitable_from_thread is still -1, until
+       commit_fetched_detached_transaction() is called. */
+    assert(_stm_detached_inevitable_from_thread == -1);
+    return cur;
+}
+
+static void commit_fetched_detached_transaction(intptr_t old)
+{
+    /* Here, 'seg_num' is the segment that contains the detached
+       inevitable transaction from fetch_detached_transaction(),
+       probably belonging to an unrelated thread.  We fetched it,
+       which means that nobody else can concurrently fetch it now, but
+       everybody will see that there is still a concurrent inevitable
+       transaction.  This should guarantee there are no race
+       conditions.
+    */
+    int mysegnum = STM_SEGMENT->segment_num;
+    int segnum = ((stm_thread_local_t *)old)->last_associated_segment_num;
+    dprintf(("commit_fetched_detached_transaction from seg %d\n", segnum));
+    assert(segnum > 0);
+
+    if (segnum != mysegnum) {
+        set_gs_register(get_segment_base(segnum));
+    }
+    commit_external_inevitable_transaction();
+
+    if (segnum != mysegnum) {
+        set_gs_register(get_segment_base(mysegnum));
+    }
+}
+
+static void commit_detached_transaction_if_from(stm_thread_local_t *tl)
+{
+    intptr_t old;
+ restart:
+    old = _stm_detached_inevitable_from_thread;
+    if (old == (intptr_t)tl) {
+        if (!__sync_bool_compare_and_swap(&_stm_detached_inevitable_from_thread,
+                                          old, -1))
+            goto restart;
+        commit_fetched_detached_transaction(old);
+        return;
+    }
+    if (old == -1) {
+        /* busy-loop: wait until _stm_detached_inevitable_from_thread
+           is reset to a value different from -1 */
+        while (_stm_detached_inevitable_from_thread == -1)
+            spin_loop();
+        goto restart;
+    }
+}
diff --git a/c8/stm/detach.h b/c8/stm/detach.h
new file mode 100644
--- /dev/null
+++ b/c8/stm/detach.h
@@ -0,0 +1,5 @@
+
+static void setup_detach(void);
+static intptr_t fetch_detached_transaction(void);
+static void commit_fetched_detached_transaction(intptr_t old);
+static void commit_detached_transaction_if_from(stm_thread_local_t *tl);
diff --git a/c8/stm/finalizer.c b/c8/stm/finalizer.c
--- a/c8/stm/finalizer.c
+++ b/c8/stm/finalizer.c
@@ -494,11 +494,11 @@
 
     rewind_jmp_buf rjbuf;
     stm_rewind_jmp_enterframe(tl, &rjbuf);
-    stm_start_transaction(tl);
+    _stm_start_transaction(tl);
 
     _execute_finalizers(&g_finalizers);
 
-    stm_commit_transaction();
+    _stm_commit_transaction();
     stm_rewind_jmp_leaveframe(tl, &rjbuf);
 
     __sync_lock_release(&lock);
diff --git a/c8/stm/forksupport.c b/c8/stm/forksupport.c
--- a/c8/stm/forksupport.c
+++ b/c8/stm/forksupport.c
@@ -40,7 +40,7 @@
 
     bool was_in_transaction = _stm_in_transaction(this_tl);
     if (!was_in_transaction)
-        stm_start_transaction(this_tl);
+        _stm_start_transaction(this_tl);
     assert(in_transaction(this_tl));
 
     stm_become_inevitable(this_tl, "fork");
@@ -73,7 +73,7 @@
     s_mutex_unlock();
 
     if (!was_in_transaction) {
-        stm_commit_transaction();
+        _stm_commit_transaction();
     }
 
     dprintf(("forksupport_parent: continuing to run\n"));
@@ -159,7 +159,7 @@
     assert(STM_SEGMENT->segment_num == segnum);
 
     if (!fork_was_in_transaction) {
-        stm_commit_transaction();
+        _stm_commit_transaction();
     }
 
     /* Done */
diff --git a/c8/stm/fprintcolor.h b/c8/stm/fprintcolor.h
--- a/c8/stm/fprintcolor.h
+++ b/c8/stm/fprintcolor.h
@@ -37,5 +37,6 @@
 /* ------------------------------------------------------------ */
 
 
+__attribute__((unused))
 static void stm_fatalerror(const char *format, ...)
      __attribute__((format (printf, 1, 2), noreturn));
diff --git a/c8/stm/nursery.c b/c8/stm/nursery.c
--- a/c8/stm/nursery.c
+++ b/c8/stm/nursery.c
@@ -11,8 +11,13 @@
 static uintptr_t _stm_nursery_start;
 
 
+#define DEFAULT_FILL_MARK_NURSERY_BYTES   (NURSERY_SIZE / 4)
+
+uintptr_t stm_fill_mark_nursery_bytes = DEFAULT_FILL_MARK_NURSERY_BYTES;
+
 /************************************************************/
 
+
 static void setup_nursery(void)
 {
     assert(_STM_FAST_ALLOC <= NURSERY_SIZE);
@@ -309,6 +314,7 @@
     else
         assert(finalbase <= ssbase && ssbase <= current);
 
+    dprintf(("collect_roots_in_nursery:\n"));
     while (current > ssbase) {
         --current;
         uintptr_t x = (uintptr_t)current->ss;
@@ -320,6 +326,7 @@
         else {
             /* it is an odd-valued marker, ignore */
         }
+        dprintf(("    %p: %p -> %p\n", current, (void *)x, current->ss));
     }
 
     minor_trace_if_young(&tl->thread_local_obj);
@@ -447,7 +454,7 @@
 }
 
 
-static size_t throw_away_nursery(struct stm_priv_segment_info_s *pseg)
+static void throw_away_nursery(struct stm_priv_segment_info_s *pseg)
 {
 #pragma push_macro("STM_PSEGMENT")
 #pragma push_macro("STM_SEGMENT")
@@ -480,7 +487,9 @@
 # endif
 #endif
 
+    pseg->total_throw_away_nursery += nursery_used;
     pseg->pub.nursery_current = (stm_char *)_stm_nursery_start;
+    pseg->pub.nursery_mark -= nursery_used;
 
     /* free any object left from 'young_outside_nursery' */
     if (!tree_is_cleared(pseg->young_outside_nursery)) {
@@ -505,8 +514,6 @@
     }
 
     tree_clear(pseg->nursery_objects_shadows);
-
-    return nursery_used;
 #pragma pop_macro("STM_SEGMENT")
 #pragma pop_macro("STM_PSEGMENT")
 }
@@ -519,6 +526,7 @@
 static void _do_minor_collection(bool commit)
 {
     dprintf(("minor_collection commit=%d\n", (int)commit));
+    assert(!STM_SEGMENT->no_safe_point_here);
 
     STM_PSEGMENT->minor_collect_will_commit_now = commit;
 
@@ -561,11 +569,12 @@
     assert(MINOR_NOTHING_TO_DO(STM_PSEGMENT));
 }
 
-static void minor_collection(bool commit)
+static void minor_collection(bool commit, bool external)
 {
     assert(!_has_mutex());
 
-    stm_safe_point();
+    if (!external)
+        stm_safe_point();
 
     timing_event(STM_SEGMENT->running_thread, STM_GC_MINOR_START);
 
@@ -579,7 +588,7 @@
     if (level > 0)
         force_major_collection_request();
 
-    minor_collection(/*commit=*/ false);
+    minor_collection(/*commit=*/ false, /*external=*/ false);
 
 #ifdef STM_TESTS
     /* tests don't want aborts in stm_allocate, thus
diff --git a/c8/stm/nursery.h b/c8/stm/nursery.h
--- a/c8/stm/nursery.h
+++ b/c8/stm/nursery.h
@@ -10,9 +10,9 @@
                                 object_t *obj, uint8_t mark_value,
                                 bool mark_all, bool really_clear);
 
-static void minor_collection(bool commit);
+static void minor_collection(bool commit, bool external);
 static void check_nursery_at_transaction_start(void);
-static size_t throw_away_nursery(struct stm_priv_segment_info_s *pseg);
+static void throw_away_nursery(struct stm_priv_segment_info_s *pseg);
 static void major_do_validation_and_minor_collections(void);
 
 static void assert_memset_zero(void *s, size_t n);
diff --git a/c8/stm/setup.c b/c8/stm/setup.c
--- a/c8/stm/setup.c
+++ b/c8/stm/setup.c
@@ -134,8 +134,12 @@
     setup_pages();
     setup_forksupport();
     setup_finalizer();
+    setup_detach();
 
     set_gs_register(get_segment_base(0));
+
+    dprintf(("nursery: %p -> %p\n", (void *)NURSERY_START,
+                                    (void *)NURSERY_END));
 }
 
 void stm_teardown(void)
@@ -229,6 +233,8 @@
 {
     int num;
     s_mutex_lock();
+    tl->self = tl;    /* for faster access to &stm_thread_local (and easier
+                         from the PyPy JIT, too) */
     if (stm_all_thread_locals == NULL) {
         stm_all_thread_locals = tl->next = tl->prev = tl;
         num = 0;
@@ -263,6 +269,8 @@
 
 void stm_unregister_thread_local(stm_thread_local_t *tl)
 {
+    commit_detached_transaction_if_from(tl);
+
     s_mutex_lock();
     assert(tl->prev != NULL);
     assert(tl->next != NULL);
diff --git a/c8/stm/sync.c b/c8/stm/sync.c
--- a/c8/stm/sync.c
+++ b/c8/stm/sync.c
@@ -1,6 +1,7 @@
 #include <sys/syscall.h>
 #include <sys/prctl.h>
 #include <asm/prctl.h>
+#include <time.h>
 
 #ifndef _STM_CORE_H_
 # error "must be compiled via stmgc.c"
@@ -21,25 +22,29 @@
 
 static void setup_sync(void)
 {
-    if (pthread_mutex_init(&sync_ctl.global_mutex, NULL) != 0)
-        stm_fatalerror("mutex initialization: %m");
+    int err = pthread_mutex_init(&sync_ctl.global_mutex, NULL);
+    if (err != 0)
+        stm_fatalerror("mutex initialization: %d", err);
 
     long i;
     for (i = 0; i < _C_TOTAL; i++) {
-        if (pthread_cond_init(&sync_ctl.cond[i], NULL) != 0)
-            stm_fatalerror("cond initialization: %m");
+        err = pthread_cond_init(&sync_ctl.cond[i], NULL);
+        if (err != 0)
+            stm_fatalerror("cond initialization: %d", err);
     }
 }
 
 static void teardown_sync(void)
 {
-    if (pthread_mutex_destroy(&sync_ctl.global_mutex) != 0)
-        stm_fatalerror("mutex destroy: %m");
+    int err = pthread_mutex_destroy(&sync_ctl.global_mutex);
+    if (err != 0)
+        stm_fatalerror("mutex destroy: %d", err);
 
     long i;
     for (i = 0; i < _C_TOTAL; i++) {
-        if (pthread_cond_destroy(&sync_ctl.cond[i]) != 0)
-            stm_fatalerror("cond destroy: %m");
+        err = pthread_cond_destroy(&sync_ctl.cond[i]);
+        if (err != 0)
+            stm_fatalerror("cond destroy: %d", err);
     }
 
     memset(&sync_ctl, 0, sizeof(sync_ctl));
@@ -59,19 +64,30 @@
         stm_fatalerror("syscall(arch_prctl, ARCH_SET_GS): %m");
 }
 
+static void ensure_gs_register(long segnum)
+{
+    /* XXX use this instead of set_gs_register() in many places */
+    if (STM_SEGMENT->segment_num != segnum) {
+        set_gs_register(get_segment_base(segnum));
+        assert(STM_SEGMENT->segment_num == segnum);
+    }
+}
+
 static inline void s_mutex_lock(void)
 {
     assert(!_has_mutex_here);
-    if (UNLIKELY(pthread_mutex_lock(&sync_ctl.global_mutex) != 0))
-        stm_fatalerror("pthread_mutex_lock: %m");
+    int err = pthread_mutex_lock(&sync_ctl.global_mutex);
+    if (UNLIKELY(err != 0))
+        stm_fatalerror("pthread_mutex_lock: %d", err);
     assert((_has_mutex_here = true, 1));
 }
 
 static inline void s_mutex_unlock(void)
 {
     assert(_has_mutex_here);
-    if (UNLIKELY(pthread_mutex_unlock(&sync_ctl.global_mutex) != 0))
-        stm_fatalerror("pthread_mutex_unlock: %m");
+    int err = pthread_mutex_unlock(&sync_ctl.global_mutex);
+    if (UNLIKELY(err != 0))
+        stm_fatalerror("pthread_mutex_unlock: %d", err);
     assert((_has_mutex_here = false, 1));
 }
 
@@ -83,26 +99,70 @@
 #endif
 
     assert(_has_mutex_here);
-    if (UNLIKELY(pthread_cond_wait(&sync_ctl.cond[ctype],
-                                   &sync_ctl.global_mutex) != 0))
-        stm_fatalerror("pthread_cond_wait/%d: %m", (int)ctype);
+    int err = pthread_cond_wait(&sync_ctl.cond[ctype],
+                                &sync_ctl.global_mutex);
+    if (UNLIKELY(err != 0))
+        stm_fatalerror("pthread_cond_wait/%d: %d", (int)ctype, err);
+}
+
+static inline void timespec_delay(struct timespec *t, double incr)
+{
+#ifdef CLOCK_REALTIME
+    clock_gettime(CLOCK_REALTIME, t);
+#else
+    struct timeval tv;
+    RPY_GETTIMEOFDAY(&tv);
+    t->tv_sec = tv.tv_sec;
+    t->tv_nsec = tv.tv_usec * 1000 + 999;
+#endif
+    /* assumes that "incr" is not too large, less than 1 second */
+    long nsec = t->tv_nsec + (long)(incr * 1000000000.0);
+    if (nsec >= 1000000000) {
+        t->tv_sec += 1;
+        nsec -= 1000000000;
+        assert(nsec < 1000000000);
+    }
+    t->tv_nsec = nsec;
+}
+
+static inline bool cond_wait_timeout(enum cond_type_e ctype, double delay)
+{
+#ifdef STM_NO_COND_WAIT
+    stm_fatalerror("*** cond_wait/%d called!", (int)ctype);
+#endif
+
+    assert(_has_mutex_here);
+
+    struct timespec t;
+    timespec_delay(&t, delay);
+
+    int err = pthread_cond_timedwait(&sync_ctl.cond[ctype],
+                                     &sync_ctl.global_mutex, &t);
+    if (err == 0)
+        return true;     /* success */
+    if (LIKELY(err == ETIMEDOUT))
+        return false;    /* timeout */
+    stm_fatalerror("pthread_cond_timedwait/%d: %d", (int)ctype, err);
 }
 
 static inline void cond_signal(enum cond_type_e ctype)
 {
-    if (UNLIKELY(pthread_cond_signal(&sync_ctl.cond[ctype]) != 0))
-        stm_fatalerror("pthread_cond_signal/%d: %m", (int)ctype);
+    int err = pthread_cond_signal(&sync_ctl.cond[ctype]);
+    if (UNLIKELY(err != 0))
+        stm_fatalerror("pthread_cond_signal/%d: %d", (int)ctype, err);
 }
 
 static inline void cond_broadcast(enum cond_type_e ctype)
 {
-    if (UNLIKELY(pthread_cond_broadcast(&sync_ctl.cond[ctype]) != 0))
-        stm_fatalerror("pthread_cond_broadcast/%d: %m", (int)ctype);
+    int err = pthread_cond_broadcast(&sync_ctl.cond[ctype]);
+    if (UNLIKELY(err != 0))
+        stm_fatalerror("pthread_cond_broadcast/%d: %d", (int)ctype, err);
 }
 
 /************************************************************/
 
 
+#if 0
 void stm_wait_for_current_inevitable_transaction(void)
 {
  restart:
@@ -125,7 +185,7 @@
     }
     s_mutex_unlock();
 }
-
+#endif
 
 
 static bool acquire_thread_segment(stm_thread_local_t *tl)
@@ -155,10 +215,12 @@
         num = (num+1) % (NB_SEGMENTS-1);
         if (sync_ctl.in_use1[num+1] == 0) {
             /* we're getting 'num', a different number. */
-            dprintf(("acquired different segment: %d->%d\n",
-                     tl->last_associated_segment_num, num+1));
+            int old_num = tl->last_associated_segment_num;
+            dprintf(("acquired different segment: %d->%d\n", old_num, num+1));
             tl->last_associated_segment_num = num+1;
             set_gs_register(get_segment_base(num+1));
+            dprintf(("                            %d->%d\n", old_num, num+1));
+            (void)old_num;
             goto got_num;
         }
     }
@@ -185,18 +247,22 @@
 
 static void release_thread_segment(stm_thread_local_t *tl)
 {
+    int segnum;
     assert(_has_mutex());
 
     cond_signal(C_SEGMENT_FREE);
 
     assert(STM_SEGMENT->running_thread == tl);
-    assert(tl->last_associated_segment_num == STM_SEGMENT->segment_num);
-    assert(in_transaction(tl));
-    STM_SEGMENT->running_thread = NULL;
-    assert(!in_transaction(tl));
+    segnum = STM_SEGMENT->segment_num;
+    if (tl != NULL) {
+        assert(tl->last_associated_segment_num == segnum);
+        assert(in_transaction(tl));
+        STM_SEGMENT->running_thread = NULL;
+        assert(!in_transaction(tl));
+    }
 
-    assert(sync_ctl.in_use1[tl->last_associated_segment_num] == 1);
-    sync_ctl.in_use1[tl->last_associated_segment_num] = 0;
+    assert(sync_ctl.in_use1[segnum] == 1);
+    sync_ctl.in_use1[segnum] = 0;
 }
 
 __attribute__((unused))
@@ -263,16 +329,19 @@
     }
     assert(!pause_signalled);
     pause_signalled = true;
+    dprintf(("request to pause\n"));
 }
 
 static inline long count_other_threads_sp_running(void)
 {
     /* Return the number of other threads in SP_RUNNING.
-       Asserts that SP_RUNNING threads still have the NSE_SIGxxx. */
+       Asserts that SP_RUNNING threads still have the NSE_SIGxxx.
+       (A detached inevitable transaction is still SP_RUNNING.) */
     long i;
     long result = 0;
-    int my_num = STM_SEGMENT->segment_num;
+    int my_num;
 
+    my_num = STM_SEGMENT->segment_num;
     for (i = 1; i < NB_SEGMENTS; i++) {
         if (i != my_num && get_priv_segment(i)->safe_point == SP_RUNNING) {
             assert(get_segment(i)->nursery_end <= _STM_NSE_SIGNAL_MAX);
@@ -295,6 +364,7 @@
         if (get_segment(i)->nursery_end == NSE_SIGPAUSE)
             get_segment(i)->nursery_end = NURSERY_END;
     }
+    dprintf(("request removed\n"));
     cond_broadcast(C_REQUEST_REMOVED);
 }
 
@@ -312,6 +382,8 @@
         if (STM_SEGMENT->nursery_end == NURSERY_END)
             break;    /* no safe point requested */
 
+        dprintf(("enter safe point\n"));
+        assert(!STM_SEGMENT->no_safe_point_here);
         assert(STM_SEGMENT->nursery_end == NSE_SIGPAUSE);
         assert(pause_signalled);
 
@@ -326,11 +398,15 @@
         cond_wait(C_REQUEST_REMOVED);
         STM_PSEGMENT->safe_point = SP_RUNNING;
         timing_event(STM_SEGMENT->running_thread, STM_WAIT_DONE);
+        assert(!STM_SEGMENT->no_safe_point_here);
+        dprintf(("left safe point\n"));
     }
 }
 
 static void synchronize_all_threads(enum sync_type_e sync_type)
 {
+ restart:
+    assert(_has_mutex());
     enter_safe_point_if_requested();
 
     /* Only one thread should reach this point concurrently.  This is
@@ -349,8 +425,19 @@
     /* If some other threads are SP_RUNNING, we cannot proceed now.
        Wait until all other threads are suspended. */
     while (count_other_threads_sp_running() > 0) {
+
+        intptr_t detached = fetch_detached_transaction();
+        if (detached != 0) {
+            remove_requests_for_safe_point();    /* => C_REQUEST_REMOVED */
+            s_mutex_unlock();
+            commit_fetched_detached_transaction(detached);
+            s_mutex_lock();
+            goto restart;
+        }
+
         STM_PSEGMENT->safe_point = SP_WAIT_FOR_C_AT_SAFE_POINT;
-        cond_wait(C_AT_SAFE_POINT);
+        cond_wait_timeout(C_AT_SAFE_POINT, 0.00001);
+        /* every 10 microsec, try again fetch_detached_transaction() */
         STM_PSEGMENT->safe_point = SP_RUNNING;
 
         if (must_abort()) {
diff --git a/c8/stm/sync.h b/c8/stm/sync.h
--- a/c8/stm/sync.h
+++ b/c8/stm/sync.h
@@ -17,6 +17,7 @@
 static bool _has_mutex(void);
 #endif
 static void set_gs_register(char *value);
+static void ensure_gs_register(long segnum);
 
 
 /* acquire and release one of the segments for running the given thread
diff --git a/c8/stmgc.c b/c8/stmgc.c
--- a/c8/stmgc.c
+++ b/c8/stmgc.c
@@ -18,6 +18,7 @@
 #include "stm/rewind_setjmp.h"
 #include "stm/finalizer.h"
 #include "stm/locks.h"
+#include "stm/detach.h"
 
 #include "stm/misc.c"
 #include "stm/list.c"
@@ -41,3 +42,4 @@
 #include "stm/rewind_setjmp.c"
 #include "stm/finalizer.c"
 #include "stm/hashtable.c"
+#include "stm/detach.c"
diff --git a/c8/stmgc.h b/c8/stmgc.h
--- a/c8/stmgc.h
+++ b/c8/stmgc.h
@@ -13,6 +13,7 @@
 #include <limits.h>
 #include <unistd.h>
 
+#include "stm/atomic.h"
 #include "stm/rewind_setjmp.h"
 
 #if LONG_MAX == 2147483647
@@ -39,9 +40,11 @@
 
 struct stm_segment_info_s {
     uint8_t transaction_read_version;
+    uint8_t no_safe_point_here;    /* set from outside, triggers an assert */
     int segment_num;
     char *segment_base;
     stm_char *nursery_current;
+    stm_char *nursery_mark;
     uintptr_t nursery_end;
     struct stm_thread_local_s *running_thread;
 };
@@ -65,13 +68,10 @@
        the following raw region of memory is cleared. */
     char *mem_clear_on_abort;
     size_t mem_bytes_to_clear_on_abort;
-    /* after an abort, some details about the abort are stored there.
-       (this field is not modified on a successful commit) */
-    long last_abort__bytes_in_nursery;
     /* the next fields are handled internally by the library */
     int last_associated_segment_num;   /* always a valid seg num */
     int thread_local_counter;
-    struct stm_thread_local_s *prev, *next;
+    struct stm_thread_local_s *self, *prev, *next;
     void *creating_pthread[2];
 } stm_thread_local_t;
 
@@ -82,6 +82,17 @@
 void _stm_write_slowpath_card(object_t *, uintptr_t);
 object_t *_stm_allocate_slowpath(ssize_t);
 object_t *_stm_allocate_external(ssize_t);
+
+extern volatile intptr_t _stm_detached_inevitable_from_thread;
+long _stm_start_transaction(stm_thread_local_t *tl);
+void _stm_commit_transaction(void);
+void _stm_leave_noninevitable_transactional_zone(void);
+#define _stm_detach_inevitable_transaction(tl)  do {                    \
+    write_fence();                                                      \
+    assert(_stm_detached_inevitable_from_thread == 0);                  \
+    _stm_detached_inevitable_from_thread = (intptr_t)(tl->self);        \
+} while (0)
+void _stm_reattach_transaction(stm_thread_local_t *tl);
 void _stm_become_inevitable(const char*);
 void _stm_collectable_safe_point(void);
 
@@ -379,39 +390,92 @@
     rewind_jmp_enum_shadowstack(&(tl)->rjthread, callback)
 
 
-/* Starting and ending transactions.  stm_read(), stm_write() and
-   stm_allocate() should only be called from within a transaction.
-   The stm_start_transaction() call returns the number of times it
-   returned, starting at 0.  If it is > 0, then the transaction was
-   aborted and restarted this number of times. */
-long stm_start_transaction(stm_thread_local_t *tl);
-void stm_start_inevitable_transaction(stm_thread_local_t *tl);
-void stm_commit_transaction(void);
+#ifdef STM_NO_AUTOMATIC_SETJMP
+int stm_is_inevitable(stm_thread_local_t *tl);
+#else
+static inline int stm_is_inevitable(stm_thread_local_t *tl) {
+    return !rewind_jmp_armed(&tl->rjthread);
+}
+#endif
 
-/* Temporary fix?  Call this outside a transaction.  If there is an
-   inevitable transaction running somewhere else, wait until it finishes. */
-void stm_wait_for_current_inevitable_transaction(void);
+
+/* Entering and leaving a "transactional code zone": a (typically very
+   large) section in the code where we are running a transaction.
+   This is the STM equivalent to "acquire the GIL" and "release the
+   GIL", respectively.  stm_read(), stm_write(), stm_allocate(), and
+   other functions should only be called from within a transaction.
+
+   Note that transactions, in the STM sense, cover _at least_ one
+   transactional code zone.  They may be longer; for example, if one
+   thread does a lot of stm_enter_transactional_zone() +
+   stm_become_inevitable() + stm_leave_transactional_zone(), as is
+   typical in a thread that does a lot of C function calls, then we
+   get only a few bigger inevitable transactions that cover the many
+   short transactional zones.  This is done by having
+   stm_leave_transactional_zone() turn the current transaction
+   inevitable and detach it from the running thread (if there is no
+   other inevitable transaction running so far).  Then
+   stm_enter_transactional_zone() will try to reattach to it.  This is
+   far more efficient than constantly starting and committing
+   transactions.
+
+   stm_enter_transactional_zone() and stm_leave_transactional_zone()
+   preserve the value of errno.
+*/
+#ifdef STM_DEBUGPRINT
+#include <stdio.h>
+#endif
+static inline void stm_enter_transactional_zone(stm_thread_local_t *tl) {
+    if (__sync_bool_compare_and_swap(&_stm_detached_inevitable_from_thread,
+                                     (intptr_t)tl, 0)) {
+#ifdef STM_DEBUGPRINT
+        fprintf(stderr, "stm_enter_transactional_zone fast path\n");
+#endif
+    }
+    else {
+        _stm_reattach_transaction(tl);
+        /* _stm_detached_inevitable_from_thread should be 0 here, but
+           it can already have been changed from a parallel thread
+           (assuming we're not inevitable ourselves) */
+    }
+}
+static inline void stm_leave_transactional_zone(stm_thread_local_t *tl) {
+    assert(STM_SEGMENT->running_thread == tl);
+    if (stm_is_inevitable(tl)) {
+#ifdef STM_DEBUGPRINT
+        fprintf(stderr, "stm_leave_transactional_zone fast path\n");
+#endif
+        _stm_detach_inevitable_transaction(tl);
+    }
+    else {
+        _stm_leave_noninevitable_transactional_zone();
+    }
+}
+
+/* stm_force_transaction_break() is in theory equivalent to
+   stm_leave_transactional_zone() immediately followed by
+   stm_enter_transactional_zone(); however, it is supposed to be
+   called in CPU-heavy threads that had a transaction run for a while,
+   and so it *always* forces a commit and starts the next transaction.
+   The new transaction is never inevitable.  See also
+   stm_should_break_transaction(). */
+void stm_force_transaction_break(stm_thread_local_t *tl);
 
 /* Abort the currently running transaction.  This function never
-   returns: it jumps back to the stm_start_transaction(). */
+   returns: it jumps back to the start of the transaction (which must
+   not be inevitable). */
 void stm_abort_transaction(void) __attribute__((noreturn));
 
-#ifdef STM_NO_AUTOMATIC_SETJMP
-int stm_is_inevitable(void);
-#else
-static inline int stm_is_inevitable(void) {
-    return !rewind_jmp_armed(&STM_SEGMENT->running_thread->rjthread);
-}
-#endif
-
 /* Turn the current transaction inevitable.
    stm_become_inevitable() itself may still abort the transaction instead
    of returning. */
 static inline void stm_become_inevitable(stm_thread_local_t *tl,
                                          const char* msg) {
     assert(STM_SEGMENT->running_thread == tl);
-    if (!stm_is_inevitable())
+    if (!stm_is_inevitable(tl))
         _stm_become_inevitable(msg);
+    /* now, we're running the inevitable transaction, so this var should be 0 */
+    assert(_stm_detached_inevitable_from_thread == 0);
 }
 
 /* Forces a safe-point if needed.  Normally not needed: this is
@@ -425,6 +489,23 @@
 void stm_collect(long level);
 
 
+/* A way to detect that we've run for a while and should call
+   stm_force_transaction_break() */
+static inline int stm_should_break_transaction(void)
+{
+    return ((intptr_t)STM_SEGMENT->nursery_current >=
+            (intptr_t)STM_SEGMENT->nursery_mark);
+}
+extern uintptr_t stm_fill_mark_nursery_bytes;
+/* ^^^ at the start of a transaction, 'nursery_mark' is initialized to
+   'stm_fill_mark_nursery_bytes' inside the nursery.  This value can
+   be larger than the nursery; every minor collection shifts the
+   current 'nursery_mark' down by one nursery-size.  After an abort
+   and restart, 'nursery_mark' is set to ~90% of the value it reached
+   in the last attempt.
+*/
+
+
 /* Prepare an immortal "prebuilt" object managed by the GC.  Takes a
    pointer to an 'object_t', which should not actually be a GC-managed
    structure but a real static structure.  Returns the equivalent
@@ -466,8 +547,8 @@
    other threads.  A very heavy-handed way to make sure that no other
    transaction is running concurrently.  Avoid as much as possible.
    Other transactions will continue running only after this transaction
-   commits.  (xxx deprecated and may be removed) */
-void stm_become_globally_unique_transaction(stm_thread_local_t *tl, const char *msg);
+   commits.  (deprecated, not working any more according to demo_random2) */
+//void stm_become_globally_unique_transaction(stm_thread_local_t *tl, const char *msg);
 
 /* Moves the transaction forward in time by validating the read and
    write set with all commits that happened since the last validation
diff --git a/c8/test/support.py b/c8/test/support.py