[Python-Dev] Re: threading and forking and 2.0 (patch #101226)

Guido van Rossum guido@beopen.com
Fri, 25 Aug 2000 00:58:15 -0500


Here's a patch that Tim & I believe should solve the thread+fork
problem properly.  I'll try to explain it briefly.

I'm not checking this in yet because I need more eyeballs, and because
I don't actually have a test to prove that I've fixed the problem.
However, our theory is very hopeful.

(1) BACKGROUND: A Python lock may be released by a different thread
than who aqcuired it, and it may be acquired by the same thread
multiple times.  A pthread mutex must always be unlocked by the same
thread that locked it, and can't be locked more than once.  So, a
Python lock can't be built out of a simple pthread mutex; instead, a
Python lock is built out of a "locked" flag and a <condition variable,
mutex> pair.  The mutex is locked for at most a few cycles, to protect
the flag.  This design is Tim's (while still at KSR).

(2) PROBLEM: If you fork while another thread holds a mutex, that
mutex will never be released, because only the forking thread survives
in the child.  The LinuxThread manual recommends to use
pthread_atfork() to acquire all locks in locking order before the
fork, and release them afterwards.  A problem with Tim's design here
is that even if the forking thread has Python's global interpreter
lock, another thread trying to acquire the lock may still hold the
mutex at the time of the fork, causing it to be held forever in the
child.  Charles has posted an effective hack that allocates a new
global interpreter lock in the child, but this doesn't solve the
problem for other locks.

(3) BRAINWAVE: If we use a single mutex shared by all locks, instead
of a mutex per lock, we can lock this mutex around the fork and thus
prevent any other thread from locking it.  This is okay because, while
a condition variable always needs a mutex to go with it, there's no
rule that the same mutex can't be shared by many condition variables.
The code below implements this.

(4) MORE WORK: (a) The PyThread API also defines semaphores, which may
have a similar problem.  But I'm not aware of any use of these (I'm
not quite sure why semaphore support was added), so I haven't patched
these.  (b) The thread_pth.h file define locks in the same way; there
may be others too.  I haven't touched these.

(5) TESTING: Charles Waldman posted this code to reproduce the
problem.  Unfortunately I haven't had much success with it; it seems
to hang even when I apply Charles' patch.

    import thread
    import os, sys
    import time

    def doit(name):
	while 1:
	    if os.fork()==0:
		print name, 'forked', os.getpid()
		os._exit(0)
	    r = os.wait()

    for x in range(50):
	name = 't%s'%x
	print 'starting', name
	thread.start_new_thread(doit, (name,))

    time.sleep(300)

Here's the patch:

*** Python/thread_pthread.h	2000/08/23 21:33:05	2.29
--- Python/thread_pthread.h	2000/08/25 04:29:43
***************
*** 84,101 ****
   * and a <condition, mutex> pair.  In general, if the bit can be acquired
   * instantly, it is, else the pair is used to block the thread until the
   * bit is cleared.     9 May 1994 tim@ksr.com
   */
  
  typedef struct {
  	char             locked; /* 0=unlocked, 1=locked */
  	/* a <cond, mutex> pair to handle an acquire of a locked lock */
  	pthread_cond_t   lock_released;
- 	pthread_mutex_t  mut;
  } pthread_lock;
  
  #define CHECK_STATUS(name)  if (status != 0) { perror(name); error = 1; }
  
  /*
   * Initialization.
   */
  
--- 84,125 ----
   * and a <condition, mutex> pair.  In general, if the bit can be acquired
   * instantly, it is, else the pair is used to block the thread until the
   * bit is cleared.     9 May 1994 tim@ksr.com
+  *
+  * MODIFICATION: use a single mutex shared by all locks.
+  * This should make it easier to cope with fork() while threads exist.
+  * 24 Aug 2000 {guido,tpeters}@beopen.com
   */
  
  typedef struct {
  	char             locked; /* 0=unlocked, 1=locked */
  	/* a <cond, mutex> pair to handle an acquire of a locked lock */
  	pthread_cond_t   lock_released;
  } pthread_lock;
  
+ static pthread_mutex_t locking_mutex = PTHREAD_MUTEX_INITIALIZER;
+ 
  #define CHECK_STATUS(name)  if (status != 0) { perror(name); error = 1; }
  
  /*
+  * Callbacks for pthread_atfork().
+  */
+ 
+ static void prefork_callback()
+ {
+ 	pthread_mutex_lock(&locking_mutex);
+ }
+ 
+ static void parent_callback()
+ {
+ 	pthread_mutex_unlock(&locking_mutex);
+ }
+ 
+ static void child_callback()
+ {
+ 	pthread_mutex_unlock(&locking_mutex);
+ }
+ 
+ /*
   * Initialization.
   */
  
***************
*** 113,118 ****
--- 137,144 ----
  	pthread_t thread1;
  	pthread_create(&thread1, NULL, (void *) _noop, &dummy);
  	pthread_join(thread1, NULL);
+ 	/* XXX Is the following supported here? */
+ 	pthread_atfork(&prefork_callback, &parent_callback, &child_callback);
  }
  
  #else /* !_HAVE_BSDI */
***************
*** 123,128 ****
--- 149,156 ----
  #if defined(_AIX) && defined(__GNUC__)
  	pthread_init();
  #endif
+ 	/* XXX Is the following supported everywhere? */
+ 	pthread_atfork(&prefork_callback, &parent_callback, &child_callback);
  }
  
  #endif /* !_HAVE_BSDI */
***************
*** 260,269 ****
  	if (lock) {
  		lock->locked = 0;
  
- 		status = pthread_mutex_init(&lock->mut,
- 					    pthread_mutexattr_default);
- 		CHECK_STATUS("pthread_mutex_init");
- 
  		status = pthread_cond_init(&lock->lock_released,
  					   pthread_condattr_default);
  		CHECK_STATUS("pthread_cond_init");
--- 288,293 ----
***************
*** 286,294 ****
  
  	dprintf(("PyThread_free_lock(%p) called\n", lock));
  
- 	status = pthread_mutex_destroy( &thelock->mut );
- 	CHECK_STATUS("pthread_mutex_destroy");
- 
  	status = pthread_cond_destroy( &thelock->lock_released );
  	CHECK_STATUS("pthread_cond_destroy");
  
--- 310,315 ----
***************
*** 304,314 ****
  
  	dprintf(("PyThread_acquire_lock(%p, %d) called\n", lock, waitflag));
  
! 	status = pthread_mutex_lock( &thelock->mut );
  	CHECK_STATUS("pthread_mutex_lock[1]");
  	success = thelock->locked == 0;
  	if (success) thelock->locked = 1;
! 	status = pthread_mutex_unlock( &thelock->mut );
  	CHECK_STATUS("pthread_mutex_unlock[1]");
  
  	if ( !success && waitflag ) {
--- 325,335 ----
  
  	dprintf(("PyThread_acquire_lock(%p, %d) called\n", lock, waitflag));
  
! 	status = pthread_mutex_lock( &locking_mutex );
  	CHECK_STATUS("pthread_mutex_lock[1]");
  	success = thelock->locked == 0;
  	if (success) thelock->locked = 1;
! 	status = pthread_mutex_unlock( &locking_mutex );
  	CHECK_STATUS("pthread_mutex_unlock[1]");
  
  	if ( !success && waitflag ) {
***************
*** 316,330 ****
  
  		/* mut must be locked by me -- part of the condition
  		 * protocol */
! 		status = pthread_mutex_lock( &thelock->mut );
  		CHECK_STATUS("pthread_mutex_lock[2]");
  		while ( thelock->locked ) {
  			status = pthread_cond_wait(&thelock->lock_released,
! 						   &thelock->mut);
  			CHECK_STATUS("pthread_cond_wait");
  		}
  		thelock->locked = 1;
! 		status = pthread_mutex_unlock( &thelock->mut );
  		CHECK_STATUS("pthread_mutex_unlock[2]");
  		success = 1;
  	}
--- 337,351 ----
  
  		/* mut must be locked by me -- part of the condition
  		 * protocol */
! 		status = pthread_mutex_lock( &locking_mutex );
  		CHECK_STATUS("pthread_mutex_lock[2]");
  		while ( thelock->locked ) {
  			status = pthread_cond_wait(&thelock->lock_released,
! 						   &locking_mutex);
  			CHECK_STATUS("pthread_cond_wait");
  		}
  		thelock->locked = 1;
! 		status = pthread_mutex_unlock( &locking_mutex );
  		CHECK_STATUS("pthread_mutex_unlock[2]");
  		success = 1;
  	}
***************
*** 341,352 ****
  
  	dprintf(("PyThread_release_lock(%p) called\n", lock));
  
! 	status = pthread_mutex_lock( &thelock->mut );
  	CHECK_STATUS("pthread_mutex_lock[3]");
  
  	thelock->locked = 0;
  
! 	status = pthread_mutex_unlock( &thelock->mut );
  	CHECK_STATUS("pthread_mutex_unlock[3]");
  
  	/* wake up someone (anyone, if any) waiting on the lock */
--- 362,373 ----
  
  	dprintf(("PyThread_release_lock(%p) called\n", lock));
  
! 	status = pthread_mutex_lock( &locking_mutex );
  	CHECK_STATUS("pthread_mutex_lock[3]");
  
  	thelock->locked = 0;
  
! 	status = pthread_mutex_unlock( &locking_mutex );
  	CHECK_STATUS("pthread_mutex_unlock[3]");
  
  	/* wake up someone (anyone, if any) waiting on the lock */

--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)