[Python-Dev] Re: threading and forking and 2.0 (patch #101226) (original) (raw)

Guido van Rossum guido@beopen.com
Fri, 25 Aug 2000 00:58:15 -0500


Here's a patch that Tim & I believe should solve the thread+fork problem properly. I'll try to explain it briefly.

I'm not checking this in yet because I need more eyeballs, and because I don't actually have a test to prove that I've fixed the problem. However, our theory is very hopeful.

(1) BACKGROUND: A Python lock may be released by a different thread than who aqcuired it, and it may be acquired by the same thread multiple times. A pthread mutex must always be unlocked by the same thread that locked it, and can't be locked more than once. So, a Python lock can't be built out of a simple pthread mutex; instead, a Python lock is built out of a "locked" flag and a <condition variable, mutex> pair. The mutex is locked for at most a few cycles, to protect the flag. This design is Tim's (while still at KSR).

(2) PROBLEM: If you fork while another thread holds a mutex, that mutex will never be released, because only the forking thread survives in the child. The LinuxThread manual recommends to use pthread_atfork() to acquire all locks in locking order before the fork, and release them afterwards. A problem with Tim's design here is that even if the forking thread has Python's global interpreter lock, another thread trying to acquire the lock may still hold the mutex at the time of the fork, causing it to be held forever in the child. Charles has posted an effective hack that allocates a new global interpreter lock in the child, but this doesn't solve the problem for other locks.

(3) BRAINWAVE: If we use a single mutex shared by all locks, instead of a mutex per lock, we can lock this mutex around the fork and thus prevent any other thread from locking it. This is okay because, while a condition variable always needs a mutex to go with it, there's no rule that the same mutex can't be shared by many condition variables. The code below implements this.

(4) MORE WORK: (a) The PyThread API also defines semaphores, which may have a similar problem. But I'm not aware of any use of these (I'm not quite sure why semaphore support was added), so I haven't patched these. (b) The thread_pth.h file define locks in the same way; there may be others too. I haven't touched these.

(5) TESTING: Charles Waldman posted this code to reproduce the problem. Unfortunately I haven't had much success with it; it seems to hang even when I apply Charles' patch.

import thread
import os, sys
import time

def doit(name):
while 1:
    if os.fork()==0:
    print name, 'forked', os.getpid()
    os._exit(0)
    r = os.wait()

for x in range(50):
name = 't%s'%x
print 'starting', name
thread.start_new_thread(doit, (name,))

time.sleep(300)

Here's the patch:

*** Python/thread_pthread.h 2000/08/23 21:33:05 2.29 --- Python/thread_pthread.h 2000/08/25 04:29:43 *************** *** 84,101 **** * and a <condition, mutex> pair. In general, if the bit can be acquired * instantly, it is, else the pair is used to block the thread until the * bit is cleared. 9 May 1994 tim@ksr.com /
typedef struct { char locked; /
0=unlocked, 1=locked / / a <cond, mutex> pair to handle an acquire of a locked lock / pthread_cond_t lock_released; - pthread_mutex_t mut; } pthread_lock;
#define CHECK_STATUS(name) if (status != 0) { perror(name); error = 1; }
/
* Initialization. */
--- 84,125 ---- * and a <condition, mutex> pair. In general, if the bit can be acquired * instantly, it is, else the pair is used to block the thread until the * bit is cleared. 9 May 1994 tim@ksr.com + * + * MODIFICATION: use a single mutex shared by all locks. + * This should make it easier to cope with fork() while threads exist. + * 24 Aug 2000 {guido,tpeters}@beopen.com /
typedef struct { char locked; /
0=unlocked, 1=locked / / a <cond, mutex> pair to handle an acquire of a locked lock / pthread_cond_t lock_released; } pthread_lock;
+ static pthread_mutex_t locking_mutex = PTHREAD_MUTEX_INITIALIZER; + #define CHECK_STATUS(name) if (status != 0) { perror(name); error = 1; }
/
+ * Callbacks for pthread_atfork(). + / + + static void prefork_callback() + { + pthread_mutex_lock(&locking_mutex); + } + + static void parent_callback() + { + pthread_mutex_unlock(&locking_mutex); + } + + static void child_callback() + { + pthread_mutex_unlock(&locking_mutex); + } + + / * Initialization. */
*************** *** 113,118 **** --- 137,144 ---- pthread_t thread1; pthread_create(&thread1, NULL, (void ) _noop, &dummy); pthread_join(thread1, NULL); + / XXX Is the following supported here? / + pthread_atfork(&prefork_callback, &parent_callback, &child_callback); }
#else /
!_HAVE_BSDI / *************** *** 123,128 **** --- 149,156 ---- #if defined(_AIX) && defined(GNUC) pthread_init(); #endif + / XXX Is the following supported everywhere? / + pthread_atfork(&prefork_callback, &parent_callback, &child_callback); }
#endif /
!_HAVE_BSDI */ *************** *** 260,269 **** if (lock) { lock->locked = 0;
- status = pthread_mutex_init(&lock->mut, - pthread_mutexattr_default); - CHECK_STATUS("pthread_mutex_init");

      status = pthread_cond_init(&lock->lock_released,
                     pthread_condattr_default);
      CHECK_STATUS("pthread_cond_init");

--- 288,293 ---- *************** *** 286,294 ****
dprintf(("PyThread_free_lock(%p) called\n", lock));
- status = pthread_mutex_destroy( &thelock->mut ); - CHECK_STATUS("pthread_mutex_destroy");

  status = pthread_cond_destroy( &thelock->lock_released );
  CHECK_STATUS("pthread_cond_destroy");

--- 310,315 ----


*** 304,314 ****

  dprintf(("PyThread_acquire_lock(%p, %d) called\n", lock, waitflag));

! status = pthread_mutex_lock( &thelock->mut ); CHECK_STATUS("pthread_mutex_lock[1]"); success = thelock->locked == 0; if (success) thelock->locked = 1; ! status = pthread_mutex_unlock( &thelock->mut ); CHECK_STATUS("pthread_mutex_unlock[1]");

  if ( !success && waitflag ) {

--- 325,335 ----

  dprintf(("PyThread_acquire_lock(%p, %d) called\n", lock, waitflag));

! status = pthread_mutex_lock( &locking_mutex ); CHECK_STATUS("pthread_mutex_lock[1]"); success = thelock->locked == 0; if (success) thelock->locked = 1; ! status = pthread_mutex_unlock( &locking_mutex ); CHECK_STATUS("pthread_mutex_unlock[1]");

  if ( !success && waitflag ) {

*** 316,330 ****

      /* mut must be locked by me -- part of the condition
       * protocol */

! status = pthread_mutex_lock( &thelock->mut ); CHECK_STATUS("pthread_mutex_lock[2]"); while ( thelock->locked ) { status = pthread_cond_wait(&thelock->lock_released, ! &thelock->mut); CHECK_STATUS("pthread_cond_wait"); } thelock->locked = 1; ! status = pthread_mutex_unlock( &thelock->mut ); CHECK_STATUS("pthread_mutex_unlock[2]"); success = 1; } --- 337,351 ----

      /* mut must be locked by me -- part of the condition
       * protocol */

! status = pthread_mutex_lock( &locking_mutex ); CHECK_STATUS("pthread_mutex_lock[2]"); while ( thelock->locked ) { status = pthread_cond_wait(&thelock->lock_released, ! &locking_mutex); CHECK_STATUS("pthread_cond_wait"); } thelock->locked = 1; ! status = pthread_mutex_unlock( &locking_mutex ); CHECK_STATUS("pthread_mutex_unlock[2]"); success = 1; }


*** 341,352 ****

  dprintf(("PyThread_release_lock(%p) called\n", lock));

! status = pthread_mutex_lock( &thelock->mut ); CHECK_STATUS("pthread_mutex_lock[3]");

  thelock->locked = 0;

! status = pthread_mutex_unlock( &thelock->mut ); CHECK_STATUS("pthread_mutex_unlock[3]");

  /* wake up someone (anyone, if any) waiting on the lock */

--- 362,373 ----

  dprintf(("PyThread_release_lock(%p) called\n", lock));

! status = pthread_mutex_lock( &locking_mutex ); CHECK_STATUS("pthread_mutex_lock[3]");

  thelock->locked = 0;

! status = pthread_mutex_unlock( &locking_mutex ); CHECK_STATUS("pthread_mutex_unlock[3]");

  /* wake up someone (anyone, if any) waiting on the lock */

--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)