[Python-bugs-list] [Bug #125673] PyThreadState_Delete: invalid tstate (Unix only?)

noreply@sourceforge.net noreply@sourceforge.net
Mon, 22 Jan 2001 16:39:06 -0800


Bug #125673, was updated on 2000-Dec-13 08:25
Here is a current snapshot of the bug.

Project: Python
Category: Threads
Status: Open
Resolution: None
Bug Group: Platform-specific
Priority: 5
Submitted by: gvanrossum
Assigned to : gvanrossum
Summary: PyThreadState_Delete: invalid tstate (Unix only?)

Details: I am working on a simple couroutine/generator package using
threads,
to prototype the API.  It seems to be working fine, except it is
exposing a hard-to-find bug in the threadstate code.  The following
script[*] contains the API implementation and a simple example based on
Tim's "fringe()" code.  When I run the example, I *sometimes* get:

    Segmentation fault

but *sometimes* I get:

    Fatal Python error: PyThreadState_Delete: invalid tstate
    Aborted

and *sometimes* it succeeds.  If I uncomment the raw_input("Exit?")
line at the end I never get an error.  The error behavior seems very
fickle: making almost arbitrary changes to the code can trigger it or
make it go away.  When I run it under gdb, I cannot reproduce the
problen, ever.  (Haven't I heard this before?)

The only clue is the fatal error message: it seems to be a race
condition at thread termination.  But how to debug this?

_____
[*] I'm not including the script here.
I can mail it to interested parties though.  For my own reference:
Subject: [Pycabal] Mysterious thread bug
To: <cabal>
Date: Thu, 16 Nov 2000 16:21:12 -0500


Follow-Ups:

Date: 2001-Jan-22 16:39
By: gvanrossum

Comment:
A likely story indeed!  With the current CVS, the test script fails every
time.  If I comment out the call to PyThreadState_Delete() from
t_bootstrap, it runs fine every time.

Suggestion: change PyThreadState_Delete() so that it can be called with the
interpreter lock *held*, and in that case it should "atomically" delete the
tstate object and release the lock.

I'll see if I can come up with a patch.
-------------------------------------------------------

Date: 2001-Jan-21 01:26
By: tim_one

Comment:
Let's pretend:

threadmodule.c/t_bootstrap finishes running the user's spawned-thread code,
PyEval_ReleaseThread(tstate) releases the global lock, then
PyThreadState_Delete(tstate) is called to unlink tstate from the
tstate->interp->tstate_head chain.

But the thread swaps out at that point, and the main thread resumes
executing.  It's got nothing left to do, so it gets into
pythonrun.c/Py_Finalize() quickly, and soon enough calls
PyInterpreterState_Delete(interp) from there.  That calls zapthreads(). 
zapthreads calls PyThreadState_Delete(ts) on *every* threadstate ts in the
interp->tstate_head chain.

If you're with me so far, the other thread still hasn't called
PyThreadState_Delete on *its* threadstate, and the comment in zapthreads is
semi-prophetic <wink>:

/* No need to lock the mutex here because this should
   only happen when the threads are all really dead
   (XXX famous last words). */

But the problem is not that the mutex isn't locked, it's that the other
thread is still going to try deleting its tstate again *later* (the precise
cause of an "invalid tstate" error msg):  other threads aren't really dead,
and AFAICT there's actually no reason to believe they *should* be dead at
this point (other than luck).

Anyway, if this is right, we have two threads battling over who's going to
delete a single tstate, and if the main thread gets in first, the other
thread is certain to raise an error (except that, at least on Windows, if
the main thread manages to exit before the other thread gets that far, the
other thread will be killed off quietly in mid-stream by the OS; since
Linux threads seem to be indistinguishable from Linux processes, I bet they
run some pthreads emulation layer in user space that *may* take a fair
amount of time to kill off child threads when the parent goes away).

Waddya think?  Explains everything and solves nothing <wink>.

-------------------------------------------------------

Date: 2001-Jan-20 23:20
By: tim_one

Comment:
Guido, since we just fixed *a* thread termination problem (premature
clearing of "initialized" in pythonrun.c), and I've never seen this fail on
Windows, reassigning to you to see whether-- however unlikely it may seem
--this problem has gone away by magic now.

-------------------------------------------------------

Date: 2000-Dec-13 08:51
By: tim_one

Comment:
I was never able to provoke a problem on Windows using Guido's script, so
changed Group to Platform-specific and added "(Linux only?)" to Summary. 
Here's the script; assigned to Greg under the hope he can provoke a
problem:

import thread

class EarlyExit(Exception):
    pass

class main_coroutine:

    def __init__(self):
        self.id = 0
        self.caller = None
        self.value = None
        self.lock = thread.allocate_lock()
        self.lock.acquire()
        self.done = 0

    def __call__(self, value=None):
        cur = current()
        assert cur is not self
        self.caller = cur
        self.value = value
        self.lock.release()
        cur.lock.acquire()
        if self.done:
            raise EarlyExit
        return cur.value

all_coroutines = {thread.get_ident(): main_coroutine()}

def current():
    return all_coroutines[thread.get_ident()]

def suspend(value=None):
    cur = current()
    caller = cur.caller
    assert caller and caller is not cur
    caller.value = value
    caller.lock.release()
    cur.lock.acquire()
    return cur.value

nextid = 1

class coroutine(main_coroutine):

    def __init__(self, func, *args):
        global nextid
        self.id = nextid
        nextid = nextid + 1
        self.caller = current()
        boot = thread.allocate_lock()
        boot.acquire()
        thread.start_new_thread(self.run, (boot, func, args))
        boot.acquire()

    def run(self, boot, func, args):
        me = thread.get_ident()
        all_coroutines[me] = self
        self.lock = thread.allocate_lock()
        self.lock.acquire()
        self.done = 0
        boot.release()
        self.lock.acquire()
        if self.value:
            print "Warning: initial value %s ignored" % `value`
        try:
            apply(func, args)
        finally:
            del all_coroutines[me]
            self.done = 1
            self.caller.lock.release()

def fringe(list):
    tl = type(list)
    for item in list:
        if type(item) is tl:
            fringe(item)
        else:
            suspend(item)

def printinorder(list):
    c = coroutine(fringe, list)
    try:
        while 1:
            print c(),
    except EarlyExit:
        pass
    print

if __name__ == '__main__':
    printinorder([1,2,3])
    l = [1,2,[3,4,[5],6]]
    printinorder(l)
    #raw_input("Exit?")

-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=125673&group_id=5470