This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: PyThreadState_Delete: invalid tstate
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, tim.peters
Priority: normal Keywords:

Created on 2000-12-13 16:25 by gvanrossum, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Messages (6)
msg2651 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2000-12-13 16:25
I am working on a simple couroutine/generator package using threads,
to prototype the API.  It seems to be working fine, except it is
exposing a hard-to-find bug in the threadstate code.  The following
script[*] contains the API implementation and a simple example based on
Tim's "fringe()" code.  When I run the example, I *sometimes* get:

    Segmentation fault

but *sometimes* I get:

    Fatal Python error: PyThreadState_Delete: invalid tstate
    Aborted

and *sometimes* it succeeds.  If I uncomment the raw_input("Exit?")
line at the end I never get an error.  The error behavior seems very
fickle: making almost arbitrary changes to the code can trigger it or
make it go away.  When I run it under gdb, I cannot reproduce the
problen, ever.  (Haven't I heard this before?)

The only clue is the fatal error message: it seems to be a race
condition at thread termination.  But how to debug this?

_____
[*] I'm not including the script here.
I can mail it to interested parties though.  For my own reference:
Subject: [Pycabal] Mysterious thread bug
To: <cabal>
Date: Thu, 16 Nov 2000 16:21:12 -0500
msg2652 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2000-12-13 16:51
I was never able to provoke a problem on Windows using Guido's script, so changed Group to Platform-specific and added "(Linux only?)" to Summary.  Here's the script; assigned to Greg under the hope he can provoke a problem:

import thread

class EarlyExit(Exception):
    pass

class main_coroutine:

    def __init__(self):
        self.id = 0
        self.caller = None
        self.value = None
        self.lock = thread.allocate_lock()
        self.lock.acquire()
        self.done = 0

    def __call__(self, value=None):
        cur = current()
        assert cur is not self
        self.caller = cur
        self.value = value
        self.lock.release()
        cur.lock.acquire()
        if self.done:
            raise EarlyExit
        return cur.value

all_coroutines = {thread.get_ident(): main_coroutine()}

def current():
    return all_coroutines[thread.get_ident()]

def suspend(value=None):
    cur = current()
    caller = cur.caller
    assert caller and caller is not cur
    caller.value = value
    caller.lock.release()
    cur.lock.acquire()
    return cur.value

nextid = 1

class coroutine(main_coroutine):

    def __init__(self, func, *args):
        global nextid
        self.id = nextid
        nextid = nextid + 1
        self.caller = current()
        boot = thread.allocate_lock()
        boot.acquire()
        thread.start_new_thread(self.run, (boot, func, args))
        boot.acquire()

    def run(self, boot, func, args):
        me = thread.get_ident()
        all_coroutines[me] = self
        self.lock = thread.allocate_lock()
        self.lock.acquire()
        self.done = 0
        boot.release()
        self.lock.acquire()
        if self.value:
            print "Warning: initial value %s ignored" % `value`
        try:
            apply(func, args)
        finally:
            del all_coroutines[me]
            self.done = 1
            self.caller.lock.release()

def fringe(list):
    tl = type(list)
    for item in list:
        if type(item) is tl:
            fringe(item)
        else:
            suspend(item)

def printinorder(list):
    c = coroutine(fringe, list)
    try:
        while 1:
            print c(),
    except EarlyExit:
        pass
    print

if __name__ == '__main__':
    printinorder([1,2,3])
    l = [1,2,[3,4,[5],6]]
    printinorder(l)
    #raw_input("Exit?")
msg2653 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-01-21 07:20
Guido, since we just fixed *a* thread termination problem (premature clearing of "initialized" in pythonrun.c), and I've never seen this fail on Windows, reassigning to you to see whether-- however unlikely it may seem --this problem has gone away by magic now.
msg2654 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-01-21 09:26
Let's pretend:

threadmodule.c/t_bootstrap finishes running the user's spawned-thread code, PyEval_ReleaseThread(tstate) releases the global lock, then PyThreadState_Delete(tstate) is called to unlink tstate from the tstate->interp->tstate_head chain.

But the thread swaps out at that point, and the main thread resumes executing.  It's got nothing left to do, so it gets into pythonrun.c/Py_Finalize() quickly, and soon enough calls PyInterpreterState_Delete(interp) from there.  That calls zapthreads().  zapthreads calls PyThreadState_Delete(ts) on *every* threadstate ts in the interp->tstate_head chain.

If you're with me so far, the other thread still hasn't called PyThreadState_Delete on *its* threadstate, and the comment in zapthreads is semi-prophetic <wink>:

/* No need to lock the mutex here because this should
   only happen when the threads are all really dead
   (XXX famous last words). */

But the problem is not that the mutex isn't locked, it's that the other thread is still going to try deleting its tstate again *later* (the precise cause of an "invalid tstate" error msg):  other threads aren't really dead, and AFAICT there's actually no reason to believe they *should* be dead at this point (other than luck).

Anyway, if this is right, we have two threads battling over who's going to delete a single tstate, and if the main thread gets in first, the other thread is certain to raise an error (except that, at least on Windows, if the main thread manages to exit before the other thread gets that far, the other thread will be killed off quietly in mid-stream by the OS; since Linux threads seem to be indistinguishable from Linux processes, I bet they run some pthreads emulation layer in user space that *may* take a fair amount of time to kill off child threads when the parent goes away).

Waddya think?  Explains everything and solves nothing <wink>.
msg2655 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-01-23 00:39
A likely story indeed!  With the current CVS, the test script fails every time.  If I comment out the call to PyThreadState_Delete() from t_bootstrap, it runs fine every time.

Suggestion: change PyThreadState_Delete() so that it can be called with the interpreter lock *held*, and in that case it should "atomically" delete the tstate object and release the lock.

I'll see if I can come up with a patch.
msg2656 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-01-23 01:48
Fixed by introducing a new API: PyThreadState_DeleteCurrent().

AFAICT this is not Unix-only, but whether the behavior triggers depends on details of the thread implementation, so I cleared the platform-specific group and removed "(unix only?)" from subject.
History
Date User Action Args
2022-04-10 16:03:33adminsetgithub: 33586
2000-12-13 16:25:14gvanrossumcreate