[Patches] [ python-Patches-1531859 ] Tracing and profiling functions can cause hangs in threads

Tue Aug 1 17:53:04 CEST 2006

Patches item #1531859, was opened at 2006-07-31 16:48
Message generated for change (Comment added) made by splitscreen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1531859&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Matt Fleming (splitscreen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Tracing and profiling functions can cause hangs in threads

Initial Comment:
Attached is a patch (with test case) that fixes a
problem when a tracing function or a profiling function
for a thread references a thread ID, causing it to hang.

Matt

----------------------------------------------------------------------

>Comment By: Matt Fleming (splitscreen)
Date: 2006-08-01 15:53

Message:
Logged In: YES 
user_id=1126061

Your first solution, where the threading module in the
standard library would check for any sort of recursion
problem before trying to acquire the _active_limbo_lock, if
say, it _does_ try to acquire _active_limbo_lock is it's
already locked, what is the solution? It's probably not a
good idea for threading.enumerate() to not return anything.

How about unsetting/resetting the _trace_hook/_profile_hook
around locked sections of code? This is pretty much the same
as the developer of a tracing function using your idea of
asking for _active_limbo_lock in their func except that it'd
be transparent to them. This might be a vialbe solution, as
long as it's clearly documented somewhere that this is what
happens. 

It might just be a better solution to paste something in the
threading docs such as "Don't call these functions from
within tracing functions in threaded code" and a list of
functions that are problematic, leaving the responsibility
of this solely on the developer. I have no idea.

But yes, thanks for pulling me up on the fact taht this
patch is incomplete and doesn't fix all cases.

Matt

----------------------------------------------------------------------

Comment By: Rocky Bernstein (rockyb)
Date: 2006-08-01 14:26

Message:
Logged In: YES 
user_id=158581

One change to my comment below. I now don't think the "share
the responsibility" approach mentioned will work any
differently than the approach where the user of Threading
adds active_limbo_lock.aquire(blocking=0) calls. 

The most reliable then is scanning the call stack, but this
requires knowledge of the internals of Threading.py. This
knowledge could be eliminated thouhg. On entry to a locking
routine, a local variable could be set and instead of
scanning the call stack for method names and file names
(threading.py) a scan could be done for that local variable.
Going further Threading could provide a routine to do the
stack scan.

----------------------------------------------------------------------

Comment By: Rocky Bernstein (rockyb)
Date: 2006-08-01 13:28

Message:
Logged In: YES 
user_id=158581

I would like to try to clarify the problem a little and
suggest some possible solution approaches. 

While this patch solves a particular threading.settrace()
problem (and possibly a potential threading.setprofile
problem), the more I think about this, I'm not sure it will
solve all of them or is necessary in all cases.

To reiterate the problem: 

It was noticed that having tracing (Threading.settrace) or
profiling turned on while inside threading.py can cause a
thread hang when _active_limbo_lock.aquire() is called
recursively: once while code uses a method in threading.py
like _delete(), and another time when tracing or profiling
routine is called by settrace from within a Threading method
and the tracing/profiling code calls one of the Threading
methods like enumerate() to get information for its own
purposes. (The patch addresses this for _delete but I'm not
sure it would address it if the first call were say enumerate).

One possibility and clearly the most reliable one because it
relies least on code using Threading, would be for
threading.py to check for this kind of recursive invocation
 (at the module level, not the method level) which might be
done by scanning a call stack. More later. 

Another possibility might be to document this behavior and
put the burden on the profiler/debugger/tracer or anything
that might cause some set of threading routines to be called
recursively. To address the problem outside of Threading
code, what might be done is call
_active_limbo_lock.aquire(blocking=0) before calling a
Threading routine like enumerate(), and use the Threading
routine only only if the lock is acquired.

This will work, but it may get the "cannot acquire lock"
status too often, namely in situations where there isn't a
recursive call. Better than this would again be to somehow
indicate that "a call to a Threading routine which does
locking" is in progress. 

A simple and reliable way to do this would be to share the
responsibility: the Threading methods would set a boolean
variable set to indicate this condition. Code using
Threading could test this before making calls which would
cause recursive invocation.

----------------------------------------------------------------------

Comment By: Matt Fleming (splitscreen)
Date: 2006-08-01 00:30

Message:
Logged In: YES 
user_id=1126061

Actually the problem is a little different than I first
reliased. I've updated the comment block above the code in
threading.py's __delete method to more clearly explain the
situation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1531859&group_id=5470