The reliability of python threads

Fri Jan 26 02:17:28 EST 2007

 "Carl J. Van Arsdall" <cvanarsdall at mvista.com> wrote:

> Right, I wasn't coming here to get someone to debug my app, I'm just 
> looking for ideas.  I constantly am trying to find new ways to improve 
> my software and new ways to reduce bugs, and when i get really stuck, 
> new ways to track bugs down.  The exception won't mean much, but I can 
> say that the error appears to me as bad data.  I do checks prior to 
> performing actions on any data, if the data doesn't look like what it 
> should look like, then the system flags an exception.
> 
> The problem I'm having is determining how the data went bad.  In 
> tracking down the problem a couple guys mentioned that problems like 
> that usually are a race condition.  From here I examined my code, 
> checked out all the locking stuff, made sure it was good, and wasn't 
> able to find anything.  Being that there's one lock and the critical 
> sections are well defined, I'm having difficulty.  One idea I have to 

Are you 100% rock bottom gold plated guaranteed sure that there is
not something else that is also critical that you just haven't realised is?

This stuff is never obvious before the fact - and always seems stupid
afterward, when you have found it.  Your best (some would say only)
weapon is your imagination, fueled by scepticism...

> try and get a better understanding might be to check data before its 
> stored.  Again, I still don't know how it would get messed up nor can I 
> reproduce the error on my own. 
> 
> Do any of you think that would be a good practice for trying to track 
> this down? (Check the data after reading it, check the data before 
> saving it)

Nothing wrong with doing that to find a bug - not as a general 
practice, of course - that would be too pessimistic.

In hard to find bugs - doing anything to narrow the time and place
of the error down is fair game - the object is to get you to read
some code that you *know works* with new eyes...

I build in a global boolean variable that I call trace, and when its on
I do all sort of weird stuff, giving a running commentary (either by
print or in some log like file) of what the programme is doing, 
like read this, wrote that, received this, done that here, etc.
A bare useful minimum is a "we get here" indicator like the routine
name, but the data helps a lot too.

Compared to an assert, it does not stop the execution, and you
could get lucky by cross correlating such "traces" from different
threads. - or better, if you use a queue or a pipe for the "log", 
you might see the timing relationships directly.

But this in itself is fraught with danger, as you can hit file size 
limits, or slow the whole thing down to unusability.

On the other hand it does not generate the volume that a genuine 
trace does, it is easier to read, and you can limit it to the bits that
you are currently suspicious of.

Programming is such fun...

hth - Hendrik