[Python-Dev] Strategies for debugging buildbot failures?

Sun Jan 18 19:03:15 CET 2009

This is probably a stupid question, but here goes:

Can anyone suggest good strategies for debugging buildbot
test failures, for problems that aren't reproducible locally?

There have been various times in the past that I've wanted
to be able to do this.  Right now, I'm thinking particularly of
the 'Unknown signal 32' failure that's been occurring on the
gentoo x86 buildbots for 3.0 and 3.x since pre- 3.0 alpha
days.  I recently noticed an apparent pattern to these
failures: (failure occurs at the first test that involves
threads, after test_os has been run), but am unsure how
to proceed from there.

Is it acceptable to commit a change (to the trunk or py3k, not to
the release branches) solely for the purpose of getting more
information about a failure?  I don't see a lot of this kind of
activity going on in the checkin messages, so I'm not sure
whether this is okay or not.  If I did this, the commit
message would clearly indicate that the checkin was
meant to be temporary, and give an expected time to reversion.

Alternatively, is it reasonable to create a new branch solely
for the purpose of tracking down one particular problem?
Again, I don't see this sort of thing happening, but it seems
like an attractive strategy, since it allows one to test one
particular buildbot (via the form for requesting a build)
without messing up anything else.

What do others do to debug these failures?

Mark

(P.S. After a bit of Googling, I suspect the 'Unknown
signal 32' failure of being related to the LinuxThreads
library, and probably not Python's fault.  But it would
still be good to understand why it occurs with 3.x but
not 2.x, and whether there's an easy workaround.)