[Python-Dev] Community buildbots and Python release quality metrics

Thu Jun 26 17:18:55 CEST 2008

Today on planetpython.org, Doug Hellman announced the June issue of 
Python magazine.  The cover story this month is about Pybots, "the 
fantastic automation system that has been put in place to make sure new 
releases of Python software are as robust and stable as possible".

Last week, there was a "beta" release of Python which, according to the 
community buildbots, cannot run any existing python software.  Normally 
I'd be complaining here about Twisted, but in fact Twisted is doing 
relatively well right now; only 80 failing tests.  Django apparently 
cannot even be imported.

The community buildbots have been in a broken state for months now[1]. 
I've been restraining myself from whinging about this, but now that it's 
getting close to release, it's time to get these into shape, or it's 
time to get rid of them.

There are a few separate issues here.  One is, to what extent should 
Python actually be compatible between releases?  There's not much point 
in saying "this release is compatible with python 2.5" if all python 2.5 
programs need to be modified in order to run on python 2.6.  Of course, 
it's quite likely that there are some 3rd-party Python programs that 
continue to work, but python.org provides links to 2 projects that don't 
and none that do.

Another way to phrase this question is, "whose responsibility is it to 
make Python 2.5 programs run on Python 2.6"?  Or, "what happens when the 
core team finds out that a change they have made has broken some python 
software 'in the wild'"?

Here are a couple of ways to answer these questions:

  1) It's the python core team's responsibility.  There should be Python 
buildbots for lots of different projects and they should all be green 
before the code can be considered [one of: allowed in trunk, alpha, 
beta, RC].
  2) It's the python core team's responsibility as long as the major 
projects are using public APIs.  The code should not be considered 
[alpha, beta, RC] if there are any known breakages in public APIs.
  3) It's only the python core team's responsibility to notify projects 
of incompatible changes of which they are aware, which may occur in 
public or private APIs.  Major projects with breakages will be given a 
grace period before a [beta, RC, final] to release a forward-compatible 
version.
  4) The status quo: every new Python release can (and will, at least 
speaking in terms of 2.6) break lots of popular software, with no 
warning aside from the beta period.  There are no guarantees.

For obvious reasons, I'd prefer #1, but any of the above is better than 
#4.  I don't think #4 is an intentional choice, either; I think it's the 
result of there being no clear consequence for community buildbots 
failing.  Or, for that matter, of core buildbots failing.  Investigating 
the history is tricky (I see a bunch of emails from Barry here, but I'm 
not sure which is the final state) but I believe the beta went out with 
a bunch of buildbots offline or failing.

A related but independent issue is: should the community buildbots 
continue to exist?  An automated test is only as good as the 
consequences of its failure.  As it stands, the community buildbots seem 
to be nothing but an embarrassment.  I don't mean this to slam Grig, or 
the Python team - I mean, literally, if there is no consequence of their 
failure then the only use I can see for the data the buildbots currently 
provide (i.e. months and months of failures) is to embarrass the Python 
core developers.  python.org's purpose should not be to provide negative 
press for Python ;), so if this continues to be the case, they probably 
shouldn't be linked.  This isn't just an issue with changes to Python 
breaking stuff; the pybots' configuration is apparently not well 
maintained, since the buildbots which say "2.5" aren't passing either, 
and that's not a problem with the code, it's just that the slaves are 
actually running against trunk at HEAD.

Ultimately these tools only exist to ensure the quality of Python 
releases.  The really critical question here is, what does it mean to 
have a Python release that is high-quality?  There are some obvious 
things: it shouldn't crash, it should conform to its own documentation. 
Personally I think "it passes all of its own tests" and "it runs 
existing Python code" are important metrics too, but the most important 
thing I'm trying to get across here is that there should be a clear 
understanding of which goals the release / QA / buildbot / test process 
is trying to accomplish.  For me, personally, it really needs to be 
clear when I can say "you guys screwed up and broke stuff", and when I 
just have to suck it up and deal with the consequences of a new version 
of Python in Twisted.

It's definitely bad for all of us if the result is that new releases of 
Python just break everything.  Users don't care where the responsibility 
lies, they just know that stuff doesn't work, so we should decide on a 
process which allows Twisted (and other popular projects, like Django, 
Plone, pytz, PyGTK, pylons, ...) to (mostly) run when new versions of 
Python are released.

[1]: http://mail.python.org/pipermail/python-dev/2008-May/079212.html