[Python-Dev] Community buildbots and Python release quality metrics

Thu Jun 26 21:32:10 CEST 2008

I do tend to ramble on, so here's an executive summary of my response:

I want python developers to pay attention to the community buildbots and 
to treat breakages of existing projects as a serious issue.  However, I 
don't think that maintaining those projects is the core team's job, so 
all I'm asking for is for core developers to:

  * treat breakages of 3rd party packages as a potentially serious issue,
  * if possible (i.e. if they find out about the breakage soon enough, 
which should be the case in any pybots failure) revert the change that 
caused the problem until the problem can be fixed, and
  * notify 3rd party maintainers when it's decided that the breakage will 
not be fixed.

This only applies to breakages that the core developers find out about, 
which for all practical purposes means the ones on the community 
builders page.

Those of you looking for point-by-point responses and some repetition of 
the above points, enjoy :).

On 05:03 pm, guido at python.org wrote:
>On Thu, Jun 26, 2008 at 9:21 AM,  <glyph at divmod.com> wrote:
>>On 03:33 pm, guido at python.org wrote:
>>>It needs to be decided case-by-case.
>>(misunderstanding)
>No, I just meant that we need to figure out for each 3rd party test
>that fails whether the failure is our fault (too incompatibile) or
>theirs (relied on undefined behavior) and what the best fix is (change
>our code or theirs -- note that even if it's there fault there are
>cases where the best fix is to change our code.

This is basically fine, as far as I'm concerned.

I would like to suggest, however, that these issues be dealt with as 
soon as possible, rather than waiting for the release process to begin. 
A lot of decisions are made on this mailing list about the supposed 
properties of "average" python code, without any actual survey of said 
code.  Sometimes the results of that survey can be really surprising. 
The end goal of any particular compatibility policy, of a distinction 
between "public" and "private" APIs, and so on, is to keep code working.
>I'm sorry if your interpretation of the terminology is different, but
>this is mine and this is what we've always used, and it's not likely
>to change. (At least not for the 2.6/3.0 release.)

I have no problem with your definitions of these terms.  I think that 
they should probably be in PEP 101 though.  Would you accept a patch 
that added an edited / expanded version of this paragraph?
>>Still, I'm bringing this up now because it _is_ a beta,

>Absolutely correct. The RCs are hoped to be as good as the final
>release. *Now* is the time to bring up issue.

Well, that's good, at least :)
>But please bring up specific issues -- I don't want to have an
>extended discussion about process or quality or expectations. I just
>want the code to be fixed.

Well, one specific issue has been bumped in priority as a result of this 
thread, and others are under discussion.  The code is getting fixed.
>>(I just care that I stop having problems with incompatibility.)
>
>And here we seem to be parting our ways. We have a large amount of
>process already. I don't want more.

Looking at it from my perspective, I'm proposing a reduction in process. 
Under the current process, if a buildbot goes red, the developer makes a 
judgment call, the release manager makes a judgment call, there's 
discussion on a ticket, a ticket gets filed, it gets promoted, it gets 
demoted, the RM forgets to re-promote it...

My suggestion is that the process be, simply: if a buildbot (community 
or otherwise) goes red, the change that broke it gets reverted.  No 
questions asked!  It's still there in the revision history, ready to be 
re-applied once the issues get worked out.  Discussion can then take 
place and case-by-case judgments can be applied.
>If you're talking about community buildbots (which I presume are
>testing 3rd party packages against the core release) being red, that's
>out of scope for the core developers.

I don't necessarily think that keeping the community buildbots red is 
the core developers' responsibility, but I don't think it should be 
entirely out of scope, either.  The python test suite is, frankly, poor 
- and I hope I'm not surprising you by saying that.  It's full of race 
conditions, tends to fail intermittently, and is frequently ignored. 
Not only that, but it is quite often changed, so tests for issues that 
affect real code are quite often removed.  So, the community buildbots 
are not just making sure that 3rd-party code still works, they are an 
expanded, independently developed test suite to make sure that *python 
itself* still works.  Sometimes they will not fill that role 
particularly well, but they are worth paying attention to.

If python had a good, exhaustive regression test suite that was 
immutable between major versions, I'd probably feel differently.  But 
that's not the world we live in.

Right now, apparently, the *default* policy is that if the community 
buildbots go red, later, before a release, someone will maybe take a 
look at it.  I'd suggest that the *default* policy ought to be that if a 
particular changeset breaks a community buildbot, it needs further 
examination before being merged to trunk.

However, this is just the way I prefer to do development; if you think 
that would slow things down too much, the only thing I'm _really_ asking 
for is a clear statement that says "there should be no test failures on 
community buildbots that have not been explicitly accepted before a 
final release".  I'm not even sure what "explicitly accepted" means - 
you have to sign off?  the release manager, maybe?  A discussion on this 
list?  I don't really care, as long as _somebody_ does.

Right now, my impression of the process is this:

  * The community buildbot goes red; no core developer looks at it.
    * If the project is Twisted, JP fixes the bug on Twisted's end.
    * If the project is Django, nobody notices.
  * Months later, a beta goes out.  A few people try it out and report 
some bugs, but don't really understand the output.  A good number go un- 
triaged.
  * A little while later, a final release comes out.  Many projects are 
broken as a result.

This is not a hypothetical concern.  This is what happened with 2.5; 
Twisted was broken for months, and Zope *to this day* does not support 
Python 2.5.  2.6 looks like it's headed for the same trajectory.  To be 
clear: this is with all of Python's _own_ tests passing, so it is 
specific to paying attention to community buildbots.  (And the community 
buildbots only build django and twisted right now.  I'm not talking 
about a massive pan-galactic survey of all possible python projects. 
I'm only talking about those popular enough to make this select list. 
Which should still be a slightly longer list, but I digress...)
>Some of the
>core buildbots are red because, well, frankly, they run on a cranky
>old version of an OS that few people care about.

On Twisted, we have a distinction between "supported" and "unsupported" 
platforms, to provide the ability to run on platforms which aren't 
really supported and don't really run the whole suite, but we are 
nevertheless interested in.  I don't believe the setup is too hard and 
we'll definitely help out with that if you want to do it.  (I believe 
Thomas Herve volunteered to do this at PyCon...)
>I hope the community buildbots can be used the same way: a red bot
>means someone needs to look into an issue. The issue could be with the
>core or with the 3rd party package being tested. I don't think a
>policy like "no community buildbots should be red" makes any sense.

These bots have been red for months.  The issues exist, but have not 
been looked into.  As a result, Barry made a specific commitment on a 
ticket (i.e. "this should block beta 1") which was not met.  I think 
_something_ has to be changed to encourage people to do this more 
immediately or more seriously.
>Whoever made what change? You can't seriously expect core developers
>investigating issues with 3rd party packages, no matter what the
>cause. The correct process is that someone who cares about the 3rd
>party package (could be an end user, a developer of that package, or a
>core developer who happens to care) looks into the issue enough to
>diagnose it, and then either proposes a fix or files a bug with the
>likely culprit, which could be the core or the 3rd party package. If
>nobody cares, well, that's open source too.

If the breakage is calculated and expected, and the benefits clearly 
oughtweigh the costs... oh well, too bad for the 3rd party people.  It 
would be nice if the core developers would notify the third party if 
they find out about it so that it can be verified that the change in 
question wasn't obscuring some *other* problem, but from what I've seen, 
the breakages that I have been concerned about have not been 
intentional, calculated changes, but side-effects of other things.

I'm talking about the case where the breakage reveals either a bug in 
Python, or an unintentional / side-effect change in behavior, which is 
surprisingly frequent.