[Python-Dev] Community buildbots and Python release quality metrics

Thu Jun 26 19:03:59 CEST 2008

On Thu, Jun 26, 2008 at 9:21 AM,  <glyph at divmod.com> wrote:
> On 03:33 pm, guido at python.org wrote:
>> It needs to be decided case-by-case.
>
> If certain tests are to be ignored on a case-by-case basis, why not record
> that decision by disabling the test in the code?  Otherwise, the decision
> inevitably gets boiled down to "it's okay to release the code with a bunch
> of tests failing, but I don't know which ones".  As it was on Twisted when
> we used to make case-by-case decisions about failures, and as it is here
> now.

No, I just meant that we need to figure out for each 3rd party test
that fails whether the failure is our fault (too incompatibile) or
theirs (relied on undefined behavior) and what the best fix is (change
our code or theirs -- note that even if it's there fault there are
cases where the best fix is to change our code.

>> The beta is called beta because, well, it may break stuff and we may want
>> to fix it.
>
> That's also why the alpha is called an alpha.  My informal understanding is
> that a beta should have no (or at least very few) known issues, with a
> medium level of confidence that no further ones will be found.  An RC would
> have absolutely no known issues with a fairly high level of confidence that
> no further ones will be found.  This would include the community buildbots
> basically working for the most part; I would not be surprised if there were
> a few tests that still had issues.

I guess we differ in interpretation. I see alphas as early outreach to
the community. I see betas as a line in the sand where we stop
adding/changing functionality -- but certainly not the moment where we
expect everything to be perfect. During the beta period we fix bugs
only -- those could be problems with new features, but those could
also be problems where we accidentally introduced a backwards
incompatibility.

I'm sorry if your interpretation of the terminology is different, but
this is mine and this is what we've always used, and it's not likely
to change. (At least not for the 2.6/3.0 release.)

> But clearly the reality does not meet my informal expectations, so it would
> be nice to have something written down to check against.  Still, I'm
> bringing this up now because it _is_ a beta, and I think it will be too late
> to talk about dealing with persistent test failures after the RCs start
> coming out.

Absolutely correct. The RCs are hoped to be as good as the final
release. *Now* is the time to bring up issue.

But please bring up specific issues -- I don't want to have an
extended discussion about process or quality or expectations. I just
want the code to be fixed.

> (Of course, I'm just being sneaky.  I don't actually care if it's clearly
> documented, I just care that I stop having problems with incompatibility.
>  But I believe having it clearly spelled out would actually prevent a lot of
> the problems that I've been having, since I don't think anyone would
> *intentionally* select a policy where every release breaks at least one
> major dependent project.)

And here we seem to be parting our ways. We have a large amount of
process already. I don't want more.

>> I'm not particularly impressed by statistics like "all tests are red" --
>> this
>> may all be caused by a single issue.
>
> The issue, for me, is not specifically that tests are red.  It's that
> there's no clear policy about what to do about that.  Can a release go out
> with some of the tests being red?  If so, what are the extenuating
> circumstances?  Does this have to be fixed?  If not, why not?  Why are we
> talking about this now?  Shouldn't the change which caused Django to become
> unimportable have been examined at the time, rather than months later?  (In
> other words, if it *is* a single issue, great, it's easy to fix: revert that
> single issue.)  If not, then shouldn't someone in Django-land have been
> notified so they could cope with the change?
>
> Sorry that there are so many questions here; if I had fewer, I'd use fewer
> words to ask them.

If you're talking about community buildbots (which I presume are
testing 3rd party packages against the core release) being red, that's
out of scope for the core developers. We have enough trouble keeping
our own (core) buildbots green, as you may have noticed. Some of the
core buildbots are red because, well, frankly, they run on a cranky
old version of an OS that few people care about. And then sometimes we
leave them red, or turn them off.

I see the core buildbots as a tool for developers to have their
changes tested on a wide variety of platforms. They are an internal
tool that gives us guidance about whether we're ready.

I hope the community buildbots can be used the same way: a red bot
means someone needs to look into an issue. The issue could be with the
core or with the 3rd party package being tested. I don't think a
policy like "no community buildbots should be red" makes any sense.

>> For example, I'd much rather read an explanation about *why* Django cannot
>> even be imported than a blanket complaint that this is a disgrace. So why is
>> it?
>
> I don't know.  JP is already addressing the issues affecting Twisted in
> another thread (incompatible changes in the private-but-necessary-to-
> get-any-testing-done API of the warnings module).  But I really think that
> whoever made the change which broke it should be the one investigating it,
> not me.

Whoever made what change? You can't seriously expect core developers
investigating issues with 3rd party packages, no matter what the
cause. The correct process is that someone who cares about the 3rd
party package (could be an end user, a developer of that package, or a
core developer who happens to care) looks into the issue enough to
diagnose it, and then either proposes a fix or files a bug with the
likely culprit, which could be the core or the 3rd party package. If
nobody cares, well, that's open source too.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)