[Python-Dev] Community buildbots and Python release quality metrics

Thu Jun 26 18:21:08 CEST 2008

On 03:33 pm, guido at python.org wrote:
>Too verbose, Glyph. :-)

Mea culpa.  "Glyph" might be a less appropriate moniker than "Altogether 
too many glyphs".
>It needs to be decided case-by-case.

If certain tests are to be ignored on a case-by-case basis, why not 
record that decision by disabling the test in the code?  Otherwise, the 
decision inevitably gets boiled down to "it's okay to release the code 
with a bunch of tests failing, but I don't know which ones".  As it was 
on Twisted when we used to make case-by-case decisions about failures, 
and as it is here now.
>The beta is called beta because, well, it may break stuff and we may 
>want to fix it.

That's also why the alpha is called an alpha.  My informal understanding 
is that a beta should have no (or at least very few) known issues, with 
a medium level of confidence that no further ones will be found.  An RC 
would have absolutely no known issues with a fairly high level of 
confidence that no further ones will be found.  This would include the 
community buildbots basically working for the most part; I would not be 
surprised if there were a few tests that still had issues.

But clearly the reality does not meet my informal expectations, so it 
would be nice to have something written down to check against.  Still, 
I'm bringing this up now because it _is_ a beta, and I think it will be 
too late to talk about dealing with persistent test failures after the 
RCs start coming out.

(Of course, I'm just being sneaky.  I don't actually care if it's 
clearly documented, I just care that I stop having problems with 
incompatibility.  But I believe having it clearly spelled out would 
actually prevent a lot of the problems that I've been having, since I 
don't think anyone would *intentionally* select a policy where every 
release breaks at least one major dependent project.)
>I'm not particularly impressed by statistics like "all tests are red" 
>-- this
>may all be caused by a single issue.

The issue, for me, is not specifically that tests are red.  It's that 
there's no clear policy about what to do about that.  Can a release go 
out with some of the tests being red?  If so, what are the extenuating 
circumstances?  Does this have to be fixed?  If not, why not?  Why are 
we talking about this now?  Shouldn't the change which caused Django to 
become unimportable have been examined at the time, rather than months 
later?  (In other words, if it *is* a single issue, great, it's easy to 
fix: revert that single issue.)  If not, then shouldn't someone in 
Django-land have been notified so they could cope with the change?

Sorry that there are so many questions here; if I had fewer, I'd use 
fewer words to ask them.
>For example, I'd much rather read an explanation about *why* Django 
>cannot even be imported than a blanket complaint that this is a 
>disgrace. So why is it?

I don't know.  JP is already addressing the issues affecting Twisted in 
another thread (incompatible changes in the private-but-necessary-to- 
get-any-testing-done API of the warnings module).  But I really think 
that whoever made the change which broke it should be the one 
investigating it, not me.