[Python-Dev] sprints and pushes

Stephen J. Turnbull stephen at xemacs.org
Fri Mar 25 03:51:11 CET 2011


Tres Seaver writes:

 > >  > > Well, keep in ming hg is a *distributed* version control system. You
 > >  > > don't have to push your changes right now.

 > >  > That doesn't work so well at a sprint, where the point is to maximize
 > >  > the value of precious face-time to get stuff done *now*.
 > > 
 > > That's where the D in DVCS comes in.  It's a new world, friends.  All
 > > you need to do is bring a $50 wireless router to the sprint, and have
 > > some volunteer set up a shared repo for the sprinters.  Then some
 > > volunteer *later* runs the tests and pilots the patches into the
 > > public repo.  Where's the latency?
 > 
 > The current full test suite is punishingly expensive to run, sprint
 > or not.  Because of that fact, people will defer running it, and
 > sometimes forget.  Trying to require that people run it repeatedly
 > during a push race is just Canute lashing the waves.

"Defer" is precisely the point of the proposal you are apparently
responding to, and "people forget" is why I propose a volunteer (or if
possible an automatic process, see my other post and Jesus Cea's) to
run the deferred tests.  I don't understand your point, unless it is
this:

 > The rhythm is still broken if developers don't run the tests and see them
 > pass.  Async is an enemy to the process here, because it encourages poor
 > practices.

Yes, but you can't have it both ways.  Either "the" tests (in fact, I
think people are referring to several different sets of tests by the
phrase "the tests", sometimes even in the same sentence) are valuable
and it's desirable that they always be run (I'm deliberately ignoring
cost here), or they're not, in which case they should never be run.
Now costs come back in: if the tests are valuable but too costly to
run during sprints, better late than never, IMO.  What am I missing?

Also, there's nothing here that says that developers can't run the
tests they think are relevant themselves.  But shouldn't that be their
choice if we are going to relax the requirement from the full test
suite to some subset?  After all, they're the ones closest to the
problem, they should be in the best position to decide which tests are
relevant.

 > >  > Maybe we need to chop the problem up as:
 > > 
 > > "Violence is the last refuge of the incompetent." ObRef Asimov.<wink>
 > 
 > Hmm, that hardly seems appropriate, even with the wink.  "Chopping"
 > isn't violence in any normal sense of the word when applied to
 > non-persons:  Chopping up a problem is no more violent than chopping
 > wood or onions (i.e., not at all).

Well, the referent of "violence" is the global nature of the proposal.
I don't think that one size fits all, here.  If you are going to argue
for running some tests but not others after making changes, shouldn't
there be a notion of relevance involved?  IMO "the" tests for modules
with dependents should include the tests for their dependents, for
example.  Modules that are leaves in the dependency tree presumably
can be unit tested and leave it at that.

Eg, much as I normally respect Barry's intuitions, his proposal (to
remove costly tests, without reference to the possibility of missing
something important) is IMHO absolutely the wrong criterion.  I don't
really know about Python's test suite, but in XEmacs the time-
expensive tests are mostly the ones that involve timeouts and lots of
system calls because they interact with the OS -- and those are
precisely the finicky areas where a small change can occasion an
unexpected bug.  For XEmacs, those bugs also are more likely than
average to be showstoppers for dependents, in the sense that until
they're fixed, the dependents can't be tested at all.  So the cost of
*not* running those tests is relatively high, too.

OTOH, there are a couple of expensive "sledgehammer" tests, eg, the
M17N tests that iterate over *all* the characters XEmacs knows
about.  (Seriously, once you've seen one of the 11,000 precomposed
Hangul, you've seen them all, and it's almost that bad for the 21,000
kanji.)  These could easily be omitted without harming anything else.

Yes, it would be a fair amount of work to do this kind of analysis.
That's why I propose some sort of deferred testing as an alternative
to a cost-based smoke test.

Another alternative would be to require unit testing of changed
modules only, which should be a pretty accurate heuristic for
relevance, except for modules with lots of dependencies.


More information about the Python-Dev mailing list