[core-workflow] Longer term idea: consider Rust's homu for CPython merge gating

Sun Jan 3 01:06:23 EST 2016

On 3 January 2016 at 15:08, Guido van Rossum <guido at python.org> wrote:
> In general I like the idea of a commit queue; we built one at Dropbox which
> has been very useful in keeping our master green. Based on this experience,
> I note that testing each commit before it's merged limits the rate at which
> commits can be merged, unless you play some tricks that occasionally
> backfire. Taking homu's brief description literally, it would seem that if
> it takes e.g. 10 minutes to run the tests you can't accept more than 6
> commits per hour. I don't recall how long Python's tests take, but I I
> wouldn't be surprised if it was a lot longer than 10 minutes.

It's less than 10 minutes on a modern laptop (although I admit I
haven't done a -uall run in a while), but longer than that on the
Buildbot fleet (since some of the buildbots aren't particularly fast).

> Now, you can reduce the running time by parallelizing test runs, and Python
> doesn't see that many commits typically, so perhaps this isn't a problem.
> But for larger projects it may be -- it definitely was a big concern at
> Dropbox, and I recall hearing about some astronomical stats from the
> Chromium project.
>
> At Dropbox we solved this by not literally testing each commit after the
> last one has been merged. Instead, for each commit to be tested, we create a
> throwaway branch where we merge the commit with the current HEAD on the
> master branch, test the result, and then (if the tests pass) re-merge the
> commit onto master. This way we can test many commits in parallel (and yes,
> we pay a lot of money to Amazon for the needed capacity :-).

As far I'm aware, OpenStack's Zuul mergebot is currently best in class
at solving this problem:

* while a patch is in review, they run a "check" test against master
to see if the change works at all
* to be approved for merge, patches need both a clean check run and
approval from human reviewers
* once patches are approved for merge, they go into a merge queue,
where each patch is applied atop the previous one
* Zuul runs up to N test runs in parallel, merging them in queue order
as they finish successfully
* if any test run fails, any subsequent test runs (finished or not)
are discarded, the offending patch is removed from the merge queue,
and Zuul rebuilds the queue with the failing patch removed

It's a lot like a CPU pipeline in that regard - when everything goes
well, patches are merged as fast as they're approved, just with a
fixed time delay corresponding to the length of time it takes to run
the test suite. If one of the merges fails, then it's like branch
prediction in a CPU failing - you have to flush the pipeline and start
over again from the point of the failure.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia