[Python-Dev] My thinking about the development process

Brett Cannon bcannon at gmail.com
Fri Dec 5 21:04:53 CET 2014


This is a bit long as I espoused as if this was a blog post to try and give
background info on my thinking, etc. The TL;DR folks should start at the
"Ideal Scenario" section and read to the end.

P.S.: This is in Markdown and I have put it up at
https://gist.github.com/brettcannon/a9c9a5989dc383ed73b4 if you want a
nicer formatted version for reading.

# History lesson
Since I signed up for the python-dev mailing list way back in June 2002,
there seems to be a cycle where we as a group come to a realization that
our current software development process has not kept up with modern
practices and could stand for an update. For me this was first shown when
we moved from SourceForge to our own infrastructure, then again when we
moved from Subversion to Mercurial (I led both of these initiatives, so
it's somewhat a tradition/curse I find myself in this position yet again).
And so we again find ourselves at the point of realizing that we are not
keeping up with current practices and thus need to evaluate how we can
improve our situation.

# Where we are now
Now it should be realized that we have to sets of users of our development
process: contributors and core developers (the latter whom can play both
roles). If you take a rough outline of our current, recommended process it
goes something like this:

1. Contributor clones a repository from hg.python.org
2. Contributor makes desired changes
3. Contributor generates a patch
4. Contributor creates account on bugs.python.org and signs the
   [contributor agreement](https://www.python.org/psf/contrib/contrib-form/)
4. Contributor creates an issue on bugs.python.org (if one does not already
exist) and uploads a patch
5. Core developer evaluates patch, possibly leaving comments through our
[custom version of Rietveld](http://bugs.python.org/review/)
6. Contributor revises patch based on feedback and uploads new patch
7. Core developer downloads patch and applies it to a clean clone
8. Core developer runs the tests
9. Core developer does one last `hg pull -u` and then commits the changes
to various branches

I think we can all agree it works to some extent, but isn't exactly smooth.
There are multiple steps in there -- in full or partially -- that can be
automated. There is room to improve everyone's lives.

And we can't forget the people who help keep all of this running as well.
There are those that manage the SSH keys, the issue tracker, the review
tool, hg.python.org, and the email system that let's use know when stuff
happens on any of these other systems. The impact on them needs to also be
considered.

## Contributors
I see two scenarios for contributors to optimize for. There's the simple
spelling mistake patches and then there's the code change patches. The
former is the kind of thing that you can do in a browser without much
effort and should be a no-brainer commit/reject decision for a core
developer. This is what the GitHub/Bitbucket camps have been promoting
their solution for solving while leaving the cpython repo alone.
Unfortunately the bulk of our documentation is in the Doc/ directory of
cpython. While it's nice to think about moving the devguide, peps, and even
breaking out the tutorial to repos hosting on Bitbucket/GitHub, everything
else is in Doc/ (language reference, howtos, stdlib, C API, etc.). So
unless we want to completely break all of Doc/ out of the cpython repo and
have core developers willing to edit two separate repos when making changes
that impact code **and** docs, moving only a subset of docs feels like a
band-aid solution that ignores the big, white elephant in the room: the
cpython repo, where a bulk of patches are targeting.

For the code change patches, contributors need an easy way to get a hold of
the code and get their changes to the core developers. After that it's
things like letting contributors knowing that their patch doesn't apply
cleanly, doesn't pass tests, etc. As of right now getting the patch into
the issue tracker is a bit manual but nothing crazy. The real issue in this
scenario is core developer response time.

## Core developers
There is a finite amount of time that core developers get to contribute to
Python and it fluctuates greatly. This means that if a process can be found
which allows core developers to spend less time doing mechanical work and
more time doing things that can't be automated -- namely code reviews --
then the throughput of patches being accepted/rejected will increase. This
also impacts any increased patch submission rate that comes from improving
the situation for contributors because if the throughput doesn't change
then there will simply be more patches sitting in the issue tracker and
that doesn't benefit anyone.

# My ideal scenario
If I had an infinite amount of resources (money, volunteers, time, etc.),
this would be my ideal scenario:

1. Contributor gets code from wherever; easiest to just say "fork on GitHub
or Bitbucket" as they would be official mirrors of hg.python.org and are
updated after every commit, but could clone hg.python.org/cpython if they
wanted
2. Contributor makes edits; if they cloned on Bitbucket or GitHub then they
have browser edit access already
3. Contributor creates an account at bugs.python.org and signs the CLA
3. The contributor creates an issue at bugs.python.org (probably the one
piece of infrastructure we all agree is better than the other options,
although its workflow could use an update)
4. If the contributor used Bitbucket or GitHub, they send a pull request
with the issue # in the PR message
5. bugs.python.org notices the PR, grabs a patch for it, and puts it on
bugs.python.org for code review
6. CI runs on the patch based on what Python versions are specified in the
issue tracker, letting everyone know if it applied cleanly, passed tests on
the OSs that would be affected, and also got a test coverage report
7. Core developer does a code review
8. Contributor updates their code based on the code review and the updated
patch gets pulled by bugs.python.org automatically and CI runs again
9. Once the patch is acceptable and assuming the patch applies cleanly to
all versions to commit to, the core developer clicks a "Commit" button,
fills in a commit message and NEWS entry, and everything gets committed (if
the patch can't apply cleanly then the core developer does it the
old-fashion way, or maybe auto-generate a new PR which can be manually
touched up so it does apply cleanly?)

Basically the ideal scenario lets contributors use whatever tools and
platforms that they want and provides as much automated support as possible
to make sure their code is tip-top before and during code review while core
developers can review and commit patches so easily that they can do their
job from a beach with a tablet and some WiFi.

## Where the current proposed solutions seem to fall short
### GitHub/Bitbucket
Basically GitHub/Bitbucket is a win for contributors but doesn't buy core
developers that much. GitHub/Bitbucket gives contributors the easy cloning,
drive-by patches, CI, and PRs. Core developers get a code review tool --
I'm counting Rietveld as deprecated after Guido's comments about the code's
maintenance issues -- and push-button commits **only for single branch
changes**. But for any patch that crosses branches we don't really gain
anything. At best core developers tell a contributor "please send your PR
against 3.4", push-button merge it, update a local clone, merge from 3.4 to
default, do the usual stuff, commit, and then push; that still keeps me off
the beach, though, so that doesn't get us the whole way. You could force
people to submit two PRs, but I don't see that flying. Maybe some tool
could be written that automatically handles the merge/commit across
branches once the initial PR is in? Or automatically create a PR that core
developers can touch up as necessary and then accept that as well?
Regardless, some solution is necessary to handle branch-crossing PRs.

As for GitHub vs. Bitbucket, I personally don't care. I like GitHub's
interface more, but that's personal taste. I like hg more than git, but
that's also personal taste (and I consider a transition from hg to git a
hassle but not a deal-breaker but also not a win). It is unfortunate,
though, that under this scenario we would have to choose only one platform.

It's also unfortunate both are closed-source, but that's not a
deal-breaker, just a knock against if the decision is close.

### Our own infrastructure
The shortcoming here is the need for developers, developers, developers!
Everything outlined in the ideal scenario is totally doable on our own
infrastructure with enough code and time (donated/paid-for infrastructure
shouldn't be an issue). But historically that code and time has not
materialized. Our code review tool is a fork that probably should be
replaced as only Martin von Löwis can maintain it. Basically Ezio Melotti
maintains the issue tracker's code. We don't exactly have a ton of people
constantly going "I'm so bored because everything for Python's development
infrastructure gets sorted so quickly!" A perfect example is that R. David
Murray came up with a nice update for our workflow after PyCon but then ran
out of time after mostly defining it and nothing ever became of it (maybe
we can rectify that at PyCon?). Eric Snow has pointed out how he has
written similar code for pulling PRs from I think GitHub to another code
review tool, but that doesn't magically make it work in our infrastructure
or get someone to write it and help maintain it (no offense, Eric).

IOW our infrastructure can do anything, but it can't run on hopes and
dreams. Commitments from many people to making this happen by a certain
deadline will be needed so as to not allow it to drag on forever. People
would also have to commit to continued maintenance to make this viable
long-term.

# Next steps
I'm thinking first draft PEPs by February 1 to know who's all-in (8 weeks
away), all details worked out in final PEPs and whatever is required to
prove to me it will work by the PyCon language summit (4 months away). I
make a decision by May 1, and
then implementation aims to be done by the time 3.5.0 is cut so we can
switch over shortly thereafter (9 months away). Sound like a reasonable
timeline?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20141205/6dc99f63/attachment.html>


More information about the Python-Dev mailing list