[Python-Dev] PEP 481 - Migrate Some Supporting Repositories to Git and Github

Donald Stufft donald at stufft.io
Sun Nov 30 00:27:05 CET 2014


As promised in the "Move selected documentation repos to PSF BitBucket
account?" thread I've written up a PEP for moving selected repositories from
hg.python.org to Github.

You can see this PEP online at: https://www.python.org/dev/peps/pep-0481/

I've also reproduced the PEP below for inline discussion.

-----------------------

Abstract
========

This PEP proposes migrating to Git and Github for certain supporting
repositories (such as the repository for Python Enhancement Proposals) in a way
that is more accessible to new contributors, and easier to manage for core
developers. This is offered as an alternative to PEP 474 which aims to achieve
the same overall benefits but while continuing to use the Mercurial DVCS and
without relying on a commerical entity.

In particular this PEP proposes changes to the following repositories:

* https://hg.python.org/devguide/
* https://hg.python.org/devinabox/
* https://hg.python.org/peps/


This PEP does not propose any changes to the core development workflow for
CPython itself.


Rationale
=========

As PEP 474 mentions, there are currently a number of repositories hosted on
hg.python.org which are not directly used for the development of CPython but
instead are supporting or ancillary repositories. These supporting repositories
do not typically have complex workflows or often branches at all other than the
primary integration branch. This simplicity makes them very good targets for
the "Pull Request" workflow that is commonly found on sites like Github.

However where PEP 474 wants to continue to use Mercurial and wishes to use an
OSS and self-hosted and therefore restricts itself to only those solutions this
PEP expands the scope of that to include migrating to Git and using Github.

The existing method of contributing to these repositories generally includes
generating a patch and either uploading them to bugs.python.org or emailing
them to peps at python.org. This process is unfriendly towards non-comitter
contributors as well as making the process harder than it needs to be for
comitters to accept the patches sent by users. In addition to the benefits
in the pull request workflow itself, this style of workflow also enables
non techincal contributors, especially those who do not know their way around
the DVCS of choice, to contribute using the web based editor. On the committer
side the Pull Requests enable them to tell, before merging, whether or not
a particular Pull Request will break anything. It also enables them to do a
simple "push button" merge which does not require them to check out the
changes locally. Another such feature that is useful in particular for docs,
is the ability to view a "prose" diff. This Github specific feature enables
a committer to view a diff of the rendered output which will hide things like
reformatting a paragraph and show you what the actual "meat" of the change
actually is.


Why Git?
--------

Looking at the variety of DVCS which are available today it becomes fairly
clear that git has gotten the vast mindshare of people who are currently using
it. The Open Hub (Previously Ohloh) statistics [#openhub-stats]_ show that
currently 37% of the repositories Open Hub is indexing is using git which is
second only to SVN (which has 48%) while Mercurial has just 2% of the indexed
repositories (beating only bazaar which has 1%). In additon to the Open Hub
statistics a look at the top 100 projects on PyPI (ordered by total download
counts) shows us that within the Python space itself there is a majority of
projects using git:

=== ========= ========== ====== === ====
Git Mercurial Subversion Bazaar CVS None
=== ========= ========== ====== === ====
62  22        7          4      1   1
=== ========= ========== ====== === ====


Chosing a DVCS which has the larger mindshare will make it more likely that any
particular person who has experience with DVCS at all will be able to
meaningfully use the DVCS that we have chosen without having to learn a new
tool.

In addition to simply making it more likely that any individual will already
know how to use git, the number of projects and people using it means that the
resources for learning the tool are likely to be more fully fleshed out and
when you run into problems the liklihood that someone else had that problem
and posted a question and recieved an answer is also far likelier.

Thirdly by using a more popular tool you also increase your options for tooling
*around* the DVCS itself. Looking at the various options for hosting
repositories it's extremely rare to find a hosting solution (whether OSS or
commerical) that supports Mercurial but does not support Git, on the flip side
there are a number of tools which support Git but do not support Mercurial.
Therefore the popularity of git increases the flexibility of our options going
into the future for what toolchain these projects use.

Also by moving to the more popular DVCS we increase the likelhood that the
knowledge that the person has learned in contributing to these support
repositories will transfer to projects outside of the immediate CPython project
such as to the larger Python community which is primarily using Git hosted on
Github.

In previous years there was concern about how well supported git was on Windows
in comparison to Mercurial. However git has grown to support Windows as a first
class citizen. In addition to that, for Windows users who are not well aquanted
with the Windows command line there are GUI options as well.

On a techincal level git and Mercurial are fairly similar, however the git
branching model is signifcantly better than Mercurial "Named Branches" for
non-comitter contributors. Mercurial does have a "Bookmarks" extension however
this isn't quite as good as git's branching model. All bookmarks live in the
same namespace so it requires individual users to ensure that they namespace
the branchnames themselves lest the risk collision. It also is an extension
which requires new users to first discover they need an extension at all and
then figure out what they need to do in order to enable that extension. Since
it is an extension it also means that in general support for them outside of
Mercurial core is going to be less than 100% in comparison to git where the
feature is built in and core to using git at all. Finally users who are not
used to Mercurial are unlikely to discover bookmarks on their own, instead they
will likely attempt to use Mercurial's "Named Branches" which, given the fact
they live "forever", are not often what a project wants their contributors to
use.


Why Github?
-----------

There are a number of software projects or web services which offer
functionality similar to that of Github. These range from commerical web
services such as a Bitbucket to self-hosted OSS solutions such as Kallithea or
Gitlab. This PEP proposes that we move these repositories to Github.

There are two primary reasons for selecting Github: Popularity and
Quality/Polish.

Github is currently the most popular hosted repository hosting according to
Alexa where it currently has a global rank of 121. Much like for Git itself by
choosing the most popular tool we gain benefits in increasing the likelhood
that a new contributor will have already experienced the toolchain, the quality
and availablity of the help, more and better tooling being built around it, and
the knowledge transfer to other projects. A look again at the top 100 projects
by download counts on PyPI shows the following hosting locations:

====== ========= =========== ========= =========== ==========
GitHub BitBucket Google Code Launchpad SourceForge Other/Self
====== ========= =========== ========= =========== ==========
62     18        6           4         3           7
====== ========= =========== ========= =========== ==========

In addition to all of those reasons, Github also has the benefit that while
many of the options have similar features when you look at them in a feature
matrix the Github version of each of those features tend to work better and be
far more polished. This is hard to quantify objectively however it is a fairly
common sentiment if you go around and ask people who are using these services
often.

Finally a reason to choose a web service at all over something that is
self-hosted is to be able to more efficiently use volunteer time and donated
resources. Every additional service hosted on the PSF infrastruture by the
PSF infrastructure team further spreads out the amount of time that the
volunteers on that team have to spend and uses some chunk of resources that
could potentionally be used for something where there is no free or affordable
hosted solution available.

One concern that people do have with using a hosted service is that there is a
lack of control and that at some point in the future the service may no longer
be suitable. It is the opinion of this PEP that Github does not currently and
has not in the past engaged in any attempts to lock people into their platform
and that if at some point in the future Github is no longer suitable for one
reason or another than at that point we can look at migrating away from Github
onto a different solution. In other words, we'll cross that bridge if and when
we come to it.


Example: Scientific Python
--------------------------

One of the key ideas behind the move to both git and Github is that a feature
of a DVCS, the repository hosting, and the workflow used is the social network
and size of the community using said tools. We can see this is true by looking
at an example from a sub-community of the Python community: The Scientific
Python community. They have already migrated most of the key pieces of the
SciPy stack onto Github using the Pull Request based workflow starting with
IPython and as more projects moved over it became a natural default for new
projects.

They claim to have seen a great benefit from this move, where it enables casual
contributors to easily move between different projects within their
sub-community without having to learn a special, bespoke workflow and a
different toolchain for each project. They've found that when people can use
their limited time on actually contributing instead of learning the different
tools and workflows that not only do they contribute more to one project, that
they also expand out and contribute to other projects. This move is also
attributed to making it commonplace for members of that community to go so far
as publishing their research and educational materials on Github as well.

This showcases the real power behind moving to a highly popular toolchain and
workflow, as each variance introduces yet another hurdle for new and casual
contributors to get past and it makes the time spent learning that workflow
less reusable with other projects.


Migration
=========

Through the use of hg-git [#hg-git]_ we can easily convert a Mercurial
repository to a Git repository by simply pushing the Mercurial repository to
the Git repository. People who wish to continue to use Mercurual locally can
then use hg-git going into the future using the new Github URL, however they
will need to re-clone their repositories as using Git as the server seems to
trigger a one time change of the changeset ids.

As none of the selected repositories have any tags, branches, or bookmarks
other than the ``default`` branch the migration will simply map the ``default``
branch in Mercurial to the ``master`` branch in git.

In addition since none of the selected projects have any great need of a
complex bug tracker, they will also migrate their issue handling to using the
GitHub issues.

In addition to the migration of the repository hosting itself there are a
number of locations for each particular repository which will require updating.
The bulk of these will simply be changing commands from the hg equivilant to
the git equivilant.

In particular this will include:

* Updating www.python.org to generate PEPs using a git clone and link to
  Github.
* Updating docs.python.org to pull from Github instead of hg.python.org for the
  devguide.
* Enabling the ability to send an email to python-checkins at python.org for each
  push.
* Enabling the ability to send an IRC message to #python-dev on Freenode for
  each push.
* Migrate any issues for these projects to their respective bug tracker on
  Github.

This will restore these repositories to similar functionality as they currently
have. In addition to this the migration will also include enabling testing for
each pull request using Travis CI [#travisci]_ where possible to ensure that
a new PR does not break the ability to render the documentation or PEPs.


User Access
===========

Moving to Github would involve adding an additional user account that will need
to be managed, however it also offers finer grained control, allowing the
ability to grant someone access to only one particular repository instead of
the coarser grained ACLs available on hg.python.org.


References
==========

.. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`
.. [#hg-git] `hg-git <https://hg-git.github.io/>`
.. [#travisci] `Travis CI <https://travis-ci.org/>`

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



More information about the Python-Dev mailing list