Notice: While Javascript is not essential for this website, your interaction with the content will be limited. Please turn Javascript on for the full experience.

PEP 507 -- Migrate CPython to Git and GitLab

PEP:507
Title:Migrate CPython to Git and GitLab
Author:Barry Warsaw <barry at python.org>
Status:Rejected
Type:Process
Created:2015-09-30
Post-History:
Resolution:https://mail.python.org/pipermail/core-workflow/2016-January/000345.html

Abstract

This PEP proposes migrating the repository hosting of CPython and the supporting repositories to Git. Further, it proposes adopting a hosted GitLab instance as the primary way of handling merge requests, code reviews, and code hosting. It is similar in intent to PEP 481 but proposes an open source alternative to GitHub and omits the proposal to run Phabricator. As with PEP 481, this particular PEP is offered as an alternative to PEP 474 and PEP 462.

Rationale

CPython is an open source project which relies on a number of volunteers donating their time. As with any healthy, vibrant open source project, it relies on attracting new volunteers as well as retaining existing developers. Given that volunteer time is the most scarce resource, providing a process that maximizes the efficiency of contributors and reduces the friction for contributions, is of vital importance for the long-term health of the project.

The current tool chain of the CPython project is a custom and unique combination of tools. This has two critical implications:

  • The unique nature of the tool chain means that contributors must remember or relearn, the process, workflow, and tools whenever they contribute to CPython, without the advantage of leveraging long-term memory and familiarity they retain by working with other projects in the FLOSS ecosystem. The knowledge they gain in working with CPython is unlikely to be applicable to other projects.
  • The burden on the Python/PSF infrastructure team is much greater in order to continue to maintain custom tools, improve them over time, fix bugs, address security issues, and more generally adapt to new standards in online software development with global collaboration.

These limitations act as a barrier to contribution both for highly engaged contributors (e.g. core Python developers) and especially for more casual "drive-by" contributors, who care more about getting their bug fix than learning a new suite of tools and workflows.

By proposing the adoption of both a different version control system and a modern, well-maintained hosting solution, this PEP addresses these limitations. It aims to enable a modern, well-understood process that will carry CPython development for many years.

Version Control System

Currently the CPython and supporting repositories use Mercurial. As a modern distributed version control system, it has served us well since the migration from Subversion. However, when evaluating the VCS we must consider the capabilities of the VCS itself as well as the network effect and mindshare of the community around that VCS.

There are really only two real options for this, Mercurial and Git. The technical capabilities of the two systems are largely equivalent, therefore this PEP instead focuses on their social aspects.

It is not possible to get exact numbers for the number of projects or people which are using a particular VCS, however we can infer this by looking at several sources of information for what VCS projects are using.

The Open Hub (previously Ohloh) statistics [1] show that 37% of the repositories indexed by The Open Hub are using Git (second only to Subversion which has 48%) while Mercurial has just 2%, beating only Bazaar which has 1%. This has Git being just over 18 times as popular as Mercurial on The Open Hub.

Another source of information on VCS popularity is PyPI itself. This source is more targeted at the Python community itself since it represents projects developed for Python. Unfortunately PyPI does not have a standard location for representing this information, so this requires manual processing. If we limit our search to the top 100 projects on PyPI (ordered by download counts) we can see that 62% of them use Git, while 22% of them use Mercurial, and 13% use something else. This has Git being just under 3 times as popular as Mercurial for the top 100 projects on PyPI.

These numbers back up the anecdotal evidence for Git as the far more popular DVCS for open source projects. Choosing the more popular VCS has a number of positive benefits.

For new contributors it increases the likelihood that they will have already learned the basics of Git as part of working with another project or if they are just now learning Git, that they'll be able to take that knowledge and apply it to other projects. Additionally a larger community means more people writing how to guides, answering questions, and writing articles about Git which makes it easier for a new user to find answers and information about the tool they are trying to learn and use. Given its popularity, there may also be more auxiliary tooling written around Git. This increases options for everything from GUI clients, helper scripts, repository hosting, etc.

Further, the adoption of Git as the proposed back-end repository format doesn't prohibit the use of Mercurial by fans of that VCS! Mercurial users have the [2] plugin which allows them to push and pull from a Git server using the Mercurial front-end. It's a well-maintained and highly functional plugin that seems to be well-liked by Mercurial users.

Repository Hosting

Where and how the official repositories for CPython are hosted is in someways determined by the choice of VCS. With Git there are several options. In fact, once the repository is hosted in Git, branches can be mirrored in many locations, within many free, open, and proprietary code hosting sites.

It's still important for CPython to adopt a single, official repository, with a web front-end that allows for many convenient and common interactions entirely through the web, without always requiring local VCS manipulations. These interactions include as a minimum, code review with inline comments, branch diffing, CI integration, and auto-merging.

This PEP proposes to adopt a [3] instance, run within the python.org domain, accessible to and with ultimate control from the PSF and the Python infrastructure team, but donated, hosted, and primarily maintained by GitLab, Inc.

Why GitLab? Because it is a fully functional Git hosting system, that sports modern web interactions, software workflows, and CI integration. GitLab's Community Edition (CE) is open source software, and thus is closely aligned with the principles of the CPython community.

Code Review

Currently CPython uses a custom fork of Rietveld modified to not run on Google App Engine and which is currently only really maintained by one person. It is missing common features present in many modern code review tools.

This PEP proposes to utilize GitLab's built-in merge requests and online code review features to facilitate reviews of all proposed changes.

GitLab merge requests

The normal workflow for a GitLab hosted project is to submit a merge request asking that a feature or bug fix branch be merged into a target branch, usually one or more of the stable maintenance branches or the next-version master branch for new features. GitLab's merge requests are similar in form and function to GitHub's pull requests, so anybody who is already familiar with the latter should be able to immediately utilize the former.

Once submitted, a conversation about the change can be had between the submitter and reviewer. This includes both general comments, and inline comments attached to a particular line of the diff between the source and target branches. Projects can also be configured to automatically run continuous integration on the submitted branch, the results of which are readily visible from the merge request page. Thus both the reviewer and submitter can immediately see the results of the tests, making it much easier to only land branches with passing tests. Each new push to the source branch (e.g. to respond to a commenter's feedback or to fix a failing test) results in a new run of the CI, so that the state of the request always reflects the latest commit.

Merge requests have a fairly major advantage over the older "submit a patch to a bug tracker" model. They allow developers to work completely within the VCS using standard VCS tooling, without requiring the creation of a patch file or figuring out the right location to upload the patch to. This lowers the barrier for sending a change to be reviewed.

Merge requests are far easier to review. For example, they provide nice syntax highlighted diffs which can operate in either unified or side by side views. They allow commenting inline and on the merge request as a whole and they present that in a nice unified way which will also hide comments which no longer apply. Comments can be hidden and revealed.

Actually merging a merge request is quite simple, if the source branch applies cleanly to the target branch. A core reviewer simply needs to press the "Merge" button for GitLab to automatically perform the merge. The source branch can be optionally rebased, and once the merge is completed, the source branch can be automatically deleted.

GitLab also has a good workflow for submitting pull requests to a project completely through their web interface. This would enable the Python documentation to have "Edit on GitLab" buttons on every page and people who discover things like typos, inaccuracies, or just want to make improvements to the docs they are currently reading. They can simply hit that button and get an in browser editor that will let them make changes and submit a merge request all from the comfort of their browser.

Criticism

X is not written in Python

One feature that the current tooling (Mercurial, Rietveld) has is that the primary language for all of the pieces are written in Python. This PEP focuses more on the best tools for the job and not necessarily on the best tools that happen to be written in Python. Volunteer time is the most precious resource for any open source project and we can best respect and utilize that time by focusing on the benefits and downsides of the tools themselves rather than what language their authors happened to write them in.

One concern is the ability to modify tools to work for us, however one of the Goals here is to not modify software to work for us and instead adapt ourselves to a more standardized workflow. This standardization pays off in the ability to re-use tools out of the box freeing up developer time to actually work on Python itself as well as enabling knowledge sharing between projects.

However, if we do need to modify the tooling, Git itself is largely written in C the same as CPython itself. It can also have commands written for it using any language, including Python. GitLab itself is largely written in Ruby and since it is Open Source software, we would have the ability to submit merge requests to the upstream Community Edition, albeit in language potentially unfamiliar to most Python programmers.

Mercurial is better than Git

Whether Mercurial or Git is better on a technical level is a highly subjective opinion. This PEP does not state whether the mechanics of Git or Mercurial are better, and instead focuses on the network effect that is available for either option. While this PEP proposes switching to Git, Mercurial users are not left completely out of the loop. By using the hg-git extension for Mercurial, working with server-side Git repositories is fairly easy and straightforward.

CPython Workflow is too Complicated

One sentiment that came out of previous discussions was that the multi-branch model of CPython was too complicated for GitLab style merge requests. This PEP disagrees with that sentiment.

Currently any particular change requires manually creating a patch for 2.7 and 3.x which won't change at all in this regards.

If someone submits a fix for the current stable branch (e.g. 3.5) the merge request workflow can be used to create a request to merge the current stable branch into the master branch, assuming there is no merge conflicts. As always, merge conflicts must be manually and locally resolved. Because developers also have the option of performing the merge locally, this provides an improvement over the current situation where the merge must always happen locally.

For fixes in the current development branch that must also be applied to stable release branches, it is possible in many situations to locally cherry pick and apply the change to other branches, with merge requests submitted for each stable branch. It is also possible just cherry pick and complete the merge locally. These are all accomplished with standard Git commands and techniques, with the advantage that all such changes can go through the review and CI test workflows, even for merges to stable branches. Minor changes may be easily accomplished in the GitLab web editor.

No system can hide all the complexities involved in maintaining several long lived branches. The only thing that the tooling can do is make it as easy as possible to submit and commit changes.

Open issues

  • What level of hosted support will GitLab offer? The PEP author has been in contact with the GitLab CEO, with positive interest on their part. The details of the hosting offer would have to be discussed.
  • What happens to Roundup and do we switch to the GitLab issue tracker? Currently, this PEP is not suggesting we move from Roundup to GitLab issues. We have way too much invested in Roundup right now and migrating the data would be a huge effort. GitLab does support webhooks, so we will probably want to use webhooks to integrate merges and other events with updates to Roundup (e.g. to include pointers to commits, close issues, etc. similar to what is currently done).
  • What happens to wiki.python.org? Nothing! While GitLab does support wikis in repositories, there's no reason for us to migration our Moin wikis.
  • What happens to the existing GitHub mirrors? We'd probably want to regenerate them once the official upstream branches are natively hosted in Git. This may change commit ids, but after that, it should be easy to mirror the official Git branches and repositories far and wide.
  • Where would the GitLab instance live? Physically, in whatever hosting provider GitLab chooses. We would point gitlab.python.org (or git.python.org?) to this host.

References

[1]Open Hub Statistics <https://www.openhub.net/repositories/compare>
[2]Hg-Git mercurial plugin <https://hg-git.github.io/>
[3]https://about.gitlab.com/
Source: https://github.com/python/peps/blob/master/pep-0507.txt