|Author:||Nick Coghlan <ncoghlan at gmail.com>|
|Post-History:||19-Jul-2014, 08-Jan-2015, 01-Feb-2015|
- Intended Benefits
- Sustaining Engineering Considerations
- Personal Motivation
- Technical Concerns and Challenges
- Pilot Objectives and Timeline
- Future Implications for CPython Core Development
This PEP proposes setting up a new PSF provided resource, forge.python.org, as a location for maintaining various supporting repositories (such as the repository for Python Enhancement Proposals) in a way that is more accessible to new contributors, and easier to manage for core developers.
This PEP does not propose any changes to the core development workflow for CPython itself (see PEP 462 in relation to that).
This PEP proposes that an instance of the self-hosted Kallithea code repository management system be deployed as "forge.python.org".
Individual repositories (such as the developer guide or the PEPs repository) may then be migrated from the existing hg.python.org infrastructure to the new forge.python.org infrastructure on a case by case basis. Each migration will need to decide whether to retain a read-only mirror on hg.python.org, or whether to just migrate wholesale to the new location.
In addition to supporting read-only mirrors on hg.python.org, forge.python.org will also aim to support hosting mirrors on popular proprietary hosting sites like GitHub and BitBucket. The aim will be to allow users familiar with these sites to submit and discuss pull requests using their preferred workflow, with forge.python.org automatically bringing those contributions over to the master repository.
Given the availability and popularity of commercially backed "free for open source projects" repository hosting services, this would not be a general purpose hosting site for arbitrary Python projects. The initial focus will be specifically on CPython and other repositories currently hosted on hg.python.org. In the future, this could potentially be expanded to consolidating other PSF managed repositories that are currently externally hosted to gain access to a pull request based workflow, such as the repository for the python.org Django application. As with the initial migrations, any such future migrations would be considered on a case-by-case basis, taking into account the preferences of the primary users of each repository.
Currently, hg.python.org hosts more than just the core CPython repository, it also hosts other repositories such as those for the CPython developer guide and for Python Enhancement Proposals, along with various "sandbox" repositories for core developer experimentation.
While the simple "pull request" style workflow made popular by code hosting sites like GitHub and BitBucket isn't adequate for the complex branching model needed for parallel maintenance and development of the various CPython releases, it's a good fit for several of the ancillary projects that surround CPython that we don't wish to move to a proprietary hosting site.
The key requirements proposed for a PSF provided software forge are:
- MUST support simple "pull request" style workflows
- MUST support online editing for simple changes
- MUST be backed by an active development organisation (community or commercial)
Additional recommended requirements that are satisfied by this proposal, but may be negotiable if a sufficiently compelling alternative is presented:
- SHOULD support self-hosting on PSF infrastructure without ongoing fees
- SHOULD be a fully open source application written in Python
- SHOULD support Mercurial (for consistency with existing tooling)
- SHOULD support Git (to provide that option to users that prefer it)
- SHOULD allow users of git and Mercurial clients to transparently collaborate on the same repository
- SHOULD be open to customisation to meet the needs of CPython core development, including providing a potential path forward for the proposed migration to a core reviewer model in PEP 462
The preference for self-hosting without ongoing fees rules out the free-as-in-beer providers like GitHub and BitBucket, in addition to the various proprietary source code management offerings.
The preference for Mercurial support not only rules out GitHub, but also other Git-only solutions like GitLab and Gitorious.
The hard requirement for online editing support rules out the Apache Allura/HgForge combination.
The preference for a fully open source solution rules out RhodeCode.
Of the various options considered by the author of this proposal, that leaves Kallithea SCM as the proposed foundation for a forge.python.org service.
Kallithea is a full GPLv3 application (derived from the clearly and unambiguously GPLv3 licensed components of RhodeCode) that is being developed under the auspices of the Software Freedom Conservancy . The Conservancy has affirmed that the Kallithea codebase is completely and validly licensed under GPLv3. In addition to their role in building the initial Kallithea community, the Conservancy is also the legal home of both the Mercurial and Git projects. Other SFC member projects that may be familiar to Python users include Twisted, Gevent, BuildBot and PyPy.
The primary benefit of deploying Kallithea as forge.python.org is that supporting repositories such as the developer guide and the PEP repo could potentially be managed using pull requests and online editing. This would be much simpler than the current workflow which requires PEP editors and other core developers to act as intermediaries to apply updates suggested by other users.
The richer administrative functionality would also make it substantially easier to grant users access to particular repositories for collaboration purposes, without having to grant them general access to the entire installation. This helps lower barriers to entry, as trust can more readily be granted and earned incrementally, rather than being an all-or-nothing decision around granting core developer access.
Even with its current workflow, CPython itself remains one of the largest open source projects in the world (in the top 2% of projects tracked on OpenHub). Unfortunately, we have been significantly less effective at encouraging contributions to the projects that make up CPython's workflow infrastructure, including ensuring that our installations track upstream, and that wherever feasible, our own customisations are contributed back to the original project.
As such, a core component of this proposal is to actively engage with the upstream Kallithea community to lower the barriers to working with and on the Kallithea SCM, as well as with the PSF Infrastructure team to ensure the forge.python.org service integrates cleanly with the PSF's infrastructure automation.
This approach aims to provide a number of key benefits:
- allowing those of us contributing to maintenance of this service to be as productive as possible in the time we have available
- offering a compelling professional development opportunity to those volunteers that choose to participate in maintenance of this service
- making the Kallithea project itself more attractive to other potential users by making it as easy as possible to adopt, deploy and manage
- as a result of the above benefits, attracting sufficient contributors both in the upstream Kallithea community, and within the CPython infrastructure community, to allow the forge.python.org service to evolve effectively to meet changing developer expectations
Some initial steps have already been taken to address these sustaining engineering concerns:
- Tymoteusz Jankowski has been working with Donald Stufft to work out what would be involved in deploying Kallithea using the PSF's Salt based infrastructure automation.
- Graham Dumpleton and I have been working on making it easy to deploy demonstration Kallithea instances to the free tier of Red Hat's open source hosting service, OpenShift Online. (See the comments on that post, or the quickstart issue tracker for links to Graham's follow on work)
The next major step to be undertaken is to come up with a local development workflow that allows contributors on Windows, Mac OS X and Linux to run the Kallithea tests locally, without interfering with the operation of their own system. The currently planned approach for this is to focus on Vagrant, which is a popular automated virtual machine management system specifically aimed at developers running local VMs for testing purposes. The Vagrant based development guidelines for OpenShift Origin provide an extended example of the kind of workflow this approach enables. It's also worth noting that Vagrant is one of the options for working with a local build of the main python.org website .
If these workflow proposals end up working well for Kallithea, they may also be worth proposing for use by the upstream projects backing other PSF and CPython infrastructure services, including Roundup, BuildBot, and the main python.org web site.
As several aspects of this proposal and PEP 462 align with various workflow improvements under consideration for Red Hat's Beaker open source hardware integration testing system and other work-related projects, I have arranged to be able to devote ~1 day a week to working on CPython infrastructure projects.
Together with Rackspace's existing contributions to maintaining the pypi.python.org infrastructure, I personally believe this arrangement is indicative of a more general recognition amongst CPython redistributors and major users of the merit in helping to sustain upstream infrastructure through direct contributions of developer time, rather than expecting volunteer contributors to maintain that infrastructure entirely in their spare time or funding it indirectly through the PSF (with the additional management overhead that would entail). I consider this a positive trend, and one that I will continue to encourage as best I can.
As of March 2015, having moved from Boeing Defence Australia (where I had worked since September 1998) to Red Hat back in June 2011 , I now work for Red Hat as a software development workflow designer and process architect, focusing on the open source cross-platform Atomic Developer Bundle , which is part of the tooling ecosystem for the Project Atomic container hosting platform. Two of the key pieces of that bundle will be familiar to many readers: Docker for container management, and Vagrant for cross-platform local development VM management.
However, rather than being a developer for the downstream Red Hat Enterprise Linux Container Development Kit, I work with the development teams for a range of Red Hat's internal services, encouraging the standardisation of internal development tooling and processes on the Atomic Developer Bundle, contributing upstream as required to ensure it meets our needs and expectations. As with other Red Hat community web service development projects like PatternFly , this approach helps enable standardisation across internal services, community projects, and commercial products, while still leaving individual development teams with significant scope to appropriately prioritise their process improvement efforts by focusing on the limitations currently causing the most difficulties for them and their users.
In that role, I'll be focusing on effectively integrating the Developer Bundle with tools and technologies used across Red Hat's project and product portfolio. As Red Hat is an open source system integrator, that means touching on a wide range of services and technologies, including GitHub, GerritHub, standalone Gerrit, GitLab, Bugzilla, JIRA, Jenkins, Docker, Kubernetes, OpenShift, OpenStack, oVirt, Ansible, Puppet, and more.
However, as noted above in the section on sustaining engineering considerations, I've also secured agreement to spend a portion of my work time on similarly applying these cross platforms tools to improving the developer experience for the maintenance of Python Software Foundation infrastructure, starting with this proposal for a Kallithea-based forge.python.org service.
Between them, my day job and my personal open source engagement have given me visibility into a lot of what the popular source code management services do well and what they do poorly. While Kallithea certainly has plenty of flaws of its own, it's the one I consider most fixable from a personal perspective, as it allows me to get directly involved in tailoring it to meet the needs of the CPython core development community in a way that wouldn't be possible with a proprietary service like GitHub or BitBucket, or practical with a PHP-based service like Phabricator or a Ruby-based service like GitLab.
Introducing a new service into the CPython infrastructure presents a number of interesting technical concerns and challenges. This section covers several of the most significant ones.
The default position of this PEP is that the new forge.python.org service will be integrated into the existing PSF Salt infrastructure and hosted on the PSF's Rackspace cloud infrastructure.
However, other hosting options will also be considered, in particular, possible deployment as a Kubernetes hosted web service on either Google Container Engine or the next generation of Red Hat's OpenShift Online service, by using either GCEPersistentDisk or the open source GlusterFS distributed filesystem to hold the source code repositories.
Ongoing infrastructure maintenance is an area of concern within the PSF, as we currently lack a system administrator mentorship program equivalent to the Fedora Infrastructure Apprentice or GNOME Infrastructure Apprentice programs.
Instead, systems tend to be maintained largely by developers as a part time activity on top of their development related contributions, rather than seeking to recruit folks that are more interested in operations (i.e. keeping existing systems running well) than they are in development (i.e. making changes to the services to provide new features or a better user experience, or to address existing issues).
While I'd personally like to see the PSF operating such a program at some point in the future, I don't consider setting one up to be a feasible near term goal. However, I do consider it feasible to continue laying the groundwork for such a program by extending the PSF's existing usage of modern infrastructure technologies like OpenStack and Salt to cover more services, as well as starting to explore the potential benefits of containers and container platforms when it comes to maintaining and enhancing PSF provided services.
I also plan to look into the question of whether or not an open source cloud management platform like ManageIQ may help us bring our emerging "cloud sprawl" problem across Rackspace, Google, Amazon and other services more under control.
Ideally we'd like to be able to offer a single account that spans all python.org services, including Kallithea, Roundup/Rietveld, PyPI and the back end for the new python.org site, but actually implementing that would be a distinct infrastructure project, independent of this PEP. (It's also worth noting that the fine-grained control of ACLs offered by such a capability is a prerequisite for setting up an effective system administrator mentorship program )
For the initial rollout of forge.python.org, we will likely create yet another identity silo within the PSF infrastructure. A potentially superior alternative would be to add support for python-social-auth to Kallithea, but actually doing so would not be a requirement for the initial rollout of the service (the main technical concern there is that Kallithea is a Pylons application that has not yet been ported to Pyramid, so integration will require either adding a Pylons backend to python-social-auth, or else embarking on the Pyramid migration in Kallithea).
Kallithea provides configurable issue tracker integration. This will need to be set up appropriately to integrate with the Roundup issue tracker at bugs.python.org before the initial rollout of the forge.python.org service.
The initial rollout of forge.python.org would support publication of read-only mirrors, both on hg.python.org and other services, as that is a relatively straightforward operation that can be implemented in a commit hook.
While a highly desirable feature, accepting pull requests on external services, and mirroring them as submissions to the master repositories on forge.python.org is a more complex problem, and would likely not be included as part of the initial rollout of the forge.python.org service.
Kallithea's native support for both Git and Mercurial offers an opportunity to make it relatively straightforward for developers to use the client of their choice to interact with repositories hosted on forge.python.org.
This transparent interoperability does not exist yet, but running our own multi-VCS repository hosting service provides the opportunity to make this capability a reality, rather than passively waiting for a proprietary provider to deign to provide a feature that likely isn't in their commercial interest. There's a significant misalignment of incentives between open source communities and commercial providers in this particular area, as even though offering VCS client choice can significantly reduce community friction by eliminating the need for projects to make autocratic decisions that force particular tooling choices on potential contributors, top down enforcement of tool selection (regardless of developer preference) is currently still the norm in the corporate and other organisational environments that produce GitHub and Atlassian's paying customers.
Prior to acceptance, in the absence of transparent interoperability, this PEP should propose specific recommendations for inclusion in the CPython developer's guide section for git users for creating pull requests against forge.python.org hosted Mercurial repositories.
This proposal is part of Brett Cannon's current evaluation of improvement proposals for various aspects of the CPython development workflow. Key dates in that timeline are:
- Feb 1: Draft proposal published (for Kallithea, this PEP)
- Apr 8: Discussion of final proposals at Python Language Summit
- May 1: Brett's decision on which proposal to accept
- Sep 13: Python 3.5 released, adopting new workflows for Python 3.6
If this proposal is selected for further development, it is proposed to start with the rollout of the following pilot deployment:
- a reference implementation operational at kallithea-pilot.python.org, containing at least the developer guide and PEP repositories. This will be a "throwaway" instance, allowing core developers and other contributors to experiment freely without worrying about the long term consequences for the repository history.
- read-only live mirrors of the Kallithea hosted repositories on GitHub and BitBucket. As with the pilot service itself, these would be temporary repos, to be discarded after the pilot period ends.
- clear documentation on using those mirrors to create pull requests against Kallithea hosted Mercurial repositories (for the pilot, this will likely not include using the native pull request workflows of those hosted services)
- automatic linking of issue references in code review comments and commit messages to the corresponding issues on bugs.python.org
- draft updates to PEP 1 explaining the Kallithea based PEP editing and submission workflow
The following items would be needed for a production migration, but there doesn't appear to be an obvious way to trial an updated implementation as part of the pilot:
- adjusting the PEP publication process and the developer guide publication process to be based on the relocated Mercurial repos
The following items would be objectives of the overall workflow improvement process, but are considered "desirable, but not essential" for the initial adoption of the new service in September (if this proposal is the one selected and the proposed pilot deployment is successful):
- allowing the use of python-social-auth to authenticate against the PSF hosted Kallithea instance
- allowing the use of the GitHub and BitBucket pull request workflows to submit pull requests to the main Kallithea repo
- allowing easy triggering of forced BuildBot runs based on Kallithea hosted repos and pull requests (prior to the implementation of PEP 462 , this would be intended for use with sandbox repos rather than the main CPython repo)
The workflow requirements for the main CPython development repository are significantly more complex than those for the repositories being discussed in this PEP. These concerns are covered in more detail in PEP 462 .
Given Guido's recommendation to replace Rietveld with a more actively maintained code review system, my current plan is to rewrite that PEP to use Kallithea as the proposed glue layer, with enhanced Kallithea pull requests eventually replacing the current practice of uploading patche files directly to the issue tracker.
I've also started working with Pierre Yves-David on a custom Mercurial extension that automates some aspects of the CPython core development workflow.
This document has been placed in the public domain.