[Pandas-dev] Thoughts on adopting a 1-PR-1-commit policy?

Wes McKinney wesmckinn at gmail.com
Sat Jan 16 18:20:17 EST 2016


I've grown very fond of the PR cherry-picking style used in many
Apache projects.

Here's an example of a very large commit to Apache Spark that was
performed in this fashion:

https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53#diff-e1e1d3d40573127e9ee0480caf1283d6

If you compare pandas's commit history with a project like this,
you'll see it is much easier to follow because there is one commit for
each patch to the project, rather than a merge commit plus 1 or more
merged commits (depending on whether the person merging the PR did an
interactive rebase).

The script to do this is not too complex, and is even less complex for
pandas because we do not use JIRA:

https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py

I've been using a pared down version of the script in Ibis:

https://github.com/cloudera/ibis/blob/master/dev/merge-pr.py

Here is an example of what a merge commit with multiple subcommits
looks like using this tool:

https://github.com/cloudera/ibis/commit/eafabe060dcaaea0a6076342eaa374929b91cf47

It's pretty easy to use: run the script and enter the PR # you are
merging. It automatically squashes and closes the merged PR.

Let me know if this is something that would interest the team. I know
there are varying opinions on the GitHub Green Button =)

- Wes


More information about the Pandas-dev mailing list