[Python-Dev] Source control tools

Thomas Wouters thomas at python.org
Mon Jun 12 23:31:14 CEST 2006


On 6/12/06, Guido van Rossum <guido at python.org> wrote:

> Perhaps issues like these should motivate us to consider a different
> source control tool. There's a new crop of tools out that could solve
> this by having multiple repositories that can be sync'ed with each
> other. This sounds like an important move towards world peace!


It would be an important move towards world peace, if it didn't inspire
whole new SCM-holy-wars :-)  I have a fair bit of experience with different
SCM (VC, source control tool, however you want to call them) so I'll take
this opportunity to toss up some observations. Not that switching to another
SCM will really solve the issues you are referring to, but I happen to think
switching to another SCM is  a great idea :) Although I don't see an obvious
candidate at the moment... I'll explain why.

First of all, changing SCM means changing how everyone works. It's nothing
like the CVS->Subversion switch, which changed very little in workflow. All
the cool SCMs use 'real branches', and to get full benefit you have to
switch your development to a 'branch oriented model', if you'll pardon the
buzzwordyness. At XS4ALL we've used BitKeeper for a few years now, and while
it really took a while for some of the developers to catch on, the branch
thing makes parallel development *much* easier. If you haven't experienced
it yourself, it's hard to see why it matters so much (except maybe in cases
with extreme parallel development, like the Linux kernel), but it really
does make life a lot easier in the long run, for programmers and release
managers.

Secondly, the way most of the 'less-linear' SCMs work is that everyone has
their own repository. That is, in fact, what makes it so useful -- no need
for central repository access, no need for a network connection for full
operability, no need for write access to get your changes checked in
(locally), and easy interchanging between repositories. With a large
(history-rich) project like Python, though, those repositories can get a tad
large. Most of the SCMs have ways to work around it (not downloading the
full history, side-storing the full history in a separate branch, etc) but
it's still an extra inconvenience. Now, me, I don't mind downloading a
couple hundred megabytes to get a 25Mb sourcetree with full history, but I
have a 1Gbit link to the python.org servers :) On the other hand, with most
of the SCMs, you only download that full history once, and then locally
branch off of that.

The real reason I think we should consider other SCMs is because I fear what
the history will look like when 3.0 is done. In Subversion, merges of
branches don't preserve history -- you have to do it yourself. Given the way
Subversion works, I don't think that'll really change; it's just not what
Subversion is meant to do (and that's fine.) It does mean, however, that
when we switch the trunk to 3.0, we have to decide between the history of
the trunk or the history of the p3yk branch. We either merge the p3yk branch
into the trunk, making a single huge checkin message explaining all the
changes (or not), or we swap the trunk and the p3yk branch. The former means
'svn blamelog', for instance, will show a single revision and a single
author for *all* p3yk changes. The latter means 'svn blamelog' will group
the trunk changes into the merge commits you can already see on the
python-3000-checkins list: a block of merges at a time, based on whenever I
(or someone else) has the time to merge the trunk in. So, in that case, 'svn
blamelog' will show *me* as author of all 2.5-to-2.7 changes, at a time the
original change didn't go in, with log messages that are largely irrelevant
;-) And the mess gets bigger if part of p3yk or trunk's development is done
in other branches -- svnmerge log messages hidden in svnmerge log messages.
ugh.

Before XS4ALL switched to BitKeeper, I spent quite a while examining
different SCMs, but at the time, they just weren't at the stage of
development you'd trust your company development to (not if you can afford
BitKeeper anyway ;)  After (re-)experiencing the pain that is
Subversion/CVS-style branching with the p3yk branch and the manual trunk
merges, I went ahead and checked out the current state of the alternatives.
There are quite a few, now (Monotone, Darcs, Git/Cogito, Mercurial,
Arch/tla/Bazaar, Bazaar-NG, Arx, CodeVille, SVK) and I haven't had time to
give them all the in-depth examination they are worthy of, but so far it
looks like only a few of them currently scale well enough to handle a large
(history-rich) project like Python. Not that it's fair to expect them to
scale well, mind you, given that most of them are only a few years old and
most don't claim to be "1.0".

Using 'tailor' ( http://www.darcs.net/DarcsWiki/Tailor ) I imported the
Python sourcetree with full history into Darcs and Git. I also did a partial
import into Monotone (history going back a few months) -- the Monotone
import was a lot slower than the others, and I didn't feel like waiting a
week. I then made a branch of each and imported the p3yk branch into them
(using some hand-crafting of tailor's internal data, as it doesn't seem to
support branch-imports at the moment.) Darcs was being troublesome at that
point, but I haven't spent the time to figure out if it was something I or
tailor did wrong in the import. As I said, Monotone was rather slow, which
is not surprising considering it does a lot of signing of digital
certificates. I personally like Monotone a lot, because its central
branch-database is the 'next step up' from what most SCMs do and because I
really like the cryptographic aspect, but it's probably too complex for
Python. Git, the 'low level SCM' developed for the Linux kernel, is
incredibly fast in normal operation and does branches quite well. I suspect
much of its speed is lost on non-Linux platforms, though, and I don't know
how well it works on Windows ;)

I did partial imports into Mercurial and Bazaar-NG, but I got interrupted
and couldn't draw any conclusions -- although from looking at the
implementation, I don't think they'd scale very well at the moment (but that
could probably be fixed.) I should also look at SVK (a BitKeeper-style SCM
ontop of Subversion, really), but tailor doesn't support it (yet) and my
last look was just too depressing to cope with manually importing Python.

In short[*], I don't see an immediate candidate for an alternate SCM for
Python (although Git is sexy), but there's lots of long-term possibilities.
I intend to keep my Git and Darcs repositories up to date (it's little
effort to make tailor update them), tailorize Mercurial, Bazaar-NG, (full)
Monotone and probably others, tailorize some branches as well, and publish
them somewhere, hopefully with instructions, observations and honest
opinions (I just need to find the right place to host it, as e.g. Monotone
really wants a separate daemon process to run.)  I also intend to do my own
p3yk development in one of those SCMs; I can just export patches and apply
them to SVN when they're ready ;P I would like to hear if others have any
interest at all in this, though, if anything to keep me motivated during the
tediously long tailorizing runs :)

Oh, and in case it matters to people: tailor, Mercurial and Bazaar-NG are
written in Python.

Blurt-blurt'ly y'rs,

[*] Short? This whole mail was short! I can talk for hours about the benefit
of proper branches and what kind of stuff is easier, better and more
efficient with them. I can draw huge ASCII diagrams explaining the
fundamental difference between CVS/SVN,
BitKeeper/Arch/Darcs/Bazaar-NG/Mercurial and Monotone (yes, that's three
groups), and I have powerpoint(!) sheets I used to give a presentation about
how and why BitKeeper works at work. It's probably a bit off-topic here,
though, so don't tempt me ;P
-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060612/b22b1dfb/attachment.htm 


More information about the Python-Dev mailing list