[Python-Dev] Looking for VCS usage scenarios

Stephen J. Turnbull stephen at xemacs.org
Tue Nov 4 06:21:08 CET 2008


Brett Cannon writes:

 > I have yet to have met anyone who thinks git is great while having
 > used another DVCS as extensively (and I mean I have never found
 > someone who has used two DVCSs extensively).

When XEmacs was considering changing from CVS, I used Darcs as my
primary VCS for about 4 months, including a mammoth (>1MB patch)
merge.  Since Dec 2007, Mercurial has been the official XEmacs VCS.
Nowadays I'm more management than developer but I love git, and will
not use either Darcs or Mercurial for any project where git is an
option.  (Somebody else did the work of moving the CVS history, so
they got to choose Mercurial -- in hindsight, it would have been worth
doing the work....)  I don't know if that counts as "extensive".

I like git because
(1) I like the model of exposing the commit DAG directly as a graph of
    objects.
(2) It's very fast.
(3) It does not promote a particular style of development.  Both
    merging parallel branch tips and rebasing to serialize branches on
    the trunk are well-supported.  (Mercurial and especially Bazaar do
    support the merging style better than git does.)
(4) Branching is cheap and fast.  I typically have a subbranch for
    almost every typo/minor fix I do in a working branch, which I then
    cherrypick into the mainline.  (This workflow avoids merge
    conflicts due to cherrypicking typo fixes directly from the
    subbranch.  Mercurial makes such cherrypicking relatively
    inconvenient, and I often make mistakes and commit too much in the
    wrong branch.  In Darcs this can be very painful because of
    dependencies the cherrypick drags in.)  Switching branches is a
    sub-second operation until the diff gets to be about 15-20 files.
(5) All branches are explicit.  You commit to the current branch.
(6) Files to commit must be named in the commit command, marked with
    an add command, or included via the --all option.
(7) A fairly natural, if ugly, syntax for specifying revisions,
    ranges, and various operations on ranges in log and diff
    commands.  No "revision numbers" that vary randomly according to
    workspace.

I dislike Darcs because
(1) The DAG is implicit.
(2) It's slow.
(3) I never know what I'll get when I ask to pull a single patch;
    Darcs's criteria for dependency are opaque, at least to me.
(4) It's hard to script and really likes to be used interactively.

I dislike Mercurial because
(1) It strongly encourages a commit-then-merge style which results in
    a large number of "merge turds" in the history.  Since most
    "merges" succeed because the changes are in different files, these
    are very annoying to me.
(2) The default revision numbering typically results in rather bizarre
    diffs near merges, but there is no easy way to specify a
    particular parent (except the first) without looking up the log.
(3) Commits everything in the workspace by default.
(4) Commit is silent by default, so you don't realize how much you
    have committed until you push ... and have succeeded so you can no
    longer roll it back safely.
(5) Creates new branches without being asked, which then need to be
    merged, thus strongly encouraging the commit-then-merge style.
(6) I don't trust its compute-ancestors-separately-per-file merge
    algorithm.  If this really works, there's nothing wrong in
    principle with CVS!
(7) A lot of features require plugins, and the result is command
    proliferation, though unlike git only "porcelain" is exposed.

I haven't used Bazaar beyond "bzr pull" of Mailman once a week or so,
so I don't dislike it.  Things I have observed or have seen discussed
on the bazaar mailing list that you might want to consider:
(1) The UI is as baroque as git's, once you consider all the plugins
    and GUIs that are available.  Lots of different workspace styles
    (ordinary branching, stacked branching, looms -- similar to
    quilts?, lightweight checkouts, ...) are supported with a
    corresponding increase in subcommand count and/or options.
(2) New repo formats are added frequently, and taking advantage of new
    features often requires upgrading your repo format.  So-called
    lightweight checkouts can be especially annoying as they involve
    leaving the history on the server, making distributed work
    problematic.
(3) Bazaar is very good at supporting the kind of refactoring that
    involves lots of file/directory renames and/or splitting/combining.
(4) Bazaar is claimed to have especially good merging support.
(5) Bazaar has an idiosyncratic log format that displays branches and
    merges "nicely" by choosing a principal branch, and indenting
    subsidiary branches.  This view changes depending on the repo,
    AIUI.  Some people prefer to leave that to a separate command
    (a graphical DAG viewer or something like "git-show-branches").
(6) In some common use patterns (eg, "bzr log | less"), Bazaar
    currently does not scale.

 > >.  It is guaranteed to scale (unless Python gets to be
 > > significantly bigger and more active than Linux, at any rate) and it has
 > > a large, very technically capable, and supported user community already.
 > 
 > I think any of the DVCSs will scale. But I will be taking some
 > performance numbers so scalability will be taken into consideration.

On the contrary.  Bazaar is currently known *not* to scale, and the
bazaar developers have a number of hypotheses about why, and are
working hard on fixing the acknowledged problem.  Emacs made the
decision to use bzr "because it's a fellow GNU project" early this
year, but they're still using CVS because of ongoing pushback against
the performance problems of bzr.  Let's put it this way: on my iBook
G4, for the same Emacs repository (ie, containing the same subset of
versions), "gitk" puts up the whole DAG in living color in about 10
seconds, while "bzr log" takes almost 5 minutes to return the *first*
revision.  There are workarounds, of course, but the default form of
that command (and several others) is very slow in that repo.

My understanding is that to deal fully with these problems, the Bazaar
developers plan to change the repo file format.  Some progress has
been made, (small) quantitative improvements have been made, but AFAIK
bzr still has bad worst-case performance for some common operations on
moderately large repos (way smaller than the Linux kernel).

 > Well, we will see, but as of right now my use of git has left a nasty
 > taste in my mouth that will take a lot of proverbial mouthwash to get
 > rid of and allow it to be considered in this PEP.

It's your PEP, but if you don't take git seriously, I expect a lot of
people will refuse to take your PEP seriously.

N.B.  It is *not* obvious that you or the PSF should cater to those
people.  It is relatively simple, though of course somewhat annoying
and inconvenient, to set up a local bidirectional gateway between the
"official" dVCS and one's preferred one.  I think you probably do want
a compromise that everybody will use, but you should keep the "keep
your own repo in any format you want" alternative in mind as a way to
gauge how much claimed pain you should acknowledge.<wink>



More information about the Python-Dev mailing list