Version Control Software

Jason Swails jason.swails at gmail.com
Sun Jun 16 12:39:30 EDT 2013


On Sun, Jun 16, 2013 at 9:30 AM, Chris “Kwpolska” Warrick <
kwpolska at gmail.com> wrote:

> On Sun, Jun 16, 2013 at 1:14 AM, Chris Angelico <rosuav at gmail.com> wrote:
> > Hmm. ~/cpython/.hg is 200MB+, but ~/pike/.git is only 86MB. Does
> > Mercurial compress its content? A tar.gz of each comes down, but only
> > to ~170MB and ~75MB respectively, so I'm guessing the bulk of it is
> > already compressed. But 200MB for cpython seems like a lot.
>
> Next time, do a more fair comparison.
>
> I created an empty git and hg repository, and created a file promptly
> named “file” with DIGIT ONE (0x31; UTF-8/ASCII–encoded) and commited
> it with “c1” as the message, then I turned it into “12” and commited
> as “c2” and did this one more time, making the file “123” at commit
> named “c3”.
>
> [kwpolska at kwpolska-lin .hg at default]% cat * */* */*/* 2>/dev/null | wc -c
> 1481
> [kwpolska at kwpolska-lin .git at master]% cat * */* */*/* */*/*/* 2>/dev/null
> | wc -c
> 16860 ← WRONG!
>
> There is just one problem with this: an empty git repository starts at
> 15216 bytes, due to some sample hooks.  Let’s remove them and try
> again:
>
> [kwpolska at kwpolska-lin .git at master]% rm hooks/*
> [kwpolska at kwpolska-lin .git at master]% cat * */* */*/* */*/*/* */*/*/*
> 2>/dev/null | wc -c
> 2499
>
> which is a much more sane number.  This includes a config file (in the
> ini/configparser format) and such.  According to my maths skils (or
> rather zsh’s skills), new commits are responsible for 1644 bytes in
> the git repo and 1391 bytes in the hg repo.
>

This is not a fair comparison, either.  If we want to do a fair comparison
pertinent to this discussion, let's convert the cpython mercurial
repository into a git repository and allow the git repo to repack the diffs
the way it deems fit.

I'm using the git-remote-hg.py script [
https://github.com/felipec/git/blob/fc/master/contrib/remote-helpers/git-remote-hg.py]
to clone a mercurial repo into a native git repo.  Then, in one of the rare
cases, using git gc --aggressive. [1]

The result:

Git:
cpython_git/.git $ du -h --max-depth=1
40K ./hooks
145M ./objects
20K ./logs
24K ./refs
24K ./info
146M .

Mercurial:
cpython/.hg $ du -h --max-depth=1
209M ./store
20K ./cache
209M .


And to help illustrate the equivalence of the two repositories:

Git:

cpython_git $ git log | head; git log | tail

commit 78f82bde04f8b3832f3cb6725c4bd9c8d705d13b
Author: Brett Cannon <brett at python.org>
Date:   Sat Jun 15 23:24:11 2013 -0400

    Make test_builtin work when executed directly

commit a7b16f8188a16905bbc1d49fe6fd940078dd1f3d
Merge: 346494a af14b7c
Author: Gregory P. Smith <greg at krypto.org>
Date:   Sat Jun 15 18:14:56 2013 -0700
Author: Guido van Rossum <guido at python.org>
Date:   Mon Sep 10 11:15:23 1990 +0000

    Warning about incompleteness.

commit b5e5004ae8f54d7d5ddfa0688fc8385cafde0e63
Author: Guido van Rossum <guido at python.org>
Date:   Thu Aug 9 14:25:15 1990 +0000

    Initial revision

Mercurial:

cpython $ hg log | head; hg log | tail

changeset:   84163:5b90da280515
bookmark:    master
tag:         tip
user:        Brett Cannon <brett at python.org>
date:        Sat Jun 15 23:24:11 2013 -0400
summary:     Make test_builtin work when executed directly

changeset:   84162:7dee56b6ff34
parent:      84159:5e8b377942f7
parent:      84161:7e06a99bb821
user:        Guido van Rossum <guido at python.org>
date:        Mon Sep 10 11:15:23 1990 +0000
summary:     Warning about incompleteness.

changeset:   0:3cd033e6b530
branch:      legacy-trunk
user:        Guido van Rossum <guido at python.org>
date:        Thu Aug 09 14:25:15 1990 +0000
summary:     Initial revision

They both appear to have the same history.  In this particular case, it
seems that git does a better job in terms of space management, probably due
to the fact that it doesn't store duplicate copies of identical source code
that appears in different files (it tracks content, not files).

That being said, from what I've read both git and mercurial have their
advantages, both in the performance arena and the features/usability arena
(I only know how to really use git).  I'd certainly take a DVCS over a
centralized model any day.

All the best,
Jason

[1] I know I just posted in this thread about --aggressive being bad, but
the packing from the translation was horrible --> the translated git repo
was ~2 GB in size.  An `aggressive' repacking was necessary to allow git to
decide how to pack the diffs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20130616/e6c96365/attachment.html>


More information about the Python-list mailing list