[Python-Dev] Fwd: PEP: Migrating the Python CVS to Subversion

Daniel Berlin dberlin at dberlin.org
Mon Aug 15 00:25:02 CEST 2005


On Sun, 2005-08-14 at 23:58 +0200, "Martin v. Löwis" wrote:
> Guido van Rossum wrote:
> > Here's another POV.
> 
> I think I agree with Daniel's view, in particular wrt. to performance.
> Whatever the replacement tool, it should perform as well or better
> than CVS currently does; it also shouldn't perform much worse than
> subversion.

Then, in fairness, I should note that annotate is slower on subversion
(and monotone, and anything using binary deltas) than CVS.

This is because you can't generate line-diffs that annotate wants from
binary copy + add diffs.  You have to reconstruct the actual revisions
and then line diff them.    Thus, CVS is O(N) here, and SVN and other
binary delta users are  O(N^2).

You wouldn't really notice the speed difference when you are annotating
a file with 100 revisions.  You would if you annotate the 800k changelog
which has 30k trunk revisions.  CVS takes 4 seconds, svn takes ~5
minutes, the whole time being spent in doing diffs of those revisions.
I rewrote the blame algorithm recently so that it will only take about 2
minutes on changelog, but it cheats because it knows it can stop early
because it's blamed all the revisions (since our changelog rotates).

For those curious, you also can't directly generate "always-correct"
byte-level differences from the diffs, since their goal is to find the
most space efficient way to transform rev old into rev new, *not* record
actual byte-level changes that occurred between old and new.  It may
turn out that doing an add of 2 bytes is cheaper than specifying the
opcode for copy(start,len).  Actual diffs are produced by reproducing
the texts and line diffing them.  Such is the cost of efficient
storage :).

> 
> I've been using git (or, rather, cogito) to keep up-to-date with the
> Linux kernel. While performance of git is really good, storage
> requirements are *quite* high, and initial "checkout" takes a long
> time - even though the Linux kernel repository stores virtual no
> history (there was a strict cut when converting the bitkeeper HEAD).
> So these distributed tools would cause quite some disk consumption
> on client machines. bazaar-ng apparently supports only-remote
> repositories as well, so that might be no concern.

The argument "network and disk is cheap" doesn't work for us when you
are talking 5-10 gigabytes of initial transfer :).  However, I doubt
it's more than a hundred meg or so for python, if that.

You may run into these problems in 10 years :)








More information about the Python-Dev mailing list