[core-workflow] Help needed: best way to convert hg repos to git?

Martin Panter vadmium+py at gmail.com
Fri Feb 12 01:04:19 EST 2016


On 12 February 2016 at 03:07, Brett Cannon <brett at python.org> wrote:
> On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez <nicolas.alvarez at gmail.com>
> wrote:
>> I tried fast-export, and I don't really see anything wrong with the
>> repository. The size is 221MB.

One thing I’m slightly curious about is how much the result differs
from <https://github.com/python/cpython> or other results, and if so,
what the differences are. The differences could be serious (mangled
history), or they could be trivial things like stripping trailing
newlines from commit messages, or skipping commits that don’t change
any files.

>> It depends on how crazy you want to go. For example, SVN-era merges
>> don't appear as merges, but looks like some SVN-era branches don't
>> exist in Hg to begin with (Would I need to get cpython-fullhistory?
>> Cloning it gives me a 400 Bad Request). Do we care about that?
>
> Good question. If you are not an even clone it then that shows how much
> people who are. Honestly I wouldn't worry since we have the history in the
> hg repo (converting from svn was necessary to have it available without the
> server).

I care a bit. If I get the time, I would like to figure out a robust
way to convert the Subversion history to Git so that the svnmerge
information is included as proper merges.

Another concern for me is that some of the useful history is not even
in Mercurial. For example <https://hg.python.org/lookup/r70152> is an
svnmerge from ^/python/branches/io-c into ^/python/branches/py3k, but
the Mercurial repository doesn’t have the branch history, so all the
merged-in Subversion revisions such as r68683 are missing.

Some other highlights on my quest to investigate the holy Subversion
respository (I can post my full notes somewhere if ppl are
interested):

* It is nice to have a local mirror of the Subversion repository so
that experimenting with different options and programs isn’t horribly
slow. But I don’t want to mirror everything or overload the server
because there are other projects stored in the repository that seem to
take up a lot of space (and download time).

* What is the story with the cpython-fullhistory Mercurial repository?
On the surface it almost looks like an out-of-date copy of the main
repository, but I notice some subtle differences, e.g. revision ids
for early tags are different, v1.0.0 tag is added.

* Some Subversion revisions actually merge stuff from outside the
Python tree (e.g. <https://hg.python.org/lookup/r88662> from
^/sandbox/trunk/2to3/lib2to3 into
^/branches/release27-maint/Lib/lib2to3. Not sure if it is worth trying
to salvage these merges; I never noticed them when working on Python.

>> Or, changes that come from non-committers could have their Author
>> field modified, maybe based on the ACKS file modification. It's
>> feasible but will take time and manual work. Do we care about that?
>
> That would be great but too much effort.

I think it would not be worth it, and could even be detrimenal. You
would be trying to guess based on incomplete and unreliable
information. Maybe one person wrote a test, another wrote the
implementation, and a third wrote the documentation, but it was all
committed at once. Maybe the author was already in ACKS and the
committer did not mention who the author was in the message. I think
it is safer to not pretend the author field is alway accurate.


More information about the core-workflow mailing list