[core-workflow] Some questions

Senthil Kumaran senthil at uthcode.com
Sun May 8 20:43:12 EDT 2016


Hi Émanuel,

On Sun, May 8, 2016 at 4:40 PM, Émanuel Barry <vgr255 at live.ca> wrote:

> Take each X commit (say, every 100th or 1000th commit, or even every
> commit if we decide to be insane^Wprecise), store hashes of all files at
> that revision with possibly the file tree, in a .py file as a list or dict,
> or json or anything you prefer. Then I upload it for you to look at and you
> can compare with the mercurial repo. Or we run the same script on the
> mercurial repo and compare the resulting files.


If we store anything externally, that could start limiting us.

I looked at the problem in this angle - final cpython git repo has ~10000
commits in master branch. That's not a large number to deal with. The
orginal hg repo should have exact number of commits. We have to do a diff
between each of these commits, including merge commits. and check if
contents of those commits are same, if we encounter anything where git-repo
differs in content or history from hg-repo, we alert and fail.

Since this is a history checking operation and we could complete this in
O(minutes) or ~1 hour to validate the repos. This will give us confidence
on the migration, and will help us evaluate multiple hg -> git repos that
have been migrated at different points in time.

This feature will go in this tool:
https://github.com/orsenthil/cpython-hg-to-git , which we will use to
migrate, sync, and validate hg->git repos.
If interested, you could research for efficient way to do the above
operation and submit a pull request against that tool.

HTH,
Senthil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/core-workflow/attachments/20160508/6d87d0dd/attachment-0001.html>


More information about the core-workflow mailing list