[core-workflow] Some questions

Émanuel Barry vgr255 at live.ca
Sun May 8 21:10:29 EDT 2016


(I apologize for top-posting, I still haven’t figured out how to fix my email client)

 

There’s nearly 94k commits in the git repo, and I expect the hg repo has that same number. It’s a tad more than 10,000.

 

I’ll definitely take a look at that tool; my main weakness is that I don’t know hg commands or similar, but comparing separate commits is most definitely better.

 

@Ethan: I meant that I would write all the output to a file for comparison, but apparently that’s not a very good idea, so here I drop it instead.

 

I’ll look at the tool and see what I can do. I’ll try to document my findings if I can’t come up with a good solution, and probably even if I do.

 

Cheers,

-Emanuel

 

From: Senthil Kumaran [mailto:senthil at uthcode.com] 
Sent: Sunday, May 08, 2016 8:43 PM
To: Émanuel Barry
Cc: core-workflow
Subject: Re: [core-workflow] Some questions

 

Hi Émanuel,

 

On Sun, May 8, 2016 at 4:40 PM, Émanuel Barry <vgr255 at live.ca <mailto:vgr255 at live.ca> > wrote:

Take each X commit (say, every 100th or 1000th commit, or even every commit if we decide to be insane^Wprecise), store hashes of all files at that revision with possibly the file tree, in a .py file as a list or dict, or json or anything you prefer. Then I upload it for you to look at and you can compare with the mercurial repo. Or we run the same script on the mercurial repo and compare the resulting files.

 

If we store anything externally, that could start limiting us.

 

I looked at the problem in this angle - final cpython git repo has ~10000 commits in master branch. That's not a large number to deal with. The orginal hg repo should have exact number of commits. We have to do a diff between each of these commits, including merge commits. and check if contents of those commits are same, if we encounter anything where git-repo differs in content or history from hg-repo, we alert and fail.

 

Since this is a history checking operation and we could complete this in O(minutes) or ~1 hour to validate the repos. This will give us confidence on the migration, and will help us evaluate multiple hg -> git repos that have been migrated at different points in time.

 

This feature will go in this tool: https://github.com/orsenthil/cpython-hg-to-git , which we will use to migrate, sync, and validate hg->git repos.

If interested, you could research for efficient way to do the above operation and submit a pull request against that tool.

 

HTH,

Senthil

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/core-workflow/attachments/20160508/60934645/attachment.html>


More information about the core-workflow mailing list