PyPy has moved to Git, GitHub (original) (raw)

PyPy has moved its canonical repo and issue tracker fromhttps://foss.heptapod.net/pypy/pypy to https://github.com/pypy/pypy. Obviously, this means development will now be tracked in Git rather than Mercurial.

Motivation

We still feel Mercurial is a better version control system. The named branch model and user interface are superior. But

Method

The migration required two parts: migrating the code and then migrating the issues and merge requests.

Code migration 1: code and notes

I used a fork of git-remote-hg to create a local Git repo with all the changesets. Then I wanted to add a Git note to each commit with the branch it came from. So I prepared a file with two columns: the Git commit hash, and the corresponding branch from Mercurial. Mercurial can describe each commit in two ways: either the commit hash or by a number index. I used hg log to convert an index i to the Mercurial hash, and then git-hg-helper from git-remote-hg to convert the Mercurial hash to a Git hash: (cdpypy−git;git−hg−helpergit−rev(cd pypy-git; git-hg-helper git-rev (cdpypygit;githghelpergitrev(cd ../pypy-hg; hg log -r $i -T"{node}\n"))

Then I used hg log again to print the Mercurial branch for the index i: (cdpypy−hg;hglog−r(cd pypy-hg; hg log -r (cdpypyhg;hglogri -T'{branch}\n')

Putting these two together, I could loop over all the commits by their numerical index to prepare the file. Then I iterated over each line in the file, and added the Git note. Since the git note add command works on the current HEAD, I needed to checkout each commit in turn and then add the note:

git checkout -q && git notes --ref refs/notes/branch add -m branch:

I could then use git push --all to push to GitHub.

Code migration 2: prepare the branches

PyPy has almost 500 open branches. The code migration created all the branch HEADs, but git push --all did not push them. I needed to check them out and push each one. So I created a file with all the branch names

cd pypy-hg; hg branches | cut -f1 -d" " > branches.txt

and then push each one to the GitHub repo

while read branch; do git checkout branches/$branch && git push origin branches/$branch; done < branches.txt

Note that the branches were named branches/XXX by the migration, not branch/XXX. This confuses the merge request migration, more about that later.

Issue and merge request migration

I used the solution fromnode-gitlab-2-github which worked almost perfectly. It is important to do the conversion on a private repo otherwise every mention of a successfully mapped user name notifies the user about the transfer. This can be quite annoying for a repo the size of PyPy with 600 merge requests and over 4000 issues. Issues transferred without a problem: the script properly retained the issue numbers. However the script does not convert the Mercurial hashes to Git hashes, so the bare hashes in comments show up without a link to the commit. Merge requests are more of a problem:

Layered conversions

PyPy already migrated once from Bitbucket to Heptapod. Many of the issues reflect the multiple transitions: they have lines like "Created originally on Bitbucket by XXX" from the first transition, and an additional line "In Heptapod" from this transition.

Credits

We would like to express our gratitude to the Octobusteam who support Heptapod. The transition from Bitbucket was quite an effort, and they have generously hosted our development since then. We wish them all the best, and still believe that Mercurial should have "won".

Next steps

While the repo at GitHub is live, there are still a few more things we need to do:

Additionally, now is the time for all of you to prove the move is worthwhile:

How will development change?

Heptapod did not allow personal forks, so we were generous with a commit bit to the main repo. Additionally, we (well, me) have been using a commit-directly-to-main workflow. We will now be adopting a more structured workflow. Please fork the repo and submit a pull request for any changes. We can now add some pre-merge CI to check that the PR at least passes the first stage of translation. The live and active branches will be:

Working between the repos

Finding commits

If you want to figure out how a Mercurial commit relates to a Git commit, you can use git-hg-helper. You run it in the Git repo. It takes the full long hash from one repo and gives you the corresponding hash of the other repo:

$ git-hg-helper git-rev d64027c4c2b903403ceeef2c301f5132454491df 4527e62ad94b0e940a5b0f9f20d29428672f93f7 $ git-hg-helper hg-rev 4527e62ad94b0e940a5b0f9f20d29428672f93f7 d64027c4c2b903403ceeef2c301f5132454491df

Finding branches

Branches migrated from Mercurial will have a branches prefix, not branch. While GitLab uses branch for its prefix, the git-remote-hg script usesbranches. New work should be in a PR targeting main, py3.9 or py3.10.

Thanks for helping to make PyPy better.

Matti

Update

In the meantime we found out that unfortunately something went wrong in the migration of the issues. The old issue 3655 got lost in the migration. This means that after number 3655 the numbers are different between github and heptapod, with heptapod being one larger. E.g. issue 3700 on heptapod is issue 3699 on github. We are investigating options.