[Python-Dev] Looking for VCS usage scenarios (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Tue Nov 4 06:21:08 CET 2008


Brett Cannon writes:

I have yet to have met anyone who thinks git is great while having used another DVCS as extensively (and I mean I have never found someone who has used two DVCSs extensively).

When XEmacs was considering changing from CVS, I used Darcs as my primary VCS for about 4 months, including a mammoth (>1MB patch) merge. Since Dec 2007, Mercurial has been the official XEmacs VCS. Nowadays I'm more management than developer but I love git, and will not use either Darcs or Mercurial for any project where git is an option. (Somebody else did the work of moving the CVS history, so they got to choose Mercurial -- in hindsight, it would have been worth doing the work....) I don't know if that counts as "extensive".

I like git because (1) I like the model of exposing the commit DAG directly as a graph of objects. (2) It's very fast. (3) It does not promote a particular style of development. Both merging parallel branch tips and rebasing to serialize branches on the trunk are well-supported. (Mercurial and especially Bazaar do support the merging style better than git does.) (4) Branching is cheap and fast. I typically have a subbranch for almost every typo/minor fix I do in a working branch, which I then cherrypick into the mainline. (This workflow avoids merge conflicts due to cherrypicking typo fixes directly from the subbranch. Mercurial makes such cherrypicking relatively inconvenient, and I often make mistakes and commit too much in the wrong branch. In Darcs this can be very painful because of dependencies the cherrypick drags in.) Switching branches is a sub-second operation until the diff gets to be about 15-20 files. (5) All branches are explicit. You commit to the current branch. (6) Files to commit must be named in the commit command, marked with an add command, or included via the --all option. (7) A fairly natural, if ugly, syntax for specifying revisions, ranges, and various operations on ranges in log and diff commands. No "revision numbers" that vary randomly according to workspace.

I dislike Darcs because (1) The DAG is implicit. (2) It's slow. (3) I never know what I'll get when I ask to pull a single patch; Darcs's criteria for dependency are opaque, at least to me. (4) It's hard to script and really likes to be used interactively.

I dislike Mercurial because (1) It strongly encourages a commit-then-merge style which results in a large number of "merge turds" in the history. Since most "merges" succeed because the changes are in different files, these are very annoying to me. (2) The default revision numbering typically results in rather bizarre diffs near merges, but there is no easy way to specify a particular parent (except the first) without looking up the log. (3) Commits everything in the workspace by default. (4) Commit is silent by default, so you don't realize how much you have committed until you push ... and have succeeded so you can no longer roll it back safely. (5) Creates new branches without being asked, which then need to be merged, thus strongly encouraging the commit-then-merge style. (6) I don't trust its compute-ancestors-separately-per-file merge algorithm. If this really works, there's nothing wrong in principle with CVS! (7) A lot of features require plugins, and the result is command proliferation, though unlike git only "porcelain" is exposed.

I haven't used Bazaar beyond "bzr pull" of Mailman once a week or so, so I don't dislike it. Things I have observed or have seen discussed on the bazaar mailing list that you might want to consider: (1) The UI is as baroque as git's, once you consider all the plugins and GUIs that are available. Lots of different workspace styles (ordinary branching, stacked branching, looms -- similar to quilts?, lightweight checkouts, ...) are supported with a corresponding increase in subcommand count and/or options. (2) New repo formats are added frequently, and taking advantage of new features often requires upgrading your repo format. So-called lightweight checkouts can be especially annoying as they involve leaving the history on the server, making distributed work problematic. (3) Bazaar is very good at supporting the kind of refactoring that involves lots of file/directory renames and/or splitting/combining. (4) Bazaar is claimed to have especially good merging support. (5) Bazaar has an idiosyncratic log format that displays branches and merges "nicely" by choosing a principal branch, and indenting subsidiary branches. This view changes depending on the repo, AIUI. Some people prefer to leave that to a separate command (a graphical DAG viewer or something like "git-show-branches"). (6) In some common use patterns (eg, "bzr log | less"), Bazaar currently does not scale.

. It is guaranteed to scale (unless Python gets to be significantly bigger and more active than Linux, at any rate) and it has a large, very technically capable, and supported user community already.

I think any of the DVCSs will scale. But I will be taking some performance numbers so scalability will be taken into consideration.

On the contrary. Bazaar is currently known not to scale, and the bazaar developers have a number of hypotheses about why, and are working hard on fixing the acknowledged problem. Emacs made the decision to use bzr "because it's a fellow GNU project" early this year, but they're still using CVS because of ongoing pushback against the performance problems of bzr. Let's put it this way: on my iBook G4, for the same Emacs repository (ie, containing the same subset of versions), "gitk" puts up the whole DAG in living color in about 10 seconds, while "bzr log" takes almost 5 minutes to return the first revision. There are workarounds, of course, but the default form of that command (and several others) is very slow in that repo.

My understanding is that to deal fully with these problems, the Bazaar developers plan to change the repo file format. Some progress has been made, (small) quantitative improvements have been made, but AFAIK bzr still has bad worst-case performance for some common operations on moderately large repos (way smaller than the Linux kernel).

Well, we will see, but as of right now my use of git has left a nasty taste in my mouth that will take a lot of proverbial mouthwash to get rid of and allow it to be considered in this PEP.

It's your PEP, but if you don't take git seriously, I expect a lot of people will refuse to take your PEP seriously.

N.B. It is not obvious that you or the PSF should cater to those people. It is relatively simple, though of course somewhat annoying and inconvenient, to set up a local bidirectional gateway between the "official" dVCS and one's preferred one. I think you probably do want a compromise that everybody will use, but you should keep the "keep your own repo in any format you want" alternative in mind as a way to gauge how much claimed pain you should acknowledge.



More information about the Python-Dev mailing list