[Python-Dev] Looking for VCS usage scenarios (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Thu Nov 6 02:36:28 CET 2008


In what follows, caveat IANB (I am not Brett, and neither is Cosmin), but there is some experience with these systems, and my recommendations are based on that.

Cosmin Stejerean writes:

On Nov 5, 2008, at 12:16 PM, skip at pobox.com wrote:

What DVCS fits my poor brain best? I feel I'm like a dinosaur not being able to figure out how I'm supposed to contribute changes to the system.

You need not feel that way. It's not you---the flexibility of dVCS means that until the Powers That Be promulgate a Workflow, this will be ambiguous.

This is part of the purpose of the PEP. We[1] will be presenting the 5-finger exercises required to accomplish typical (and perhaps some not-so-typical) tasks, as well as benchmarks for the various systems.

Do I:

  • commit my changes to some central branch?

Call this the "record && commit to authoritative" workflow.

Not exactly. If you had commit access to the central repository you
could commit then push, which would be the DVCS equivalent of
committing to a central branch.

The workflow where general contributors commit directly to the trunk surely won't be used in Python, because of the instability it would cause. It would be possible to have a staging branch for this purpose, but IMO that's not a very effective use of a dVCS.[2]

It is useful to avoid the term "commit" here because its semantics vary across systems. As Cosmin points out, in a dVCS, what is accomplished by "vc commit" in CVS is done as "vc commit; vc push". I use the terminology "record" for the action of adding a workspace- based patch or snapshot to a repository. "push" (and "pull") move content between repositories. Unfortunately "commit" is the name of the record command in most dVCSes, so this terminology probably won't catch on.

Also, when talking about "where to commit" in terms of communication among developers, you should probably refer to storage locations as "repositories". "Branch" is another term that has varying semantics in different VCSes. In some systems (git) it is reasonable to think of repositories containing more than one branch, and branches as existing in more than one repository (but this isn't quite robust in git because branch names are just names, not first-class objects). In others (Darcs is the extreme) repository == branch == workspace.

(I'm trying to get permission to publish a 3rd party's draft document that goes into these issues in detail; here I just want to raise awareness that the intuitions that go with CVS/Subversion usage of various terms is not always going to carry over to dVCSes.)

  • commit my changes locally then create diffs I then submit to the tracker?

"Record && patch" workflow.

Possible.

But again not very effective. Under a dVCS I believe these patches will languish in the tracker as they do today, unless tools are written to automatically pull them into a repo somewhere.

  • commit locally then push them somewhere?

"Record && push to candidate" workflow.

If we go with Bazaar, this is very likely to occur, especially if Canonical's launchpad is the host. This is what Linux kernel does on git.kernel.org as well, if I understand their workflow correctly, and what github helps to support. I imagine Mercurial has an equivalent but I'm not familiar with it.

  • commit locally then ask someone to pull?

"Record && request pull" workflow.

Often preferred way to submit patches, as you can continue to maintain
the patch locally against newer versions of trunk so that the patch is
not obsolete by the time people finally get around to it.

I disagree. This doesn't scale to Python size. For distributed VC to work, somebody has to maintain a repo 24x7. Python has to do this for the trunk; the additional burden for contributed patches is not great. There is no real advantage to having contributors do so, too.[3] Integrators and interested third parties also must keep track of contributor's repo URLs. (Cf. Skip's question about discovering repos.) Not happy stuff.

The "record && push" workflow scales much better for numbers of contributors, as each contributor needs only to maintain one "push" URL, and integrators only one "pull" base URL.

  • Not commit anything anywhere but just submit patches to the
    tracker?

"Patch from workspace" workflow.

Likely possible, but it's good to have the patch committed locally so
you can modify it or continue to build upon it until it gets accepted.

The same considerations as "record && patch" also apply here.

In addition:

  • Will there be a central repository?

Generally there should be a central authoritative repository where
people can turn to for the official version.

Ie, "yes". There's no point in a PEP unless there's going to be a central repo and a defined workflow for getting contributions into it.

Note that you can always maintain your own local repo with dVCS.

  • How will I know which of possibly many repos is "authoritative"?

The authoritative repo should generally be linked to from the website
so that people can easily find it.

That depends. The notion of "authoritative" gets weakened in a distributed system, and probably more important is "which repo will be used to make the next official release".

However, although I can't say what the mechanism will be, be sure you will not have a problem learning which is authoritative for the trunk or where to find RCs and releases. (If you do, it's a doc problem and it will be fixed quickly.)

You may have more trouble with third-party patches gotten from third-party repos. GNU Arch has a system for handling this (patch names contain the originating repo). That was one of the first things the Bazaar people discarded from Arch, though. Darcs has something similar, but again Darcs is not a candidate here. I think for such "maverick" contributions there will never really be a substitute for watching the ML and tracker like a hawk.

  • How will I discover other repos? For example, if the safethread stuff is sitting somewhere in a repository can I find it on my own somehow?

I'm not aware of any decentralized system for discovering
repositories. Something like github or bitbucket could be used which
help you discover repositories, but a wiki page with a list of
alternative repositories and their purpose should suffice.

Most likely the repos you care about will be hosted in a central location, and will be browsable from a single base URL. See http://git.kernel.org/ for the git version of a browser. Mercurial and Bazaar support similar facilities either in the VCS itself or in an easily available add-on. Fancier support is available via systems like github and Launchpad.

If you care about minor, locally hosted, patches, you'll have to follow the tracker and mailing list closely and grep out the URLs to them. However, systems like Launchpad and github make it feasible to have branches for a single patch.

GNU Arch also had some systems for third-party repo discovery and maintenance of a database of them, but they ended up creating a "Supermirror" which gave a similar workflow to Launchpad/github et al. So I think you can probably discount the existence or development of such a discovery system.

  • Will a DVCS allow simpler operation as if we are still using a centralized system like CVS or Subversion?

Yes and no. There is nothing to prevent a formal workflow like that in CVS/Subversion. However, the separation of "commit" into "record && push to authoritative" leaves open the possibility of annoying glitches until you get used to it, and even then it's easy to forget to push or to forget that you've committed not-for-pushing stuff, etc, etc. In practice it is probably simpler to use a dVCS-specialized workflow like "record && push to candidate".

Footnotes: [1] I have time constraints which may not be acceptable to Brett, but he's offered to delegate the presentation of git to me, and those of hg and bzr to others.

[2] This is how XEmacs and Scheme48 do it.

[3] The insurance value is small. Integrators and other interested parties will surely be keeping local copies of almost all branches; if the central repo were to be wiped out, most branches will be recoverable from those widely distributed copies.



More information about the Python-Dev mailing list