Highlights from Git 2.28 (original) (raw)

The open source Git project just released Git 2.28 with features and bug fixes from over 58 contributors, 13 of them new. We last caught up with you on the latest in Git back when 2.26 was released. Here’s a look at some of the most interesting features and changes introduced since then.

Introducing init.defaultBranch

When you initialize a new Git repository from scratch with git init, Git has always created an initial first branch with the name master. In Git 2.28, a new configuration option, init.defaultBranch is being introduced to replace the hard-coded term. (For more background on this change, this statement from the Software Freedom Conservancy is an excellent place to look).

Starting in Git 2.28, git init will instead look to the value of init.defaultBranch when creating the first branch in a new repository. If that value is unset, init.defaultBranch defaults to master. Here, it’s important to note that:

  1. This configuration variable can be set by the user, and overriding the default value is as easy as:
$ git config --global init.defaultBranch main  
  1. This configuration variable only affects new repositories, and does not cause branches in existing projects to be renamed. git clone will also continue to respect the HEAD of the repository you’re cloning from, so you won’t see a change in branch names until a maintainer initiates one.

This change supports the many communities, both on GitHub and in the wider Git community, who are considering renaming the default branch name of their repository from master.

To learn more about the complementary changes GitHub is making, see github/renaming. GitLab and Bitbucket are also making similar changes.

[source]

Changed-path Bloom filters

In Git 2.27, the commit-graph file format was extended to store changed-path Bloom filters. What does all of that mean? In a sense, this new information helps Git find points in history that touched a given path much more quickly (for example, git log -- <path>, or git blame). Git 2.28 takes advantage of these optimizations to deliver a handful of sizeable performance improvements.

Before we get into all of that, it’s worth taking a refresher through commit graphs whether you’re new to the concept, or familiar with them. (If you are familiar, and want to take a deeper dive, check out this blog post explaining all of the juicy technical details).
In the very simplest terms, the commit-graph file stores information about commits. In essence, the commit-graph acts like a cache for commonly-accessed information about commits: who their parent(s) are, what their root tree is, and things like that. It also stores computed information, too, like a commit’s generation number, and changed-path Bloom filters (more on that in just a moment).

Why store all of this information? To understand the answer to this, it is helpful to have a cursory understanding of how Git stores objects. Git stores objects in one of two ways: either as a loose object (in which case the object’s contents are stored in a single file unique to that object on disk), or as a packed object (in which case the object is assembled from a compressed format in a *.pack file). No matter which way a commit is stored, we still have to parse and decompress it before its fields like “root tree” and “parents” can be accessed.

With a commit-graph file, all of that information is immediate: for a given commit C, Git knows exactly where to look in a commit-graph file for all of those fields that we store, and can read them off immediately, no decompression or piecing together required. This can shave some time off your usual Git operations by itself, but where the commit-graph really shines is in the computed data it stores.

Generation numbers are a sort of reachability index that can help Git answer questions about things like reachability and topological ordering very quickly. Since generation numbers aren’t new in this release (and trying to explain them quickly would lose a lot of the benefit of a more careful exposition), I’ll refer you instead to this blog post by freshly-minted Hubber Derrick Stolee on the matter.

What’s new in 2.28?

OK, if you’ve made it this far, you’ve got a pretty good handle on what commit graphs are, and what they’re useful for. Now, let’s get to the juicy details. In Git 2.27, the commit-graph file learned how to store changed-path Bloom filters. What are changed-path Bloom filters, you ask? A Bloom filter is a probabilistic set; that is it’s a set of items, but querying that set for the presence of some item x returns either “x is definitely not in this set” or “x might be in this set”, but never “x is definitely in this set”. The commit-graph stores one of these Bloom filters for commits that reside in the commit-graph, and it populates that Bloom filter with a list of paths changed between that commit and its first parent.

These Bloom filters are a huge boon for performance in lots of Git commands. The general pattern is something like: if you have a Git command that computes diffs (which can sometimes be proportionally expensive), then having Bloom filters allows Git to compute far fewer diffs by skipping the computation for certain commits when their Bloom filters return “definitely not” for paths of interest.

Take git log -- /path/to/file, for example. Prior to Git 2.27, git log would have to compute a diff over every revision in its walk before determining whether or not to show it (i.e., whether or not that diff has any entries for /path/to/file). In Git 2.27 and newer, Git can skip computing many of those diffs altogether by consulting each commit C‘s changed-path Bloom filter and querying it for /path/to/file. Again: if querying returns “definitely not”, then Git knows that computing that diff is strictly uninteresting.

Because computing diffs between commits can be expensive (at least, relative to the complexity of the algorithm for which they are being generated), reducing the number of diffs computed overall can greatly improve performance.

To try this for yourself, you can run the command:

$ git commit-graph write --reachable --changed-paths

This generates a commit-graph file with changed path Bloom filters enabled.[1] You should be able to see performance improvements in commands like git log -- <path>, git log -L, git blame, and anything else that computes first-parent diffs against a given pathspec.

[source, source, source]

Tidbits

Now that we’ve talked about a few of the headlining changes from the past couple of releases, let’s look at a few more new features 🔎

$ git log --oneline --graph --show-pulls -- <path>  

[source]

The rest of the iceberg

That’s just a sample of changes from the latest couple of releases. For more, check out the release notes for 2.27 and 2.28, or any previous version in the Git repository.

[1]: Note that since Bloom filters are not persisted automatically (that is, you have to pass --changed-paths explicitly on each subsequent write), it is a good idea to disable configuration that automatically generates commit-graphs, like fetch.writeCommitGraph and gc.writeCommitGraph.

Written by

Taylor Blau

Taylor Blau is a Staff Software Engineer at GitHub where he works on Git.

Explore more from GitHub

Docs

Docs

Everything you need to master GitHub, all in one place.

Go to Docs

GitHub

GitHub

Build what’s next on GitHub, the place for anyone from anywhere to build anything.

Start building

Customer stories

Customer stories

Meet the companies and engineering teams that build with GitHub.

Learn more

Enterprise content

Enterprise content

Executive insights, curated just for you

Get started