Highlights from Git 2.52 (original) (raw)

The open source Git project just released Git 2.52 with features and bug fixes from over 94 contributors, 33 of them new. We last caught up with you on the latest in Git back when 2.51 was released.

To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Tree-level blame information

If you’re a seasoned Git user, then you are no doubt familiar with git blame, Git’s tool for figuring out which commit most recently modified each line at a given filepath. Git’s blame functionality is great for figuring out when a bug was introduced, or why some code was written the way it was.

If you want to know which commit last modified any portion of a given filepath, that’s easy enough to do with git log -1 -- path/to/my/file, since -1 will give us only the first commit which modifies that path. But what if instead you want to know which commit most recently modified every file in some directory? Answering that question may seem contrived, but it’s not. If you’ve ever looked at a repository’s file listing on GitHub, the middle column of information has a link to the commit which most recently modified that path, along with (part of) its commit message.

Screenshot of GitHub's repository file.

GitHub’s repository file listing, showing tree-level blame information.

The question remains: how do we efficiently determine which commit most recently modified each file in a given directory? You could imagine that you might enumerate each tree entry, feeding it to git log -1 and collecting the output there, like so:

$ git ls-tree -z --name-only HEAD^{tree} | xargs -0 -I{} sh -c '
    git log -1 --format="$1 %h %s" -- $1
  ' -- {}  | column -t -l3 
.cirrus.yml     1e77de10810  ci: update FreeBSD image to 14.3
.clang-format   37215410730  clang-format: exclude control macros from SpaceBeforeParens
.editorconfig   c84209a0529  editorconfig: add .bash extension
.gitattributes  d3b58320923  merge-file doc: set conflict-marker-size attribute
.github         5db9d35a28f  Merge branch 'js/ci-github-actions-update'
[...]

That works, but not efficiently. To see why, consider a case with files A, B, and C introduced by commits C1, C2, and C3, respectively. To blame A, we walk from C3 back to C1 in order to determine that C1 was the most recent commit to modify A. That traversal passed through C2 and C3, but since we were only looking for modifications to A, we’ll end up revisiting those commits when trying to blame B and C. In this example, we visit those three commits six times in total, which is twice the necessary number of history traversals.

Git 2.52 introduces a new command which comes up with the same information in a fraction of the time: git last-modified. To get a sense for how much faster last-modified is than the example above, here are some hyperfine results:

Benchmark 1: git ls-tree + log
  Time (mean ± σ):      3.962 s ±  0.011 s    [User: 2.676 s, System: 1.330 s]
  Range (min … max):    3.940 s …  3.984 s    10 runs

Benchmark 2: git last-modified
  Time (mean ± σ):     722.7 ms ±   4.6 ms    [User: 682.4 ms, System: 40.1 ms]
  Range (min … max):   717.3 ms … 731.3 ms    10 runs

Summary
  git last-modified ran
    5.48 ± 0.04 times faster than git ls-tree + log

The core functionality behind git last-modified was written by GitHub over many years (originally called blame-tree in GitHub’s fork of Git), and is what has powered our tree-level blame since 2012. Earlier this year, we shared those patches with engineers at GitLab, who tidied up years of development into a reviewable series of patches which landed in this release.

There are still some features in GitHub’s version of this command that have yet to make their way into a Git release, including an on-disk format to cache the results of previous runs. In the meantime, check out git last-modified, available in Git 2.52.

[source, source, source]

Advanced repository maintenance strategies

Returning readers of this series may recall our coverage of the git maintenance command. If this is your first time reading along, or you could use a refresher, we’ve got you covered.

git maintenance is a Git command which can perform repository housekeeping tasks either on a scheduled or ad-hoc basis. The maintenance command can perform a variety of tasks, like repacking the contents of your repository, updating commit-graphs, expiring stale reflog entries, and much more. Put together, maintenance ensures that your repository continues to operate smoothly and efficiently.

By default (or when running the gc task), git maintenance relies on git gc internally to repack your repository, and remove any unreachable objects. This has a couple of drawbacks, namely that git gc performs “all-into-one” repacks to consolidate the contents of your repository, which can be sluggish for very large repositories. As an alternative, git maintenance has an incremental-repack strategy, but this never prunes out any unreachable objects.

Git 2.52 bridges this gap by introducing a new geometric task within git maintenance that avoids all-into-one repacks when possible, and prunes unreachable objects on a less frequent basis. This new task uses tools (like geometric repacking) that were designed at GitHub and have powered GitHub’s own repository maintenance for many years. Those tools have been in Git since 2.33, but were awkward to use or discover since their implementation was buried within git repack, not git gc.

The geometric task here works by inspecting the contents of your repository to determine if we can combine some number of packfiles to form a geometric progression by object count. If it can, it performs a geometric repack, condensing the contents of your repository without pruning any objects. Alternatively, if a geometric repack would pack the entirety of your repository into a single pack, then a full git gc is performed instead, which consolidates the contents of your repository and prunes out unreachable objects.

Git 2.52 makes it a breeze to keep even your largest repositories running smoothly. Check out the new geometric strategy, or any of the many other capabilities of git maintenance can do in 2.52.

[source]


The tip of the iceberg…

Now that we’ve covered some of the larger changes in more detail, let’s take a closer look at a selection of some other new features and updates in this release.

$ keys='layout.bare layout.shallow object.format references.format'  
$ git repo info $keys  
layout.bare=false  
layout.shallow=false  
object.format=sha1  
references.format=files  

The new git repo command can also print out some general statistics about your repository’s structure and contents via its git repo structure sub-command:

$ git repo structure  
Counting objects: 497533, done.  
| Repository structure | Value  |  
| -------------------- | ------ |  
| * References         |        |  
|   * Count            |   2871 |  
|     * Branches       |     58 |  
|     * Tags           |   1273 |  
|     * Remotes        |   1534 |  
|     * Others         |      6 |  
|                      |        |  
| * Reachable objects  |        |  
|   * Count            | 497533 |  
|     * Commits        |  91386 |  
|     * Trees          | 208050 |  
|     * Blobs          | 197103 |  
|     * Tags           |    994 |  

[source, source, source]

…the rest of the iceberg

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.52, or any previous version in the Git repository.

Written by

Taylor Blau

Taylor Blau is a Principal Software Engineer at GitHub where he works on Git.