How Git Works Under the Hood? (original) (raw)

Last Updated : 16 Jan, 2026

Git is a distributed version control system that manages code changes and collaboration, and understanding how Git commands work internally helps developers use Git more confidently and effectively.

Git Concepts

Git Concepts provide the foundational understanding of how Git tracks changes, manages versions, and enables collaboration in software development.

Snapshots vs. Deltas

Git uses a snapshot-based approach by saving the entire project state at each commit instead of file differences.

Git's Three States

Git operates with three main states that files can reside in:

Git's Architecture

Git’s Architecture is designed around a distributed model that efficiently tracks changes using the working directory, staging area, and repository.

Git’s Internal Data Structures

Git’s internal data structures are designed to store project data efficiently and accurately track every change made to files over time.

Blob

A Blob (Binary Large Object) stores the actual content of a single file in Git.

**Example:

When you add a file to Git, it creates a blob object that stores the file's content:

echo "Hello, World!" > example.txt
git add example.txt

This command creates a blob object, identified by its SHA-1 hash.

Tree

Trees are Git’s way of representing directories. A tree object points to blobs (files) and other trees (subdirectories), thus forming the hierarchical structure of the project.

**Example: A tree object captures a snapshot of the directory structure at a given commit, with pointers to blobs and other trees.

Commit

A commit object is the most critical data structure in Git represents a project snapshot at a specific point in time.

**Structure of a Commit Object:

Git Object Model

Git’s object model defines how data is stored and managed using blobs, trees, commits, and tags, each uniquely identified by a SHA-1 hash.

Hashes and SHA-1

Git uses SHA-1 hashing to ensure the integrity and uniqueness of all repository objects.

**Example:

git hash-object example.txt

This command outputs the SHA-1 hash of the file content, which is how Git identifies and stores the file internally.

Object Storage

Git stores data as objects inside the .git/objects directory, organized using SHA-1 hashes for efficient access.

Commit Management in Git

Commit Management in Git refers to how Git records, organizes, and tracks snapshots of project changes to maintain a clear and reliable version history.

When you make a commit in Git, it performs several steps under the hood:

  1. **Creating Blobs: Git creates blobs for each file that has been added or modified.
  2. **Building Trees: It then creates tree objects representing the directory structure.
  3. **Generating a Commit: Finally, a commit object is created that links to the top-level tree object and the previous commit.

Commits form a chain, where each commit points to its parent(s), creating a linear or branched history of changes.

Branches in Git are pointers to commits. A branch essentially points to a commit and moves forward as new commits are added.

Branches Work

**Tags: Tags are another type of reference in Git, used to mark specific commits, often for releases or important milestones. Unlike branches, tags do not move—they point to a specific commit permanently.

The Role of the Index (Staging Area)

The index, or staging area, is an intermediate state that allows you to build up a commit piece by piece. It stores information about what will go into your next commit, letting you control the granularity of your changes.

**How the Index Works?

Understanding Git Refs

Refs, or references, are pointers that help Git keep track of commits, branches, and tags. They are stored in the .git/refs directory.

**Types of Refs

Git’s Garbage Collection and Cleanup

Git repositories can accumulate unreferenced objects, such as commits from deleted branches. Git uses garbage collection to clean up these objects and optimize repository size.

**Garbage Collection in Git