Git Objects: Blobs, Trees, Commits, Tags

Understanding Git's four object types — blobs, trees, commits, and annotated tags — how they relate through content-addressable storage, and how to inspect them with plumbing commands.

published: reading time: 13 min read updated: March 31, 2026

Introduction

Git is fundamentally a content-addressable filesystem with a VCS user interface. At its core lies a simple but powerful abstraction: four object types that together represent every snapshot of your project’s history. Understanding these objects — blobs, trees, commits, and tags — is the key to demystifying Git’s internals.

Unlike traditional version control systems that store deltas (differences between versions), Git stores complete snapshots. Each snapshot is decomposed into these four object types, linked together by SHA-1 hashes into a directed acyclic graph (DAG). This design gives Git its speed, integrity guarantees, and distributed nature.

This article examines each object type in depth, shows how they interconnect, and teaches you to inspect them using Git’s plumbing commands. By the end, you’ll be able to manually reconstruct a Git repository from raw objects if needed.

When to Use / When Not to Use

When to understand Git objects:

  • Debugging corruption or missing data in repositories
  • Building Git tooling or integrations
  • Understanding how Git achieves integrity verification
  • Optimizing repository size and performance
  • Recovering lost data from the object database

When not to manipulate objects directly:

  • Daily development — use porcelain commands (git add, git commit)
  • When unsure — direct object manipulation can corrupt repositories
  • For simple history inspection — git log and git show are sufficient

Core Concepts

Git stores everything as objects in .git/objects/, each identified by a SHA-1 hash of its content. There are exactly four types:


graph TD
    TAG["Annotated Tag\n(type: tag)"] -->|points to| COMMIT["Commit\n(type: commit)"]
    COMMIT -->|tree: points to| TREE["Tree\n(type: tree)"]
    COMMIT -->|parent: points to| COMMIT2["Parent Commit"]
    TREE -->|contains| TREE2["Subdirectory Tree"]
    TREE -->|contains| BLOB1["Blob\n(file content)"]
    TREE -->|contains| BLOB2["Blob\n(file content)"]
    TREE2 -->|contains| BLOB3["Blob\n(file content)"]

The relationship is hierarchical: tags point to commits, commits point to trees (and parent commits), trees point to other trees and blobs. Blobs are the leaves — they contain actual file content.

Each object is stored as:


<object type> <content length>\0<content>

This header is zlib-compressed and stored in .git/objects/ under a path derived from the SHA-1 hash.

Architecture or Flow Diagram


flowchart LR
    FILE["File Content"] -->|git hash-object| BLOB["Blob Object\nSHA: abc123..."]
    BLOB -->|referenced by| TREE["Tree Object\nSHA: def456..."]
    TREE -->|referenced by| COMMIT["Commit Object\nSHA: 789ghi..."]
    COMMIT -->|referenced by| TAG["Tag Object\nSHA: jkl012..."]

    META["Author, Date, Message"] --> COMMIT
    PARENT["Parent SHA"] --> COMMIT

The flow shows how file content becomes a blob, which is referenced by a tree, which is referenced by a commit, which may be referenced by a tag. Metadata flows into commits separately from content.

Step-by-Step Guide / Deep Dive

Blob Objects

Blobs store file content. They do not store filenames, permissions, or directory structure — just raw bytes.


# Create a blob from a file
echo "Hello, Git!" | git hash-object -w --stdin
# Output: 8ab686eafeb1f44702738c8b0f24f2567c36da6d

# The object is now in .git/objects/8a/
ls .git/objects/8a/

# Inspect the blob
git cat-file -p 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Output: Hello, Git!

# Check the type
git cat-file -t 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Output: blob

Key properties:

  • Content-addressable: identical files produce identical blobs (deduplication)
  • Immutable: once created, a blob never changes
  • No metadata: filename and permissions are stored in the tree, not the blob

Tree Objects

Trees represent directories. They map filenames to blob SHAs (or other tree SHAs for subdirectories).


# Create a tree from the current index
git write-tree
# Output: 4b825dc642cb6eb9a060e54bf899d69f824970a0

# Inspect a tree
git cat-file -p 4b825dc642cb6eb9a060e54bf899d69f824970a0
# Output format:
# 100644 blob abc123... file1.txt
# 040000 tree def456... subdir/
# 100755 blob 789ghi... script.sh

Each tree entry contains:

  • Mode: file permissions (100644 for regular files, 100755 for executables, 040000 for directories)
  • Type: blob or tree
  • SHA-1: hash of the referenced object
  • Filename: the name within this directory

Commit Objects

Commits are the backbone of Git history. Each commit records a snapshot (via a tree), authorship, and lineage.


# Create a commit object manually
export GIT_AUTHOR_NAME="Test User"
export GIT_AUTHOR_EMAIL="test@example.com"
export GIT_COMMITTER_NAME="Test User"
export GIT_COMMITTER_EMAIL="test@example.com"
export GIT_AUTHOR_DATE="2026-03-31T12:00:00+00:00"
export GIT_COMMITTER_DATE="2026-03-31T12:00:00+00:00"

TREE_SHA=$(git write-tree)
COMMIT_SHA=$(echo "Initial commit" | git commit-tree $TREE_SHA)

echo $COMMIT_SHA
# Output: a1b2c3d4e5f6...

# Inspect the commit
git cat-file -p $COMMIT_SHA
# Output:
# tree 4b825dc642cb6eb9a060e54bf899d69f824970a0
# author Test User <test@example.com> 1711886400 +0000
# committer Test User <test@example.com> 1711886400 +0000
#
# Initial commit

A commit object contains:

  • tree: SHA-1 of the root tree (the snapshot)
  • parent: SHA-1 of the parent commit(s) — zero for the initial commit, one for normal commits, two+ for merges
  • author: who wrote the code (name, email, timestamp, timezone)
  • committer: who committed the code (can differ from author, e.g., after rebase)
  • message: the commit message

Tag Objects

There are two types of tags in Git:

Lightweight tags are simply refs pointing to a commit — no tag object is created:


git tag v1.0-lightweight
# Creates: .git/refs/tags/v1.0-lightweight → commit SHA

Annotated tags are full objects with metadata:


git tag -a v1.0 -m "Release version 1.0"
# Creates a tag object

# Inspect the tag object
git cat-file -p $(git rev-parse v1.0)
# Output:
# object a1b2c3d4e5f6...
# type commit
# tag v1.0
# tagger Test User <test@example.com> 1711886400 +0000
#
# Release version 1.0

Annotated tags contain:

  • object: the SHA-1 of what the tag points to (usually a commit)
  • type: the type of the object (commit, tree, blob, or tag)
  • tag: the tag name
  • tagger: who created the tag
  • message: the tag message

Production Failure Scenarios + Mitigations

ScenarioSymptomsMitigation
Missing blob”fatal: unable to read tree”Fetch from remote, or restore from backup; blobs are immutable so any copy works
Corrupted treeDirectory listing failsRebuild tree from working tree with git read-tree; verify with git fsck
Broken commit chaingit log stops abruptlyUse git replace to graft history, or rebase onto valid ancestor
Tag pointing to wrong typeUnexpected behavior on tag checkoutVerify with git cat-file -t; recreate annotated tag if needed
Object store corruptionMultiple “bad object” errorsRun git fsck --full; clone fresh from remote; restore from backup

Trade-offs

AspectAdvantageDisadvantage
Content-addressable storageAutomatic deduplication, integrity verificationSHA-1 collision risk (being mitigated with SHA-256)
Snapshot-based (not delta)Fast checkouts, simple modelHigher storage for text files (mitigated by pack files)
Immutable objectsSafe concurrent access, easy replicationNo in-place updates; every change creates new objects
No filenames in blobsBlobs can be shared across treesMust traverse tree to find which file a blob belongs to

Implementation Snippets


# Create a blob and store it
echo "content" | git hash-object -w --stdin

# Read a blob's content
git cat-file -p <sha>

# Get object type
git cat-file -t <sha>

# Get object size
git cat-file -s <sha>

# Create a tree from index
git write-tree

# Read a tree into index
git read-tree <tree-sha>

# Create a commit object
git commit-tree <tree-sha> -p <parent-sha> -m "message"

# Create an annotated tag object
git mktag << EOF
object <commit-sha>
type commit
tag v1.0
tagger Name <email> date
EOF

# List all objects in the repository
git rev-list --objects --all

# Find all unreachable objects
git fsck --unreachable

Observability Checklist

  • Monitor: Object count growth with git count-objects -v
  • Verify: Run git fsck periodically to detect corruption
  • Track: Ratio of loose to packed objects (should favor packed)
  • Alert: Unexpected object count spikes (may indicate accidental large file commits)
  • Audit: Tag signatures for release integrity verification

Security/Compliance Notes

  • Object hashes provide integrity verification — tampering changes the hash
  • SHA-1 is being deprecated in favor of SHA-256 for collision resistance
  • Signed tags (GPG/SSH) provide non-repudiation for releases
  • Objects are not encrypted — sensitive data in blobs is readable by anyone with repo access
  • See Git Secrets Management for preventing secret commits

Common Pitfalls / Anti-Patterns

  • Assuming blob size equals file size — blobs include a header; use git cat-file -s for actual size
  • Confusing lightweight and annotated tags — lightweight tags are just refs, not objects
  • Modifying objects directly — objects are immutable; use Git commands to create new ones
  • Ignoring unreachable objects — they consume space until git gc prunes them
  • Storing large binary files as blobs — use Git LFS instead

Quick Recap Checklist

  • Blobs store file content only — no filenames or metadata
  • Trees map filenames to blob/tree SHAs — represent directories
  • Commits point to a tree, parent(s), and record authorship
  • Annotated tags are full objects; lightweight tags are just refs
  • All objects are content-addressable by SHA-1 hash
  • Objects are immutable and zlib-compressed
  • The object graph forms a directed acyclic graph (DAG)
  • Use git cat-file to inspect any object by type, size, or content

Interview Q&A

Why don't Git blobs store filenames?

Blobs store only file content to enable deduplication. If two files in different directories have identical content, they share the same blob. Filenames are stored in tree objects, which map names to blob SHAs. This separation means renaming a file doesn't create a new blob — only a new tree.

How does Git detect if an object has been corrupted?

Every object's SHA-1 hash is computed from its type, size, and content. When Git reads an object, it recomputes the hash and compares it to the filename. If they don't match, the object is corrupted. This is why Git is called a "content-addressable filesystem" — the address is the content's fingerprint.

What's the difference between `git hash-object` and `git hash-object -w`?

Without -w, git hash-object only computes and prints the SHA-1 hash without storing the object. With -w (write), it also stores the object in .git/objects/. This is useful for checking if content already exists before writing it.

Can a commit have zero parents? When?

Yes — the initial commit of any repository has zero parents. Additionally, commits created with git commit-tree without the -p flag, or orphan branches created with git checkout --orphan, produce commits with no parent. This creates a new root in the commit DAG.

How do annotated tags differ from lightweight tags internally?

An annotated tag creates a full tag object in .git/objects/ with metadata (tagger, date, message, GPG signature). A lightweight tag is just a file in .git/refs/tags/ containing a commit SHA — no object is created. Annotated tags are preferred for releases because they're immutable and verifiable.

Object Relationship Diagram (Clean)


graph TD
    REPO["Repository"] -->|contains| TAGS["Annotated Tags"]
    TAGS -->|points to| COMMITS["Commits"]
    COMMITS -->|tree ref| TREES["Trees"]
    COMMITS -->|parent ref| PARENTS["Parent Commits"]
    TREES -->|entries| SUBTREES["Subdirectory Trees"]
    TREES -->|entries| BLOBS["Blobs (file content)"]
    SUBTREES -->|entries| MORE_BLOBS["More Blobs"]

    BLOBS -->|content only| FILES["Raw File Bytes"]
    MORE_BLOBS -->|content only| FILES

Production Failure: Corrupted Object Database

Scenario: Missing blob causing checkout failure


# Symptoms
$ git checkout main
error: unable to read sha1 file (src/config.py)
fatal: unable to checkout working tree

$ git fsck --full
error: abc123def456...: object missing
error: 789ghi...: object corrupt

# Root cause: Disk corruption, interrupted gc, or filesystem error
# destroyed blob objects in .git/objects/

# Recovery steps:

# 1. Identify missing objects
git fsck --full 2>&1 | grep "missing"

# 2. Try to fetch missing objects from remote
git fetch origin --refetch

# 3. If remote doesn't have them (local-only commits):
#    Check reflog for last known good state
git reflog
git checkout HEAD@{1}  # Try previous HEAD

# 4. As last resort, clone fresh and cherry-pick
cd ..
git clone https://github.com/user/repo.git repo-clean
cd repo-clean
git --git-dir=../repo/.git cherry-pick <sha>

# 5. Prevent future corruption:
#    - Use reliable storage (SSD > HDD for .git/)
#    - Run git fsck periodically
#    - Keep remote backups
git push origin --mirror  # Full backup

Trade-offs: Annotated vs Lightweight Tags

AspectAnnotated TagsLightweight Tags
Object typeFull tag object in .git/objects/Simple ref file in .git/refs/tags/
MetadataTagger name, email, date, messageNone
GPG signingSupported (git tag -s)Not supported
Storage~200 bytes per tag~41 bytes (SHA only)
ImmutabilityImmutable once createdCan be moved with git tag -f
Use caseReleases, public milestonesPrivate bookmarks, temporary markers
git describeWorks correctlyMay not show tag message
Platform displayShows message on GitHub/GitLabShows as simple pointer

Recommendation: Use annotated tags for anything public or release-related. Lightweight tags are fine for personal, temporary markers.

Implementation: Creating and Inspecting Each Object Type Manually


# === 1. BLOB ===
# Create blob from string
BLOB_SHA=$(echo "file content" | git hash-object -w --stdin)
echo "Blob SHA: $BLOB_SHA"

# Create blob from file
git hash-object -w myfile.txt

# Inspect
git cat-file -t $BLOB_SHA   # blob
git cat-file -s $BLOB_SHA   # size in bytes
git cat-file -p $BLOB_SHA   # content

# === 2. TREE ===
# Build index, then create tree
echo "100644 blob $BLOB_SHA test.txt" | git mktree
# Output: TREE_SHA

# Or from current index
TREE_SHA=$(git write-tree)

# Inspect
git cat-file -t $TREE_SHA   # tree
git cat-file -p $TREE_SHA   # entries (mode, type, sha, name)

# === 3. COMMIT ===
# Create commit (requires env vars for author)
export GIT_AUTHOR_NAME="Test"
export GIT_AUTHOR_EMAIL="test@example.com"
export GIT_COMMITTER_NAME="Test"
export GIT_COMMITTER_EMAIL="test@example.com"

COMMIT_SHA=$(echo "Initial commit" | git commit-tree $TREE_SHA)

# Inspect
git cat-file -t $COMMIT_SHA   # commit
git cat-file -p $COMMIT_SHA   # tree, author, committer, message

# === 4. ANNOTATED TAG ===
# Create tag object
git tag -a v1.0 -m "Release 1.0" $COMMIT_SHA

# Get tag object SHA (not the commit it points to)
TAG_SHA=$(git rev-parse v1.0^{tag})

# Inspect
git cat-file -t $TAG_SHA   # tag
git cat-file -p $TAG_SHA   # object, type, tag, tagger, message

# === Verify the chain ===
echo "Tag -> Commit -> Tree -> Blob"
git cat-file -p $TAG_SHA | grep "^object"
git cat-file -p $COMMIT_SHA | grep "^tree"
git cat-file -p $TREE_SHA | grep "blob"

Resources

Category

Related Posts

Git References and HEAD

Deep dive into Git references — branch refs, tag refs, HEAD, detached HEAD state, and symbolic references. Learn how Git tracks commits through the refs namespace.

#git #version-control #refs

Semantic Versioning and Git Tags: SemVer, Tag Types, and Management Strategies

Master semantic versioning (SemVer 2.0.0), lightweight vs annotated git tags, tag management strategies, and automated versioning workflows for production software releases.

#git #version-control #semver

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn