Git Objects: Blobs, Trees, Commits, Tags

Understanding Git's four object types — blobs, trees, commits, and annotated tags — how they relate through content-addressable storage, and how to inspect them with plumbing commands.

published: reading time: 21 min read author: Geek Workbench updated: March 31, 2026

Introduction

Git is fundamentally a content-addressable filesystem with a VCS user interface. At its core lies a simple but powerful abstraction: four object types that together represent every snapshot of your project’s history. Understanding these objects — blobs, trees, commits, and tags — is the key to demystifying Git’s internals.

Unlike traditional version control systems that store deltas (differences between versions), Git stores complete snapshots. Each snapshot is decomposed into these four object types, linked together by SHA-1 hashes into a directed acyclic graph (DAG). This design gives Git its speed, integrity guarantees, and distributed nature.

This article examines each object type in depth, shows how they interconnect, and teaches you to inspect them using Git’s plumbing commands. By the end, you’ll be able to manually reconstruct a Git repository from raw objects if needed.

When to Use / When Not to Use

When to understand Git objects:

  • Debugging corruption or missing data in repositories
  • Building Git tooling or integrations
  • Understanding how Git achieves integrity verification
  • Optimizing repository size and performance
  • Recovering lost data from the object database

When not to manipulate objects directly:

  • Daily development — use porcelain commands (git add, git commit)
  • When unsure — direct object manipulation can corrupt repositories
  • For simple history inspection — git log and git show are sufficient

Core Concepts

Git stores everything as objects in .git/objects/, each identified by a SHA-1 hash of its content. There are exactly four types:


graph TD
    TAG["Annotated Tag\n(type: tag)"] -->|points to| COMMIT["Commit\n(type: commit)"]
    COMMIT -->|tree: points to| TREE["Tree\n(type: tree)"]
    COMMIT -->|parent: points to| COMMIT2["Parent Commit"]
    TREE -->|contains| TREE2["Subdirectory Tree"]
    TREE -->|contains| BLOB1["Blob\n(file content)"]
    TREE -->|contains| BLOB2["Blob\n(file content)"]
    TREE2 -->|contains| BLOB3["Blob\n(file content)"]

The relationship is hierarchical: tags point to commits, commits point to trees (and parent commits), trees point to other trees and blobs. Blobs are the leaves — they contain actual file content.

Each object is stored as:


<object type> <content length>\0<content>

This header is zlib-compressed and stored in .git/objects/ under a path derived from the SHA-1 hash.

Architecture or Flow Diagram


flowchart LR
    FILE["File Content"] -->|git hash-object| BLOB["Blob Object\nSHA: abc123..."]
    BLOB -->|referenced by| TREE["Tree Object\nSHA: def456..."]
    TREE -->|referenced by| COMMIT["Commit Object\nSHA: 789ghi..."]
    COMMIT -->|referenced by| TAG["Tag Object\nSHA: jkl012..."]

    META["Author, Date, Message"] --> COMMIT
    PARENT["Parent SHA"] --> COMMIT

The flow shows how file content becomes a blob, which is referenced by a tree, which is referenced by a commit, which may be referenced by a tag. Metadata flows into commits separately from content.

Step-by-Step Guide / Deep Dive

Blob Objects

Blobs store file content. They do not store filenames, permissions, or directory structure — just raw bytes.


# Create a blob from a file
echo "Hello, Git!" | git hash-object -w --stdin
# Output: 8ab686eafeb1f44702738c8b0f24f2567c36da6d

# The object is now in .git/objects/8a/
ls .git/objects/8a/

# Inspect the blob
git cat-file -p 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Output: Hello, Git!

# Check the type
git cat-file -t 8ab686eafeb1f44702738c8b0f24f2567c36da6d
# Output: blob

Key properties:

  • Content-addressable: identical files produce identical blobs (deduplication)
  • Immutable: once created, a blob never changes
  • No metadata: filename and permissions are stored in the tree, not the blob

Tree Objects

Trees represent directories. They map filenames to blob SHAs (or other tree SHAs for subdirectories).


# Create a tree from the current index
git write-tree
# Output: 4b825dc642cb6eb9a060e54bf899d69f824970a0

# Inspect a tree
git cat-file -p 4b825dc642cb6eb9a060e54bf899d69f824970a0
# Output format:
# 100644 blob abc123... file1.txt
# 040000 tree def456... subdir/
# 100755 blob 789ghi... script.sh

Each tree entry contains:

  • Mode: file permissions (100644 for regular files, 100755 for executables, 040000 for directories)
  • Type: blob or tree
  • SHA-1: hash of the referenced object
  • Filename: the name within this directory

Commit Objects

Commits are the backbone of Git history. Each commit records a snapshot (via a tree), authorship, and lineage.


# Create a commit object manually
export GIT_AUTHOR_NAME="Test User"
export GIT_AUTHOR_EMAIL="test@example.com"
export GIT_COMMITTER_NAME="Test User"
export GIT_COMMITTER_EMAIL="test@example.com"
export GIT_AUTHOR_DATE="2026-03-31T12:00:00+00:00"
export GIT_COMMITTER_DATE="2026-03-31T12:00:00+00:00"

TREE_SHA=$(git write-tree)
COMMIT_SHA=$(echo "Initial commit" | git commit-tree $TREE_SHA)

echo $COMMIT_SHA
# Output: a1b2c3d4e5f6...

# Inspect the commit
git cat-file -p $COMMIT_SHA
# Output:
# tree 4b825dc642cb6eb9a060e54bf899d69f824970a0
# author Test User <test@example.com> 1711886400 +0000
# committer Test User <test@example.com> 1711886400 +0000
#
# Initial commit

A commit object contains:

  • tree: SHA-1 of the root tree (the snapshot)
  • parent: SHA-1 of the parent commit(s) — zero for the initial commit, one for normal commits, two+ for merges
  • author: who wrote the code (name, email, timestamp, timezone)
  • committer: who committed the code (can differ from author, e.g., after rebase)
  • message: the commit message

Tag Objects

There are two types of tags in Git:

Lightweight tags are simply refs pointing to a commit — no tag object is created:


git tag v1.0-lightweight
# Creates: .git/refs/tags/v1.0-lightweight → commit SHA

Annotated tags are full objects with metadata:


git tag -a v1.0 -m "Release version 1.0"
# Creates a tag object

# Inspect the tag object
git cat-file -p $(git rev-parse v1.0)
# Output:
# object a1b2c3d4e5f6...
# type commit
# tag v1.0
# tagger Test User <test@example.com> 1711886400 +0000
#
# Release version 1.0

Annotated tags contain:

  • object: the SHA-1 of what the tag points to (usually a commit)
  • type: the type of the object (commit, tree, blob, or tag)
  • tag: the tag name
  • tagger: who created the tag
  • message: the tag message

Production Failure Scenarios

ScenarioSymptomsMitigation
Missing blob”fatal: unable to read tree”Fetch from remote, or restore from backup; blobs are immutable so any copy works
Corrupted treeDirectory listing failsRebuild tree from working tree with git read-tree; verify with git fsck
Broken commit chaingit log stops abruptlyUse git replace to graft history, or rebase onto valid ancestor
Tag pointing to wrong typeUnexpected behavior on tag checkoutVerify with git cat-file -t; recreate annotated tag if needed
Object store corruptionMultiple “bad object” errorsRun git fsck --full; clone fresh from remote; restore from backup

Trade-off Analysis

AspectAdvantageDisadvantage
Content-addressable storageAutomatic deduplication, integrity verificationSHA-1 collision risk (being mitigated with SHA-256)
Snapshot-based (not delta)Fast checkouts, simple modelHigher storage for text files (mitigated by pack files)
Immutable objectsSafe concurrent access, easy replicationNo in-place updates; every change creates new objects
No filenames in blobsBlobs can be shared across treesMust traverse tree to find which file a blob belongs to

Implementation Snippets


# Create a blob and store it
echo "content" | git hash-object -w --stdin

# Read a blob's content
git cat-file -p <sha>

# Get object type
git cat-file -t <sha>

# Get object size
git cat-file -s <sha>

# Create a tree from index
git write-tree

# Read a tree into index
git read-tree <tree-sha>

# Create a commit object
git commit-tree <tree-sha> -p <parent-sha> -m "message"

# Create an annotated tag object
git mktag << EOF
object <commit-sha>
type commit
tag v1.0
tagger Name <email> date
EOF

# List all objects in the repository
git rev-list --objects --all

# Find all unreachable objects
git fsck --unreachable

Observability Checklist

  • Monitor: Object count growth with git count-objects -v
  • Verify: Run git fsck periodically to detect corruption
  • Track: Ratio of loose to packed objects (should favor packed)
  • Alert: Unexpected object count spikes (may indicate accidental large file commits)
  • Audit: Tag signatures for release integrity verification

Security & Compliance Considerations

  • Object hashes provide integrity verification — tampering changes the hash
  • SHA-1 is being deprecated in favor of SHA-256 for collision resistance
  • Signed tags (GPG/SSH) provide non-repudiation for releases
  • Objects are not encrypted — sensitive data in blobs is readable by anyone with repo access
  • See Git Secrets Management for preventing secret commits

Common Pitfalls / Anti-Patterns

  • Assuming blob size equals file size — blobs include a header; use git cat-file -s for actual size
  • Confusing lightweight and annotated tags — lightweight tags are just refs, not objects
  • Modifying objects directly — objects are immutable; use Git commands to create new ones
  • Ignoring unreachable objects — they consume space until git gc prunes them
  • Storing large binary files as blobs — use Git LFS instead

Quick Recap Checklist

  • Blobs store file content only — no filenames or metadata
  • Trees map filenames to blob/tree SHAs — represent directories
  • Commits point to a tree, parent(s), and record authorship
  • Annotated tags are full objects; lightweight tags are just refs
  • All objects are content-addressable by SHA-1 hash
  • Objects are immutable and zlib-compressed
  • The object graph forms a directed acyclic graph (DAG)
  • Use git cat-file to inspect any object by type, size, or content

Object Relationship Diagram (Clean)


graph TD
    REPO["Repository"] -->|contains| TAGS["Annotated Tags"]
    TAGS -->|points to| COMMITS["Commits"]
    COMMITS -->|tree ref| TREES["Trees"]
    COMMITS -->|parent ref| PARENTS["Parent Commits"]
    TREES -->|entries| SUBTREES["Subdirectory Trees"]
    TREES -->|entries| BLOBS["Blobs (file content)"]
    SUBTREES -->|entries| MORE_BLOBS["More Blobs"]

    BLOBS -->|content only| FILES["Raw File Bytes"]
    MORE_BLOBS -->|content only| FILES

Production Failure: Corrupted Object Database

Scenario: Missing blob causing checkout failure


# Symptoms
$ git checkout main
error: unable to read sha1 file (src/config.py)
fatal: unable to checkout working tree

$ git fsck --full
error: abc123def456...: object missing
error: 789ghi...: object corrupt

# Root cause: Disk corruption, interrupted gc, or filesystem error
# destroyed blob objects in .git/objects/

# Recovery steps:

# 1. Identify missing objects
git fsck --full 2>&1 | grep "missing"

# 2. Try to fetch missing objects from remote
git fetch origin --refetch

# 3. If remote doesn't have them (local-only commits):
#    Check reflog for last known good state
git reflog
git checkout HEAD@{1}  # Try previous HEAD

# 4. As last resort, clone fresh and cherry-pick
cd ..
git clone https://github.com/user/repo.git repo-clean
cd repo-clean
git --git-dir=../repo/.git cherry-pick <sha>

# 5. Prevent future corruption:
#    - Use reliable storage (SSD > HDD for .git/)
#    - Run git fsck periodically
#    - Keep remote backups
git push origin --mirror  # Full backup

Trade-offs: Annotated vs Lightweight Tags

AspectAnnotated TagsLightweight Tags
Object typeFull tag object in .git/objects/Simple ref file in .git/refs/tags/
MetadataTagger name, email, date, messageNone
GPG signingSupported (git tag -s)Not supported
Storage~200 bytes per tag~41 bytes (SHA only)
ImmutabilityImmutable once createdCan be moved with git tag -f
Use caseReleases, public milestonesPrivate bookmarks, temporary markers
git describeWorks correctlyMay not show tag message
Platform displayShows message on GitHub/GitLabShows as simple pointer

Recommendation: Use annotated tags for anything public or release-related. Lightweight tags are fine for personal, temporary markers.

Implementation: Creating and Inspecting Each Object Type Manually


# === 1. BLOB ===
# Create blob from string
BLOB_SHA=$(echo "file content" | git hash-object -w --stdin)
echo "Blob SHA: $BLOB_SHA"

# Create blob from file
git hash-object -w myfile.txt

# Inspect
git cat-file -t $BLOB_SHA   # blob
git cat-file -s $BLOB_SHA   # size in bytes
git cat-file -p $BLOB_SHA   # content

# === 2. TREE ===
# Build index, then create tree
echo "100644 blob $BLOB_SHA test.txt" | git mktree
# Output: TREE_SHA

# Or from current index
TREE_SHA=$(git write-tree)

# Inspect
git cat-file -t $TREE_SHA   # tree
git cat-file -p $TREE_SHA   # entries (mode, type, sha, name)

# === 3. COMMIT ===
# Create commit (requires env vars for author)
export GIT_AUTHOR_NAME="Test"
export GIT_AUTHOR_EMAIL="test@example.com"
export GIT_COMMITTER_NAME="Test"
export GIT_COMMITTER_EMAIL="test@example.com"

COMMIT_SHA=$(echo "Initial commit" | git commit-tree $TREE_SHA)

# Inspect
git cat-file -t $COMMIT_SHA   # commit
git cat-file -p $COMMIT_SHA   # tree, author, committer, message

# === 4. ANNOTATED TAG ===
# Create tag object
git tag -a v1.0 -m "Release 1.0" $COMMIT_SHA

# Get tag object SHA (not the commit it points to)
TAG_SHA=$(git rev-parse v1.0^{tag})

# Inspect
git cat-file -t $TAG_SHA   # tag
git cat-file -p $TAG_SHA   # object, type, tag, tagger, message

# === Verify the chain ===
echo "Tag -> Commit -> Tree -> Blob"
git cat-file -p $TAG_SHA | grep "^object"
git cat-file -p $COMMIT_SHA | grep "^tree"
git cat-file -p $TREE_SHA | grep "blob"

Pack Files and Object Compression

Loose objects (individual files in .git/objects/) are eventually packed into pack files for efficiency:


# Trigger manual packing
git gc

# List packed objects
git verify-pack -v .git/objects/pack/*.idx

# See pack file statistics
git count-objects -v

Pack file structure:


graph TD
    LOOSE["Loose Objects\n(individual files)"] -->|git gc triggers| PACKING["Packing Process"]
    PACKING -->|delta compression| PACK["Pack File\n(.pack)"]
    PACK -->|index for fast lookup| IDX["Pack Index\n(.idx)"]
    PACK -->|reduced storage| STORED["Smaller on disk\n(~10-20% of loose)"]

How packing works:

  1. Git finds related objects (similar content, same repository)
  2. Stores one “base” object plus deltas (differences) for others
  3. Creates .idx file for O(log n) SHA lookups
  4. Result: 10-20x storage reduction for text files

Why pack files matter:

  • Clone and fetch operations transfer pack files, not loose objects
  • Deduplication happens across all branches and history
  • git clone --reference uses pack files for storage optimization

The Index (Staging Area)

The index (.git/index) is a binary file that maps tracked files to blob SHAs — it’s the staging area between working tree and repository.


# View the index
git ls-files --stage

# Sample output:
# 100644 abc123... 0    file1.txt
# 100644 def456... 0    file2.sh
# 040000 789abc... 0    subdir/

# Inspect index structure
git cat-file -p 2b3c4d...  # Use write-tree output to see tree structure

Index entries contain:

  • Mode: File permissions (100644, 100755, etc.)
  • SHA: Blob hash of the file in the repository
  • Stage: 0 for normal entries, 1-3 for merge conflicts
  • Name: Relative path from repository root

Index vs Trees:

AspectIndexTree
Location.git/index.git/objects/ (SHA-named)
MutabilityUpdates on every git addImmutable once created
ScopeSingle branch (current)Any commit in history
PurposeStaging area, merge conflict resolutionSnapshot of directory at commit

Index workflow:


flowchart TD
    WORKING["Working Tree\n(modified files)"] -->|git add| INDEX["Index\n(staged content)"]
    INDEX -->|git commit| TREE["Tree Object\n(snapshot)"]
    TREE -->|commit| REPO["Repository\n(permanent)"]
    WORKING -.->|git diff| INDEX

Interview Questions

1. Why don't Git blobs store filenames?

Blobs store only file content to enable deduplication. If two files in different directories have identical content, they share the same blob. Filenames are stored in tree objects, which map names to blob SHAs. This separation means renaming a file doesn't create a new blob — only a new tree.

2. How does Git detect if an object has been corrupted?

Every object's SHA-1 hash is computed from its type, size, and content. When Git reads an object, it recomputes the hash and compares it to the filename. If they don't match, the object is corrupted. This is why Git is called a "content-addressable filesystem" — the address is the content's fingerprint.

3. What's the difference between git hash-object and git hash-object -w?

Without -w, git hash-object only computes and prints the SHA-1 hash without storing the object. With -w (write), it also stores the object in .git/objects/. This is useful for checking if content already exists before writing it.

4. Can a commit have zero parents? When?

Yes — the initial commit of any repository has zero parents. Additionally, commits created with git commit-tree without the -p flag, or orphan branches created with git checkout --orphan, produce commits with no parent. This creates a new root in the commit DAG.

5. How do annotated tags differ from lightweight tags internally?

An annotated tag creates a full tag object in .git/objects/ with metadata (tagger, date, message, GPG signature). A lightweight tag is just a file in .git/refs/tags/ containing a commit SHA — no object is created. Annotated tags are preferred for releases because they're immutable and verifiable.

6. What happens to blobs when a file is deleted in Git?

Blobs are never automatically deleted — they are immutable and retained indefinitely. When you delete a file and commit, Git creates a new tree that no longer references the blob's SHA, but the blob itself remains in `.git/objects/`. This enables recovery via reflog or dangling blob recovery. Blobs are only pruned by `git gc` when they become unreachable (no refs point to them) and are older than the expiration window.

7. How does `git fsck` detect object corruption?

git fsck validates each object's SHA-1 hash against its content. It traverses all reachable objects from refs (branches, tags) and checks that: (1) the object exists, (2) its hash matches computed content, (3) referenced objects exist. For packed objects, it verifies CRC checksums. Use git fsck --full for comprehensive checks including unreachable objects.

8. Why does Git use SHA-1 instead of faster hash algorithms?

When Git was designed (2005), SHA-1 was the standard for integrity verification. Its 160-bit output provides sufficient collision resistance for content-addressable storage. Git requires deterministic hashing (same content = same hash) and depends on the property that different content produces different hashes. SHA-256 support was added later; SHA-1's main advantage is ubiquity and compatibility across systems.

9. Can you have a commit with multiple parent commits? When?

Yes — merge commits have two or more parent commits. The first parent is typically the branch you were on when merging; additional parents are the commits from merged branches. This forms a directed acyclic graph (DAG) rather than a simple chain. Octopus merges can have even more parents (e.g., `git merge branch1 branch2 branch3`).

10. What is the difference between git cat-file -p, -t, and -s?

-p pretty-prints the object's content in human-readable format. -t shows only the object type (blob, tree, commit, tag). -s shows only the size in bytes. These correspond to the three parts of the raw object header: type, size, and content.

11. How does Git store directory structures in tree objects?

Directories are represented as tree objects themselves. Each tree entry has a mode (040000 for directories), a SHA pointing to either a blob (file) or another tree (subdirectory), and the filename. When you `git add` a new file, Git creates a blob, updates the tree, and recursively updates parent trees — all as separate immutable objects.

12. What happens when you rename a file in Git?

Git's rename detection works via content similarity (optional, enabled with `-M` flag in `git diff`). Internally: a rename creates a new tree entry pointing to the existing blob (deduplication means the blob stays the same). Git compares old and new trees to infer renames. No blob is created or destroyed — only the tree structure changes.

13. How do you recover a dangling (unreachable) blob?

Dangling blobs are not referenced by any commit or ref but still exist in the object store. To find them: git fsck --unreachable --no-reflogs. To view content: git cat-file -p <sha>. To recover: create a blob reference via `git hash-object -w <file>` if you have the content. Dangling blobs are often temporary from operations like `git rebase` or `git reset --soft`.

14. What is the maximum size Git can handle for a single object?

Git has no hard limit on object size — theoretically up to 2^63 bytes. In practice, Git becomes inefficient with objects larger than ~2GB (due to memory for delta compression). For large files, use Git LFS which stores pointer files in the repository and the actual content on LFS servers. Before Git LFS existed, large binary files caused repository bloat.

15. Why might two identical files have different blob SHAs?

Two files with identical content should produce identical blob SHAs (content-addressable). If they don't: (1) check for line-ending differences (CRLF vs LF) — Git may normalize during checkout, (2) verify file mode (executable vs non-executable creates different trees), (3) ensure no trailing spaces or invisible characters differ. The blob SHA should match for truly identical content.

16. What is the .git/objects/pack directory for?

Pack files store multiple objects compressed together using delta encoding. Git groups similar objects (e.g., successive versions of a file) and stores only the differences plus a base object. This reduces storage by 10-20x for text files. The `.idx` file provides O(log n) SHA lookups. Running git gc converts loose objects to pack files; git clone primarily transfers pack files.

17. Can you modify a commit's tree SHA? What happens?

Commits are immutable — you cannot modify any field including the tree SHA. To change what's in a commit, you must create a new commit with a different SHA. This is what operations like `git commit --amend`, `git rebase`, and `git cherry-pick` do internally. The old commit remains (becomes dangling) until garbage collected.

18. How does Git handle file permission changes in trees?

Tree entries store file mode as an octal number: 100644 (regular file), 100755 (executable), 040000 (subdirectory), 120000 (symlink). When permissions change, Git creates a new tree entry with the updated mode but same blob SHA (if content unchanged). This is why `git diff` can show permission-only changes.

19. What is the relationship between refs, objects, and the HEAD pointer?

Refs (branches, tags) are pointers to commit SHAs stored in `.git/refs/`. HEAD is a special ref pointing to the current commit. Objects (blobs, trees, commits, tags) are stored in `.git/objects/`. When you commit, Git: (1) creates a tree from the index, (2) creates a commit pointing to that tree and the current HEAD, (3) updates HEAD and the current branch ref to the new commit.

20. Why does Git store content by hash and what problems does content-addressable storage solve?

Content-addressable storage (storing content by its SHA-1 hash) solves several problems: deduplication (identical content is stored once regardless of filename), integrity verification (the hash acts as a checksum — any corruption changes the hash), immutability (changing content produces a new hash, leaving the original intact), and efficient comparison (objects with the same hash are guaranteed identical). This design is why Git operations are so fast — most operations are local hash lookups.

Further Reading

  • Git Objects - This article
  • Git LFS - Large file handling
  • Official Git documentation: git cat-file, git hash-object, git write-tree

Conclusion

Git’s object model — blobs for content, trees for structure, commits for snapshots, tags for anchors — is elegant in its simplicity. Every Git operation is a dance between these four object types. Master them and you master Git.

Category

Related Posts

Git References and HEAD

Deep dive into Git references — branch refs, tag refs, HEAD, detached HEAD state, and symbolic references. Learn how Git tracks commits through the refs namespace.

#git #version-control #refs

Semantic Versioning and Git Tags: SemVer, Tag Types, and Management Strategies

Master semantic versioning (SemVer 2.0.0), lightweight vs annotated git tags, tag management strategies, and automated versioning workflows for production software releases.

#git #version-control #semver

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn