Git Object Database and Pack Files
Understanding Git's object storage: loose objects, pack files, delta compression, and how Git optimizes storage for repositories with millions of objects and gigabytes of history.
Introduction
Every Git repository stores its entire history in the object database under .git/objects/. When you first start a project, objects are stored as individual compressed files — one file per blob, tree, commit, or tag. This “loose object” format is simple and fast for small repositories.
But as your project grows to thousands of commits and millions of files, storing each object as a separate file becomes inefficient. Filesystem overhead, disk space waste, and slow enumeration become real problems. Git solves this with pack files — a binary format that stores multiple objects together with delta compression between similar objects.
Understanding the object database and pack file system is essential for managing large repositories, optimizing CI/CD clone times, and debugging storage issues. This article explains how Git stores objects, how pack files work, and how to optimize your repository’s storage.
When to Use / When Not to Use
When to understand pack files:
- Managing large repositories with long histories
- Optimizing clone and fetch times in CI/CD pipelines
- Debugging repository size issues
- Understanding
git gcandgit repackbehavior - Building Git server infrastructure
When not to manipulate pack files directly:
- Daily development — Git manages packs automatically
- Small repositories — loose objects are fine
- When unsure — use
git gcinstead of manual repacking
Core Concepts
Git’s object database has two storage modes:
graph TD
OBJ[".git/objects/"] --> LOOSE["Loose Objects\none file per object"]
OBJ --> PACK["Pack Files\nmultiple objects per file"]
LOOSE --> L1["zlib-compressed\nindividual files"]
LOOSE --> L2["path: .git/objects/ab/cdef...\n(2-char prefix directory)"]
PACK --> P1[".git/objects/pack/\npack-<hash>.pack\npack-<hash>.idx"]
PACK --> P2["delta-compressed\nbetween similar objects"]
PACK --> P3["reverse-index for\nfast lookup"]
Loose objects are simple: each object is zlib-compressed and stored in a file named by its SHA-1 hash. The first two characters form a subdirectory.
Pack files are complex: they store multiple objects in a single binary file, with delta compression between similar objects (e.g., consecutive versions of the same file), and an index file for fast random access.
Architecture or Flow Diagram
flowchart LR
COMMIT["New Commit"] -->|creates| NEW_OBJ["New Objects\n(blobs, trees, commit)"]
NEW_OBJ -->|stored as| LOOSE["Loose Objects\n.git/objects/XX/"]
GC["git gc / git repack"] -->|collects| MANY["Many Loose Objects"]
MANY -->|delta compression| DELTA["Delta Chains\nbase → delta → delta"]
DELTA -->|writes| PACK["Pack File\n.pack + .idx"]
PACK -->|replaces| OLD["Old Loose Objects\n(deleted)"]
FETCH["git fetch"] -->|receives| THIN["Thin Pack\n(deltas against missing bases)"]
THIN -->|resolves| COMPLETE["Complete Pack\n(all bases present)"]
The flow shows how objects start as loose files, get packed during garbage collection, and how fetch operations use thin packs for efficient network transfer.
Step-by-Step Guide / Deep Dive
Loose Object Format
Each loose object is stored as:
zlib(<object type> <size>\0<object content>)
# Create a loose object
echo "hello" | git hash-object -w --stdin
# Output: ce013625030ba8dba906f756967f9e9ca394464a
# The file is at .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
ls -la .git/objects/ce/
# Inspect the raw compressed content
python3 -c "
import zlib, sys
with open('.git/objects/ce/013625030ba8dba906f756967f9e9ca394464a', 'rb') as f:
data = zlib.decompress(f.read())
print(repr(data))
"
# Output: b'blob 6\x00hello\n'
Pack File Format
A pack file (.pack) contains:
- Header:
PACKsignature, version number, object count - Objects: Each object is either:
- A full (base) object: type + size + zlib-compressed data
- A delta object: type + size + base object reference + delta instructions
- Trailer: SHA-1 checksum of the entire pack
The index file (.idx) provides:
- Sorted list of object SHAs with their offsets in the pack
- Fan-out table for binary search
- Pack checksum
# List pack files
ls -lh .git/objects/pack/
# Output:
# pack-abc123.idx (index - fast lookup)
# pack-abc123.pack (data - compressed objects)
# pack-abc123.rev (reverse index - offset to SHA)
# Inspect pack contents
git verify-pack -v .git/objects/pack/pack-abc123.idx
# Output per object:
# <sha> <type> <size> <packed-size> <offset> <depth>
# abc123... commit 234 230 12 0
# def456... tree 120 115 242 1
# 789ghi... blob 5432 128 357 2 (delta, depth 2)
Delta Compression
Delta compression is Git’s key storage optimization. Instead of storing full copies of similar objects, Git stores:
- Base object: A full copy (usually the oldest version)
- Delta objects: Instructions to transform the base into the target
graph LR
BASE["Base Object\nv1 of file.py\n(full copy, 5KB)"] -->|delta| D1["Delta 1\nv1 → v2\n(changes only, 200B)"]
D1 -->|delta| D2["Delta 2\nv2 → v3\n(changes only, 150B)"]
D2 -->|delta| D3["Delta 3\nv3 → v4\n(changes only, 180B)"]
This can reduce storage by 90%+ for text files that change incrementally. The delta depth matters — deep chains require more decompression work to reconstruct an object.
Thin Packs
When fetching from a remote, Git sends “thin packs” that contain deltas against objects the receiver already has. This minimizes network transfer:
# The server sends deltas against objects it knows you have
git fetch origin
# Output: remote: Compressing objects: 100% (15/15), done.
# Your Git resolves the deltas against local objects
# and creates a complete pack file
Pack Bitmaps
For very large repositories, Git can generate bitmap indexes that accelerate git rev-list operations:
# Enable bitmaps in config
git config repack.writeBitmaps true
# Repack with bitmaps
git repack -adb
# The .bitmap file speeds up history queries
ls .git/objects/pack/*.bitmap
Production Failure Scenarios + Mitigations
| Scenario | Symptoms | Mitigation |
|---|---|---|
| Corrupted pack file | ”error: bad packed object” | Run git unpack-objects from a fresh clone; delete corrupted pack |
| Deep delta chains | Slow git log or checkout | Repack with git repack -f to reset delta depth |
| Missing base object | ”fatal: bad object” during unpack | Fetch from remote; the base may exist only in another pack |
| Pack file too large | Memory issues during repack | Use git repack --window-memory=1g to limit memory |
| Incomplete fetch | ”error: packfile does not match index” | Delete pack files and re-fetch: rm .git/objects/pack/* && git fetch |
Trade-offs
| Aspect | Advantage | Disadvantage |
|---|---|---|
| Loose objects | Simple, fast individual access | Wasteful for many similar objects |
| Pack files | Excellent compression, fast enumeration | Slower individual object access |
| Delta compression | 90%+ space savings for text | CPU cost for delta creation and resolution |
| Thin packs | Minimal network transfer | Requires receiver to have base objects |
| Pack bitmaps | Fast history queries | Additional disk space, repack time |
Implementation Snippets
# Check object database statistics
git count-objects -vH
# Output:
# count: 1234 (loose objects)
# size: 5.6M (loose object size)
# in-pack: 56789 (packed objects)
# packs: 3 (number of pack files)
# size-pack: 45.2M (pack file size)
# Verify pack integrity
git verify-pack -v .git/objects/pack/pack-*.idx
# List objects in a pack, sorted by size
git verify-pack -v .git/objects/pack/pack-*.idx | sort -k3 -n -r | head -20
# Repack with maximum compression
git repack -a -d -f --depth=250 --window=250
# Repack with delta compression disabled (for speed)
git repack -a -d -f --no-reuse-delta
# Create a pack with bitmaps
git repack -a -d -b
# Garbage collect aggressively
git gc --aggressive --prune=now
# Find largest objects in the repository
git rev-list --objects --all | \
git cat-file --batch-check | \
sort -k3 -n -r | head -20
# Check delta chain depth
git verify-pack -v .git/objects/pack/pack-*.idx | \
awk '{print $6}' | sort -n | uniq -c | sort -n
Observability Checklist
- Monitor: Loose object count (
git count-objects -v) - Track: Pack file sizes and count over time
- Alert: Delta chain depth exceeding 50 (causes slow access)
- Verify: Pack integrity with
git verify-packafter repacking - Audit: Largest objects in repository (may need Git LFS migration)
Security/Compliance Notes
- Pack files contain all historical data — deleted files may still exist in packs
- Running
git gc --prune=nowremoves unreachable objects but doesn’t overwrite disk blocks - For true secret removal, see Removing Sensitive Data from History
- Pack files are not encrypted — anyone with filesystem access can extract objects
Common Pitfalls / Anti-Patterns
- Running
git gctoo frequently — wastes CPU; Git auto-gc is usually sufficient - Ignoring large objects in packs — they bloat every clone; migrate to Git LFS
- Disabling auto-gc — leads to excessive loose objects and slow operations
- Not pruning after history rewrite — old objects remain in packs until pruned
- Assuming
git rmdeletes data — it only removes from the working tree; objects persist
Quick Recap Checklist
- Loose objects are individual zlib-compressed files
- Pack files store multiple objects with delta compression
- Delta chains save space but add decompression cost
- Thin packs minimize network transfer during fetch
- Pack bitmaps accelerate history queries
-
git gcconverts loose objects to packs automatically -
git repackgives fine-grained control over packing - Deleted files may still exist in pack files
Interview Q&A
Git finds similar objects (usually different versions of the same file) and stores one as a full base object. Subsequent versions are stored as delta instructions — byte-level copy/insert commands that transform the base into the target. This is similar to xdelta or bsdiff. Delta chains can be deep (A → B → C → D), but Git limits depth to balance compression ratio with access speed.
git repack only creates new pack files from loose objects and/or existing packs. git gc is a higher-level command that runs repack, prunes unreachable objects, expires reflogs, and runs other maintenance tasks. Think of repack as a tool and gc as a maintenance workflow that uses repack.
The server creates a thin pack — a pack file containing deltas against objects it believes the client already has. This minimizes network transfer. The client then "thickens" the pack by resolving deltas against its local objects. If the server's assumption is wrong, the fetch fails and retries with a complete pack.
Use git rev-list --objects --all | git cat-file --batch-check | sort -k3 -n -r | head -20. This lists all objects reachable from any ref, shows their sizes, and sorts by size descending. For objects inside pack files, use git verify-pack -v .git/objects/pack/*.idx | sort -k3 -n -r | head -20.
Object Storage Architecture (Clean)
graph TD
DB[".git/objects/"] --> LOOSE["Loose Objects"]
DB --> PACK_DIR["pack/"]
LOOSE --> L_FORMAT["zlib-compressed files"]
LOOSE --> L_PATH["Path: .git/objects/XX/YYYY..."]
LOOSE --> L_ACCESS["Fast individual access"]
PACK_DIR --> IDX[".idx (index file)"]
PACK_DIR --> PACK[".pack (object data)"]
PACK_DIR --> REV[".rev (reverse index)"]
PACK --> DELTA_CHAIN["Delta Chain"]
DELTA_CHAIN --> BASE["Base object (full copy)"]
BASE --> D1["Delta 1 (changes)"]
D1 --> D2["Delta 2 (changes)"]
Production Failure: Pack File Corruption
Scenario: Incomplete clone with missing deltas
# Symptoms
$ git log --oneline
error: Could not read abc123...
fatal: Failed to traverse parents of commit def456...
$ git fsck --full
error: packfile .git/objects/pack/pack-abc.pack does not match index
error: def456...: object missing
# Root cause: Network interruption during clone/fetch left pack file
# incomplete, or disk corruption damaged pack data
# Recovery steps:
# 1. Identify corrupted pack
ls -la .git/objects/pack/
# Note the pack file names
# 2. Remove corrupted pack files
rm -f .git/objects/pack/pack-*.pack
rm -f .git/objects/pack/pack-*.idx
rm -f .git/objects/pack/pack-*.rev
# 3. Re-fetch from remote
git fetch origin --refetch
# 4. Verify integrity
git fsck --full
# 5. If remote doesn't have the objects (local-only work):
# Restore from backup or another clone
rsync -avz backup-server:/path/to/repo/.git/objects/ .git/objects/
# Prevention:
# - Use git clone --recurse-submodules for complete clones
# - Verify after large fetches: git fsck --connectivity-only
# - Use shallow clones only when full history isn't needed
Trade-offs: Loose vs Packed Storage
| Aspect | Loose Objects | Packed Objects |
|---|---|---|
| Performance (read) | Fast for individual objects | Slower per-object (must unpack) |
| Performance (enumerate) | Slow (many filesystem calls) | Fast (single file scan) |
| Disk space | High (no delta compression) | Low (delta compression saves 90%+) |
| Transfer efficiency | Poor (many small files) | Excellent (single pack file) |
| Creation cost | None (created on each commit) | High (CPU-intensive repacking) |
| Network transfer | Inefficient for fetch/clone | Optimized with thin packs |
| Corruption impact | Single object lost | Entire pack may be unreadable |
| Best for | Active repos with few objects | Large repos, archives, servers |
Rule of thumb: Let Git manage this automatically. Manual repacking is only needed for very large repos or before archiving.
Implementation: Manual Pack Creation and Inspection
# === Create a pack from loose objects ===
# Repack all objects into a single pack
git repack -a -d
# -a: pack everything (not just unreachable)
# -d: delete loose objects after packing
# === Create pack with maximum compression ===
git repack -a -d -f --depth=250 --window=250
# -f: ignore existing delta info, recompute
# --depth: max delta chain length
# --window: objects to consider for delta
# === Inspect pack contents ===
PACK_FILE=$(ls .git/objects/pack/pack-*.idx | head -1)
git verify-pack -v "$PACK_FILE" | head -20
# Output format:
# <sha> <type> <size> <packed-size> <offset> <delta-depth>
# === Find largest objects in pack ===
git verify-pack -v "$PACK_FILE" | \
grep -v "^$" | \
grep -v "chain" | \
sort -k3 -n -r | \
head -10
# === Check delta chain depths ===
git verify-pack -v "$PACK_FILE" | \
awk 'NF==6 {print $6}' | \
sort -n | uniq -c | sort -rn
# Shows distribution: how many objects at each delta depth
# === Create a custom pack with specific objects ===
# Pack only objects reachable from main
git pack-objects my-pack <<EOF
$(git rev-list main)
EOF
# Creates: my-pack-<sha>.pack and my-pack-<sha>.idx
# === Create thin pack for network transfer ===
# Pack objects missing from remote's known commits
git pack-objects --thin --stdout <<EOF | ssh remote "git unpack-objects"
$(git rev-list origin/main..main)
EOF
Resources
Category
Tags
Related Posts
Git Garbage Collection and Maintenance
Master git gc, git prune, git fsck, and automated repository maintenance. Learn how Git manages object storage, cleans unreachable data, and keeps repositories healthy.
Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each
Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.
Automated Changelog Generation: From Commit History to Release Notes
Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.