Git Object Database and Pack Files

Understanding Git's object storage: loose objects, pack files, delta compression, and how Git optimizes storage for repositories with millions of objects and gigabytes of history.

published: March 31, 2026 reading time: 13 min read updated: March 31, 2026

Introduction

Every Git repository stores its entire history in the object database under .git/objects/. When you first start a project, objects are stored as individual compressed files — one file per blob, tree, commit, or tag. This “loose object” format is simple and fast for small repositories.

But as your project grows to thousands of commits and millions of files, storing each object as a separate file becomes inefficient. Filesystem overhead, disk space waste, and slow enumeration become real problems. Git solves this with pack files — a binary format that stores multiple objects together with delta compression between similar objects.

Understanding the object database and pack file system is essential for managing large repositories, optimizing CI/CD clone times, and debugging storage issues. This article explains how Git stores objects, how pack files work, and how to optimize your repository’s storage.

When to Use / When Not to Use

When to understand pack files:

Managing large repositories with long histories
Optimizing clone and fetch times in CI/CD pipelines
Debugging repository size issues
Understanding git gc and git repack behavior
Building Git server infrastructure

When not to manipulate pack files directly:

Daily development — Git manages packs automatically
Small repositories — loose objects are fine
When unsure — use git gc instead of manual repacking

Core Concepts

Git’s object database has two storage modes:


graph TD
    OBJ[".git/objects/"] --> LOOSE["Loose Objects\none file per object"]
    OBJ --> PACK["Pack Files\nmultiple objects per file"]

    LOOSE --> L1["zlib-compressed\nindividual files"]
    LOOSE --> L2["path: .git/objects/ab/cdef...\n(2-char prefix directory)"]

    PACK --> P1[".git/objects/pack/\npack-<hash>.pack\npack-<hash>.idx"]
    PACK --> P2["delta-compressed\nbetween similar objects"]
    PACK --> P3["reverse-index for\nfast lookup"]

Loose objects are simple: each object is zlib-compressed and stored in a file named by its SHA-1 hash. The first two characters form a subdirectory.

Pack files are complex: they store multiple objects in a single binary file, with delta compression between similar objects (e.g., consecutive versions of the same file), and an index file for fast random access.

Architecture or Flow Diagram


flowchart LR
    COMMIT["New Commit"] -->|creates| NEW_OBJ["New Objects\n(blobs, trees, commit)"]
    NEW_OBJ -->|stored as| LOOSE["Loose Objects\n.git/objects/XX/"]

    GC["git gc / git repack"] -->|collects| MANY["Many Loose Objects"]
    MANY -->|delta compression| DELTA["Delta Chains\nbase → delta → delta"]
    DELTA -->|writes| PACK["Pack File\n.pack + .idx"]
    PACK -->|replaces| OLD["Old Loose Objects\n(deleted)"]

    FETCH["git fetch"] -->|receives| THIN["Thin Pack\n(deltas against missing bases)"]
    THIN -->|resolves| COMPLETE["Complete Pack\n(all bases present)"]

The flow shows how objects start as loose files, get packed during garbage collection, and how fetch operations use thin packs for efficient network transfer.

Step-by-Step Guide / Deep Dive

Loose Object Format

Each loose object is stored as:


zlib(<object type> <size>\0<object content>)


# Create a loose object
echo "hello" | git hash-object -w --stdin
# Output: ce013625030ba8dba906f756967f9e9ca394464a

# The file is at .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
ls -la .git/objects/ce/

# Inspect the raw compressed content
python3 -c "
import zlib, sys
with open('.git/objects/ce/013625030ba8dba906f756967f9e9ca394464a', 'rb') as f:
    data = zlib.decompress(f.read())
    print(repr(data))
"
# Output: b'blob 6\x00hello\n'

Pack File Format

A pack file (.pack) contains:

Header: PACK signature, version number, object count
Objects: Each object is either:
- A full (base) object: type + size + zlib-compressed data
- A delta object: type + size + base object reference + delta instructions
Trailer: SHA-1 checksum of the entire pack

The index file (.idx) provides:

Sorted list of object SHAs with their offsets in the pack
Fan-out table for binary search
Pack checksum


# List pack files
ls -lh .git/objects/pack/
# Output:
# pack-abc123.idx  (index - fast lookup)
# pack-abc123.pack (data - compressed objects)
# pack-abc123.rev  (reverse index - offset to SHA)

# Inspect pack contents
git verify-pack -v .git/objects/pack/pack-abc123.idx
# Output per object:
# <sha> <type> <size> <packed-size> <offset> <depth>
# abc123... commit 234 230 12 0
# def456... tree   120 115 242 1
# 789ghi... blob   5432 128 357 2  (delta, depth 2)

Delta Compression

Delta compression is Git’s key storage optimization. Instead of storing full copies of similar objects, Git stores:

Base object: A full copy (usually the oldest version)
Delta objects: Instructions to transform the base into the target


graph LR
    BASE["Base Object\nv1 of file.py\n(full copy, 5KB)"] -->|delta| D1["Delta 1\nv1 → v2\n(changes only, 200B)"]
    D1 -->|delta| D2["Delta 2\nv2 → v3\n(changes only, 150B)"]
    D2 -->|delta| D3["Delta 3\nv3 → v4\n(changes only, 180B)"]

This can reduce storage by 90%+ for text files that change incrementally. The delta depth matters — deep chains require more decompression work to reconstruct an object.

Thin Packs

When fetching from a remote, Git sends “thin packs” that contain deltas against objects the receiver already has. This minimizes network transfer:


# The server sends deltas against objects it knows you have
git fetch origin
# Output: remote: Compressing objects: 100% (15/15), done.

# Your Git resolves the deltas against local objects
# and creates a complete pack file

Pack Bitmaps

For very large repositories, Git can generate bitmap indexes that accelerate git rev-list operations:


# Enable bitmaps in config
git config repack.writeBitmaps true

# Repack with bitmaps
git repack -adb

# The .bitmap file speeds up history queries
ls .git/objects/pack/*.bitmap

Production Failure Scenarios + Mitigations

Scenario	Symptoms	Mitigation
Corrupted pack file	”error: bad packed object”	Run `git unpack-objects` from a fresh clone; delete corrupted pack
Deep delta chains	Slow `git log` or checkout	Repack with `git repack -f` to reset delta depth
Missing base object	”fatal: bad object” during unpack	Fetch from remote; the base may exist only in another pack
Pack file too large	Memory issues during repack	Use `git repack --window-memory=1g` to limit memory
Incomplete fetch	”error: packfile does not match index”	Delete pack files and re-fetch: `rm .git/objects/pack/* && git fetch`

Trade-offs

Aspect	Advantage	Disadvantage
Loose objects	Simple, fast individual access	Wasteful for many similar objects
Pack files	Excellent compression, fast enumeration	Slower individual object access
Delta compression	90%+ space savings for text	CPU cost for delta creation and resolution
Thin packs	Minimal network transfer	Requires receiver to have base objects
Pack bitmaps	Fast history queries	Additional disk space, repack time

Implementation Snippets


# Check object database statistics
git count-objects -vH
# Output:
# count: 1234        (loose objects)
# size: 5.6M         (loose object size)
# in-pack: 56789     (packed objects)
# packs: 3           (number of pack files)
# size-pack: 45.2M   (pack file size)

# Verify pack integrity
git verify-pack -v .git/objects/pack/pack-*.idx

# List objects in a pack, sorted by size
git verify-pack -v .git/objects/pack/pack-*.idx | sort -k3 -n -r | head -20

# Repack with maximum compression
git repack -a -d -f --depth=250 --window=250

# Repack with delta compression disabled (for speed)
git repack -a -d -f --no-reuse-delta

# Create a pack with bitmaps
git repack -a -d -b

# Garbage collect aggressively
git gc --aggressive --prune=now

# Find largest objects in the repository
git rev-list --objects --all | \
  git cat-file --batch-check | \
  sort -k3 -n -r | head -20

# Check delta chain depth
git verify-pack -v .git/objects/pack/pack-*.idx | \
  awk '{print $6}' | sort -n | uniq -c | sort -n

Observability Checklist

Monitor: Loose object count (git count-objects -v)
Track: Pack file sizes and count over time
Alert: Delta chain depth exceeding 50 (causes slow access)
Verify: Pack integrity with git verify-pack after repacking
Audit: Largest objects in repository (may need Git LFS migration)

Security/Compliance Notes

Pack files contain all historical data — deleted files may still exist in packs
Running git gc --prune=now removes unreachable objects but doesn’t overwrite disk blocks
For true secret removal, see Removing Sensitive Data from History
Pack files are not encrypted — anyone with filesystem access can extract objects

Common Pitfalls / Anti-Patterns

Running git gc too frequently — wastes CPU; Git auto-gc is usually sufficient
Ignoring large objects in packs — they bloat every clone; migrate to Git LFS
Disabling auto-gc — leads to excessive loose objects and slow operations
Not pruning after history rewrite — old objects remain in packs until pruned
Assuming git rm deletes data — it only removes from the working tree; objects persist

Quick Recap Checklist

Loose objects are individual zlib-compressed files
Pack files store multiple objects with delta compression
Delta chains save space but add decompression cost
Thin packs minimize network transfer during fetch
Pack bitmaps accelerate history queries
git gc converts loose objects to packs automatically
git repack gives fine-grained control over packing
Deleted files may still exist in pack files

Interview Q&A

How does delta compression work in Git pack files?

Git finds similar objects (usually different versions of the same file) and stores one as a full base object. Subsequent versions are stored as delta instructions — byte-level copy/insert commands that transform the base into the target. This is similar to xdelta or bsdiff. Delta chains can be deep (A → B → C → D), but Git limits depth to balance compression ratio with access speed.

What's the difference between `git gc` and `git repack`?

git repack only creates new pack files from loose objects and/or existing packs. git gc is a higher-level command that runs repack, prunes unreachable objects, expires reflogs, and runs other maintenance tasks. Think of repack as a tool and gc as a maintenance workflow that uses repack.

Why does `git fetch` sometimes say "Compressing objects" on the server side?

The server creates a thin pack — a pack file containing deltas against objects it believes the client already has. This minimizes network transfer. The client then "thickens" the pack by resolving deltas against its local objects. If the server's assumption is wrong, the fetch fails and retries with a complete pack.

How can you find the largest objects in a Git repository?

Use git rev-list --objects --all | git cat-file --batch-check | sort -k3 -n -r | head -20. This lists all objects reachable from any ref, shows their sizes, and sorts by size descending. For objects inside pack files, use git verify-pack -v .git/objects/pack/*.idx | sort -k3 -n -r | head -20.

Object Storage Architecture (Clean)


graph TD
    DB[".git/objects/"] --> LOOSE["Loose Objects"]
    DB --> PACK_DIR["pack/"]

    LOOSE --> L_FORMAT["zlib-compressed files"]
    LOOSE --> L_PATH["Path: .git/objects/XX/YYYY..."]
    LOOSE --> L_ACCESS["Fast individual access"]

    PACK_DIR --> IDX[".idx (index file)"]
    PACK_DIR --> PACK[".pack (object data)"]
    PACK_DIR --> REV[".rev (reverse index)"]

    PACK --> DELTA_CHAIN["Delta Chain"]
    DELTA_CHAIN --> BASE["Base object (full copy)"]
    BASE --> D1["Delta 1 (changes)"]
    D1 --> D2["Delta 2 (changes)"]

Production Failure: Pack File Corruption

Scenario: Incomplete clone with missing deltas


# Symptoms
$ git log --oneline
error: Could not read abc123...
fatal: Failed to traverse parents of commit def456...

$ git fsck --full
error: packfile .git/objects/pack/pack-abc.pack does not match index
error: def456...: object missing

# Root cause: Network interruption during clone/fetch left pack file
# incomplete, or disk corruption damaged pack data

# Recovery steps:

# 1. Identify corrupted pack
ls -la .git/objects/pack/
# Note the pack file names

# 2. Remove corrupted pack files
rm -f .git/objects/pack/pack-*.pack
rm -f .git/objects/pack/pack-*.idx
rm -f .git/objects/pack/pack-*.rev

# 3. Re-fetch from remote
git fetch origin --refetch

# 4. Verify integrity
git fsck --full

# 5. If remote doesn't have the objects (local-only work):
#    Restore from backup or another clone
rsync -avz backup-server:/path/to/repo/.git/objects/ .git/objects/

# Prevention:
# - Use git clone --recurse-submodules for complete clones
# - Verify after large fetches: git fsck --connectivity-only
# - Use shallow clones only when full history isn't needed

Trade-offs: Loose vs Packed Storage

Aspect	Loose Objects	Packed Objects
Performance (read)	Fast for individual objects	Slower per-object (must unpack)
Performance (enumerate)	Slow (many filesystem calls)	Fast (single file scan)
Disk space	High (no delta compression)	Low (delta compression saves 90%+)
Transfer efficiency	Poor (many small files)	Excellent (single pack file)
Creation cost	None (created on each commit)	High (CPU-intensive repacking)
Network transfer	Inefficient for fetch/clone	Optimized with thin packs
Corruption impact	Single object lost	Entire pack may be unreadable
Best for	Active repos with few objects	Large repos, archives, servers

Rule of thumb: Let Git manage this automatically. Manual repacking is only needed for very large repos or before archiving.

Implementation: Manual Pack Creation and Inspection


# === Create a pack from loose objects ===
# Repack all objects into a single pack
git repack -a -d
# -a: pack everything (not just unreachable)
# -d: delete loose objects after packing

# === Create pack with maximum compression ===
git repack -a -d -f --depth=250 --window=250
# -f: ignore existing delta info, recompute
# --depth: max delta chain length
# --window: objects to consider for delta

# === Inspect pack contents ===
PACK_FILE=$(ls .git/objects/pack/pack-*.idx | head -1)
git verify-pack -v "$PACK_FILE" | head -20
# Output format:
# <sha> <type> <size> <packed-size> <offset> <delta-depth>

# === Find largest objects in pack ===
git verify-pack -v "$PACK_FILE" | \
  grep -v "^$" | \
  grep -v "chain" | \
  sort -k3 -n -r | \
  head -10

# === Check delta chain depths ===
git verify-pack -v "$PACK_FILE" | \
  awk 'NF==6 {print $6}' | \
  sort -n | uniq -c | sort -rn
# Shows distribution: how many objects at each delta depth

# === Create a custom pack with specific objects ===
# Pack only objects reachable from main
git pack-objects my-pack <<EOF
$(git rev-list main)
EOF
# Creates: my-pack-<sha>.pack and my-pack-<sha>.idx

# === Create thin pack for network transfer ===
# Pack objects missing from remote's known commits
git pack-objects --thin --stdout <<EOF | ssh remote "git unpack-objects"
$(git rev-list origin/main..main)
EOF