Git Garbage Collection and Maintenance
Master git gc, git prune, git fsck, and automated repository maintenance. Learn how Git manages object storage, cleans unreachable data, and keeps repositories healthy.
Introduction
Git repositories grow over time. Every commit, branch deletion, and force push leaves behind objects that may become unreachable. Without cleanup, your .git directory accumulates loose objects, stale reflogs, and orphaned packs — slowing down operations and consuming disk space.
Git’s garbage collection system handles this automatically, but understanding how it works lets you optimize for large repositories, recover lost data before it’s pruned, and troubleshoot storage issues. The difference between a snappy 500MB repo and a sluggish 5GB repo often comes down to maintenance practices.
This article covers every aspect of Git’s garbage collection: the automatic triggers, manual commands, pruning policies, and maintenance routines that keep repositories healthy across their entire lifecycle.
When to Use / When Not to Use
When to manage Git garbage collection:
- Repository size is growing unexpectedly
- CI/CD clone times are increasing
- After history rewriting (rebase, filter-branch)
- Before archiving or backing up repositories
- Troubleshooting “bad object” errors
When not to intervene:
- Small, active repositories — auto-gc handles them
- During active development — let Git manage itself
- When unsure about pruning — unreachable objects may be needed
Core Concepts
Git’s garbage collection has three phases:
graph TD
REPO["Repository"] -->|accumulates| UNREACH["Unreachable Objects\n(commits, blobs, trees)"]
REPO -->|accumulates| LOOSE["Loose Objects\nindividual files"]
REPO -->|accumulates| STALE["Stale Reflogs\nexpired entries"]
AUTO["Auto GC\n(triggered by operations)"] -->|checks| THRESHOLD["Threshold Check\nloose objects > gc.auto"]
THRESHOLD -->|exceeded| PACK["Repack Objects\ninto pack files"]
THRESHOLD -->|not exceeded| SKIP["Skip GC\ncontinue normally"]
MANUAL["Manual GC\ngit gc"] -->|runs| PRUNE["Prune Unreachable\nobjects older than gc.pruneExpire"]
MANUAL -->|runs| REFLOG["Expire Reflogs\nolder than gc.reflogExpire"]
MANUAL -->|runs| PACK
Auto GC runs silently during normal operations when loose object count exceeds gc.auto (default: 6700). Manual GC gives you full control over packing, pruning, and reflog expiration.
Architecture or Flow Diagram
flowchart LR
COMMITS["New Commits"] -->|create| OBJECTS["New Objects\nloose files"]
OBJECTS -->|count exceeds| TRIGGER["gc.auto threshold\n(default: 6700)"]
TRIGGER -->|triggers| AUTO_GC["Auto GC\n(git gc --auto)"]
AUTO_GC -->|packs| PACKS["Pack Files\ncompressed objects"]
AUTO_GC -->|removes| LOOSE_DEL["Deleted Loose Objects"]
REBASE["git rebase"] -->|creates| ORPHAN["Orphaned Objects\nunreachable commits"]
ORPHAN -->|after| EXPIRY["gc.pruneExpire\n(default: 14 days)"]
EXPIRY -->|pruned| GONE["Permanently Deleted"]
FSCK["git fsck"] -->|finds| DANGLING["Dangling Objects\nreachable via reflog"]
DANGLING -->|reflog expires| PRUNED["Pruned by gc"]
Step-by-Step Guide / Deep Dive
Automatic Garbage Collection
Git runs git gc --auto after certain operations (commit, merge, fetch) when loose object count exceeds the threshold:
# Check current gc settings
git config --get gc.auto
# Output: 6700
git config --get gc.autopacklimit
# Output: 50
# Check if auto-gc will trigger
git count-objects -v
# Output:
# count: 1234 (loose objects)
# in-pack: 56789 (packed objects)
# packs: 3 (number of packs)
Auto-gc packs loose objects but does not prune unreachable objects. Pruning requires manual intervention.
Manual Garbage Collection
# Standard garbage collection
git gc
# Output:
# Enumerating objects: 12345, done.
# Counting objects: 100% (12345/12345), done.
# Delta compression using up to 8 threads
# Compressing objects: 100% (5678/5678), done.
# Writing objects: 100% (12345/12345), done.
# Total 12345 (delta 8901), reused 12345 (delta 8901)
# Aggressive garbage collection (better compression, slower)
git gc --aggressive
# Prune immediately (dangerous — removes all unreachable objects)
git gc --prune=now
# Disable pruning entirely
git gc --prune=never
Pruning Unreachable Objects
Unreachable objects are commits, trees, or blobs not referenced by any branch, tag, or reflog:
# Find unreachable objects
git fsck --unreachable
# Output:
# unreachable commit abc123...
# unreachable blob def456...
# Find dangling objects (unreachable AND not in reflog)
git fsck --dangling
# Prune objects older than 7 days
git prune --expire=7.days.ago
# Expire reflog entries older than 30 days
git reflog expire --expire=30.days.ago --all
Repository Health Checks
# Full integrity check
git fsck --full
# Output:
# Checking object directories: 100% (256/256)
# Checking objects: 100% (12345/12345)
# dangling commit abc123...
# Check for missing objects
git fsck --no-dangling --no-reflogs
# Verify pack files
git verify-pack -v .git/objects/pack/*.idx
# Check repository size
git count-objects -vH
Git Maintenance Command
Git 2.30+ introduced git maintenance for automated, scheduled maintenance:
# Register for automatic maintenance
git maintenance register
# Run all maintenance tasks
git maintenance run --auto
# Run specific tasks
git maintenance run --task=gc
git maintenance run --task=loose-objects
git maintenance run --task=incremental-repack
# Check maintenance schedule
git maintenance start
# Stop maintenance
git maintenance unregister
Production Failure Scenarios
| Scenario | Symptoms | Mitigation |
|---|---|---|
| Accidental prune | Lost commits after git gc --prune=now | Recover from remote clone or backup; check reflog first |
| GC during CI | Slow CI runs due to gc trigger | Disable auto-gc in CI: git config gc.auto 0 |
| Pack corruption | ”bad packed object” errors | Delete pack, re-fetch from remote |
| Disk space exhaustion | Repository fills disk | Run git gc --aggressive; migrate large files to LFS |
| Reflog bloat | .git/logs/ consumes GBs | git reflog expire --expire=7.days.ago --all |
Trade-off Analysis
| Aspect | Advantage | Disadvantage |
|---|---|---|
| Auto GC | Zero-config maintenance | May trigger at inconvenient times |
| Aggressive GC | Maximum compression | Very slow on large repos |
| Immediate prune | Frees disk space | Loses recovery safety net |
| 14-day default prune | Recovery window | Delays space reclamation |
| Git maintenance | Scheduled, incremental | Requires Git 2.30+ |
Implementation Snippets
# Safe cleanup workflow
git reflog expire --expire=30.days.ago --all
git gc --prune=30.days.ago
# Aggressive cleanup (after confirming no needed unreachable objects)
git reflog expire --expire=7.days.ago --all
git gc --aggressive --prune=7.days.ago
# CI/CD optimization: disable auto-gc
git config gc.auto 0
git config gc.autoDetach false
# Check what would be pruned (dry run)
git fsck --unreachable --no-reflogs
# Monitor repository health
git count-objects -vH
git fsck --full 2>&1 | grep -E "error|warning"
# Set up scheduled maintenance
git config maintenance.auto true
git config maintenance.gc.auto 6700
Observability Checklist
- Monitor: Loose object count (
git count-objects -v) - Track: Repository size growth over time
- Alert: Pack file count exceeding
gc.autopacklimit - Verify: Integrity with periodic
git fsck --full - Audit: Reflog size per branch
Security & Compliance Considerations
- Pruned objects may still be recoverable from disk forensics
- For true data removal, use secure deletion tools after pruning
- GC doesn’t remove objects from remote repositories
- See Removing Sensitive Data from History for secret removal
Common Pitfalls / Anti-Patterns
- Running
git gc --prune=nowwithout checking reflog — loses recovery options - Disabling auto-gc permanently — leads to excessive loose objects
- Not pruning after history rewrite — old objects persist indefinitely
- Assuming
git gcremoves secrets — it only removes unreachable objects, not specific content
Quick Recap Checklist
- Auto GC triggers when loose objects exceed
gc.auto(default: 6700) - Manual
git gcpacks objects and optionally prunes unreachable ones - Pruning removes objects not referenced by any ref or reflog
- Default prune expiry is 14 days — provides recovery window
-
git fsckverifies repository integrity -
git maintenanceprovides scheduled, incremental maintenance - Always check reflog before pruning
GC Process Flow (Clean Architecture)
graph TD
REPO["Repository Operations"] -->|generate| LOOSE["Loose Objects"]
REPO -->|generate| REFLOG["Reflog Entries"]
REPO -->|generate| UNREACH["Unreachable Objects"]
AUTO_GC["Auto GC\n(triggered at threshold)"] -->|checks| COUNT["Loose object count\n> gc.auto (6700)?"]
COUNT -->|yes| PACK["Repack objects\ninto pack files"]
COUNT -->|no| SKIP["Skip — no action"]
MANUAL_GC["Manual GC\ngit gc"] --> PACK
MANUAL_GC --> EXPIRE_REFLOG["Expire old reflogs\ngc.reflogExpire"]
EXPIRE_REFLOG --> PRUNE["Prune unreachable objects\ngc.pruneExpire (14 days)"]
PACK --> CLEAN["Clean repository\noptimized storage"]
PRUNE --> CLEAN
Production Failure: Aggressive GC Data Loss
Scenario: Reflog data loss after aggressive pruning
# What happened:
$ git gc --prune=now # Removes ALL unreachable objects immediately
$ # Later: need to recover a commit from 3 days ago
$ git reflog
# Empty — reflog entries were expired too!
# Symptoms
$ git log --oneline
# Missing commits that were on deleted branches
$ git fsck --unreachable
# No unreachable objects — they're all gone
# Recovery (limited options):
# 1. Check if remote still has the commits
git fetch origin
git log origin/old-branch --oneline
# 2. Check other clones (colleagues' machines)
# Ask teammates if they have the commits locally
# 3. Check CI/CD logs for commit SHAs
# Some CI systems log commit hashes before building
# 4. If truly lost — the commits are unrecoverable
# This is why --prune=now is dangerous
# Prevention:
# NEVER use --prune=now unless you're certain
# Always use a time window:
git gc --prune=30.days.ago
# Safe cleanup workflow:
# 1. Check what would be pruned
git fsck --unreachable --no-reflogs
# 2. Review reflog for valuable commits
git reflog --all
# 3. Create branches for anything important
git branch save-important <sha>
# 4. Then gc with a safety window
git gc --prune=90.days.ago
Trade-offs: Auto GC vs Manual GC
| Aspect | Auto GC | Manual GC |
|---|---|---|
| Trigger | Automatic (loose objects > gc.auto) | Explicit command |
| Safety | Conservative — never prunes unreachable | Can prune immediately (—prune=now) |
| Performance | Runs during operations (may slow commit/fetch) | Runs on your schedule |
| Disk recovery | Packs objects but keeps unreachable | Can free maximum space |
| Configuration | gc.auto, gc.autopacklimit | Full control over all parameters |
| CI/CD impact | Can trigger unexpectedly | Can be disabled or scheduled |
| Best for | Daily development | Maintenance windows, large repos |
| Risk level | Low | Medium to high (depends on flags) |
Recommendation: Keep auto-gc enabled for daily work. Run manual gc monthly or after major operations (rebase, filter-repo) with a safe prune window.
Implementation: GC Configuration Tuning
# === Threshold tuning ===
# When auto-gc triggers (default: 6700 loose objects)
git config gc.auto 10000
# Maximum number of pack files before consolidating (default: 50)
git config gc.autopacklimit 10
# === Pruning windows ===
# How long unreachable objects are kept (default: 14 days)
git config gc.pruneExpire 30.days.ago
# Reflog expiry for current branch (default: 90 days)
git config gc.reflogExpire 180.days.ago
# Reflog expiry for other refs (default: 30 days)
git config gc.reflogExpireUnreachable 90.days.ago
# === Pack optimization ===
# Delta search window size (default: 10)
git config pack.window 50
# Maximum delta chain depth (default: 50)
git config pack.depth 250
# Memory limit for delta search
git config pack.windowMemory 512m
# Compression level (0-9, default: -1 = zlib default)
git config pack.compression 6
# === Pack bitmaps (for large repos) ===
# Speeds up git rev-list operations
git config repack.writeBitmaps true
# === CI/CD optimization ===
# Disable auto-gc in CI environments
git config gc.auto 0
git config gc.autodetach false
# === Verify current configuration ===
git config --get-regexp 'gc\.'
git config --get-regexp 'pack\.'
git config --get-regexp 'repack\.'
# === Scheduled maintenance (Git 2.30+) ===
# Register for automatic background maintenance
git maintenance register
# Run maintenance tasks
git maintenance run --auto
# Available tasks: gc, commit-graph, loose-objects, incremental-repack
git maintenance run --task=gc
git maintenance run --task=commit-graph
Interview Questions
Standard git gc repacks objects with default delta window and depth. --aggressive uses a larger delta window (250 vs 10) and deeper delta chains, producing smaller packs but taking significantly longer. It also reuses existing delta information less, recomputing for better compression. Use it occasionally, not regularly.
By default, 14 days (gc.pruneExpire = 14.days.ago). Unreachable objects that are still referenced in the reflog are kept until the reflog entry expires (90 days for current branch, 30 days for others). Only objects that are both unreachable AND past the prune expiry are deleted.
Auto-gc can trigger unpredictably during clone or fetch, adding minutes to CI runs. CI environments typically use shallow clones that are discarded after the run, so gc provides no benefit. Disable it with git config gc.auto 0 in CI scripts to ensure consistent build times.
Yes, but only after the reflog expires. When you force-push, the old commits become unreachable from any branch, but they remain in the reflog. Once the reflog entries expire (default 90 days) AND the prune expiry passes (default 14 days), git gc will permanently remove them.
Loose objects are individual files stored under .git/objects/ as single files per object. Pack files are compressed archives that bundle multiple objects together, using delta compression to store differences between similar objects. Git creates pack files during gc to save space. Auto-gc triggers when loose object count exceeds gc.auto (default 6700) and consolidates them into packs.
It shows a human-readable summary of object counts and storage usage: count (loose objects), in-pack (objects in pack files), packs (number of pack files), and size (pack file size). A growing count with few packs suggests auto-gc is not keeping up. A high packs count approaching gc.autopacklimit signals it's time to run manual gc.
When the number of pack files exceeds gc.autopacklimit (default 50), Git automatically repacks all objects into a single new pack file during gc. This consolidates fragmented packs and reduces file descriptor usage. If you have many small packs, running git gc manually triggers this consolidation.
git gc --auto only runs when loose object count exceeds the threshold and only performs packing. git maintenance (Git 2.30+) runs scheduled background tasks incrementally: loose object packing, commit-graph updates, and incremental repack. It runs via cron or scheduler on your system, doing small amounts of work per run so it never causes the latency spikes that a full gc can produce on large repos.
A dangling commit is an unreachable commit not referenced by any branch, tag, or the reflog. This commonly happens after rebasing, amending, or resetting. git fsck --dangling finds them. They are kept because the reflog still references them. Once the reflog entry expires AND the prune expiry passes, gc removes them. They are recoverable until that point.
It removes ALL unreachable objects immediately, bypassing the default 14-day recovery window. If your reflog entries are also expired or if you ran git reflog expire first, there is no safety net. Any commits from deleted branches, rebased history, or reset operations that are not on another branch or tag are permanently deleted. Use a time-based prune window like --prune=30.days.ago instead.
It performs a full integrity check on all objects in the repository, verifying SHA-1 checksums, directory structure validity, and referential integrity. It detects corrupted or missing objects, broken refs, and dangling objects. Run it periodically (monthly or after any filesystem issue) and always check the output for errors or warnings.
gc.reflogExpire controls expiry for reflog entries of the current branch (default 90 days). gc.reflogExpireUnreachable controls expiry for reflog entries of deleted or unreachable branches (default 30 days). Setting these appropriately balances between recovery window (longer for important branches) and storage cleanup (shorter for feature branches that are merged and deleted).
It sets the maximum delta chain depth when repacking objects (default 50). A deeper chain can produce smaller packs for chains of similar objects (like consecutive commits with small changes), but increases CPU usage during repack and object lookup time. --aggressive effectively sets a deeper delta window. For most repos, the default is fine; only very large repos with deep commit histories benefit from tuning it.
Use git verify-pack when you suspect pack file corruption specifically. It verifies the .idx index file matches the .pack file and lists all objects in the pack. git fsck checks object integrity across both loose objects and packs, resolving refs and checking object connectivity. Run git verify-pack -v .git/objects/pack/*.idx to see all objects in all packs and spot-check for issues.
The commit-graph is a binary index that accelerates commit lookups, especially git log and reachability queries. git maintenance updates it incrementally. While gc itself does not directly update commit-graph, running git maintenance run --task=commit-graph alongside gc keeps it current. Without it, repos with many commits see significantly slower log operations.
When you delete a branch, its reflog entries remain for the gc.reflogExpireUnreachable period (default 30 days). If you recreate the branch name, a new reflog starts fresh. The old commits are still in the repo as dangling/unreachable objects until both the reflog expires and the prune expiry passes. You can recover the old commits via git reflog of the new branch only shows new entries — use git fsck --unreachable to find orphaned commits.
Git LFS stores large files externally and stores pointer files in the repo. The actual LFS objects are stored in .git/lfs/objects and managed by LFS itself, not Git gc. Git gc only affects the pointer files (small text files) and regular Git objects. This means repos using LFS can have large LFS storage even when Git gc has cleaned everything. Run git lfs prune separately to clean up old LFS objects.
git reflog expire removes reflog entries themselves (the history of where HEAD pointed). git gc --prune removes the actual repository objects (commits, trees, blobs) that are no longer referenced by any ref or remaining reflog. You need reflog entries to survive long enough to let you recover — but once all reflog entries referencing an object are gone, gc can prune that object. Running reflog expire before gc is a safe cleanup sequence.
A full gc is CPU and memory intensive, re-computing deltas and rewriting pack files. On repos over 10GB, it can take hours and consume tens of gigabytes of temporary disk space. It also takes an exclusive lock on the repository. Mitigation: use git maintenance for incremental maintenance, disable auto-gc in CI, run gc during maintenance windows, and use git gc --aggressive sparingly (never as a scheduled task).
Schedule git maintenance run --auto as a background cron job during low-activity hours to keep the repository healthy without impacting developer workflows. Use git maintenance start (Git 2.30+) which sets up hourly, daily, and weekly maintenance tasks automatically, including commit-graph updates, loose object packing, and incremental repack. For large monorepos, set git config maintenance.gc.auto 0 and run git gc manually during scheduled maintenance windows instead of relying on auto-gc triggers. Monitor repository health monthly with git count-objects -vH and git fsck --full to catch issues early. In CI/CD pipelines, disable auto-gc entirely with git config gc.auto 0 to prevent unpredictable slowdowns during builds.
Further Reading
- Git Internals - Git Objects — Official Pro Git chapter on object storage
- Git Internals - Maintenance and Data Recovery — Official guide on gc, fsck, and recovery
- Git Configuration - gc and cleanup — Full git-gc documentation
- Git Configuration - git maintenance — Automatic maintenance task reference
- Git Book - Branching and Merging — Understanding reflog context
- Atlassian Git Tutorials - Git Internals — Visual walkthrough of packfiles and object store
- GitHub Blog - Deep Git series — Engineering blog posts on Git internals
Conclusion
Git’s garbage collector is the housekeeping daemon that keeps your repo healthy — pruning unreachable objects, packing loose files, and optimizing storage. Regular maintenance prevents repository bloat and keeps Git operations fast over years of development.
Category
Related Posts
Git Object Database and Pack Files
Understanding Git's object storage: loose objects, pack files, delta compression, and how Git optimizes storage for repositories with millions of objects and gigabytes of history.
Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each
Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.
Automated Changelog Generation: From Commit History to Release Notes
Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.