Removing Sensitive Data from Git History
Using git filter-repo, BFG Repo-Cleaner, and git filter-branch to scrub secrets, passwords, and credentials from Git history. Step-by-step remediation guide.
Introduction
Accidentally committing a secret to Git is a rite of passage for developers. API keys, database passwords, private keys, and tokens end up in commit history, where they persist even after you delete them in a subsequent commit. Git’s append-only object database means that “deleting” a file only creates a new commit without it — the old commit with the secret remains accessible forever.
The only way to truly remove sensitive data from Git history is to rewrite the repository’s history, creating new commits that never contained the secret. This process changes every commit hash downstream, requiring all collaborators to re-clone the repository.
This guide covers the three main tools for history rewriting, when to use each, and the complete remediation workflow including secret rotation — because removing from history is only half the solution.
When to Use / When Not to Use
When to rewrite history:
- Secrets committed to any branch
- Large files accidentally committed (before migrating to LFS)
- Compliance requirements for data removal
- Repository cleanup before open-sourcing
When not to rewrite history:
- For shared branches without team coordination
- When secret rotation is sufficient (and faster)
- If you don’t control all clones of the repository
- As a substitute for proper secret management
Core Concepts
History rewriting creates entirely new commits:
graph TD
OLD["Original History\nC1 → C2(secret) → C3 → C4"] -->|rewrite| NEW["New History\nC1' → C2'(clean) → C3' → C4'"]
NEW -->|force push| REMOTE["Remote Repository\n(updated)"]
REMOTE -->|requires| RECLONE["All collaborators\nmust re-clone"]
SECRET["Secret in C2"] -->|rotate first| ROTATE["Rotate Secret\ninvalidate old value"]
ROTATE -->|then| REWRITE["Rewrite History"]
Every commit hash changes because each commit includes its parent’s hash. Rewriting C2 changes C3 and C4 too.
Architecture or Flow Diagram
flowchart TD
DISCOVER["Discover Secret in History"] -->|immediately| ROTATE["1. Rotate the Secret\ninvalidate old value"]
ROTATE --> CHOOSE["2. Choose Tool"]
CHOOSE -->|recommended| FILTER_REPO["git filter-repo\nfast, modern, safe"]
CHOOSE -->|large repos| BFG["BFG Repo-Cleaner\nfastest for large repos"]
CHOOSE -->|last resort| FILTER_BRANCH["git filter-branch\nslow, deprecated"]
FILTER_REPO --> REWRITE["3. Rewrite History\nremove secret from all commits"]
BFG --> REWRITE
FILTER_BRANCH --> REWRITE
REWRITE --> VERIFY["4. Verify Removal\ngit log -p, secret scan"]
VERIFY -->|confirmed| FORCE["5. Force Push\ngit push --force-with-lease"]
VERIFY -->|still found| REWRITE
FORCE --> NOTIFY["6. Notify Team\nre-clone required"]
NOTIFY --> PREVENT["7. Prevent Recurrence\npre-commit hooks"]
Step-by-Step Guide / Deep Dive
Reference: Step 1: Rotate the Secret First
Reference: Step 4: Verify Removal
Step 1: Rotate the Secret First
Before rewriting history, invalidate the compromised credential:
# The secret is already exposed — assume it's compromised
# Rotate it in the service immediately
# Then proceed with history cleanup
Step 2: Using git filter-repo (Recommended)
git filter-repo is the modern, recommended tool:
# Install
pip install git-filter-repo
# Or: brew install git-filter-repo
# Remove a specific file from all history
git filter-repo --path path/to/secret-file --invert-paths
# Remove files matching a pattern
git filter-repo --path-glob '*.env' --invert-paths
# Replace text in all commits
git filter-repo --replace-text <(echo 'OLD_SECRET==>REDACTED')
# Remove blobs matching expression
git filter-repo --strip-blobs-bigger-than 10M
Step 3: Using BFG Repo-Cleaner
BFG is optimized for speed on large repositories:
# Install
brew install bfg
# Remove files by name
bfg --delete-files .env
# Remove files by size
bfg --strip-blobs-bigger-than 100M
# Replace text in all blobs
bfg --replace-text passwords.txt
# Run (must be on a bare clone)
git clone --mirror https://github.com/user/repo.git
cd repo.git
bfg --delete-files .env
git reflog expire --expire=now --all
git gc --prune=now --aggressive
Step 4: Verify Removal
# Search entire history for the secret
git log -p --all -S 'SECRET_VALUE'
# Search with gitleaks
gitleaks detect --log-opts="--all"
# Check that the file is truly gone
git log --all --full-history -- path/to/secret-file
# Verify no secrets remain
git rev-list --all | xargs git grep -l 'SECRET_VALUE'
Step 5: Force Push
# Force push with lease (safer than --force)
git push origin --force-with-lease
# If protected branch, temporarily disable protection
# Then re-enable after push
Step 6: Notify Team
# All collaborators must:
# 1. Delete their local clone
# 2. Re-clone from the cleaned repository
# 3. NOT merge old history back
# Warning message to team:
echo "URGENT: Repository history has been rewritten due to secret exposure.
Please delete your local clone and re-clone:
rm -rf repo && git clone https://github.com/user/repo.git
Do NOT pull or merge from your old clone."
Production Failure Scenarios
| Scenario | Symptoms | Mitigation |
|---|---|---|
| Secret in fork | Fork still contains secret | Contact fork owner; GitHub can remove if reported |
| Collaborator pushes old history | Secret reappears | All collaborators must delete old clones |
| CI/CD cached old commits | Pipeline uses old history | Clear CI caches; re-trigger pipelines |
| filter-repo refuses on non-fresh clone | ”expected fresh clone” error | Clone fresh: git clone --mirror then filter |
| Protected branch blocks force push | ”protected branch” error | Temporarily disable protection; re-enable after |
Trade-off Analysis
| Tool | Speed | Safety | Ease of Use |
|---|---|---|---|
| git filter-repo | Fast | High (built-in safety checks) | Moderate |
| BFG Repo-Cleaner | Fastest | High (focused on common cases) | Easy |
| git filter-branch | Slowest | Low (easy to misuse) | Hard |
Implementation Snippets
# Complete remediation workflow
# 1. Clone fresh
git clone --mirror https://github.com/user/repo.git
cd repo.git
# 2. Remove secret file
git filter-repo --path .env --path config/secrets.yml --invert-paths
# 3. Verify
git log --all --oneline | head -20
# 4. Push cleaned history
git push --force --mirror
# 5. Clean up
cd ..
rm -rf repo.git
# Alternative: BFG for large repos
git clone --mirror https://github.com/user/repo.git
cd repo.git
bfg --delete-files '{.env,*.pem,*.key}'
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push --force --mirror
Observability Checklist
- Monitor: Secret scanning alerts after cleanup
- Verify: Full history scan with gitleaks
- Track: Team re-clone completion
- Audit: Protected branch settings after force push
- Alert: Any recurrence of the secret in new commits
Security & Compliance Considerations
- Rotating the secret is mandatory — history removal alone is insufficient
- Assume the secret was exposed from the moment it was committed
- Check if the secret was pushed to any public repositories or forks
- Document the incident for compliance records
- See Git Secrets Management for prevention
Common Pitfalls / Anti-Patterns
- Not rotating the secret first — it may already be compromised
- Using
git filter-branch— it’s deprecated and slow - Not forcing collaborators to re-clone — old clones can reintroduce secrets
- Skipping verification — always confirm the secret is truly gone
- Not adding pre-commit hooks — prevent recurrence
Quick Recap Checklist
- Rotate the compromised secret immediately
- Use
git filter-repoor BFG (not filter-branch) - Verify removal with full history scan
- Force push with
--force-with-lease - Notify all collaborators to re-clone
- Install pre-commit hooks to prevent recurrence
- Document the incident
History Rewrite Process (Clean Architecture)
graph TD
OLD["Original History"] -->|filter-repo| CLEAN["Cleaned History"]
OLD -->|BFG| CLEAN
OLD -->|filter-branch| CLEAN_SLOW["Cleaned (slow)"]
CLEAN -->|force push| REMOTE["Remote Updated"]
CLEAN_SLOW -->|force push| REMOTE
REMOTE -->|notify| TEAM["Team Re-clones"]
OLD -.->|still exists in| CLONES["Cached Clones\nstill have secrets"]
CLONES -.->|must be| DELETED["Deleted or Re-cloned"]
Production Failure: Incomplete Cleanup
Scenario: Cached clones still containing secrets
# What happened:
# 1. History rewritten and force-pushed
# 2. Team member pulls instead of re-cloning
# 3. Old objects still in their local .git/objects/
# 4. They push — secret reappears in remote!
# Symptoms
$ git push
# Team member's old clone pushes old commits back
$ gitleaks detect --log-opts="--all"
# Secret found again!
# Root cause: Not all clones were cleaned; old objects persist
# Recovery steps:
# 1. Identify the source of re-contamination
git log --all --oneline --source | grep -C2 "secret"
# Find which branch/clone reintroduced the secret
# 2. Force push again with clean history
git push origin --force-with-lease
# 3. Enforce re-clone on ALL machines:
# - Developer laptops
# - CI/CD runners (clear workspace cache)
# - Staging servers
# - Backup systems
# 4. Verify no cached clones exist:
# Check CI/CD workspace directories
# Check developer machines (ask team)
# Check any automated systems that clone the repo
# 5. Prevent re-contamination:
# Add branch protection rules
# Enable server-side secret scanning
# Set up pre-receive hook to reject secret-containing pushes
# === CI/CD Cache Cleanup ===
# GitHub Actions
# Add to workflow:
# - name: Clean workspace
# run: |
# rm -rf $GITHUB_WORKSPACE
# git clone ${{ github.server_url }}/${{ github.repository }} .
# Jenkins
# Use "Wipe out repository & force clone" in pipeline
# GitLab CI
# variables:
# GIT_STRATEGY: clone # Not "fetch"
Trade-offs: BFG vs Git Filter-repo
| Aspect | BFG Repo-Cleaner | Git Filter-repo |
|---|---|---|
| Speed | Fastest (Java, optimized) | Fast (Python, C bindings) |
| Safety | High (focused use cases) | Highest (built-in safety checks) |
| Ease of use | Easy (simple commands) | Moderate (more options) |
| Maintenance | Unmaintained since 2020 | Actively maintained |
| Installation | Java required | Python required |
| Text replacement | --replace-text (file-based) | --replace-text (flexible) |
| File removal | --delete-files | --path --invert-paths |
| Size filtering | --strip-blobs-bigger-than | --strip-blobs-bigger-than |
| Official recommendation | No | Yes (Git community) |
| Best for | Very large repos (>1GB) | General use, most scenarios |
Recommendation: Use git filter-repo for most cases. Use BFG only for very large repositories where filter-repo is too slow.
Quick Recap: Post-Cleanup Actions
# === 1. Force push (done) ===
git push origin --force-with-lease
# === 2. Notify team ===
# Send this message to all collaborators:
# "URGENT: Repository history rewritten. Delete your local clone and re-clone:
# rm -rf repo && git clone <url>
# Do NOT pull from your old clone."
# === 3. Rotate ALL secrets ===
# Even after cleanup, assume secrets were exposed:
# - API keys: regenerate in provider console
# - Database passwords: change and update connection strings
# - SSH keys: generate new key pairs
# - Certificates: reissue from CA
# === 4. Clear CI/CD caches ===
# GitHub Actions: Settings → Actions → Clear cache
# GitLab CI: CI/CD → Clear Runner Caches
# Jenkins: Delete workspace directories
# === 5. Check forks and mirrors ===
# - Contact fork owners to re-clone
# - Update any mirror repositories
# - Check if secrets were pushed to public repos
# === 6. Enable prevention ===
# Install pre-commit hooks:
pre-commit install
# Enable platform secret scanning:
# GitHub: Settings → Code Security → Secret Scanning
# GitLab: Settings → Security → Secret Detection
# === 7. Document the incident ===
# Record: what was leaked, when, how it was fixed
# Update: runbooks, team training, prevention measures
# Review: why the secret was committed in the first place
# === 8. Verify complete cleanup ===
gitleaks detect --log-opts="--all"
# Should return: "No leaks found"
git log --all -p | grep -i "secret_value"
# Should return: nothing
Interview Questions
It's extremely slow (shell-based, processes commits one at a time), easy to misuse (common patterns produce incorrect results), and leaves backup refs that keep old objects reachable. git filter-repo is 10-100x faster, has built-in safety checks, and is the officially recommended replacement.
--force unconditionally overwrites the remote branch. --force-with-lease only overwrites if the remote branch hasn't changed since your last fetch — it protects against accidentally overwriting someone else's pushes. Always prefer --force-with-lease.
No. Because each commit's hash includes its parent's hash, changing any commit changes all descendants. This is fundamental to Git's Merkle tree structure. The entire history from the modified commit forward must be rewritten, producing new hashes for every commit.
You can't force-push to someone else's fork. Rotate the secret immediately — that's the only reliable fix. You can request GitHub to remove the secret from their caches, and contact the fork owner. But assume the secret is permanently exposed and treat it as compromised.
Rotate the secret immediately. The moment you discover a secret in history, assume it is already compromised — assume it has been scraped, used, or shared. Revoke or change the credential in its respective service (API key, password, token). Only after rotation should you proceed with history rewriting to remove it from Git.
filter-repo is faster (10-100x), has built-in safety checks, and is officially recommended. BFG is faster for very large repos (>1GB) but is unmaintained since 2020 and uses Java. filter-repo supports flexible text replacement with --replace-text, handles glob patterns, and can strip blobs by size. BFG is simpler for common cases but filter-repo is better maintained.
Because every commit hash changes after the rewrite, and the old history still exists in their local clones. If they push from their old clone, they reintroduce the old objects (including the secret). They must delete their local clone entirely and re-clone from the cleaned repository. Pulling or merging from the old clone reinfects the repository.
Use multiple verification methods: (1) git log -p --all -S 'SECRET_VALUE' to search the entire object graph, (2) gitleaks detect --log-opts="--all" for secret scanning, (3) git rev-list --all | xargs git grep -l 'SECRET_VALUE' to search all reachable objects. If any method returns results, the secret is not fully removed.
--invert-paths reverses the path matching — instead of removing files that match the path, it removes all files that do not match. For example, git filter-repo --path secrets/ --invert-paths removes everything except the secrets directory. This is useful when you want to keep only specific files and remove everything else.
Use git filter-repo --replace-text <(echo 'SECRET_VALUE==>REDACTED'). This replaces all occurrences of the string across all commits. The --replace-text option handles multiple patterns by using a file with one pattern per line. After replacement, force push with --force-with-lease and notify collaborators to re-clone.
CI/CD runners that cached the old Git objects may still have the secret in their workspace cache. Clear all CI/CD caches after rewriting: GitHub Actions (Settings → Actions → Clear caches), GitLab CI (CI/CD → Clear Runner caches), Jenkins (delete workspace directories). Any system that cloned the repository before the rewrite must be cleared or re-cloned.
No — filter-repo refuses to run on a non-fresh clone and will error with "expected fresh clone". Clone fresh first: git clone --mirror https://github.com/user/repo.git && cd repo.git && git filter-repo. Using an existing clone risks accidentally including old objects from the source. The mirror clone ensures a clean slate.
Use git filter-repo --strip-blobs-bigger-than 10M to remove all blobs over a certain size. For BFG: bfg --strip-blobs-bigger-than 100M. After rewriting, run git reflog expire --expire=now --all and git gc --prune=now --aggressive to garbage collect old objects and reduce repository size.
The reflog records where HEAD pointed before the rewrite. After rewriting, the old commits may still be reachable via reflog for the 90-day retention period. Use git reflog expire --expire=now --all to clear reflog entries and prevent old objects from being recovered. This is part of the complete cleanup after any history rewrite operation.
Install pre-commit hooks: (1) pre-commit install with hooks like gitleaks or git-secrets, (2) Enable platform secret scanning (GitHub Settings → Code Security → Secret Scanning), (3) Use .gitignore for common secret file patterns (.env, *.pem), (4) Require signed commits to establish authorship accountability.
Removing a file from Git history with git filter-repo --path file --invert-paths removes all commits that touched that file, rewriting all subsequent commits. Removing content with --replace-text keeps the file but scrubs the sensitive content from every commit. File removal is for files that should never have been committed; content removal is for files (like .env) that legitimately exist but contain bad values.
--force-with-lease checks that the remote ref matches what you last fetched before overwriting. If someone else pushed to the remote since your last fetch (perhaps an old collaborator's pre-cleanup clone), the push fails. This prevents accidentally overwriting work from others. --force would unconditionally overwrite, potentially losing other people's work.
Submodules store references to commits in external repositories. To fully remove a secret from a submodule's history, you must independently clean the submodule repository using the same methods, then update the parent repository's submodule reference. The parent repository only stores the submodule's commit hash, not its internal history.
Document: (1) what secret was exposed and when, (2) which repositories and branches were affected, (3) when rotation occurred, (4) cleanup method used (filter-repo/BFG), (5) which collaborators were notified, (6) when re-clone was confirmed complete, (7) what prevention measures were implemented. Keep this record for compliance audits — many frameworks require documented incident response.
Git stores objects compressed in .git/objects/. History rewriting decompresses and re-processes every affected object, recompressing them with new parent references. For repositories with many large blobs, this is I/O intensive. BFG handles large repos faster because it works directly on the object database without fully decompressing. filter-repo is slower but safer and more flexible.
Further Reading
- Removing sensitive data from a repository — GitHub’s official guide covering filter-repo, BFG, and filter-branch workflows
- git filter-repo — The official repository and documentation for the recommended history-rewriting tool
- BFG Repo-Cleaner — Fast, Java-based tool optimized for large repository cleanup
- Removing sensitive data from Git repositories — Atlassian’s comprehensive tutorial on history rewriting techniques
- Git Tools — Rewriting History — Official Git book chapter covering rewriting fundamentals and best practices
Conclusion
Once sensitive data reaches a remote, removing it requires rewriting history with tools like git filter-branch or BFG Repo-Cleaner. The process is surgical and destructive — it changes every descendant commit’s hash — so prevention via pre-commit scanning is far better than cleanup after the fact.
Category
Related Posts
Git Secrets Management and Pre-commit Hooks
Preventing secrets from entering repositories using pre-commit hooks, secret scanning tools, and automated detection. Protect API keys, tokens, and credentials from accidental commits.
Signed Commits (GPG/SSH)
Complete guide to Git commit signing with GPG and SSH keys. Setup, verification, trust chains, and why signed commits matter for supply chain security.
Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each
Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.