The Three States: Working Directory, Staging Area, and Repository
Explain Git's three-state architecture with diagrams and practical examples — understand how files flow between working, staging, and committed states.
Introduction
Git’s three-state architecture is the conceptual foundation that makes Git both powerful and, for beginners, confusing. Every file in a Git repository exists in one of three states at any given time: the working directory (your editable files), the staging area (files prepared for the next commit), and the repository (permanently recorded history). Understanding how files move between these states is the key to using Git effectively.
Most version control systems use a simpler two-state model — files are either modified or committed. Git’s staging area adds a deliberate intermediate step that gives you fine-grained control over what becomes part of each commit. This design choice is what enables atomic commits, partial file staging, and the ability to craft clean, reviewable commit histories.
This guide explains the three-state model in depth, with diagrams, practical examples, and real-world scenarios. Once you internalize this model, Git’s commands stop feeling arbitrary and start making logical sense. For a broader introduction to version control, see What Is Version Control?.
When to Use / When Not to Use
Understand the three states when:
- Learning Git for the first time — this model explains why Git works the way it does
- Debugging unexpected
git statusoutput - Crafting clean commits from messy working changes
- Using interactive staging (
git add -p) to split changes into logical commits - Understanding why
git resetandgit checkoutbehave differently - Teaching Git to others — the three-state model is the most important concept to convey
The staging area is less critical when:
- You commit all changes at once every time —
git commit -abypasses explicit staging - Working on solo projects with simple, linear workflows
- Using GUI Git clients that abstract the staging area away
Core Concepts
The three states represent three snapshots of your project:
Working Directory: The files you see and edit on your filesystem. This is your active workspace where you write code, fix bugs, and make changes. Files here may be untracked (new files Git does not know about), modified (changed since the last commit), or clean (identical to the last commit).
Staging Area (Index): A hidden file (.git/index) that records which changes will be included in the next commit. Think of it as a draft or preparation area — you selectively place changes here using git add, review them with git diff --staged, and only then make them permanent with git commit.
Repository (HEAD): The permanent, immutable history of your project. Each commit captures a snapshot of all staged files and links to its parent commit, forming a chain of history. Once committed, changes cannot be altered (only new commits can be added).
graph LR
A[Working Directory<br/>Your editable files] -->|git add| B[Staging Area<br/>Prepared for commit]
B -->|git commit| C[Repository<br/>Permanent history]
C -->|git checkout| A
C -->|git reset| B
B -->|git reset| A
A -->|git restore| A
Architecture or Flow Diagram
File State Transitions
stateDiagram-v2
[*] --> Untracked: New file created
Untracked --> Staged: git add
Staged --> Committed: git commit
Committed --> Modified: Edit file
Modified --> Staged: git add
Modified --> Unmodified: git restore
Staged --> Modified: git restore --staged
Committed --> Modified: git reset HEAD~1
Committed --> Staged: git reset --soft
Modified --> Untracked: git clean
note right of Untracked
Git does not track this file
It will not be committed
end note
note right of Staged
Changes are queued
for the next commit
end note
note right of Committed
Permanently recorded
in repository history
end note
The Complete File Lifecycle
graph TD
A[Create new file] --> B{Tracked?}
B -->|No| C[Untracked]
B -->|Yes| D{Changed?}
D -->|No| E[Unmodified<br/>Matches HEAD]
D -->|Yes| F[Modified<br/>Working directory changed]
C -->|git add| G[Staged<br/>New file]
F -->|git add| H[Staged<br/>Modified file]
G -->|git commit| I[Committed<br/>In repository]
H -->|git commit| I
I -->|Edit file| F
I -->|git checkout| E
F -->|git restore| E
G -->|git restore --staged| C
H -->|git restore --staged| F
Step-by-Step Guide / Deep Dive
Understanding Each State Through Examples
State 1: Working Directory
Create a file and observe its state:
# Initialize a repository
git init three-states-demo
cd three-states-demo
# Create a file
echo "Hello, World!" > hello.txt
# Check the status
git status
Output:
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
hello.txt
nothing added to commit but untracked files present (use "git add" to track)
The file exists in your working directory but Git does not track it yet. It is untracked — Git knows the file exists but will not include it in any commit until you explicitly add it.
State 2: Staging Area
# Stage the file
git add hello.txt
# Check the status
git status
Output:
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: hello.txt
The file has moved to the staging area. It is now “to be committed” — it will be included in the next commit, but it is not yet part of the permanent history.
State 3: Repository
# Commit the staged file
git commit -m "Add hello.txt"
# Check the status
git status
Output:
On branch main
nothing to commit, working tree clean
The file is now in the repository. It is permanently recorded in commit history. The working directory matches the repository — there are no uncommitted changes.
Modifying a Committed File
Now let’s see what happens when you edit a committed file:
# Modify the file
echo "Hello, Git!" > hello.txt
# Check the status
git status
Output:
On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: hello.txt
no changes added to commit (use "git add" and/or "git commit -a")
The file is now modified in the working directory but not yet staged. Git detects the difference between your working copy and the last committed version. This is the most common state during active development.
Partial Staging
One of Git’s most powerful features is the ability to stage only some changes in a file:
# Create a file with multiple changes
cat > app.py << 'EOF'
def greet():
print("Hello")
def farewell():
print("Goodbye")
def helper():
print("Helper function")
EOF
git add app.py
git commit -m "Add app.py with three functions"
# Now modify all three functions
cat > app.py << 'EOF'
def greet():
print("Hello, World!")
def farewell():
print("See you later!")
def helper():
print("Updated helper")
EOF
# Stage only the greet and farewell changes interactively
git add -p app.py
Git will present each change hunk and ask whether to stage it. You can stage greet and farewell while leaving helper unstaged, then commit them separately:
# Commit only staged changes
git commit -m "Update greet and farewell messages"
# Check what remains unstaged
git status
# modified: app.py (the helper change is still in working directory)
# Stage and commit the remaining change
git add app.py
git commit -m "Update helper function"
This produces two clean, atomic commits from a single file with mixed changes.
Moving Files Between States
# Working Directory → Staging Area
git add <file> # Stage specific file
git add . # Stage all changes
git add -p <file> # Stage interactively (hunk by hunk)
# Staging Area → Working Directory (unstage)
git restore --staged <file> # Unstage specific file
git reset HEAD <file> # Older syntax, same effect
# Staging Area → Repository
git commit # Commit all staged changes
git commit -m "message" # Commit with a message
# Working Directory → Last Committed State (discard changes)
git restore <file> # Discard working changes
git checkout -- <file> # Older syntax, same effect
# Repository → Working Directory (restore old version)
git checkout <commit> -- <file> # Restore file from specific commit
Production Failure Scenarios
| Scenario | Impact | Mitigation |
|---|---|---|
| Accidentally committing debug code left in working directory | Broken production, exposed debug output | Always review git diff --staged before committing; use pre-commit hooks |
| Forgetting to stage a critical file | Incomplete commit, broken build on remote | Review git status before every commit; use git diff --staged to verify |
| Staging too many unrelated changes | Unreviewable commits, hard to revert specific changes | Stage logically grouped changes; use git add -p for selective staging |
| Losing uncommitted work in working directory | Lost hours of development | Commit frequently (even WIP commits); use git stash for temporary saves |
git reset --hard on wrong branch | Permanent loss of uncommitted changes | Use --soft or --mixed first; verify with git reflog after mistakes |
| Merge conflict leaves files in partially staged state | Confused state, incomplete resolution | Use git status to identify conflicted files; resolve all before committing |
Trade-off Analysis
| Approach | Advantages | Disadvantages | When to Use |
|---|---|---|---|
Explicit staging (git add + git commit) | Full control, atomic commits, reviewable history | More commands to type | Production code, team projects, code review workflows |
Skip staging (git commit -a) | Faster, fewer steps | Commits all tracked changes together, no selectivity | Solo projects, quick fixes, when all changes belong in one commit |
Interactive staging (git add -p) | Granular control, clean commits from messy work | Slower, requires understanding of hunks | Refactoring, multi-purpose changes, preparing PRs |
| GUI staging | Visual, intuitive | Abstracts away the model, harder to debug | Beginners, visual thinkers, complex merges |
Implementation Snippets
Visualizing the Three States
# See the complete picture at once
echo "=== Working Directory Changes ==="
git diff # Unstaged changes
echo "=== Staging Area ==="
git diff --staged # Staged changes (what will be committed)
echo "=== Repository Status ==="
git log --oneline -3 # Recent commits
echo "=== Overall Status ==="
git status # Summary of all three states
Committing with Review
# The safe commit workflow
git status # 1. See what changed
git diff # 2. Review unstaged changes
git add <files> # 3. Stage intended changes
git diff --staged # 4. Review what will be committed
git commit -m "message" # 5. Commit
git log -1 # 6. Verify the commit
Recovering from Mistakes
# Committed but forgot to stage a file
git add forgotten-file.txt
git commit --amend --no-edit # Adds to the last commit without changing message
# Committed with wrong message
git commit --amend -m "Correct message"
# Committed to wrong branch
git reset --soft HEAD~1 # Undo commit, keep changes staged
git checkout correct-branch # Switch to correct branch
git commit -m "message" # Re-commit on correct branch
# Staged something you should not have
git restore --staged <file> # Unstage without losing changes
Stashing: A Fourth State
Git stash provides a temporary holding area for uncommitted changes:
# Save working directory and staging area changes
git stash push -m "WIP: feature in progress"
# Working directory is now clean
git status
# nothing to commit, working tree clean
# List all stashes
git stash list
# Restore the most recent stash
git stash pop
# Restore a specific stash without removing it
git stash apply stash@{2}
# Drop a stash you no longer need
git stash drop stash@{0}
Observability Checklist
- Logs: Use
git statusas your primary observability tool — it shows all three states at once - Metrics: Track the ratio of staged to unstaged changes — large unstaged deltas indicate infrequent commits
- Traces: Use
git diffandgit diff --stagedto trace exactly what will be committed - Alerts: Set up pre-commit hooks that block commits exceeding size thresholds or containing patterns like
TODO,FIXME, orconsole.log - Audit: Run
git log --statto audit what files each commit touched - Health: Periodically run
git statusto ensure no long-running uncommitted work accumulates - Validation: Before pushing, always run
git diff --stagedto verify commit contents
Security & Compliance Considerations
- Staging area is not a security boundary: Files in the staging area are stored in
.git/indexas plaintext references. They are not encrypted or protected beyond filesystem permissions - Committed secrets are permanent: Once a file with secrets is committed, it exists in the repository history forever — even if you delete it in a later commit. Use pre-commit hooks to scan for secrets before they reach the staging area
- Stash is not encrypted:
git stashstores changes in the repository’s object database. Anyone with repository access can view stash contents withgit stash show -p - Audit trails: The staging area enables clean, atomic commits that serve as better audit trails than monolithic commits. Each commit should represent a single logical change for compliance traceability
- Signed commits: Use
git commit -Sto cryptographically sign commits, proving that the staged changes were intentionally committed by the claimed author
Common Pitfalls / Anti-Patterns
- Treating
git add .as harmless: It stages everything including accidental debug files, temporary edits, and generated artifacts. Always review withgit statusbefore bulk staging - Confusing
git resetmodes:--softkeeps changes staged,--mixed(default) keeps changes in working directory,--harddiscards everything. Using the wrong mode causes data loss or confusion - Not understanding that
git commit -askips staging: It automatically stages all tracked modified files and commits them. Untracked files are still ignored. This bypasses the review step - Leaving the staging area in an inconsistent state: Staging some changes, getting distracted, and coming back days later leads to accidental commits of unrelated changes. Commit or unstage promptly
- Using
git checkoutto unstage:git checkout -- <file>restores the file from the repository, discarding both staged and unstaged changes. Usegit restore --stagedto unstage while keeping working changes - Ignoring the diff before commit: Skipping
git diff --stagedis the #1 cause of accidental commits with debug code, wrong files, or incomplete changes
Quick Recap Checklist
- Working Directory: your editable files on the filesystem
- Staging Area: the preparation zone for the next commit (
.git/index) - Repository: the permanent, immutable history of commits
-
git addmoves changes from working directory to staging area -
git commitmoves staged changes from staging area to repository -
git statusshows the state of all files across all three states -
git diffshows unstaged changes;git diff --stagedshows staged changes -
git add -penables interactive, hunk-by-hunk staging -
git stashprovides a temporary fourth state for uncommitted work - Always review staged changes with
git diff --stagedbefore committing -
git restore --stagedunstages without discarding working changes - Committed changes are permanent — the staging area is your last chance to review
Interview Questions
The staging area (also called the index) is an intermediate state between your working directory and the repository. It acts as a preparation zone where you selectively choose which changes will be included in the next commit. Git has it because it enables atomic commits — you can modify ten files but only commit the three that form a complete logical change. Without the staging area, every commit would include all modified files, making it impossible to craft clean, reviewable history from messy working sessions.
These three modes control what happens to your changes when you undo a commit. --soft moves HEAD back but keeps all changes staged — perfect for amending a commit. --mixed (the default) moves HEAD back and keeps changes in the working directory but unstaged — useful for restaging selectively. --hard moves HEAD back and discards all changes entirely — dangerous, as working directory modifications are permanently lost. The mnemonic: soft keeps everything, mixed keeps working files, hard keeps nothing.
git add -p (patch mode) presents each changed hunk (contiguous block of changes) in a file and asks whether to stage it. You respond with y (yes), n (no), s (split the hunk smaller), or e (edit manually). Use it when a single file contains multiple unrelated changes that should be separate commits — for example, fixing a bug and adding a feature in the same file. It produces cleaner, more reviewable commit history.
A file can have different parts in different states. For example, lines 1-10 of a file might be staged while lines 11-20 remain modified but unstaged. This is the power of hunk-based staging with git add -p. However, at the file level, git status reports the most changed state — if any part is staged, the file shows as staged. The staging area tracks changes at the hunk level, not the file level, which is why partial staging is possible.
The staging area is stored in .git/index, a binary file that uses a custom B+tree format to store pathnames mapped to blob object references. When you run git add, Git computes the SHA-1 hash of your file content and stores it as a blob object, then records the mapping in the index. The index does not store file content directly — it holds metadata pointing to objects in the object database. You can inspect it with git ls-files --stage.
git commit -a automatically stages all tracked modified files and commits them in one step, bypassing explicit review of the staging area. In team environments this is dangerous because it can accidentally commit unrelated changes from other people's work, debug code, or files that should not be part of a logical change. It removes the deliberate review step that git add + git diff --staged provides. For code review workflows, always prefer explicit staging.
git restore is the newer, safer syntax introduced in Git 2.23. git restore --staged <file> unstages changes without discarding working changes. git restore <file> discards working changes by restoring from HEAD. In contrast, git checkout is older and overloaded — git checkout <branch> switches branches while git checkout -- <file> restores from HEAD. Using git checkout -- to unstage is particularly risky because the same syntax can accidentally switch branches. Always prefer git restore.
Git stash creates a fourth virtual state by saving both your working directory changes and staged changes into a separate stash stack. When you git stash push, Git captures the dirty working directory and clean staging area, resets the working tree to match HEAD, then stores the dirty state as a commit object accessible via git stash list. git stash pop restores the most recent stash and drops it. git stash apply restores without dropping. Note that stash contents are stored as readable objects in .git/objects — they are not encrypted.
When you run git commit --amend, Git creates a new commit with the same parent as the original but replaces the tree with whatever is currently staged. If your staging area is clean, amending just changes the commit message. If you have new staged changes, they get folded into the previous commit. This rewrites history — the original commit is orphaned and will eventually be garbage collected. Never amend commits that have been pushed to a shared branch.
Yes, using git reset. git reset --soft HEAD~1 moves HEAD back one commit, keeps all changes staged, ready to recommit with a new message. git reset --mixed HEAD~1 (default) moves HEAD back and unstages all changes but keeps them in the working directory. git reset --hard HEAD~1 permanently discards the last commit and its changes. After any destructive reset, git reflog can recover the orphaned commit within 30 days before garbage collection.
git add computes the file content SHA-1, stores it as a blob in .git/objects, then updates .git/index with the blob reference and staged flag. git commit reads the index, creates a tree object from it, creates a commit object linking to that tree and the parent commit, updates HEAD to point to the new commit, and clears the staging area. git checkout reads the tree object for a given commit, extracts files into the working directory, and updates the index to match that tree.
git diff --staged shows exactly what will be committed — the delta between HEAD and the staging area. This is the last line of defense against accidental commits of debug code, wrong files, or incomplete changes. git diff alone shows unstaged changes (working directory vs staging area), so even git diff passing does not mean the staged changes are correct. This single habit prevents the majority of commit-related incidents in production repositories.
git restore <file> is the modern command to discard working directory changes for a specific file. git reset HEAD <file> was the traditional way to unstage a file. In Git 2.23+, git restore is the recommended approach because it separates the discarding operation from the branch-manipulation semantics of git reset. Use git restore --staged <file> to unstage and git restore <file> to discard working directory changes.
git clean removes untracked files from the working directory only — it has no effect on tracked files in any state. It does not touch the staging area or repository. git clean -n shows what would be deleted (dry run). git clean -f removes untracked files. Combined with git checkout, you can reset a working directory to a clean state, but git clean specifically targets never-before-staged files. Note: git clean is irreversible — there is no built-in recovery mechanism for untracked files.
A hunk is a contiguous block of changes (additions or deletions) within a file that Git identifies as logically related. When you run git add -p, Git presents each hunk and asks whether to stage it. Hunks are computed using a longest common subsequence algorithm to minimize the number of split points. Using s (split) tries to break a hunk into smaller pieces at edit boundaries. Using e (edit) lets you manually edit the patch to stage specific lines. Understanding hunks is essential for clean, atomic commits.
Git stores file mode (executable, symlink, etc.) in the index along with the blob reference. When you git add a file, Git records its mode. git diff --staged will show a mode change as a binary difference unless core.filemode is disabled. This matters for deployment scripts that rely on executable bits. You can verify the stored mode with git ls-files --stage — the object mode field (100644 for regular files, 100755 for executables) is part of the index entry.
The staging area lives in .git/index as plaintext references readable by anyone with repository access. Files placed there are not encrypted, access-controlled, or isolated — they exist in the same object database as all other Git data. Anyone with read access to the repository can inspect staged changes with git ls-files --stage or git show. Sensitive data should never reach the staging area. Use pre-commit hooks with tools like git-secrets or trufflehog to prevent secrets from being staged in the first place.
During a merge, Git populates the staging area with three entries per conflicted file: the common ancestor version (stage 1), the "ours" version (stage 2), and the "theirs" version (stage 3). Resolving a conflict means editing the file, staging it with git add, which marks it resolved. git status shows conflicted files distinctly. If you stage a conflicted file without fully resolving it, the commit will be incomplete. git merge --abort can undo an in-progress merge and restore the staging area to its pre-merge state.
git reset HEAD <file> and git restore --staged <file> both unstage a file, but with subtle differences. git reset updates the index to match HEAD by moving the current branch pointer, which can affect multiple files if you reset a directory. git restore --staged is scoped to the specific file and is the recommended modern syntax. Both keep working directory changes intact. git reset is older and more powerful (it also moves HEAD), while git restore was designed specifically for the restore use case and is safer for unstaging.
The staging area (index) is stored in a single binary file that Git must parse on every git status, git add, and git commit. In repositories with hundreds of thousands of files, this can become a bottleneck. Git uses a B+tree format for the index, but updating it requires reading and rewriting the entire structure. Features like git add -p that diff hunks against the index add extra computation. Large repositories benefit from features like git sparse-checkout (avoiding unneeded directories), .gitignore discipline, and shallow clones to minimize staging area overhead.
Further Reading
- Pro Git — Recording Changes — Official guide to the three states
- Git Status Documentation — Understanding status output
- Git Add Documentation — Staging area operations
- Git Reset Documentation — Moving between states
- Visualizing Git Concepts — Diagrams and explanations
Conclusion
Internalize this mental model and Git transforms from a confusing set of incantations into a transparent, predictable system. You will stop guessing what commands do and start reasoning about them in terms of how they move data between these three states.
Category
Related Posts
Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each
Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.
Master git add: Selective Staging, Patch Mode, and Staging Strategies
Master git add including selective staging, interactive mode, patch mode, and staging strategies for clean atomic commits in version control.
What Is Version Control? The Developer's Safety Net
Learn what version control systems are, why they exist, what problems they solve, and why every developer needs one for modern software development.