The .git Directory Structure
Exploring the .git folder: HEAD, config, objects, refs, hooks, and index. Understand how Git stores everything internally for better debugging and recovery.
Introduction
Every Git repository you’ve ever cloned, initialized, or worked with hides a single directory that contains the entire universe of that project’s history: the .git directory. While most developers interact with Git through high-level commands like git commit, git push, and git merge, understanding what lives inside .git transforms you from a Git user into a Git operator.
The .git directory is not a black box. It’s a carefully organized database of files, references, and metadata that together implement a distributed version control system. When things go wrong (and they will), knowing this structure is the difference between panicking and confidently recovering your work.
This post examines every component of the .git directory, explains what each file and folder does, how they interconnect, and why Git’s design choices matter for real-world development workflows.
When to Use / When Not to Use
When to understand .git internals:
- Recovering from corrupted repositories or lost commits
- Debugging mysterious Git behavior (detached HEAD, missing branches)
- Writing custom Git hooks or tooling
- Optimizing large repositories
- Understanding how Git achieves its guarantees
When not to dig into .git:
- Daily development workflows — use normal Git commands
- When you’re unsure — modifying
.gitfiles directly can corrupt your repo - For simple tasks like committing or branching — the CLI is sufficient
Core Concepts
The .git directory is Git’s entire knowledge base. Everything outside of it is your working tree — the files you edit. Everything inside is Git’s internal representation of your project’s history, configuration, and state.
graph TD
A[".git Directory"] --> B["HEAD"]
A --> C["config"]
A --> D["objects/"]
A --> E["refs/"]
A --> F["hooks/"]
A --> G["index"]
A --> H["logs/"]
A --> I["description"]
A --> J["packed-refs"]
A --> K["COMMIT_EDITMSG"]
D --> D1["info/"]
D --> D2["pack/"]
D --> D3["<sha2>/"]
E --> E1["heads/"]
E --> E2["tags/"]
E --> E3["remotes/"]
Git’s design follows a simple principle: everything is a file. This means you can inspect, backup, and even manually repair a repository using standard file operations. The directory structure is stable across Git versions, making it a reliable foundation for tooling.
Architecture or Flow Diagram
flowchart LR
WT["Working Tree\n(your files)"] -->|git add| IDX["index\n(staging area)"]
IDX -->|git commit| OBJ["objects/\n(blobs, trees, commits)"]
OBJ -->|referenced by| REF["refs/\n(branches, tags)"]
REF -->|pointed to by| HD["HEAD\n(current ref)"]
HD -->|determines| WT
CFG["config"] -.->|settings| IDX
CFG -.->|settings| OBJ
HKS["hooks/"] -.->|triggers| IDX
HKS -.->|triggers| OBJ
The flow shows how data moves from your working tree through the staging area into the object database, with references and HEAD tracking the current state. Configuration and hooks influence every transition.
Step-by-Step Guide / Deep Dive
Core References
The HEAD File
HEAD is a single file containing a reference to the current branch or commit SHA:
$ cat .git/HEAD
ref: refs/heads/main
When you’re in detached HEAD state, it contains a raw SHA-1 hash instead of a symbolic reference. This file is Git’s answer to “where am I right now?”
The config File
Your repository’s local configuration, overriding global (~/.gitconfig) and system-level settings:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = https://github.com/user/repo.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
remote = origin
merge = refs/heads/main
Object Storage
The objects/ Directory
Git’s content-addressable storage. Every blob, tree, commit, and tag lives here, named by their SHA-1 (or SHA-256) hash:
$ ls .git/objects/
02/ 0a/ 1b/ 2c/ 3d/ ... pack/ info/
The first two characters of the hash form a subdirectory; the remaining 38 characters form the filename. Objects are zlib-compressed and stored as loose files until git gc packs them.
The refs/ Directory
References are named pointers to commits:
refs/
├── heads/ # Local branches
│ ├── main
│ └── feature/auth
├── tags/ # Tags
│ └── v1.0.0
└── remotes/ # Remote-tracking branches
└── origin/
├── main
└── develop
Each file contains a 40-character SHA-1 hash. Reading refs/heads/main tells you exactly which commit main points to.
Staging & State
The index File
The index (or staging area) is a binary file that tracks which files are staged for the next commit. You can’t read it directly, but you can inspect it:
$ git ls-files --stage
100644 abc123... 0 src/main.py
100644 def456... 0 src/utils.py
Extensibility & Recovery
The hooks/ Directory
Executable scripts that run at specific points in Git’s workflow:
hooks/
├── pre-commit.sample
├── pre-push.sample
├── commit-msg.sample
└── ...
Remove .sample and make executable to activate. Hooks are not versioned — each developer manages their own.
The logs/ Directory
Reflogs track reference movements, enabling recovery of “lost” commits:
$ cat .git/logs/HEAD
0000000... abc123... User <user@email.com> 1711900000 +0000 commit: Initial commit
abc123... def456... User <user@email.com> 1711900100 +0000 commit: Add feature
Other Files
- description: Repository description (used by GitWeb)
- packed-refs: Packed references for performance (see Git Object Database and Pack Files)
- COMMIT_EDITMSG: Last commit message (used by
git commit --amend) - MERGE_HEAD, REBASE_HEAD: Temporary files during merge/rebase operations
Production Failure Scenarios
| Scenario | Symptoms | Mitigation |
|---|---|---|
| Corrupted HEAD | ”fatal: bad HEAD” | Restore from refs/heads/ or use git symbolic-ref HEAD refs/heads/main |
| Missing objects | ”fatal: loose object corrupt” | Run git fsck --full, fetch from remote, or restore from backup |
| Broken refs | Branch points to non-existent commit | Check reflog, reset to valid commit |
| Lock file stuck | ”fatal: Unable to create .git/index.lock” | Remove stale .git/index.lock after verifying no other Git process runs |
| Hook failure | Commit/push silently fails | Check hook exit codes, run hooks manually with bash .git/hooks/pre-commit |
Trade-off Analysis
| Aspect | Advantage | Disadvantage |
|---|---|---|
| File-based storage | Human-inspectable, easy to backup | Not optimized for large repos without packing |
| SHA-1 hashing | Fast content addressing | Collision risk (mitigated by SHA-256 transition) |
| Local hooks | Per-developer customization | Not shared across team, easy to forget |
| Reflog | Recovery safety net | Consumes disk space over time |
Implementation Snippets
# Initialize a new repository
git init
ls -la .git/
# Inspect HEAD
cat .git/HEAD
# List all objects with sizes
git count-objects -vH
# View packed refs
cat .git/packed-refs
# Inspect the index
git ls-files --stage
# List hooks
ls -la .git/hooks/
# View reflog
git reflog
cat .git/logs/HEAD
# Create a custom pre-commit hook
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
echo "Running pre-commit checks..."
npm run lint
EOF
chmod +x .git/hooks/pre-commit
Observability Checklist
- Monitor: Repository size growth with
git count-objects -vH - Log: Hook execution results (hooks should log to stderr)
- Alert: Reflog size exceeding thresholds (prune with
git reflog expire) - Verify: Run
git fsck --fullperiodically on critical repositories - Track: Number of loose objects vs packed objects
Security & Compliance Considerations
- The
.gitdirectory contains all history — never expose it on a web server - Hooks run with the same permissions as the user — validate hook sources
- Configuration may contain credentials — never commit
.git/configwith secrets - Consider using
git config core.hooksPathfor shared, versioned hooks
Common Pitfalls / Anti-Patterns
- Editing
.gitfiles directly — always use Git commands unless you know exactly what you’re doing - Deleting
.gitto “reset” — you lose all history; usegit reset --hardinstead - Ignoring hook exit codes — a failing hook should abort the operation
- Not backing up
.git— it IS your repository; the working tree is disposable - Sharing hooks via
.git/hooks/— hooks aren’t versioned; usecore.hooksPathor a tool like Husky
Quick Recap Checklist
-
.gitcontains everything Git needs — working tree is disposable -
HEADpoints to current branch or commit -
objects/stores all content as SHA-addressed blobs, trees, commits, tags -
refs/contains branch and tag pointers -
indexis the staging area (binary format) -
hooks/runs scripts at key Git lifecycle events -
logs/enables recovery through reflogs - Never expose
.giton production web servers
.git Directory Tree
graph TD
ROOT[".git/"] --> HEAD["HEAD"]
ROOT --> CONFIG["config"]
ROOT --> OBJECTS["objects/"]
ROOT --> REFS["refs/"]
ROOT --> HOOKS["hooks/"]
ROOT --> INDEX["index"]
ROOT --> LOGS["logs/"]
ROOT --> PACKED["packed-refs"]
ROOT --> DESC["description"]
ROOT --> EDITMSG["COMMIT_EDITMSG"]
OBJECTS --> OBJ_INFO["info/"]
OBJECTS --> OBJ_PACK["pack/"]
OBJECTS --> OBJ_LOOSE["<first-2-chars>/<remaining-38>/"]
REFS --> REFS_HEADS["heads/ (branches)"]
REFS --> REFS_TAGS["tags/"]
REFS --> REFS_REMOTES["remotes/ (tracking)"]
REFS_HEADS --> RH_MAIN["main"]
REFS_HEADS --> RH_FEAT["feature/auth"]
REFS_TAGS --> RT_V1["v1.0.0"]
REFS_REMOTES --> RR_ORIGIN["origin/"]
RR_ORIGIN --> RR_MAIN["main"]
RR_ORIGIN --> RR_DEV["develop"]
Production Failure: Corrupted Repository Recovery
Scenario: Corrupted index file preventing all operations
# Symptoms
$ git status
fatal: .git/index: index file smaller than expected
$ git add .
fatal: Unable to create '/path/to/repo/.git/index.lock': File exists.
# Root cause: Binary index file was truncated or corrupted
# (disk failure, interrupted operation, filesystem corruption)
# Recovery steps:
# 1. Remove stale lock file (verify no git process is running first)
rm -f .git/index.lock
# 2. Rebuild index from HEAD
git reset HEAD
# 3. If index is completely gone, recreate from current tree
git read-tree HEAD
# 4. Verify working tree matches
git status
# If objects are missing:
$ git cat-file -t abc123
fatal: git cat-file: could not get object info
# Recovery:
git fsck --full
# Fetch missing objects from remote
git fetch origin
# Or restore .git/objects/ from backup
Manual Object Inspection with Plumbing Commands
# Create a blob and inspect it
echo "Hello, Git internals!" | git hash-object -w --stdin
# Output: 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Verify the object exists
git cat-file -t 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: blob
# Read the blob content
git cat-file -p 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: Hello, Git internals!
# Check object size
git cat-file -s 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: 22
# Hash a file without storing it
git hash-object src/main.py
# Output: computes SHA-1 without -w flag
# Inspect a tree object
git cat-file -p HEAD^{tree}
# Output:
# 100644 blob abc123... README.md
# 040000 tree def456... src/
# 100644 blob 789ghi... package.json
# Inspect a commit object
git cat-file -p HEAD
# Output:
# tree 4b825d...
# parent a1b2c3...
# author Name <email> timestamp timezone
# committer Name <email> timestamp timezone
#
# Commit message here
Security: .git Directory Exposure on Web Servers
Exposing .git/ on a production web server is one of the most common and dangerous misconfigurations. Attackers can download your entire source code history, including:
- All source code — every version of every file ever committed
- Committed secrets — API keys, passwords, tokens in old commits
- Developer information — names, emails from commit metadata
- Infrastructure details — deployment configs, internal URLs
# Check if your site exposes .git
curl -I https://yoursite.com/.git/HEAD
# If you get 200 OK instead of 403/404, you are vulnerable
# Common attack tools that exploit this:
# - git-dumper (downloads entire repo)
# - diggit.py (reconstructs repo from exposed objects)
Mitigation:
- Nginx:
location ~ /\.git { deny all; return 404; } - Apache:
RedirectMatch 404 /\.git - Never deploy
.git/— use build artifacts, not raw repos - Scan with:
gitleaks detect --log-opts="--all"before deployment
Git Object Types Deep Dive
Git’s object model has four core object types, each serving a distinct purpose in the content-addressable storage system.
Blob
A blob stores file content. It contains only the raw bytes of a file, with no metadata about filename or path. Two files with identical content at different paths share one blob object.
# Blob structure: just content, no metadata
echo "Hello World" | git hash-object -w --stdin
# Creates blob object with no reference to filename
Tree
A tree object references blobs (files) and other trees (directories) with associated metadata: file mode, filename, and SHA-1 pointer. Trees represent a directory state at a point in time.
# Inspect tree from commit
git cat-file -p HEAD^{tree}
# 100644 blob a1b2c3... README.md
# 040000 tree d4e5f6... src/
# 100644 blob 789abc... package.json
Commit
A commit object points to a tree (root directory), zero or more parent commits, and contains author/committer metadata with timestamps. Commits form the history graph of your project.
# Commit structure breakdown
git cat-file -p HEAD
# tree 4b825d... <- root tree SHA
# parent a1b2c3... <- previous commit
# author Geek Workbench <email> timestamp timezone <- author info
# committer Geek Workbench <email> timestamp timezone <- committer info
# <- blank line
# Commit message here <- message
Tag
An annotated tag object points to a commit (or another tag) with additional metadata: tagger info, timestamp, and a message. Lightweight tags are just refs; annotated tags are full objects.
# Annotated tag structure
git cat-file -p refs/tags/v1.0.0
# object 4b825d... <- pointed commit
# type commit
# tag v1.0.0
# tagger Geek Workbench <email> timestamp
#
# Release version 1.0.0
Object Type Comparison
| Aspect | Blob | Tree | Commit | Tag (annotated) |
|---|---|---|---|---|
| Stores | File content | Directory entries | Project snapshot | Pointer + metadata |
| References | None | Blob SHAs, tree SHAs | Tree SHA, parent SHAs | Commit/tag SHA |
| Has author? | No | No | Yes | Yes |
| Has message? | No | No | Yes | Yes |
| Content address | File content | Directory structure | Tree + history | Labeled pointer |
| Typical size | Same as file | Proportional to files | Small (~50-200 bytes) | Small (~150-250 bytes) |
Interview Questions
HEAD is a symbolic reference that points to the current branch (e.g., ref: refs/heads/main). A branch reference is a file in .git/refs/heads/ that points to a specific commit SHA. HEAD moves when you check out different branches; branch refs move when you make new commits.
Filesystem performance degrades with too many files in a single directory. By using the first two hex characters of the SHA-1 hash as a subdirectory, Git limits each directory to at most 256 entries (16²), keeping filesystem operations efficient even with millions of objects.
Use git reflog to find the SHA of the lost commit. The reflog in .git/logs/HEAD records every HEAD movement, including resets. Once you have the SHA, run git checkout <sha> or git branch recovery <sha> to restore it. This works as long as git gc hasn't pruned the unreachable objects.
Git creates blob objects in .git/objects/ for each file's content (if not already present), then updates the .git/index binary file to record the blob SHA, file path, and metadata. The working tree files are hashed and compared to existing objects — unchanged files reuse existing blobs, enabling deduplication.
Loose objects are stored as individual compressed files under .git/objects/<first-2>/<remaining-38>. They are quick to write but inefficient for large repositories with many objects. Packed objects are stored in .git/objects/pack/ as bundle files (`.pack` and `.idx`) that delta-compress similar objects together, reducing storage by 50-90%. The git gc command converts loose objects to packed format.
The index (.git/index) is a binary structure that caches information about the working tree relative to the current HEAD tree. It stores pathnames, blob SHAs, and file stage information. By maintaining this cache, Git avoids rescanning the entire working tree on every commit — it only needs to compare modified files against the cached entries. This enables O(changed_files) commits instead of O(total_files).
The reflog records every change to HEAD and branch references in .git/logs/, including timestamp, old SHA, new SHA, and committer identity. By default, reflogs are retained for 90 days for reachable commits and 30 days for unreachable commits (before garbage collection). They provide a safety net for recovering from destructive operations like git reset --hard or accidentally deleted branches.
git gc (garbage collection) performs several maintenance tasks: it repacks loose objects into pack files using delta compression, removes unreachable objects older than the retention period, rebuilds pack indexes, and prunes stale reflog entries. For active repositories, running git gc periodically keeps object storage efficient and prevents accumulation of orphaned objects.
Every object is stored under the SHA-1 hash of its content — the object name IS its content hash. When reading an object, Git verifies the stored hash matches the computed hash. When writing, Git can detect corruption by recalculating the hash. This provides implicit integrity verification for every object retrieval, making tampering detectable without external mechanisms.
A regular ref (e.g., refs/heads/main) contains a 40-character SHA-1 hex string pointing directly to a commit. A symbolic ref (like HEAD) contains the text ref: refs/heads/main, storing a reference to another ref rather than a raw SHA. Symbolic refs can be resolved recursively; regular refs cannot. Use git symbolic-ref to create or modify symbolic refs.
Git stores directories as tree objects that contain entries mapping names to blob or tree SHAs, along with file mode metadata (100644 for regular files, 040000 for subdirectories). Each commit's tree SHA represents the complete project state. When a file changes, only its blob SHA changes; the tree entry updates to point to the new blob, but parent tree objects remain unchanged if their other entries are identical.
Exposing .git/ publicly reveals the entire commit history, all source code (including removed files), all secrets ever committed (even if later removed from history), and developer identity information from commit metadata. Tools like git-dumper can reconstruct entire repositories from exposed .git/ directories. Mitigation: configure web server to deny access to /\.git paths or use build artifacts instead of raw repositories for deployment.
git reset modifies HEAD and/or the index and working tree, updating the current branch ref to point to a different commit. git checkout updates HEAD to point to a different branch (or commit in detached HEAD state), modifies the index to match that commit's tree, and replaces the working tree accordingly. Reset can optionally preserve working tree changes with --soft or --mixed flags; checkout always updates the working tree to match the target.
Hooks are executable scripts in .git/hooks/ triggered by Git operations at specific points (pre-commit, post-commit, pre-push, etc.). They are not versioned and not shared via normal clone operations. Security concerns: hooks run with user permissions, can execute arbitrary code, and are a common infection vector for malware. Always validate hook sources, use core.hooksPath to point to a centralized location, or use tools like Husky that manage hooks in the project repository.
git fsck (file system check) verifies repository integrity by checking object connectivity, dangling object references, and corruption. It reports issues but does not modify anything. git gc performs maintenance operations: repacking, pruning unreachable objects, and reflog cleanup. Use git fsck --full for comprehensive checks; use git gc to reclaim disk space and optimize storage after bulk operations or long periods of active development.
1) git add creates/updates blob in .git/objects/, updates index with blob SHA and file metadata. 2) git commit creates a tree object from current index state, creates a commit object pointing to that tree and parent commit(s), then updates the branch ref to point to the new commit. 3) The blob objects persist indefinitely (until gc prunes unreachable ones). The working tree file remains unchanged until checked out from a different commit.
Git does not explicitly track renames — it detects them based on content similarity and object identity. When a file is renamed and staged with git add followed by git mv (or just git add -A), Git stores a new blob with the new name and removes the old blob reference from the tree. The git status and git log --follow commands can trace file history across renames by comparing blob SHAs and content patterns.
packed-refs is a file (not directory) that stores all refs in a single file for performance optimization. When repositories grow large with many refs, storing each ref as an individual file in refs/ becomes slow. Git moves stable refs (like old tags) into packed-refs and keeps active refs as individual files. The file format is: <SHA-1> <ref name> per line, with blank lines separating ref groups.
The index tracks which version of each file is staged for commit via the blob SHA. You can git add some files, make more changes, then git add again — each git add updates the index with the current blob SHA for that path. When you git commit, only the files recorded in the index are included in the new tree. This enables committing only a subset of changes while keeping other changes unstaged for later commits.
REBASE_HEAD is a temporary file in .git/ created during an interactive or non-interactive rebase. It records the SHA of the commit currently being rebased (the "original" commit before replaying). If a rebase conflicts, REBASE_HEAD indicates which commit you were processing. Similar temporary files include MERGE_HEAD (during merges), CHERRY_PICK_HEAD (during cherry-picks), and BISECT_LOG (during bisect operations).
Further Reading
Conclusion
The .git directory is the engine room of version control. Understanding its layout — objects, refs, HEAD, config — demystifies how Git actually works. Peel back the abstraction and you’ll never be surprised by Git behavior again.
Category
Related Posts
Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each
Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.
Automated Changelog Generation: From Commit History to Release Notes
Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.
Choosing a Git Team Workflow: Decision Framework
Decision framework for selecting the right Git branching strategy based on team size, release cadence, and project type.