The .git Directory Structure

Exploring the .git folder: HEAD, config, objects, refs, hooks, and index. Understand how Git stores everything internally for better debugging and recovery.

published: reading time: 20 min read author: Geek Workbench updated: March 31, 2026

Introduction

Every Git repository you’ve ever cloned, initialized, or worked with hides a single directory that contains the entire universe of that project’s history: the .git directory. While most developers interact with Git through high-level commands like git commit, git push, and git merge, understanding what lives inside .git transforms you from a Git user into a Git operator.

The .git directory is not a black box. It’s a carefully organized database of files, references, and metadata that together implement a distributed version control system. When things go wrong (and they will), knowing this structure is the difference between panicking and confidently recovering your work.

This post examines every component of the .git directory, explains what each file and folder does, how they interconnect, and why Git’s design choices matter for real-world development workflows.

When to Use / When Not to Use

When to understand .git internals:

  • Recovering from corrupted repositories or lost commits
  • Debugging mysterious Git behavior (detached HEAD, missing branches)
  • Writing custom Git hooks or tooling
  • Optimizing large repositories
  • Understanding how Git achieves its guarantees

When not to dig into .git:

  • Daily development workflows — use normal Git commands
  • When you’re unsure — modifying .git files directly can corrupt your repo
  • For simple tasks like committing or branching — the CLI is sufficient

Core Concepts

The .git directory is Git’s entire knowledge base. Everything outside of it is your working tree — the files you edit. Everything inside is Git’s internal representation of your project’s history, configuration, and state.


graph TD
    A[".git Directory"] --> B["HEAD"]
    A --> C["config"]
    A --> D["objects/"]
    A --> E["refs/"]
    A --> F["hooks/"]
    A --> G["index"]
    A --> H["logs/"]
    A --> I["description"]
    A --> J["packed-refs"]
    A --> K["COMMIT_EDITMSG"]

    D --> D1["info/"]
    D --> D2["pack/"]
    D --> D3["<sha2>/"]

    E --> E1["heads/"]
    E --> E2["tags/"]
    E --> E3["remotes/"]

Git’s design follows a simple principle: everything is a file. This means you can inspect, backup, and even manually repair a repository using standard file operations. The directory structure is stable across Git versions, making it a reliable foundation for tooling.

Architecture or Flow Diagram


flowchart LR
    WT["Working Tree\n(your files)"] -->|git add| IDX["index\n(staging area)"]
    IDX -->|git commit| OBJ["objects/\n(blobs, trees, commits)"]
    OBJ -->|referenced by| REF["refs/\n(branches, tags)"]
    REF -->|pointed to by| HD["HEAD\n(current ref)"]
    HD -->|determines| WT

    CFG["config"] -.->|settings| IDX
    CFG -.->|settings| OBJ
    HKS["hooks/"] -.->|triggers| IDX
    HKS -.->|triggers| OBJ

The flow shows how data moves from your working tree through the staging area into the object database, with references and HEAD tracking the current state. Configuration and hooks influence every transition.

Step-by-Step Guide / Deep Dive

Core References

The HEAD File

HEAD is a single file containing a reference to the current branch or commit SHA:


$ cat .git/HEAD
ref: refs/heads/main

When you’re in detached HEAD state, it contains a raw SHA-1 hash instead of a symbolic reference. This file is Git’s answer to “where am I right now?”

The config File

Your repository’s local configuration, overriding global (~/.gitconfig) and system-level settings:


[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = https://github.com/user/repo.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "main"]
    remote = origin
    merge = refs/heads/main

Object Storage

The objects/ Directory

Git’s content-addressable storage. Every blob, tree, commit, and tag lives here, named by their SHA-1 (or SHA-256) hash:


$ ls .git/objects/
02/  0a/  1b/  2c/  3d/  ...  pack/  info/

The first two characters of the hash form a subdirectory; the remaining 38 characters form the filename. Objects are zlib-compressed and stored as loose files until git gc packs them.

The refs/ Directory

References are named pointers to commits:


refs/
├── heads/        # Local branches
   ├── main
   └── feature/auth
├── tags/         # Tags
   └── v1.0.0
└── remotes/      # Remote-tracking branches
    └── origin/
        ├── main
        └── develop

Each file contains a 40-character SHA-1 hash. Reading refs/heads/main tells you exactly which commit main points to.

Staging & State

The index File

The index (or staging area) is a binary file that tracks which files are staged for the next commit. You can’t read it directly, but you can inspect it:


$ git ls-files --stage
100644 abc123... 0 src/main.py
100644 def456... 0 src/utils.py

Extensibility & Recovery

The hooks/ Directory

Executable scripts that run at specific points in Git’s workflow:


hooks/
├── pre-commit.sample
├── pre-push.sample
├── commit-msg.sample
└── ...

Remove .sample and make executable to activate. Hooks are not versioned — each developer manages their own.

The logs/ Directory

Reflogs track reference movements, enabling recovery of “lost” commits:


$ cat .git/logs/HEAD
0000000... abc123... User <user@email.com> 1711900000 +0000 commit: Initial commit
abc123... def456... User <user@email.com> 1711900100 +0000 commit: Add feature

Other Files

  • description: Repository description (used by GitWeb)
  • packed-refs: Packed references for performance (see Git Object Database and Pack Files)
  • COMMIT_EDITMSG: Last commit message (used by git commit --amend)
  • MERGE_HEAD, REBASE_HEAD: Temporary files during merge/rebase operations

Production Failure Scenarios

ScenarioSymptomsMitigation
Corrupted HEAD”fatal: bad HEAD”Restore from refs/heads/ or use git symbolic-ref HEAD refs/heads/main
Missing objects”fatal: loose object corrupt”Run git fsck --full, fetch from remote, or restore from backup
Broken refsBranch points to non-existent commitCheck reflog, reset to valid commit
Lock file stuck”fatal: Unable to create .git/index.lock”Remove stale .git/index.lock after verifying no other Git process runs
Hook failureCommit/push silently failsCheck hook exit codes, run hooks manually with bash .git/hooks/pre-commit

Trade-off Analysis

AspectAdvantageDisadvantage
File-based storageHuman-inspectable, easy to backupNot optimized for large repos without packing
SHA-1 hashingFast content addressingCollision risk (mitigated by SHA-256 transition)
Local hooksPer-developer customizationNot shared across team, easy to forget
ReflogRecovery safety netConsumes disk space over time

Implementation Snippets


# Initialize a new repository
git init
ls -la .git/

# Inspect HEAD
cat .git/HEAD

# List all objects with sizes
git count-objects -vH

# View packed refs
cat .git/packed-refs

# Inspect the index
git ls-files --stage

# List hooks
ls -la .git/hooks/

# View reflog
git reflog
cat .git/logs/HEAD

# Create a custom pre-commit hook
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
echo "Running pre-commit checks..."
npm run lint
EOF
chmod +x .git/hooks/pre-commit

Observability Checklist

  • Monitor: Repository size growth with git count-objects -vH
  • Log: Hook execution results (hooks should log to stderr)
  • Alert: Reflog size exceeding thresholds (prune with git reflog expire)
  • Verify: Run git fsck --full periodically on critical repositories
  • Track: Number of loose objects vs packed objects

Security & Compliance Considerations

  • The .git directory contains all history — never expose it on a web server
  • Hooks run with the same permissions as the user — validate hook sources
  • Configuration may contain credentials — never commit .git/config with secrets
  • Consider using git config core.hooksPath for shared, versioned hooks

Common Pitfalls / Anti-Patterns

  • Editing .git files directly — always use Git commands unless you know exactly what you’re doing
  • Deleting .git to “reset” — you lose all history; use git reset --hard instead
  • Ignoring hook exit codes — a failing hook should abort the operation
  • Not backing up .git — it IS your repository; the working tree is disposable
  • Sharing hooks via .git/hooks/ — hooks aren’t versioned; use core.hooksPath or a tool like Husky

Quick Recap Checklist

  • .git contains everything Git needs — working tree is disposable
  • HEAD points to current branch or commit
  • objects/ stores all content as SHA-addressed blobs, trees, commits, tags
  • refs/ contains branch and tag pointers
  • index is the staging area (binary format)
  • hooks/ runs scripts at key Git lifecycle events
  • logs/ enables recovery through reflogs
  • Never expose .git on production web servers

.git Directory Tree


graph TD
    ROOT[".git/"] --> HEAD["HEAD"]
    ROOT --> CONFIG["config"]
    ROOT --> OBJECTS["objects/"]
    ROOT --> REFS["refs/"]
    ROOT --> HOOKS["hooks/"]
    ROOT --> INDEX["index"]
    ROOT --> LOGS["logs/"]
    ROOT --> PACKED["packed-refs"]
    ROOT --> DESC["description"]
    ROOT --> EDITMSG["COMMIT_EDITMSG"]

    OBJECTS --> OBJ_INFO["info/"]
    OBJECTS --> OBJ_PACK["pack/"]
    OBJECTS --> OBJ_LOOSE["<first-2-chars>/<remaining-38>/"]

    REFS --> REFS_HEADS["heads/ (branches)"]
    REFS --> REFS_TAGS["tags/"]
    REFS --> REFS_REMOTES["remotes/ (tracking)"]

    REFS_HEADS --> RH_MAIN["main"]
    REFS_HEADS --> RH_FEAT["feature/auth"]

    REFS_TAGS --> RT_V1["v1.0.0"]

    REFS_REMOTES --> RR_ORIGIN["origin/"]
    RR_ORIGIN --> RR_MAIN["main"]
    RR_ORIGIN --> RR_DEV["develop"]

Production Failure: Corrupted Repository Recovery

Scenario: Corrupted index file preventing all operations


# Symptoms
$ git status
fatal: .git/index: index file smaller than expected

$ git add .
fatal: Unable to create '/path/to/repo/.git/index.lock': File exists.

# Root cause: Binary index file was truncated or corrupted
# (disk failure, interrupted operation, filesystem corruption)

# Recovery steps:

# 1. Remove stale lock file (verify no git process is running first)
rm -f .git/index.lock

# 2. Rebuild index from HEAD
git reset HEAD

# 3. If index is completely gone, recreate from current tree
git read-tree HEAD

# 4. Verify working tree matches
git status

# If objects are missing:
$ git cat-file -t abc123
fatal: git cat-file: could not get object info

# Recovery:
git fsck --full
# Fetch missing objects from remote
git fetch origin
# Or restore .git/objects/ from backup

Manual Object Inspection with Plumbing Commands


# Create a blob and inspect it
echo "Hello, Git internals!" | git hash-object -w --stdin
# Output: 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a

# Verify the object exists
git cat-file -t 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: blob

# Read the blob content
git cat-file -p 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: Hello, Git internals!

# Check object size
git cat-file -s 8c7e5a3b2d1f4e6a9c0b8d7e5f3a1c2b4d6e8f0a
# Output: 22

# Hash a file without storing it
git hash-object src/main.py
# Output: computes SHA-1 without -w flag

# Inspect a tree object
git cat-file -p HEAD^{tree}
# Output:
# 100644 blob abc123... README.md
# 040000 tree def456... src/
# 100644 blob 789ghi... package.json

# Inspect a commit object
git cat-file -p HEAD
# Output:
# tree 4b825d...
# parent a1b2c3...
# author Name <email> timestamp timezone
# committer Name <email> timestamp timezone
#
# Commit message here

Security: .git Directory Exposure on Web Servers

Exposing .git/ on a production web server is one of the most common and dangerous misconfigurations. Attackers can download your entire source code history, including:

  • All source code — every version of every file ever committed
  • Committed secrets — API keys, passwords, tokens in old commits
  • Developer information — names, emails from commit metadata
  • Infrastructure details — deployment configs, internal URLs

# Check if your site exposes .git
curl -I https://yoursite.com/.git/HEAD
# If you get 200 OK instead of 403/404, you are vulnerable

# Common attack tools that exploit this:
# - git-dumper (downloads entire repo)
# - diggit.py (reconstructs repo from exposed objects)

Mitigation:

  • Nginx: location ~ /\.git { deny all; return 404; }
  • Apache: RedirectMatch 404 /\.git
  • Never deploy .git/ — use build artifacts, not raw repos
  • Scan with: gitleaks detect --log-opts="--all" before deployment

Git Object Types Deep Dive

Git’s object model has four core object types, each serving a distinct purpose in the content-addressable storage system.

Blob

A blob stores file content. It contains only the raw bytes of a file, with no metadata about filename or path. Two files with identical content at different paths share one blob object.


# Blob structure: just content, no metadata
echo "Hello World" | git hash-object -w --stdin
# Creates blob object with no reference to filename

Tree

A tree object references blobs (files) and other trees (directories) with associated metadata: file mode, filename, and SHA-1 pointer. Trees represent a directory state at a point in time.


# Inspect tree from commit
git cat-file -p HEAD^{tree}
# 100644 blob a1b2c3... README.md
# 040000 tree d4e5f6... src/
# 100644 blob 789abc... package.json

Commit

A commit object points to a tree (root directory), zero or more parent commits, and contains author/committer metadata with timestamps. Commits form the history graph of your project.


# Commit structure breakdown
git cat-file -p HEAD
# tree 4b825d...                    <- root tree SHA
# parent a1b2c3...                  <- previous commit
# author Geek Workbench <email> timestamp timezone  <- author info
# committer Geek Workbench <email> timestamp timezone <- committer info
#                                         <- blank line
# Commit message here               <- message

Tag

An annotated tag object points to a commit (or another tag) with additional metadata: tagger info, timestamp, and a message. Lightweight tags are just refs; annotated tags are full objects.


# Annotated tag structure
git cat-file -p refs/tags/v1.0.0
# object 4b825d...                  <- pointed commit
# type commit
# tag v1.0.0
# tagger Geek Workbench <email> timestamp
#
# Release version 1.0.0

Object Type Comparison

AspectBlobTreeCommitTag (annotated)
StoresFile contentDirectory entriesProject snapshotPointer + metadata
ReferencesNoneBlob SHAs, tree SHAsTree SHA, parent SHAsCommit/tag SHA
Has author?NoNoYesYes
Has message?NoNoYesYes
Content addressFile contentDirectory structureTree + historyLabeled pointer
Typical sizeSame as fileProportional to filesSmall (~50-200 bytes)Small (~150-250 bytes)

Interview Questions

1. What is the difference between HEAD and a branch reference in Git?

HEAD is a symbolic reference that points to the current branch (e.g., ref: refs/heads/main). A branch reference is a file in .git/refs/heads/ that points to a specific commit SHA. HEAD moves when you check out different branches; branch refs move when you make new commits.

2. Why does Git use a two-level directory structure for objects (e.g., .git/objects/ab/cdef...)?

Filesystem performance degrades with too many files in a single directory. By using the first two hex characters of the SHA-1 hash as a subdirectory, Git limits each directory to at most 256 entries (16²), keeping filesystem operations efficient even with millions of objects.

3. How can you recover a commit that was lost after a hard reset?

Use git reflog to find the SHA of the lost commit. The reflog in .git/logs/HEAD records every HEAD movement, including resets. Once you have the SHA, run git checkout <sha> or git branch recovery <sha> to restore it. This works as long as git gc hasn't pruned the unreachable objects.

4. What happens when you run `git add` in terms of the .git directory?

Git creates blob objects in .git/objects/ for each file's content (if not already present), then updates the .git/index binary file to record the blob SHA, file path, and metadata. The working tree files are hashed and compared to existing objects — unchanged files reuse existing blobs, enabling deduplication.

5. Explain the difference between a loose object and a packed object in Git.

Loose objects are stored as individual compressed files under .git/objects/<first-2>/<remaining-38>. They are quick to write but inefficient for large repositories with many objects. Packed objects are stored in .git/objects/pack/ as bundle files (`.pack` and `.idx`) that delta-compress similar objects together, reducing storage by 50-90%. The git gc command converts loose objects to packed format.

6. How does the index file enable fast commits?

The index (.git/index) is a binary structure that caches information about the working tree relative to the current HEAD tree. It stores pathnames, blob SHAs, and file stage information. By maintaining this cache, Git avoids rescanning the entire working tree on every commit — it only needs to compare modified files against the cached entries. This enables O(changed_files) commits instead of O(total_files).

7. What is the purpose of the reflog, and how long is data retained?

The reflog records every change to HEAD and branch references in .git/logs/, including timestamp, old SHA, new SHA, and committer identity. By default, reflogs are retained for 90 days for reachable commits and 30 days for unreachable commits (before garbage collection). They provide a safety net for recovering from destructive operations like git reset --hard or accidentally deleted branches.

8. What happens during `git gc` and why is it important?

git gc (garbage collection) performs several maintenance tasks: it repacks loose objects into pack files using delta compression, removes unreachable objects older than the retention period, rebuilds pack indexes, and prunes stale reflog entries. For active repositories, running git gc periodically keeps object storage efficient and prevents accumulation of orphaned objects.

9. Describe how Git's content-addressable storage ensures data integrity.

Every object is stored under the SHA-1 hash of its content — the object name IS its content hash. When reading an object, Git verifies the stored hash matches the computed hash. When writing, Git can detect corruption by recalculating the hash. This provides implicit integrity verification for every object retrieval, making tampering detectable without external mechanisms.

10. What is the difference between a symbolic ref and a regular ref in Git?

A regular ref (e.g., refs/heads/main) contains a 40-character SHA-1 hex string pointing directly to a commit. A symbolic ref (like HEAD) contains the text ref: refs/heads/main, storing a reference to another ref rather than a raw SHA. Symbolic refs can be resolved recursively; regular refs cannot. Use git symbolic-ref to create or modify symbolic refs.

11. How does Git store directory trees in the object model?

Git stores directories as tree objects that contain entries mapping names to blob or tree SHAs, along with file mode metadata (100644 for regular files, 040000 for subdirectories). Each commit's tree SHA represents the complete project state. When a file changes, only its blob SHA changes; the tree entry updates to point to the new blob, but parent tree objects remain unchanged if their other entries are identical.

12. Why should `.git` directories never be served by a web server?

Exposing .git/ publicly reveals the entire commit history, all source code (including removed files), all secrets ever committed (even if later removed from history), and developer identity information from commit metadata. Tools like git-dumper can reconstruct entire repositories from exposed .git/ directories. Mitigation: configure web server to deny access to /\.git paths or use build artifacts instead of raw repositories for deployment.

13. What is the difference between `git reset` and `git checkout` in terms of .git internals?

git reset modifies HEAD and/or the index and working tree, updating the current branch ref to point to a different commit. git checkout updates HEAD to point to a different branch (or commit in detached HEAD state), modifies the index to match that commit's tree, and replaces the working tree accordingly. Reset can optionally preserve working tree changes with --soft or --mixed flags; checkout always updates the working tree to match the target.

14. How do custom Git hooks work, and what are the security considerations?

Hooks are executable scripts in .git/hooks/ triggered by Git operations at specific points (pre-commit, post-commit, pre-push, etc.). They are not versioned and not shared via normal clone operations. Security concerns: hooks run with user permissions, can execute arbitrary code, and are a common infection vector for malware. Always validate hook sources, use core.hooksPath to point to a centralized location, or use tools like Husky that manage hooks in the project repository.

15. What is the difference between `git fsck` and `git gc`?

git fsck (file system check) verifies repository integrity by checking object connectivity, dangling object references, and corruption. It reports issues but does not modify anything. git gc performs maintenance operations: repacking, pruning unreachable objects, and reflog cleanup. Use git fsck --full for comprehensive checks; use git gc to reclaim disk space and optimize storage after bulk operations or long periods of active development.

16. Describe the lifecycle of a file from working tree to committed object.

1) git add creates/updates blob in .git/objects/, updates index with blob SHA and file metadata. 2) git commit creates a tree object from current index state, creates a commit object pointing to that tree and parent commit(s), then updates the branch ref to point to the new commit. 3) The blob objects persist indefinitely (until gc prunes unreachable ones). The working tree file remains unchanged until checked out from a different commit.

17. How does Git handle file renaming in the object model?

Git does not explicitly track renames — it detects them based on content similarity and object identity. When a file is renamed and staged with git add followed by git mv (or just git add -A), Git stores a new blob with the new name and removes the old blob reference from the tree. The git status and git log --follow commands can trace file history across renames by comparing blob SHAs and content patterns.

18. What is the `packed-refs` file and when does Git use it?

packed-refs is a file (not directory) that stores all refs in a single file for performance optimization. When repositories grow large with many refs, storing each ref as an individual file in refs/ becomes slow. Git moves stable refs (like old tags) into packed-refs and keeps active refs as individual files. The file format is: <SHA-1> <ref name> per line, with blank lines separating ref groups.

19. Explain how the staging area (index) enables partial commits.

The index tracks which version of each file is staged for commit via the blob SHA. You can git add some files, make more changes, then git add again — each git add updates the index with the current blob SHA for that path. When you git commit, only the files recorded in the index are included in the new tree. This enables committing only a subset of changes while keeping other changes unstaged for later commits.

20. What is `REBASE_HEAD` and when does Git create it?

REBASE_HEAD is a temporary file in .git/ created during an interactive or non-interactive rebase. It records the SHA of the commit currently being rebased (the "original" commit before replaying). If a rebase conflicts, REBASE_HEAD indicates which commit you were processing. Similar temporary files include MERGE_HEAD (during merges), CHERRY_PICK_HEAD (during cherry-picks), and BISECT_LOG (during bisect operations).

Further Reading

Conclusion

The .git directory is the engine room of version control. Understanding its layout — objects, refs, HEAD, config — demystifies how Git actually works. Peel back the abstraction and you’ll never be surprised by Git behavior again.

Category

Related Posts

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn

Automated Changelog Generation: From Commit History to Release Notes

Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.

#git #version-control #changelog

Choosing a Git Team Workflow: Decision Framework

Decision framework for selecting the right Git branching strategy based on team size, release cadence, and project type.

#git #version-control #branching-strategy