Git LFS for Large Files: Binary Asset Management at Scale

Master Git Large File Storage for managing binaries, media, and datasets in Git repositories. Learn pointer files, migration strategies, and production patterns for large file workflows.

published: reading time: 12 min read updated: March 31, 2026

Introduction

Git was designed for source code — small text files that change incrementally. When you try to version large binary files (images, videos, datasets, compiled binaries) with plain Git, things fall apart. Repository sizes balloon, clone times stretch to minutes, and every operation becomes sluggish. The problem isn’t Git’s fault; it’s using the wrong tool for the job.

Git Large File Storage (Git LFS) solves this by replacing large files with lightweight pointer files in your repository while storing the actual file contents on a remote server. Your Git history stays lean, clones stay fast, and large files are downloaded on demand. It’s the standard solution for game assets, machine learning datasets, design files, and any repository that needs to version binaries.

This post covers Git LFS architecture, setup workflows, migration strategies for existing repositories, and production patterns for managing large files at scale. If your repository contains files larger than a few megabytes, this is essential reading.

When to Use / When Not to Use

Use Git LFS when:

  • Your repository contains files larger than 10MB
  • You version binary assets (images, videos, audio, 3D models)
  • You manage datasets or compiled artifacts
  • Your clone/push times are dominated by large files
  • You need to track specific file types across the repository

Avoid Git LFS when:

  • All files are small text files (< 1MB)
  • You can use external storage (S3, CDN) instead
  • Your Git hosting doesn’t support LFS
  • You need to search or diff file contents (LFS files are opaque blobs)
  • You’re storing files that change frequently (LFS bandwidth costs add up)

Core Concepts

Git LFS works by replacing large files with pointer files:


flowchart TD
    A[Large File<br/>image.png 50MB] --> B[Git LFS Filter]
    B --> C[Pointer File<br/>128 bytes]
    C --> D[Git Repository<br/>Lightweight]
    B --> E[LFS Server<br/>Actual file stored]
    F[git clone] --> G[Download pointers]
    G --> H[Checkout triggers]
    H --> I[Download actual files<br/>from LFS server]
    I --> J[Working directory<br/>with real files]

Architecture and Flow Diagram


sequenceDiagram
    participant Dev as Developer
    participant Git as Git CLI
    participant LFS as Git LFS
    pointer as Pointer File
    participant LFSRemote as LFS Server
    participant GitRemote as Git Remote

    Dev->>Git: git add large-file.bin
    Git->>LFS: Smudge filter detects LFS file
    LFS->>LFS: Generate SHA256 hash
    LFS->>pointer: Create pointer file
    LFS->>LFSRemote: Upload actual file
    LFS->>Git: Stage pointer file
    Git->>GitRemote: Push pointer (small)
    Dev->>Git: git clone
    Git->>GitRemote: Download pointers
    Git->>LFS: Checkout triggers smudge
    LFS->>LFSRemote: Download actual files
    LFSRemote-->>LFS: Large file content
    LFS->>Dev: Working directory with real files

Step-by-Step Guide

1. Install Git LFS


# macOS
brew install git-lfs

# Ubuntu/Debian
sudo apt install git-lfs

# Windows
# Download from git-lfs.github.com

# Initialize (run once per user)
git lfs install

2. Track File Types


# Track specific file types
git lfs track "*.psd"
git lfs track "*.png"
git lfs track "*.mp4"
git lfs track "datasets/*.csv"

# Track specific large files
git lfs track "models/model-v1.bin"

# View tracked patterns
git lfs track

# This creates/updates .gitattributes
cat .gitattributes
# *.psd filter=lfs diff=lfs merge=lfs -text

3. Commit and Push


# Commit the .gitattributes file (important!)
git add .gitattributes
git commit -m "chore: configure Git LFS tracking"

# Add large files normally
git add large-file.bin
git commit -m "feat: add large asset"

# Push (LFS files upload automatically)
git push origin main

4. Clone LFS Repository


# Standard clone (downloads LFS files)
git clone https://github.com/user/repo.git

# Clone without LFS files (faster)
git clone --filter=blob:none https://github.com/user/repo.git

# Pull LFS files later
git lfs pull

# Pull specific paths
git lfs pull --include="images/*"

5. Migration: Convert Existing Repository


# Install git-lfs-migrate
brew install git-lfs

# Migrate specific file types (rewrites history)
git lfs migrate import --include="*.psd,*.png,*.mp4" --everything

# Migrate with backup
git lfs migrate import --include="*.bin" --everything --verbose

# Push migrated history
git push --force --all
git push --force --tags

⚠️ Warning: History rewriting requires force push and coordination with all collaborators.

6. LFS Management Commands


# List LFS files in repository
git lfs ls-files

# Show LFS file details
git lfs ls-files --long

# Check LFS status
git lfs status

# Prune old LFS objects
git lfs prune

# Fetch specific LFS objects
git lfs fetch --all

# Verify LFS integrity
git lfs fsck

Production Failure Scenarios + Mitigations

ScenarioImpactMitigation
LFS server unavailableCan’t checkout filesCache LFS files locally; use git lfs fetch --all
Bandwidth limits exceededPush/clone failsMonitor usage; compress files; use external storage for archives
Pointer file committed without LFSLarge file in Git historyUse pre-commit hook to catch; migrate with git lfs migrate
LFS file corruptionCheckout failsRun git lfs fsck; re-fetch from server
Migration breaks collaborationTeam confusionCommunicate migration; provide re-clone instructions
Storage limits reachedCan’t push new LFS filesPrune old objects; upgrade storage plan; archive unused files

Trade-offs

AspectGit LFSExternal Storage
IntegrationSeamless with GitManual sync required
VersioningFull Git historySeparate versioning
CostLFS storage feesS3/CDN costs
Clone speedFast (on-demand download)Manual download
CollaborationNative Git workflowExtra steps
BackupGit + LFS serverSeparate backup strategy

Implementation Snippets

Pre-commit hook to catch missing LFS tracking:


#!/bin/bash
# .husky/pre-commit
large_files=$(git diff --cached --name-only | while read file; do
  if [ -f "$file" ] && [ $(stat -f%z "$file" 2>/dev/null || stat -c%s "$file" 2>/dev/null) -gt 1048576 ]; then
    if ! git check-attr filter "$file" | grep -q "lfs"; then
      echo "$file"
    fi
  fi
done)

if [ -n "$large_files" ]; then
  echo "Error: Large files not tracked by LFS:"
  echo "$large_files"
  echo "Run: git lfs track '<pattern>'"
  exit 1
fi

LFS configuration for specific paths:


# .gitattributes
# Images
*.png filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text

# Design files
*.psd filter=lfs diff=lfs merge=lfs -text
*.ai filter=lfs diff=lfs merge=lfs -text
*.sketch filter=lfs diff=lfs merge=lfs -text

# Media
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.mp3 filter=lfs diff=lfs merge=lfs -text

# Datasets
*.csv filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text

# Binaries
*.bin filter=lfs diff=lfs merge=lfs -text
*.exe filter=lfs diff=lfs merge=lfs -text
*.dll filter=lfs diff=lfs merge=lfs -text

CI configuration with LFS:

# GitHub Actions
- uses: actions/checkout@v4
  with:
    lfs: true # Automatically pulls LFS files

# Or manual LFS pull
- uses: actions/checkout@v4
- run: git lfs pull

Observability Checklist

  • Logs: Log LFS upload/download operations and failures
  • Metrics: Track LFS storage usage, bandwidth, and file counts
  • Alerts: Alert on storage limits, bandwidth thresholds, and fetch failures
  • Dashboards: Monitor LFS adoption and repository size trends
  • Traces: Trace LFS file lifecycle from add to checkout

Security/Compliance Notes

  • LFS files are stored on remote servers; verify encryption at rest
  • Access controls apply to LFS objects; configure repository permissions
  • For regulated data, ensure LFS provider meets compliance requirements
  • LFS pointer files don’t contain file contents; safe to commit publicly
  • Audit LFS access logs for unauthorized downloads
  • Consider encrypting sensitive LFS files before committing

Common Pitfalls / Anti-Patterns

Anti-PatternWhy It’s BadFix
Forgetting to commit .gitattributesLFS not configured for teamAlways commit .gitattributes with tracking changes
Tracking too many file typesUnnecessary LFS overheadTrack only files > 1MB
Not pruning old LFS objectsDisk space wasteRun git lfs prune regularly
LFS files in submoduleComplex LFS handlingKeep LFS files in main repository
Migrating without team coordinationBroken clones for collaboratorsCommunicate migration; provide re-clone steps
Using LFS for frequently changing filesHigh bandwidth costsUse external storage for volatile large files

Quick Recap Checklist

  • Install Git LFS and run git lfs install
  • Configure file type tracking in .gitattributes
  • Commit .gitattributes before adding large files
  • Set up pre-commit hooks to catch missing LFS tracking
  • Configure CI to pull LFS files automatically
  • Monitor LFS storage usage and bandwidth
  • Set up regular pruning schedule
  • Document LFS workflow for team members

Interview Q&A

How does Git LFS replace large files with pointer files?

When you add a tracked file, Git LFS intercepts the operation via smudge/clean filters. It stores the actual file on the LFS server and creates a pointer file (containing OID, size, and server URL) in the Git repository. On checkout, the smudge filter replaces the pointer with the actual file downloaded from the LFS server.

What's the difference between Git LFS and Git submodules for large files?

Git LFS replaces large files with pointers while keeping them in the same repository. Submodules reference external repositories. LFS is better for large files within a project; submodules are better for separate projects with independent lifecycles. LFS provides seamless integration; submodules require explicit initialization and updating.

How do you migrate an existing repository to use Git LFS?

Use git lfs migrate import --include="*.ext" --everything to rewrite history and replace existing large files with LFS pointers. This requires a force push and coordination with all collaborators, who must re-clone the repository. Always backup before migration and test on a copy first.

What happens when LFS storage limits are reached?

You can't push new LFS files until you free up space or upgrade your plan. Existing LFS files remain accessible. Solutions include pruning old objects (git lfs prune), removing unused LFS files from history, or migrating to external storage for archival files. Monitor usage proactively to avoid surprises.

Can you use Git LFS with any Git hosting provider?

Most major providers support LFS: GitHub, GitLab, Bitbucket, Azure DevOps. However, storage limits, bandwidth policies, and pricing vary significantly. Some providers include LFS in their plans; others charge extra. Always verify LFS support and limits before committing to a hosting provider for large-file repositories.

Extended Production Failure Scenarios

LFS Quota Exceeded

A team’s Git LFS storage quota (e.g., GitHub’s 1GB free tier) is exceeded during a large asset push. The push fails with batch response: Storage quota exceeded. The Git commits succeed (pointer files are small), but the actual LFS objects are rejected. The repository now contains pointer files that reference objects that don’t exist on the LFS server. Anyone who clones gets broken checkouts — files are replaced with 128-byte pointer text instead of actual content.

Mitigation: Monitor LFS storage usage proactively. Set alerts at 80% quota. Before pushing large batches, check available space: git lfs status. If quota is exceeded, either upgrade the plan or remove unnecessary LFS objects from the push and store them externally.

Pointer Files Without Actual Content

A developer adds a new file type to .gitattributes but forgets to run git lfs install on their machine. Large files are committed as regular Git blobs instead of LFS pointers. The repository bloats, and other developers who have LFS installed see the files as regular Git objects — they can’t use git lfs pull to fetch them because they were never stored on the LFS server.

Mitigation: Use a pre-commit hook that checks for large files not tracked by LFS. Run git lfs migrate import to fix any files that slipped through. Add CI validation that rejects pushes containing large non-LFS blobs.

Extended Trade-offs

AspectGit LFSGit AnnexExternal Storage (S3)
CostProvider-dependent (often paid after free tier)Free — self-hostedPay-per-use, can be cheaper at scale
AccessibilityNative Git workflowComplex — separate toolchainManual — separate download step
VersioningFull Git history of pointersFull history trackingSeparate versioning system needed
SetupSimple — git lfs trackComplex — init, configure remotesModerate — SDK or CLI integration
CollaborationSeamless — works with any Git hostRequires all users to install git-annexRequires separate access management
Best forGame assets, design files, ML datasetsAcademic archives, personal backupsStatic assets, distribution files

Security and Compliance: LFS Object Access Control

  • Access control: LFS objects inherit repository permissions. If a user can clone the repo, they can download all LFS objects. For sensitive binaries, use private repositories with strict access controls.
  • Bandwidth costs: LFS bandwidth is often billed separately from storage. Large teams cloning frequently can incur significant costs. Use git clone --filter=blob:none to skip LFS downloads until needed.
  • Regional storage: Some LFS providers store objects in specific regions. For compliance (GDPR, data residency), verify where LFS objects are stored. GitHub LFS uses the same region as the repository; self-hosted GitLab allows region configuration.
  • Encryption: LFS objects are encrypted in transit (HTTPS) but may not be encrypted at rest depending on the provider. For sensitive binaries, encrypt files before adding them to LFS and manage decryption keys separately.
  • Audit logging: Track LFS download events for compliance. Most providers log LFS access separately from Git access. Review logs periodically for unusual download patterns.

Cross-Roadmap References

Resources

Category

Related Posts

Centralized vs Distributed VCS: Architecture, Trade-offs, and When to Use Each

Compare centralized (SVN, CVS) vs distributed (Git, Mercurial) version control systems — their architectures, trade-offs, and when to use each approach.

#git #version-control #svn

Automated Changelog Generation: From Commit History to Release Notes

Build automated changelog pipelines from git commit history using conventional commits, conventional-changelog, and semantic-release. Learn parsing, templating, and production patterns.

#git #version-control #changelog

Choosing a Git Team Workflow: Decision Framework for Branching Strategies

Decision framework for selecting the right Git branching strategy based on team size, release cadence, project type, and organizational maturity. Compare Git Flow, GitHub Flow, and more.

#git #version-control #branching-strategy