File Allocation Methods

Understanding how file systems allocate disk space through contiguous, linked, and indexed allocation strategies, and how modern file systems like FAT, ext2/3/4, and NTFS implement these methods.

published: May 19, 2026 reading time: 40 min read author: GeekWorkBench

Quick Summary

Understanding how file systems allocate disk space through contiguous, linked, and indexed allocation strategies, and how modern file systems like FAT, ext2/3/4, and NTFS implement these methods.

File Allocation Methods

When you save a 2GB video file, something has to decide where all those bytes go on disk. The file allocation method is the strategy a file system uses to assign disk space to files. Choose the wrong allocation method and your hard drive becomes a fragmented mess. Use the right one, and your system hums along efficiently for years.

This isn’t just theory—different allocation methods directly explain why:

FAT USB drives get slow after copying many files
Linux file systems survive crashes without data loss
Database systems carefully manage their own files rather than relying on the OS

Understanding allocation methods gives you insight into every storage decision you’ll ever make.

Introduction

File allocation methods are the strategies file systems use to assign physical disk blocks to files. When you save any file—documents, videos, applications—your operating system must decide which disk sectors will store each byte. The allocation method determines how efficiently your disk space is used and how quickly files can be read and written.

Three fundamental approaches exist: contiguous allocation (files stored in consecutive blocks), linked allocation (blocks scattered with pointers connecting them), and indexed allocation (a dedicated index block tracks all file blocks). Every file system—from FAT to NTFS to ext4—implements variations of these approaches, each with distinct performance and reliability trade-offs.

This article explores all three allocation methods with practical diagrams, real-world failure scenarios, and production-ready code examples. You’ll understand why FAT drives slow down with heavy use, how ext4 handles crashes without data loss, and why databases often bypass the OS file system entirely. By the end, you’ll be equipped to choose the right file system for any workload and diagnose allocation-related performance problems.

When to Use / When Not to Use

Allocation method knowledge helps with system design and troubleshooting.

When allocation method understanding helps:

Formatting drives for specific workloads (large files vs many small files)
Understanding why fragmentation occurs and how to prevent it
Designing applications that interact heavily with storage
Troubleshooting performance issues in file-heavy workloads
Choosing the right file system for a given use case

When you can rely on defaults:

General desktop and server use (ext4/XFS/NTFS handle this well)
Cloud storage where abstracted block devices are provided
Containerized workloads with overlay file systems

Architecture or Flow Diagram

graph TD
    A[File Creation Request] --> B{Allocation Strategy}

    B --> C[Contiguous Allocation]
    B --> D[Linked Allocation]
    B --> E[Indexed Allocation]

    C --> F[Find large enough hole]
    D --> G[Track free blocks with linked list]
    E --> H[Build index block for file]

    F --> I[Allocate consecutive blocks]
    G --> J[Allocate from anywhere, link blocks]
    H --> K[Pointers in index block]

    style C stroke:#ff00ff,stroke-width:2px
    style D stroke:#00fff9,stroke-width:2px
    style E stroke:#ff00ff,stroke-width:2px

Core Concepts

Contiguous Allocation

The simplest approach: allocate consecutive blocks for each file.

File "video.mp4" - 5 blocks:
[Block 100][Block 101][Block 102][Block 103][Block 104]
                   ▲ Start block
                   └────── 5 consecutive blocks ──────┘

Directory Entry:
  Name: video.mp4
  Start: 100
  Length: 5

Advantages:

Excellent sequential read performance (no seeks between blocks)
Simple to implement
Small metadata overhead (only start block + length needed)

Disadvantages:

External fragmentation (must find large enough continuous hole)
Compaction required to reclaim space
File growth problematic (adjacent blocks may be taken)

// Simulated contiguous allocation
typedef struct {
    char filename[255];
    int start_block;
    int length_blocks;
} FileEntry;

int allocate_contiguous(int file_id, int blocks_needed) {
    int start = find_contiguous_hole(blocks_needed);
    if (start == -1) {
        // No hole large enough
        return -1;
    }
    allocate(start, blocks_needed);
    return start;
}

Linked Allocation

Each block contains a pointer to the next block:

File "document.txt" - 4 blocks:

[Block 50][data][Block 89] -> [Block 12][data][Block 200] ->
[Block 200][data][NULL]    -> [Block 75][data][NULL]

Directory Entry:
  Name: document.txt
  Start: 50
  End: 75 (cached or NULL if unknown)

Advantages:

No external fragmentation (use any free block)
File can grow easily (just add more blocks)
Simple directory entry (only start block needed)

Disadvantages:

Poor random access (must follow chain from start)
Pointer overhead (4-8 bytes per block)
Pointer integrity critical (corrupted pointer = data loss)
No concept of file size without scanning

// Linked allocation block structure
typedef struct {
    char data[BLOCK_SIZE - sizeof(int)];
    int next_block;  // Pointer to next block, -1 = end of file
} LinkedBlock;

Indexed Allocation

Store all block pointers in an index block:

File "database.db" - 7 blocks:

Index Block (Block 500):
┌─────────────────────────────────┐
│ Pointer: 120                    │
│ Pointer: 45                     │
│ Pointer: 389                    │
│ Pointer: 12                     │
│ Pointer: 201                    │
│ Pointer: 88                     │
│ Pointer: 156                    │
│ Pointer: -1 (end of index)      │
└─────────────────────────────────┘

         │
         ├──> [Block 120][data]
         ├──> [Block 45][data]
         ├──> [Block 389][data]
         ├──> [Block 12][data]
         ├──> [Block 201][data]
         ├──> [Block 88][data]
         └──> [Block 156][data]

Directory Entry:
  Name: database.db
  Index Block: 500

Advantages:

Direct random access to any block
Good sequential performance
Blocks can be anywhere on disk
File metadata self-contained in index block

Disadvantages:

Index block overhead (1 block per file regardless of size)
Limit on file size based on index block entries
Extra I/O to read index block first

Index Block Extension (ext2/3/4 approach): For large files, use multiple index blocks in a tree structure:

Single indirect block (pointers to data blocks)
Double indirect block (pointers to single indirect blocks)
Triple indirect block (pointers to double indirect blocks)

FAT: File Allocation Table

FAT uses a variation of linked allocation with a central table:

graph LR
    A[FAT Table] --> B[Cluster 0]
    A --> C[Cluster 1]
    A --> D[Cluster 2]
    B --> E[End of file]
    C --> D
    D --> E

    style A stroke:#ff00ff,stroke-width:3px

The FAT table stores one entry per cluster:

0x0000: Free cluster
0xFFFF: End of file
Other values: Next cluster in chain

FAT-16 Entry Example:
┌────────┬────────┬────────┬────────┐
│ Cluster│ Value  │ Meaning│ Next   │
├────────┼────────┼────────┼────────┤
│   0    │ 0xFFF8 │ Media  │ -      │
│   1    │ 0xFFFF │ EOF    │ -      │
│   2    │   5    │ Next   │ Cluster 5
│   3    │   6    │ Next   │ Cluster 6
│   4    │   0    │ Free   │ -      │
│   5    │ 0xFFFF │ EOF    │ -      │
│   6    │ 0xFFFF │ EOF    │ -      │
└────────┴────────┴────────┴────────┘

File "example.txt" starting at cluster 2:
[Cluster 2] -> [Cluster 5] -> EOF

FAT Fragmentation Physics: On FAT, fragmentation isn’t just about free space scatter—it’s about how the FAT chain grows. When a file grows, new clusters are allocated wherever free space exists. Over time, a file’s cluster chain becomes a scattered sequence across the drive.

The firmware’s read-ahead cache compounds this: it caches sequential clusters during reads, but random access patterns (like video seeking) trigger cache misses and constant head movement. A fragmented FAT USB drive performing 4MB sequential writes develops cluster chains where each chunk spans 20+ non-contiguous clusters. Reading back the video requires 20+ seeks per chunk, dropping throughput from 20MB/s to 2MB/s.

The FAT table itself stays in memory—small for 4GB drives, but grows with volume size. FAT12 (floppy era) had tiny tables; FAT16 consumed meaningful RAM on early systems; FAT32 balances table size against cluster granularity for modern removable media.

ext2/3/4: Block Bitmap and Extents

Linux’s ext file systems use bitmaps and extent trees:

Block Bitmap: One bit per block indicates free/allocated. Fast for finding free space.

Extent-Based Allocation: Instead of listing every block, ext4 uses extents—contiguous ranges:

Extent format in ext4:
Start block: 1024
Length: 50 blocks  (means blocks 1024-1073 are contiguous)

File stored in extents:
  Extent 1: start=1024, len=50   (blocks 1024-1073)
  Extent 2: start=2048, len=25    (blocks 2048-2072)
  Extent 3: start=4096, len=10    (blocks 4096-4105)

This reduces metadata overhead for large, contiguous files and minimizes fragmentation impact on performance.

Extent Tree Structure in ext4: The inode stores an extent tree rather than simple block pointers. The extent header contains:

eh_entries: Number of valid extent entries
eh_max: Maximum entries in this node
eh_depth: Tree depth (0 = all extents in inode, 1+ = external nodes)

Extent entries store contiguous ranges: {start block, length, start + length - 1}. A file with blocks 1024-1073 and 2048-2099 needs only two extent entries rather than 104 individual block pointers. This dramatically reduces inode size and speeds up metadata reads.

Delayed Allocation (delalloc): ext4 batches write allocations and delays physical block assignment until writeback time. Writes accumulate in the page cache, and the allocator batches them together—reducing fragmentation from scattered small allocations.

Caveat: If power fails before delayed allocations are written, data is lost. The data=ordered mode in ext4 ensures file data hits the journal before metadata, reducing this risk at a slight performance cost.

NTFS: MFT and Attribute Records

NTFS uses the Master File Table (MFT) where each file has one or more records:

MFT Record for large file:

$DATA attribute (non-resident):
  Runlist: cluster 1000-1023, then 2048-2063, then 4096-4105

$FILE_NAME attribute:
  Name: large_video.avi
  Size: 2GB

NTFS can store data directly in the MFT for small files (under 900 bytes), or use runlists pointing to clusters scattered across the volume.

MFT Structure Details: The MFT is itself a file, stored at the beginning of the volume in a reserved zone. Each record contains attribute/value pairs:

$STANDARD_INFORMATION: Timestamps, permissions, flags
$FILE_NAME: File name (multiple possible for hard links)
$DATA: File content (resident within MFT or non-resident with runlist)
$BITMAP: For sparse files, tracks which clusters are allocated

Small files store content entirely within the MFT record—fastest possible access. Larger files store runlists pointing to data clusters scattered across the volume. The MFT can grow dynamically and maintains reserved zones for critical metadata.

NTFS Fragmentation Handling: Unlike ext4’s extent tree, NTFS uses runlists that can become fragmented when the MFT grows or large files have clusters scattered across the volume. NTFS uses update sequence numbers (USNs) to detect torn writes and ensure consistency—similar to journaling but implemented at the record level.

Production Failure Scenarios

Scenario 1: FAT Fragmentation Spiral

What happened: A security camera system wrote continuous 4MB video chunks to a FAT32 USB drive. After a week, the drive showed massive fragmentation. Read speeds dropped from 20MB/s to 2MB/s. Playback became choppy.

Why it happened: FAT can’t handle many small allocations well. Each video chunk became a linked chain of clusters scattered across the drive. Reading one file meant dozens of seeks.

Detection:

# Check fragmentation on FAT/FAT32
defrag /c: /v  # Windows

# Or use CHKDSK
chkdsk e: /r  # Scan and fix

Mitigation:

Use larger cluster sizes when formatting (32KB instead of 4KB)
Periodically defragment FAT drives
For video recording, use dedicated file systems (NTFS, ext4)
Consider exFAT for large files on removable media

Scenario 2: ext4 Inode Exhaustion vs Block Exhaustion

What happened: A mail server stored millions of tiny email files. df showed plenty of disk space, but new files couldn’t be created. The issue: inode exhaustion. All inodes were allocated even though blocks remained.

Detection:

df -i  # Check inode usage
# Filesystem   Inodes    IUsed   IFree  IUse% Mounted on
# /dev/sda1  100M       99.9M   100K   99%   /var/mail

tune2fs -l /dev/sda1 | grep -i inode
# Inode count:              100000000
# Inodes per group:         8192
# Block size:               4096

Mitigation:

Use mkfs.ext4 -N <inode_count> to provision more inodes when formatting
Set inode ratio with mkfs.ext4 -i <bytes_per_inode>
Monitor inode usage with alerting

Scenario 3: Database File Fragmentation

What happened: A PostgreSQL database grew to 500GB over months. Query performance degraded despite hardware having plenty of CPU and memory. Analysis showed database file fragmentation at 78%.

Detection:

# Check fragmentation on ext4
sudo fsck.ext4 -nv /dev/sda1

# Check extent tree depth
sudo debugfs -R "extents -v /path/to/database" /dev/sda1

Mitigation:

Use e4defrag to defragment:
```
sudo e4defrag /var/lib/postgresql/data/
```
For databases, consider preallocate or use raw partitions
Choose noatime mount option to reduce fragmentation
Plan capacity to avoid >80% disk fullness

Database File Management: Why Raw I/O?

Databases like PostgreSQL and Oracle often bypass the OS file system entirely, using raw partitions or raw I/O. Understanding why:

Page cache bypass: When you write() through the OS, data goes to the page cache first. The database’s buffer pool is already a cache optimized for database access patterns. Double caching (OS page cache + database buffer pool) wastes RAM and adds CPU overhead.

fsync control: The database needs to control exactly when data is on disk. The OS may delay writes, combine them, or reorder them. For a database transaction to be durable, it needs fsync() — and the database knows exactly what needs syncing and when, without trusting the OS write-back scheduler.

Block size alignment: Database pages (typically 8KB) can be aligned to RAID stripe boundaries with raw I/O, preventing read-modify-write cycles on partial stripes.

Predictable latency: With raw I/O, I/O completion is predictable. With buffered I/O, a write might return “success” while data is still in the page cache awaiting writeback — causing durability surprises on crashes.

Implementation on Linux: posix_fallocate() for preallocation, O_DIRECT flag to bypass the page cache, and fsync() calls after critical transactions. PostgreSQL’s effective_io_concurrency setting tunes parallel I/O for raw device access.

Trade-off Table

Aspect	Contiguous	Linked	Indexed	ext4 Extents	NTFS MFT
Sequential Read	Excellent	Good	Good	Excellent	Excellent
Random Access	Good	Poor	Excellent	Good	Excellent
Fragmentation	High	Low	Low	Low	Moderate
Metadata Overhead	Low	Very Low	Medium	Low	Medium
Large File Support	Limited	Good	Limited	Excellent	Excellent
Crash Recovery	Poor	Poor	Good	Excellent	Excellent
Complexity	Low	Low	Medium	High	High

Implementation Snippet

Simulating Block Allocation with Bitmaps (Python)

#!/usr/bin/env python3
"""Simulate file allocation with block bitmap."""

class FileSystemSimulator:
    def __init__(self, total_blocks=1000, block_size=4096):
        self.total_blocks = total_blocks
        self.block_size = block_size
        self.block_bitmap = [0] * total_blocks  # 0 = free, 1 = allocated
        self.files = {}

    def allocate_contiguous(self, filename, size_blocks):
        """Allocate contiguous blocks (First Fit)."""
        start = self._find_contiguous_hole(size_blocks)
        if start == -1:
            return None

        # Mark blocks as allocated
        for i in range(start, start + size_blocks):
            self.block_bitmap[i] = 1

        self.files[filename] = {
            'start': start,
            'length': size_blocks,
            'type': 'contiguous'
        }
        return start

    def allocate_linked(self, filename, size_blocks):
        """Allocate using linked list approach."""
        free_blocks = self._find_free_blocks(size_blocks)
        if len(free_blocks) != size_blocks:
            return None

        # Allocate and link
        for i, block in enumerate(free_blocks):
            self.block_bitmap[block] = 1
            if i < len(free_blocks) - 1:
                # Store next pointer (simplified: store in separate structure)
                pass

        self.files[filename] = {
            'start': free_blocks[0],
            'blocks': free_blocks,
            'type': 'linked'
        }
        return free_blocks[0]

    def _find_contiguous_hole(self, size_needed):
        """Find first contiguous hole of sufficient size."""
        consecutive = 0
        start = -1

        for i, allocated in enumerate(self.block_bitmap):
            if allocated == 0:
                if consecutive == 0:
                    start = i
                consecutive += 1
                if consecutive >= size_needed:
                    return start
            else:
                consecutive = 0
                start = -1

        return -1

    def _find_free_blocks(self, count):
        """Find any free blocks."""
        free = []
        for i, allocated in enumerate(self.block_bitmap):
            if allocated == 0:
                free.append(i)
                if len(free) >= count:
                    return free
        return []

    def defragment(self, filename):
        """Defragment a linked file to contiguous allocation."""
        if filename not in self.files or self.files[filename]['type'] != 'linked':
            return False

        old_blocks = self.files[filename]['blocks']
        size = len(old_blocks)
        new_start = self._find_contiguous_hole(size)

        if new_start == -1:
            return False

        # Move data (in reality, this is a complex operation)
        for i in range(size):
            self.block_bitmap[old_blocks[i]] = 0
            self.block_bitmap[new_start + i] = 1

        self.files[filename] = {
            'start': new_start,
            'length': size,
            'type': 'contiguous'
        }
        return True

    def print_status(self):
        """Display allocation status."""
        allocated = sum(self.block_bitmap)
        print(f"Total blocks: {self.total_blocks}")
        print(f"Allocated: {allocated} ({100*allocated/self.total_blocks:.1f}%)")
        print(f"Free: {self.total_blocks - allocated}")
        print(f"\nFiles: {len(self.files)}")
        for name, info in self.files.items():
            print(f"  {name}: start={info.get('start', 'N/A')}, type={info['type']}")

if __name__ == "__main__":
    fs = FileSystemSimulator(100)

    # Test contiguous allocation
    fs.allocate_contiguous("video.mp4", 50)
    fs.allocate_contiguous("document.txt", 5)

    # Test linked allocation
    fs.allocate_linked("database.db", 20)

    fs.print_status()

Checking File Fragmentation with ext4

#!/bin/bash
# check_fragmentation.sh - Check file fragmentation on ext4

FILE=$1
if [ -z "$FILE" ]; then
    echo "Usage: $0 <filename>"
    exit 1
fi

DEVICE=$(df "$FILE" | tail -1 | awk '{print $1}')
MOUNT=$(mount | grep "$DEVICE" | awk '{print $3}')

# Use debugfs to check extent info
echo "=== Extent Information for $FILE ==="
debugfs -R "stat $FILE" "$DEVICE" 2>/dev/null | grep -E "Extent|Block|Size"

echo ""
echo "=== Fragmentation Analysis ==="
filefrag "$FILE"

echo ""
echo "=== Disk Layout ==="
ext2ls -l "$DEVICE" "$FILE" 2>/dev/null || echo "ext2ls not available"

Observability Checklist

Monitoring file allocation health:

df -h: Check available space
df -i: Check inode usage
filefrag -v file: Check file fragmentation level
e4defrag -c /mount/point: Check overall fragmentation percentage

# Comprehensive allocation monitoring script
#!/bin/bash
echo "=== Storage Allocation Report ==="
echo ""

echo "--- Space Usage ---"
df -h | grep -v tmpfs

echo ""
echo "--- Inode Usage ---"
df -i | grep -v tmpfs

echo ""
echo "--- Largest Files in /var (if exists) ---"
if [ -d /var ]; then
    du -sh /var/* 2>/dev/null | sort -rh | head -10
fi

echo ""
echo "--- Fragmentation Levels (ext4) ---"
for mount in $(mount | grep ext4 | awk '{print $3}'); do
    echo "Checking $mount..."
    sudo e4defrag -c "$mount" 2>/dev/null | grep -E "Fragmentation|total"
done

Common Pitfalls / Anti-Patterns

Secure File Deletion

When you rm a file, the file system just marks the file’s blocks as free in the allocation bitmap. The actual data stays on disk until those blocks get overwritten by something else. That gap between “deleted” and “really gone” is a genuine security risk — anyone with disk access or the right recovery tools can pull the data back.

Modern file systems make this worse in subtle ways. ext4’s delayed allocation means your file’s data might still be sitting in the page cache when you delete it, not written to disk yet. NTFS can retain MFT attribute records for deleted files. Even after blocks are truly freed, lab equipment can sometimes recover residual magnetic traces.

Three approaches actually work here:

Overwrite before delete: shred writes random data over file contents several times before unlinking. The -u flag removes the file after overwriting. This works on magnetic disks but SSDs are a different story — the wear-leveling controller maps logical blocks to physical cells however it wants, so the cell you think you’re overwriting might not be the one holding your data.
Full-disk secure deletion: sfill from the secure-delete package writes patterns across all free space on a mount point. Run it periodically on systems handling sensitive data.
Encryption at rest: This is the most reliable approach for SSDs. If the whole volume is encrypted with dm-crypt/LUKS, throwing away the key makes all the data unreadable instantly. No overwriting needed, and it works uniformly regardless of wear leveling. This is why compliance frameworks increasingly list encryption as the primary data-at-rest control instead of post-deletion overwriting.

# Overwrite file contents before deletion
shred -vu /path/to/file

# Or secure-delete entire directory
sfill -f /mount/point

# For ext4, encryption (dm-crypt) is more effective than overwriting on SSDs
# Create an encrypted container:
dd if=/dev/urandom of=/secure.img bs=1M count=1000
losetup /dev/loop0 /secure.img
cryptsetup luksFormat /dev/loop0
cryptsetup open /dev/loop0 secure_vol
mkfs.ext4 /dev/mapper/secure_vol

File System Integrity

File systems do more than just track which blocks belong to which files. They maintain several interlocking mechanisms to keep the structure consistent, and knowing what each does helps you diagnose when things go wrong.

Journaling (ext3/ext4, NTFS, XFS) writes metadata changes to a dedicated log area before applying them to the main file system. If a crash cuts off a write mid-operation, the journal can replay the incomplete transaction or roll it back. Without a journal, a crash leaves the allocation bitmap potentially corrupted — blocks marked free that are actually in use, or the reverse. ext2 has no journal. A crash means the next boot runs fsck across the entire partition, which on multi-terabyte drives can take an hour or more.

fsck consistency checks look for contradictions: inodes pointing to blocks marked free in the bitmap, directory entries referencing inodes that don’t exist, circular directory links. Running with -n does a read-only check — reported problems aren’t fixed. Run this on an unmounted filesystem to avoid the check itself changing state.

tune2fs intervals set when automatic checks fire. The filesystem increments a mount counter on each mount and a time counter daily. When either hits its limit, the next reboot triggers e2fsck. For servers that barely reboot, the mount-count threshold is more useful than time-based checks.

Block group invariants in ext2/3/4 are lower-level rules: bitmaps within each block group must be internally consistent, the inode table can’t overlap with data blocks, directory entries must point to valid inodes. Violations here usually point to hardware problems — bad RAM or a failing disk — rather than software bugs.

# Check ext4 metadata consistency
sudo e2fsck -n /dev/sda1  # -n = read-only, no changes

# Enable periodic file system checks
sudo tune2fs -l /dev/sda1 | grep -E "Check|state"
# Mount count: 50
# Check interval: 0 days

# Set automatic check interval (in days or mount count)
sudo tune2fs -c 30 /dev/sda1  # Check every 30 mounts
sudo tune2fs -i 90d /dev/sda1  # Check every 90 days

Compliance: Data Localization

Regulations like GDPR, HIPAA, and SOX require proof of where data lives, not just that it exists somewhere in a volume. Allocation metadata becomes your paper trail.

What triggers allocation tracking requirements:

Data sovereignty rules (GDPR Article 44-49) demand proof that files don’t cross jurisdictional boundaries. When a single file spans multiple physical disks across regions, auditors want to see exactly which blocks sit where.

Financial audits under SOX and PCI-DSS need documented proof that specific records were written to known locations and stayed there. Scattered allocation defeats this documentation.

Healthcare regulations under HIPAA require demonstrable retention controls. If defragmentation silently moves a patient record, can you prove it wasn’t altered?

Why allocation methods affect compliance:

Contiguous allocation makes residency proof straightforward. A file’s blocks occupy a known range, and the volume geometry determines the physical location. Linked and indexed allocation scatter blocks, complicating verification.

For regulated workloads, LVM snapshots at key timestamps create verifiable allocation snapshots. ext4’s dir_index tracks which files existed and where. Stricter requirements may need application-level logging that records file-to-block mappings independently of the file system.

# Create timestamped allocation snapshot for audit trail
sudo lvm snapshot /dev/vg0/data_vol -s audit_snap_$(date +%Y%m%d%H%M)
sudo debugfs -R "ls -l /audit_snap_20260519/data" /dev/vg0/audit_snap

# Track allocation patterns with audit tool
auditctl -w /data -p rw -k allocation_tracking
ausearch -k allocation_tracking

Filling Disks Above 80%

When disk usage climbs above 80%, three separate problems appear — and they make each other worse.

Fragmentation gets worse. With little free space scattered across the drive, the allocator picks from what’s available rather than what makes sense. A file that should take 100 contiguous blocks gets split across dozens of scattered regions instead. Sequential reads that should need one seek now need dozens. On ext4, files stop fitting in single extents and extent tree depth grows.

Allocation fails before the disk is actually full. The file system needs free blocks to carry out write operations themselves — not just to hold data. Creating a file means allocating directory entry blocks and inode entries. Extending a file means updating indirect pointers and extent trees. If a directory has thousands of entries, each new file needs a free inode somewhere. At 95% capacity, you may not be able to copy a 100MB file even though hundreds of megabytes of “free” space appear to exist — because the allocator can’t find the metadata blocks it needs for the operation itself.

ext4’s delayed allocation makes this sneakier. delalloc batches up write allocations and delays assigning physical blocks until writeback. When many delayed allocations pile up and the writeback thread finally runs against a nearly-full disk, it may not find contiguous runs for sequential writes. Processes block waiting for the allocator. The system looks idle but everything touching disk is stalled.

The actual threshold depends on the file system. ext4’s block group allocator performs best below 80%; beyond that, cross-group allocations increase. XFS with its allocation groups handles high utilization better but still degrades. Most admins treat 80% as a soft warning and 90-95% as an emergency cutoff.

# BAD: Let disk fill above 85%
# Causes: fragmentation, allocation failures, performance degradation

# GOOD: Monitor and alert
df -h | awk '{if ($5+0 > 80) print "WARNING: "$0}'

# Set up in /etc/rc.local or cron
# Also use iostat to check for allocation stalls:
iostat -x 1 | awk '$6 ~ /[0-9]+\.[0-9]/ && $6 > 50 {print "High await: "$0}'

Using Wrong Block Size

# Creating file system with wrong block size
# Small block (4KB): Good for many small files, wastes space on large files
# Large block (8KB): Good for large files, wastes inodes for small files

# Choose based on workload:
# Database: larger blocks
# Mail server: smaller blocks (many small files)

# Check current block size
tune2fs -l /dev/sda1 | grep "Block size"

# Recreate with different block size (requires backup)
sudo mkfs.ext4 -b 8192 /dev/sda1

Ignoring Extent Usage in ext4

ext4 replaced the old direct/indirect block pointer system from ext2/3 with extent-based allocation. Most files on ext4 use extents by default since kernel 2.6.23, but certain situations still produce non-extent files — and it matters more than you’d think.

Why extents are faster. A traditional Unix inode has 12 direct block pointers plus three levels of indirect pointers. A 10MB file with 4KB blocks needs roughly 2,560 block pointers distributed across the inode and several indirect blocks. Every read has to follow pointer chains through multiple metadata blocks before reaching the actual data. An extent entry stores a contiguous range as one pair: {start_block, length}. That same 10MB file fits in about 20 extent entries. Less metadata per file means fewer disk reads to pull up the same data.

When files fall back to block pointers. Retroactively converting a file requires rewriting its inode — e4defrag does this during defragmentation. Very old files migrated from ext2/3 may still use block pointers. Files patched heavily with dd or accessed with O_DIRECT can bypass extent allocation. Historically, the chattr +c flag (the compressed attribute, which ext4 doesn’t actually implement) caused the filesystem to fall back to block mapping.

How to check and fix. debugfs shows whether a file uses extents and how many entries its extent tree has. Files with Extent End depth greater than zero still use extents but have external extent tree nodes — this happens with very large files where the inode’s built-in extent entries overflow. chattr +e forces extent allocation for future writes but won’t convert an existing file retroactively. Run e4defrag to rewrite a file’s allocation in optimal extent format.

The performance difference shows up most on large files accessed sequentially. A 100GB database file using block pointers might need thousands of indirect block reads before reaching an offset. The same file in extent format resolves the same offset with a fraction of the metadata operations.

# Check if files are using extents
debugfs -R "ex list <path>" /dev/sda1

# Force extent usage (disable block mapping) for new allocations
chattr +e /path/to/file

# Check for non-extent files or files with deep extent trees
debugfs -R "stat <path>" /dev/sda1 | grep -i extent

# Defragment to convert to optimal extent layout
e4defrag /path/to/file

Quick Recap Checklist

Contiguous allocation: Fast but causes fragmentation
Linked allocation: Flexible but slow random access
Indexed allocation: Good random access, overhead per file
FAT: Linked list in a table; fragmentation kills performance
ext2/3/4: Bitmap allocation with extent-based files
NTFS: MFT with runlists for flexible allocation
Monitor inode usage (not just space usage)
Keep disk usage below 80% to prevent fragmentation

Interview Questions

1. Compare contiguous, linked, and indexed file allocation methods.

Contiguous allocation stores each file in consecutive blocks. Performance is excellent for sequential access, but external fragmentation becomes severe. Files can't grow if adjacent blocks are taken.

Linked allocation stores each block with a pointer to the next block. Any free block can be used, eliminating external fragmentation. However, random access requires following the entire chain, which is slow. If any pointer is corrupted, subsequent blocks are lost.

Indexed allocation stores all block pointers in an index block. Random access is fast, and blocks can be anywhere. The limitation is file size (limited by index block entries), and there's overhead (one index block per file).

Modern systems use variants: ext4 uses extents (contiguous ranges with pointers), NTFS uses the MFT with runlists, and FAT uses the central allocation table.

2. What is external fragmentation and why does it happen?

External fragmentation occurs when free disk space is split into small, non-contiguous chunks after files are created and deleted over time.

Example: You allocate a 10-block file, then delete it. Then you allocate 3 files of 3 blocks each in different areas. Delete two of them. Now you have three free blocks, but they're not adjacent. A new 5-block file can't be stored contiguously.

In contiguous allocation, this is catastrophic—you might have 100MB free but not be able to allocate a 50MB file. In linked allocation, it doesn't matter since blocks don't need to be adjacent. Indexed allocation is also immune since each block is referenced by index, not position.

Solutions: Compaction (moving files to consolidate free space), defragmentation tools, and using file systems that handle fragmentation gracefully.

3. How does ext4's extent-based allocation improve upon the traditional Unix inode approach?

Traditional Unix inodes use direct and indirect pointers (12 direct pointers, 1 single indirect, 1 double indirect, 1 triple indirect). For a large file, reading block N requires following pointer chains, potentially causing many disk I/Os.

ext4's extent-based allocation stores ranges of contiguous blocks as a single entry: {start_block, length}. A 100MB file might need just 2 extent entries instead of 25,600 block pointers.

Benefits:

Reduced metadata overhead (fewer entries per file)
Better sequential performance (fewer seek operations)
Less fragmentation impact (ranges still work even if scattered)
Fewer disk reads to access file metadata

4. Why might `df` show plenty of free space but `touch` fail with "No space left on device"?

This classic puzzle has two common causes:

1. Inode exhaustion: The file system has allocated all inodes. Inodes are created when the file system is formatted (you can have millions of tiny files with zero bytes each but use all inodes). Check with df -i. Solutions: recreate FS with more inodes (mkfs.ext4 -N <count>), or clean up small files.

2. Reserved blocks: ext2/3/4 reserves 5% of blocks for root. Even as root, you can't use these blocks. Check tune2fs -l /dev/sda1 | grep Reserved. Fix with tune2fs -m 0 /dev/sda1 (not recommended for root partition).

3. fragmentation: Not enough contiguous blocks for the allocation request (unlikely with linked/extent-based systems).

5. What is the difference between internal and external fragmentation?

Internal fragmentation: Wasted space within allocated units. If you allocate 4KB blocks and have a 1KB file, you waste 3KB. This is unavoidable with block-based allocation—you choose a block size and accept the waste.

External fragmentation: Free space split into non-contiguous chunks. After many allocations and deletions, you might have 100MB total free space but no single hole large enough for a 50MB file. This affects contiguous allocation methods but not linked or indexed.

Trade-offs:

Small block size: Less internal fragmentation, more external
Large block size: Less external fragmentation, more internal
Linked allocation: No external fragmentation, but pointer overhead
Extent-based: Reduces external fragmentation while keeping internal waste low

6. How does NTFS use the Master File Table (MFT)?

NTFS stores everything—including directory entries—as records in the Master File Table (MFT). Each file has at least one MFT record containing attribute/value pairs:

$STANDARD_INFORMATION: Timestamps, permissions
$FILE_NAME: File name (multiple possible for hard links)
$DATA: File content (resident within MFT or non-resident with runlist)

Small files (under ~900 bytes) store content entirely within the MFT record—fastest possible access. Larger files store runlists pointing to data clusters. The MFT itself can grow, and it maintains reserved zones for critical metadata.

NTFS uses update sequence numbers (USNs) to detect torn writes and ensure consistency—similar in concept to journaling but implemented differently.

7. What are the implications of the 1024-file-per-directory limit in FAT12/FAT16?

FAT12 and FAT16 limit directory entries to 16-bit root directory cluster counts. For FAT16 with typical cluster sizes, this limits root directories to 65,536 entries—but practical limits often enforced 512 entries per directory for compatibility.

FAT32 removed this root directory limitation (root can be anywhere, like any other directory), but FAT12/FAT16 root directories were fixed-size at format time.

Implications for modern use:

Old floppy disk FAT12 root directories held ~112 files maximum
Early USB drives with FAT12/16 would fail to create more than a few hundred files in the root directory
Modern SD cards and USB drives use FAT32 with essentially unlimited directory entries

8. How does extent-based allocation reduce metadata overhead compared to block pointers?

Traditional Unix inodes use block pointers:

12 direct pointers
1 single indirect (points to block of pointers)
1 double indirect (points to block of single indirect blocks)
1 triple indirect

A 1GB file with 4KB blocks needs 262,144 data blocks. With block pointers, that's one inode + 256 indirect block entries + etc.

Extent-based allocation stores contiguous ranges: {start: 4096, length: 65536} means "blocks 4096-69631, 65536 blocks (256MB)". A 1GB file might need just 4 extent entries instead of thousands of block pointers.

Benefits: Smaller metadata, fewer disk reads to access file, better sequential performance, less fragmentation impact.

9. Why do file systems reserve 5% of blocks, and how can you change this?

ext2/3/4 reserve 5% of blocks for the root user. This ensures:

Root can log in and repair the system even if users fill the disk
File system operations (like fsck) have working space
Critical daemons don't fail when disk is "full"

For large data partitions (e.g., 10TB file server), 5% is 500GB—significant waste. You can reduce or eliminate the reservation:

# View current reservation tune2fs -l /dev/sda1 | grep Reserved Change reservation to 1% tune2fs -m 1 /dev/sda1 Disable entirely (not recommended for root)

tune2fs -m 0 /dev/sda1

For non-root partitions where you manage recovery, reducing reservation saves substantial space.

9. What is the difference between contiguous allocation with consolidation and extent-based allocation?

Contiguous allocation with consolidation: When a file is deleted and an adjacent block becomes free, the file system merges the free blocks. This helps reduce external fragmentation but still requires allocation to find a large enough contiguous hole.

Extent-based allocation: Stores files as contiguous ranges called extents — a start block and a length. A file can span multiple extents if it grows beyond the initial allocation, but each extent is internally contiguous. This reduces the problem of finding huge contiguous holes while keeping most file access sequential.

Key difference: Contiguous allocation assigns one block range per file; extent allocation allows multiple block ranges per file but prefers to keep each range contiguous.

ext4 uses extent-based allocation. A 100MB file might need just 2 extents (256KB extent entries) instead of 25,600 individual block pointers. This dramatically reduces inode size and disk I/O for file metadata access.

10. What is block group allocation and why does it help with locality?

ext2/3/4 divide the partition into block groups, each containing:

Block bitmap (free block tracking)
Inode bitmap (free inode tracking)
Inode table (inode storage for the group)
Data blocks

The allocator tries to keep related data together:

New files get inodes in the same block group as their parent directory
Data blocks for a file stay within the same block group when possible
Linked files (hard links) prefer the same block group

This improves performance by ensuring related data is physically close on disk—reducing seek time during sequential access. The allocator uses heuristics and preallocation hints to maintain locality.

12. How does the free block bitmap work in ext2/3/4, and what are its advantages over linked allocation?

ext2/3/4 use a block bitmap—one bit per block indicating free (0) or allocated (1). To allocate:

Scan the bitmap for a free block
Mark it as allocated (set bit to 1)
Return the block number

Advantages over linked allocation:

O(1) free space check: No need to traverse chains to find free space
Fast allocation: Bitmap operations are hardware-optimized
No pointer corruption: Linked allocation's pointer chain isn't used; if a block is lost, the bitmap still shows it as allocated
Easy to find contiguous blocks: Bitmap scanning finds contiguous free ranges for extent allocation
Simple recovery: If a pointer is lost, the bitmap shows the actual allocation state

The inode bitmap works similarly for inode allocation. Both bitmaps are stored in each block group, enabling fast local allocation without scanning the entire partition.

13. What is the difference between file allocation table (FAT) and inode-based allocation?

FAT: A central table (stored at the beginning of the partition) where each entry corresponds to a cluster. File cluster chains are stored in this table:

FAT[cluster] = next cluster in file chain (or EOF)
Directory entries store the starting cluster only
The entire FAT must be in memory for efficient operation

Inode-based (ext2/3/4, XFS, NTFS): Each file has its own metadata structure (inode) containing block pointers:

Directory entries map names to inode numbers
Each inode stores pointers to the file's data blocks
No central table needed—inode is self-contained

Key differences:

FAT's chain is external to the file; inode's pointers are inside the file's metadata
FAT limits file size by table size; inode-based systems limit by pointer count
FAT must keep the entire table in memory; inode-based only needs to cache relevant inodes
Inode systems support hard links (multiple directory entries pointing to same inode); FAT does not

14. How does journaling improve file system reliability in ext3/4 compared to ext2?

ext2 has no journaling—after a crash, fsck must scan the entire partition to reconstruct the free block bitmap and directory structure. This takes minutes to hours on large partitions.

ext3 adds a write-ahead journal (a separate area on disk):

Before making a metadata change, write it to the journal first
Once the journal write is on disk (synchronized), apply the change to the actual file system
After the change is complete, mark the journal entry as committed

After a crash, recovery is fast:

Read the journal (small, linear scan)
Replay committed transactions (reapply changes)
Usually takes seconds, not minutes

ext4 goes further with journal checksums (detect journal corruption) and data=ordered mode (file data is written before its metadata in the journal), reducing both corruption risk and recovery time.

15. What is the trade-off between larger and smaller block sizes in file allocation?

Small block size (e.g., 4KB):

Pros: Less internal fragmentation, efficient for small files, more flexibility in allocation
Cons: More metadata overhead (more block pointers needed per file), worse sequential read performance (more seeks), larger bitmaps

Large block size (e.g., 8KB, 16KB):

Pros: Better sequential read performance (fewer blocks to seek between), smaller metadata (fewer pointers), better for large files
Cons: More internal fragmentation (wasted space in last block), inefficient for many small files, wasted i-nodes

Most file systems use 4KB as the default because it balances well for general workloads. Database workloads often use larger blocks to match their record sizes. The optimal choice depends on the typical file size in your workload.

16. How does Btrfs differ from ext4 in its allocation approach?

ext4 uses traditional block allocation with extents. Btrfs takes a fundamentally different copy-on-write (COW) approach:

extent-based vs block-pointer:

ext4: inode stores pointers to data blocks (traditional)
Btrfs: inode stores a root node of a B-tree; all block pointers are in the tree

COW semantics:

When you modify a block, Btrfs writes a new copy rather than overwriting in place
The old block remains until no references point to it
This enables snapshots (reference counting on blocks), checksums (verify old data wasn't corrupted), and easy RAID integration

Allocation differences:

Btrfs uses chunks (extents of the underlying devices) rather than fixed block groups
Dynamic allocation: chunks are allocated from free space and can be relocated
Integrated RAID: allocation metadata and data are checksummed and mirrored across devices

17. What is preallocation and when is it useful?

Preallocation reserves disk space for a file before data is written. Instead of allocating blocks as data arrives, the file system reserves a range of blocks upfront.

Use cases:

Databases: Preallocate data files to avoid fragmentation as the database grows. Use posix_fallocate() on Linux to reserve space without writing zeros.
Video recording: Reserve space for a fixed-length recording so the file doesn't fragment mid-write.
Preventing disk full errors: Allocate space for known-size files before writing begins.

Implementation in ext4:

extents are pre-allocated for the file
Blocks are marked as allocated in the bitmap
No data is written until the application writes—blocks remain sparse (read as zeros)

Caveat: Preallocating and then not using the space wastes capacity. Also, some older file systems (FAT) don't support true preallocation and may not reserve the blocks.

18. How does XFS handle allocation differently from ext4?

ext4 allocation approach:

Block groups of fixed size (~128MB each)
allocator tries to keep files and their metadata in the same block group
Uses bitmap-based allocation within block groups

XFS allocation approach:

Allocation groups (AGs): XFS partitions the filesystem into allocation groups (typically 1GB each), each with its own bitmap and inode allocator
B+tree bitmaps: XFS uses B+trees for free space tracking (not simple bitmaps), enabling faster allocation in large filesystems
Delayed allocation: Like ext4, XFS batches allocations and delays physical block assignment until writeback, reducing fragmentation
Stripe-aware allocation: XFS can align allocations to RAID stripe boundaries for better performance

XFS handles parallel allocation better because multiple threads can allocate from different AGs simultaneously without contention. This makes XFS excellent for large files and high-throughput workloads.

19. What is the difference between hard links and symbolic links in file allocation?

Hard link: Multiple directory entries (dentries) pointing to the same inode. The file's data blocks are referenced by one inode, but multiple names in different directories point to it.

Only works within the same file system (must have same inode number space)
Cannot link to directories (would create cycles in directory tree)
Deleting a hard link removes one name; the file persists until all hard links are removed
No extra disk space for the link itself (just a directory entry)

Symbolic link: A special file containing a path string. The symlink's inode points to data blocks containing the target path.

Can cross file systems and mount points
Can link to directories
Deleting the symlink does not affect the target
Deleting the target leaves a "dangling" symlink
Uses a small amount of space for the path string

From an allocation perspective: hard links share the same inode and data blocks; symbolic links allocate separate data blocks for the path content.

20. How does defragmentation work on extent-based file systems like ext4?

Fragmentation on ext4 occurs when a file's extents become scattered or when the last extent can't be extended due to adjacent allocated blocks.

Detecting fragmentation:

filefrag -v file: Shows extent list for a file
e4defrag -c /mount: Shows overall fragmentation percentage

Defragmentation with e4defrag:

Reads the file's current extent list
Calculates optimal layout (contiguous or minimal extents)
Allocates new blocks in the desired locations
Copies data to new blocks (leaving old blocks intact during copy)
Atomically updates inode to point to new blocks
Frees old blocks once new layout is active

The copy-on-write nature of the update ensures the file remains accessible during defragmentation. If the system crashes mid-defrag, the old extents are still valid.

Best practice: Keep disk usage below 80% to minimize fragmentation. ext4's delayed allocation helps—batching writes reduces scatter.

Conclusion

File allocation methods determine how disk space gets assigned to files, with direct consequences for performance and fragmentation. Contiguous allocation delivers excellent sequential throughput but suffers from external fragmentation that eventually requires compaction. Linked allocation eliminates fragmentation but destroys random access performance. Indexed allocation balances both, though with per-file overhead.

Modern file systems combine these approaches: FAT uses a central table for linked-style allocation, ext4 uses extent-based allocation that stores contiguous ranges efficiently, and NTFS uses the MFT with runlists for maximum flexibility. The key insight is that no single approach wins universally — the right choice depends on your workload characteristics.

For continued learning, study how databases implement their own file allocation strategies to bypass OS buffering, why ext4’s delayed allocation reduces fragmentation compared to ext3, and how the trade-offs between allocation methods manifest in real-world performance benchmarks. Understanding these fundamentals helps you diagnose storage performance issues and choose file systems appropriate for your use case.

File Allocation Methods

Introduction

When to Use / When Not to Use

Architecture or Flow Diagram

Core Concepts

Contiguous Allocation

Linked Allocation

Indexed Allocation

FAT: File Allocation Table

ext2/3/4: Block Bitmap and Extents

NTFS: MFT and Attribute Records

Production Failure Scenarios

Scenario 1: FAT Fragmentation Spiral

Scenario 2: ext4 Inode Exhaustion vs Block Exhaustion

Scenario 3: Database File Fragmentation

Database File Management: Why Raw I/O?

Trade-off Table

Implementation Snippet

Simulating Block Allocation with Bitmaps (Python)

Checking File Fragmentation with ext4

Observability Checklist

Common Pitfalls / Anti-Patterns

Secure File Deletion

File System Integrity

Compliance: Data Localization

Filling Disks Above 80%

Using Wrong Block Size

Ignoring Extent Usage in ext4

Quick Recap Checklist

Interview Questions

Change reservation to 1%

Disable entirely (not recommended for root)

Further Reading

Conclusion

Category

Tags

Related Posts

ASLR & Stack Protection

Assembly Language Basics: Writing Code the CPU Understands

Boolean Logic & Gates