Virtual File Systems (VFS)

Understanding how Linux abstracts multiple file systems through a common interface, enabling transparent access to ext4, NTFS, FAT, and network file systems.

published: reading time: 31 min read author: GeekWorkBench

Virtual File Systems (VFS)

Every time you access a file on Linux—whether it lives on an ext4 partition, a USB drive formatted with FAT32, an NFS share, or even /proc—the same interface handles it. That interface is the Virtual File System (VFS) layer, also known as the VFS abstraction layer. Without VFS, every application would need to understand how to talk to each specific file system type. With VFS, applications speak a universal language while the kernel translates to whatever file system actually stores the data.

VFS is one of the most elegant abstractions in operating systems. It enables the illusion of a unified file tree while simultaneously supporting dozens of radically different file system implementations. Understanding VFS helps you troubleshoot mounting issues, optimize file system performance, and understand how Linux achieves its legendary flexibility.

Introduction

When to Use / When Not to Use

Understanding VFS helps with system administration and troubleshooting.

When VFS knowledge is essential:

  • Mounting and configuring various file systems
  • Troubleshooting “mount succeeded but files not accessible” issues
  • Working with network file systems (NFS, CIFS, FUSE)
  • Understanding why some operations are slower on certain file systems
  • Container storage and volume mounting

When you can rely on defaults:

  • Standard server configuration with single file system type
  • Desktop usage with built-in file system support
  • Simple container workloads with default storage

Architecture or Flow Diagram

graph TD
    A[Application] --> B[POSIX System Calls]
    B --> C[VFS Layer]

    C --> D[ext4 Driver]
    C --> E[NTFS Driver]
    C --> F[FAT/VFAT Driver]
    C --> G[NFS Client]
    C --> H[CIFS/SMB Client]
    C --> I[procfs Driver]
    C --> J[tmpfs Driver]

    D --> K[Block Device Layer]
    E --> K
    F --> K
    G --> L[Network]
    H --> L
    K --> M[Storage Device]
    L --> N[NFS Server]
    L --> O[SMB Server]

    style A stroke:#ff00ff,stroke-width:2px
    style C stroke:#ff00ff,stroke-width:3px

The VFS layer sits between applications and the actual file system implementations. Each file system type implements the VFS interface, making them interchangeable from the application’s perspective.

Core Concepts

VFS Data Structures

The VFS layer is built on four key data structures that every file system must implement:

// Superblock - file system level metadata
struct super_block {
    unsigned long s_blocksize;         // Block size in bytes
    struct super_operations *s_op;     // Superblock operations
    struct dentry *s_root;             // Root directory entry
    struct list_head s_files;          // All open files
    void *s_fs_info;                   // File system specific info
    // ... many more fields
};

// Inode - represents a file (similar to on-disk inode)
struct inode {
    unsigned long i_ino;               // Inode number
    umode_t i_mode;                    // File type and permissions
    struct inode_operations *i_op;     // Inode operations
    struct file_operations *i_fop;     // File operations
    struct super_block *i_sb;          // Superblock reference
    // ... many more fields
};

// Dentry - directory entry (name to inode mapping)
struct dentry {
    const char *d_name;                // Name component
    struct inode *d_inode;             // Associated inode
    struct dentry *d_parent;           // Parent directory
    struct list_head d_subdirs;        // Child entries
    // ... many more fields
};

// File - open file instance
struct file {
    struct path f_path;                // Path to file
    struct file_operations *f_op;      // File operations
    loff_t f_pos;                      // Current position
    unsigned int f_flags;              // Open flags
    // ... many more fields
};

The key insight: these are generic structures. Specific file systems fill them in with their own implementations of standard operations.

VFS Operations

Each data structure has an associated operations table:

// Superblock operations - file system level
struct super_operations {
    struct inode *(*alloc_inode)(struct super_block *);
    void (*destroy_inode)(struct inode *);
    void (*dirty_inode)(struct inode *, int);
    void (*write_inode)(struct inode *, int);
    void (*put_inode)(struct inode *);
    void (*put_super)(struct super_block *);
    // ... more
};

// Inode operations - file/directory specific
struct inode_operations {
    int (*create)(struct inode *, struct dentry *, umode_t, bool);
    int (*lookup)(struct inode *, struct dentry *);
    int (*link)(struct dentry *, struct inode *, struct dentry *);
    int (*unlink)(struct inode *, struct dentry *);
    int (*mkdir)(struct inode *, struct dentry *, umode_t);
    int (*rmdir)(struct inode *, struct dentry *);
    // ... more
};

// File operations - file access
struct file_operations {
    loff_t (*llseek)(struct file *, loff_t, int);
    ssize_t (*read)(struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *);
    int (*open)(struct inode *, struct file *);
    int (*release)(struct inode *, struct file *);
    // ... more
};

Each file system (ext4, XFS, NTFS, etc.) implements these operations for its own data structures and semantics.

File System Registration

When the kernel boots, file system drivers register with VFS:

// Register a file system type
register_filesystem(&ext4_fs_type);
register_filesystem(&xfs_fs_type);
register_filesystem(&vfat_fs_type);
register_filesystem(&nfs_fs_type);

// File system type structure
struct file_system_type {
    const char *name;           // "ext4", "xfs", "ntfs"
    int fs_flags;               // FS_REQUIRES_DEV, FS_BINARY_MOUNTDATA, etc.
    struct dentry *(*mount)(struct file_system_type *, int,
                            const char *, void *);
    void (*kill_sb)(struct super_block *);
    struct module *owner;
    // ...
};

This registration makes the file system available for mounting.

Mount Chain

When you mount a device, the chain looks like:

graph TD
    A["mount -t ext4 /dev/sda1 /mnt"] --> B[VFS receives mount request]
    B --> C[Find ext4 in registered file systems]
    C --> D[Call ext4 mount function]
    D --> E[Read superblock from device]
    E --> F[Create super_block structure]
    F --> G[Create root dentry and inode]
    G --> H[Link /mnt to VFS mount tree]

    style A stroke:#ff00ff,stroke-width:2px
    style H stroke:#00fff9

The mount creates the VFS structures that represent the mounted file system in the unified namespace.

Path Resolution in VFS

When an application accesses /home/user/file.txt:

sequenceDiagram
    participant App as Application
    participant VFS as VFS Layer
    participant Cache as Dentry Cache
    participant FS as ext4 Driver
    participant Disk as Disk

    App->>VFS: open("/home/user/file.txt")
    VFS->>Cache: lookup dentry for "/"
    Cache-->>VFS: root inode
    VFS->>Cache: lookup dentry for "home"
    Cache-->>VFS: cached or inode
    VFS->>FS: read dir, find "user"
    FS->>Disk: read directory blocks
    Disk-->>FS: directory entries
    FS-->>VFS: inode for "user"
    VFS->>Cache: cache dentry
    VFS->>FS: lookup "file.txt"
    FS-->>VFS: inode for file.txt
    VFS-->>App: file descriptor

The dentry cache dramatically speeds repeated path lookups.

Core Concepts: File System Types

Disk-Based File Systems

These work with block devices:

  • ext2/ext3/ext4: The Linux standard, journaling, extent support
  • XFS: High-performance, scalable, used in enterprise
  • Btrfs: Copy-on-write, snapshots, checksums
  • NTFS: Windows file system (via ntfs-3g driver)
  • FAT32/exFAT: Universal compatibility, no journaling

Network File Systems

These access remote servers:

  • NFS (Network File System): Unix/Linux standard
  • CIFS/SMB: Windows interoperability
  • SSHFS: File system over SSH
  • FTPFS: FTP-backed file system
# Mount NFS
sudo mount -t nfs4 server:/share /mnt/nfs

# Mount CIFS
sudo mount -t cifs //server/share /mnt/cifs -o username=user

# Mount SSHFS
sshfs user@server:/path /mnt/sshfs

Virtual/Proc File Systems

These don’t store data on disk:

# proc - process information
ls /proc
# 1/  1234/  self/

# sys - system information
ls /sys
# block/  bus/  class/  devices/

# tmpfs - RAM-based file system
mount -t tmpfs tmpfs /tmp

# devpts - terminal devices
ls /dev/pts

Union/Mount Namespace File Systems

# overlay - union mount (container storage)
mount -t overlay overlay -o \
  lowerdir=/base,upperdir=/changes,workdir=/work /merged

# bind - bind mount (reuse subtree elsewhere)
mount --bind /old/location /new/location

Production Failure Scenarios

Scenario 1: File System Not Registered

What happened: An administrator tried to mount an ext4 partition but got “unknown file system type ‘ext4’.” The system had kernel support for ext4 as a module, but the module wasn’t loaded.

Detection:

# Check loaded file system modules
lsmod | grep -E "ext4|xfs|btrfs"

# Check available file systems
cat /proc/filesystems

# Try loading the module
sudo modprobe ext4

Mitigation:

  • Ensure file system modules are built into kernel or loaded
  • For embedded systems, include necessary FS support in kernel config
  • Use modprobe or add to /etc/modules for persistent loading

Scenario 2: VFS Cache Pressure Causing Memory Issues

What happened: A system with 64GB RAM showed 58GB used by page cache, leaving little for applications. The system started swapping despite having memory pressure from cache.

Detection:

# Check memory usage breakdown
free -h

# Check VFS cache statistics
cat /proc/meminfo | grep -E "Cached|Dirty|Writeback"

# Check for dropping caches
sync
echo 3 > /proc/sys/vm/drop_caches
free -h

Mitigation:

  • Adjust vm.vfs_cache_pressure:

    # Default is 100, lower to keep more dentry/inode cache
    sysctl -w vm.vfs_cache_pressure=50
    
    # Or make persistent in /etc/sysctl.conf
    vm.vfs_cache_pressure = 50
  • Use drop_caches for immediate relief during maintenance

  • Monitor and alert on cache vs application memory balance

Scenario 3: Overlay Mount Inconsistency

What happened: A container runtime used overlay file system. Applications inside containers saw stale files, files that existed in the lower layer weren’t visible, and some files showed old content despite being updated in the base image.

Why it happened: Overlay file systems have specific requirements for showing/hiding files. Incorrect lowerdir/upperdir configuration or copying files instead of using the union semantics caused visibility issues.

Detection:

# Check overlay mount options
mount | grep overlay

# View overlay layers
cat /proc/mounts | grep overlay

# Check which layers files come from
ls -la /merged/file  # upper has whiteout?

Mitigation:

  • Ensure proper overlay mount options:

    mount -t overlay overlay \
      -o lowerdir=/lower1:/lower2,upperdir=/upper,workdir=/work \
      /merged
  • Understand whiteout files (show deleted files from lower)

  • Use chattr -i for immutable flag handling in overlay

Scenario 4: Lost Connection to Network File System

What happened: An NFS server became unreachable. Client systems with NFS mounts hung—any command accessing /mnt/nfs would block indefinitely. The mount point couldn’t be unmounted.

Detection:

# Check NFS mount status
mount | grep nfs
cat /proc/mounts | grep nfs

# Check NFS daemon status
systemctl status nfs-server

# Monitor for network issues
netstat -an | grep 2049

Mitigation:

  • Use hard vs soft mount options:

    # Hard mount: retry indefinitely (can hang)
    mount -t nfs server:/share /mnt -o hard
    
    # Soft mount: timeout and return error
    mount -t nfs server:/share /mnt -o soft,timeo=50
  • Use intr option to allow signals to interrupt:

    mount -t nfs server:/share /mnt -o hard,intr
  • Use autofs for on-demand mounting

  • Set up monitoring for NFS connectivity

  • Unmount with lazy option when hung:

    sudo umount -l /mnt/nfs  # lazy unmount
    sudo umount -f /mnt/nfs  # forced unmount

Trade-off Table

File SystemVFS SupportPerformanceFeaturesComplexity
ext4NativeGoodJournal, extentsLow
XFSNativeExcellentJournal, quotaMedium
BtrfsNativeGoodCOW, snapshotsHigh
NTFSVia ntfs-3gModerateWindows compatMedium
NFSv4NativeNetwork limitedStatefulMedium
CIFS/SMBNativeNetwork limitedWindows compatLow
tmpfsNativeExcellent (RAM)Dynamic sizingLow
overlayNativeGoodUnion mountMedium

Implementation Snippet

Implementing a Simple FUSE File System

#!/usr/bin/env python3
"""Simple FUSE file system using Python (fuse-python)."""

from fuse import FUSE, FuseOSError, Operations
import os
import time

class SimpleFS(Operations):
    """A simple in-memory file system demonstrating VFS concepts."""

    def __init__(self):
        # In-memory storage
        self.files = {
            '/': {
                'type': 'directory',
                'content': b'',
                'st': self._stat('/', is_dir=True)
            }
        }

    def _stat(self, path, is_dir=False):
        """Generate stat information."""
        return {
            'st_mode': 0o40755 if is_dir else 0o100644,
            'st_nlink': 2 if is_dir else 1,
            'st_size': len(self.files.get(path, {}).get('content', b'')),
            'st_ctime': time.time(),
            'st_mtime': time.time(),
            'st_atime': time.time(),
        }

    def getattr(self, path, fh=None):
        if path not in self.files:
            raise FuseOSError(2)  # ENOENT
        return self._stat(path, self.files[path]['type'] == 'directory')

    def readdir(self, path, fh):
        entries = ['.', '..']
        for name in self.files:
            if name != '/' and os.path.dirname(name) == path.rstrip('/'):
                entries.append(os.path.basename(name))
        return entries

    def read(self, path, size, offset, fh):
        if path not in self.files:
            raise FuseOSError(2)
        data = self.files[path]['content']
        return data[offset:offset + size]

    def write(self, path, data, offset, fh):
        if path not in self.files:
            # Create file
            self.files[path] = {
                'type': 'file',
                'content': b''}
        current = self.files[path]['content']
        self.files[path]['content'] = current[:offset] + data
        return len(data)

    def create(self, path, mode, fi=None):
        self.files[path] = {
            'type': 'file',
            'content': b'',
            'st': self._stat(path)
        }
        return 0

    def mkdir(self, path, mode):
        self.files[path] = {
            'type': 'directory',
            'content': b'',
            'st': self._stat(path, is_dir=True)
        }

    def unlink(self, path):
        if path in self.files:
            del self.files[path]

    def rmdir(self, path):
        if path in self.files and self.files[path]['type'] == 'directory':
            del self.files[path]

if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('mount_point', help='Where to mount')
    parser.add_argument('-f', '--foreground', action='store_true')
    args = parser.parse_args()

    fuse = FUSE(SimpleFS(), args.mount_point, foreground=args.foreground)

Checking VFS Statistics

#!/bin/bash
# vfs_stats.sh - Display VFS statistics

echo "=== VFS Statistics ==="
echo ""

echo "--- File Systems Registered ==="
cat /proc/filesystems

echo ""
echo "--- Mount Points ==="
mount | column -t

echo ""
echo "--- Dentry Cache Stats ==="
cat /proc/sys/fs/dentry-state

echo ""
echo "--- Inode Stats ==="
cat /proc/sys/fs/inode-state

echo ""
echo "--- File Handle Limits ==="
echo "System max: $(cat /proc/sys/fs/file-max)"
echo "Current used: $(cat /proc/sys/fs/file-nr | awk '{print $1}')"
echo "Per-process limit: $(ulimit -n)"

echo ""
echo "--- VFS Cache Pressure ==="
cat /proc/sys/vm/vfs_cache_pressure

echo ""
echo "--- Dentry Cache Size ==="
grep -E "NrDentries|Dcache_alive" /proc/slabinfo 2>/dev/null || echo "Info not available"

Observability Checklist

Monitoring VFS and file system health:

  • mount: Show all mounted file systems with options
  • cat /proc/filesystems: List supported file system types
  • cat /proc/mounts: Detailed mount information including bind mounts
  • df -h: Show space usage for all mounted file systems
  • du -sh /path: Check space usage of specific directories
# Comprehensive VFS monitoring script
#!/bin/bash

echo "=== VFS Health Report ==="
echo "Generated: $(date)"
echo ""

echo "--- Active Mounts with FS Type ---"
mount | grep -v "tmpfs\|proc\|sys\|devpts\|cgroup" | awk '{print $3, $5}' | sort

echo ""
echo "--- Mount Options Security Check ---"
for mount_point in $(mount | awk '{print $3}'); do
    # Skip virtual mounts
    [[ "$mount_point" =~ ^(proc|sys|dev|/sys|/proc|/dev) ]] && continue

    opts=$(mount | grep " $mount_point " | awk '{print $6}' | tr -d '()')
    if [[ "$opts" == *"noexec"* ]]; then
        echo "$mount_point: noexec set (good for security)"
    fi
    if [[ "$opts" == *"nosuid"* ]]; then
        echo "$mount_point: nosuid set (good for security)"
    fi
done

echo ""
echo "--- NFS Mounts Status ---"
mount | grep -E "nfs|cifs" | while read line; do
    echo "$line"
    # Check for hung mounts
    mount_point=$(echo "$line" | awk '{print $3}')
    timeout 1 ls "$mount_point" >/dev/null 2>&1
    if [ $? -ne 0 ]; then
        echo "  WARNING: $mount_point not responding!"
    fi
done

echo ""
echo "--- File Descriptor Usage ---"
current=$(cat /proc/sys/fs/file-nr | awk '{print $1}')
max=$(cat /proc/sys/fs/file-max)
pct=$((current * 100 / max))
echo "Used: $current / $max ($pct%)"
if [ $pct -gt 80 ]; then
    echo "WARNING: File descriptor usage above 80%"
fi

Common Pitfalls / Anti-Patterns

Secure Mount Options

# security mount options for various scenarios

# /var (data partition)
# - noexec: prevents binary execution from this partition
# - nosuid: ignores setuid bit
# - nodev: no device files
UUID=xxx /var ext4 defaults,noexec,nosuid,nodev 0 2

# /tmp (temporary files)
# Consider tmpfs with size limit
tmpfs /tmp tmpfs defaults,noexec,nosuid,nodev,size=2G 0 0

# Network mounts
# Prevent execution of remote binaries
mount -t nfs server:/share /mnt -o noexec,nosuid,hard,intr

File System Hardening

# Enable file system access time recording (or disable for performance)
# No atime update (good for SSDs, reduces writes)
mount -o noatime /dev/sda1 /mnt

# Read-only mounting
mount -o remount,ro /mnt

# Prevent setuid execution
mount -o nosuid /mnt

# No device files
mount -o nodev /mnt

# No binary execution
mount -o noexec /mnt

Common Pitfalls / Anti-patterns

1. Ignoring Bind Mount Flags

# BAD: Bind mount without considering security
mount --bind /home /mnt/shared

# GOOD: Use appropriate flags
mount --bind /home /mnt/shared
mount -o remount,bind,nosuid,nodev,ro /mnt/shared

2. Network File System Without Timeout

# BAD: Hard mount with no interrupt capability
mount -t nfs server:/share /mnt -o hard

# GOOD: Soft mount with timeout, interruptible
mount -t nfs server:/share /mnt -o soft,timeo=50,retrans=3,intr

# BEST for critical systems: autofs
echo "/mnt/nfs -fstype=nfs4 ro,intr server:/share" >> /etc/auto.master

3. Union Mount Misconfiguration

# BAD: Incorrect overlay order
mount -t overlay overlay \
  -o upperdir=/upper,lowerdir=/lower,workdir=/work /merged
# If upper is below lower in order, lower wins

# GOOD: Correct order
mount -t overlay overlay \
  -o lowerdir=/lower:/base,upperdir=/upper,workdir=/work /merged

4. Assuming VFS Caches Are Always Safe

# BAD: Not syncing before unmount
umount /mnt  # Could lose data in cache

# GOOD: Sync first
sync
umount /mnt

# Or use lazy unmount if busy
umount -l /mnt

Quick Recap Checklist

  • VFS provides the common interface all Linux file systems implement
  • Key structures: super_block, inode, dentry, file
  • Each file system implements operations through function pointers
  • Dentry cache dramatically speeds repeated path lookups
  • Network file systems add latency but enable sharing
  • Virtual file systems (proc, sys, tmpfs) provide kernel interfaces
  • Mount options control security and performance
  • VFS is why you can cat /proc/cpuinfo and mount -t nfs server:/share with the same API

Interview Questions

1. What is the VFS layer and why was it created?

The Virtual File System (VFS) is an abstraction layer in the Linux kernel that provides a unified interface to different file system implementations. Before VFS, applications would need to know how to communicate with each specific file system type.

VFS was created to solve the problem of file system heterogeneity. When you have ext4, XFS, Btrfs, NTFS, NFS, CIFS, and dozens of other file systems, applications shouldn't need separate code paths for each one.

The key insight: all file systems present the same API through VFS. Applications call open(), read(), write(), and close(). VFS translates these to whatever the underlying file system understands. The application has no idea—and doesn't care—what's underneath.

2. Explain the relationship between VFS, the dentry cache, and the inode cache.

The dentry cache (Directory Entry cache) and inode cache work together to speed file system operations:

Dentry cache stores the mapping between directory entry names and inode numbers. When you access /home/user/file.txt, the dentry cache remembers:

  • "/" maps to the root inode
  • "home" maps to inode for /home
  • "user" maps to inode for /home/user

Inode cache stores the actual inode structures (metadata about files) including permissions, timestamps, and pointers to data blocks.

The relationship: dentries point to inodes. When you resolve a path, you use the dentry cache to quickly find each component, which gives you the inode number, which the inode cache can then provide the full inode structure.

Without these caches, every file access would require disk I/O to read directory entries and inodes.

3. What happens when you mount a device? Walk through the VFS layer involved.

When you execute mount -t ext4 /dev/sda1 /mnt, the process involves:

  1. Parse mount options: VFS extracts file system type (ext4) and target (/mnt)
  2. Locate file system driver: Looks up "ext4" in registered file systems
  3. Call mount function: Invokes ext4's mount() function
  4. Read superblock: ext4 driver reads the file system's superblock from the device
  5. Create VFS structures: Allocates super_block, inode for root directory
  6. Link to mount tree: Adds the mount to the VFS mount namespace
  7. Return success: Now /mnt represents the root of ext4 filesystem

After mounting, any file operation in /mnt goes through the ext4 driver's VFS operations to the underlying blocks on /dev/sda1.

4. How does the kernel support multiple simultaneous file system types?

The kernel supports multiple file system types through registration and operation vectors:

Each file system driver registers with VFS using register_filesystem(), providing:

  • Its name (e.g., "ext4", "xfs", "nfs")
  • Its mount function
  • Its operation vectors (super_operations, inode_operations, file_operations)

When a mount is requested, VFS looks up the file system by name and calls the registered mount function. Each file system implements the same interface but with its own logic.

At runtime, you can have ext4 on /, XFS on /home, tmpfs on /tmp, and NFS on /mnt/nfs simultaneously. Applications see all as part of the unified namespace, but VFS routes each operation to the appropriate driver.

5. What is the difference between a bind mount and a symbolic link, from the VFS perspective?

Symbolic link is a file type (stored in directory entries, has its own inode, contains a path string). When you access a symlink, VFS performs path resolution on the target path, which may cross mount points.

Bind mount is a VFS concept where the same directory entry (same dentry/inode) appears in multiple places in the mount tree. The underlying data is identical—they share the same VFS structures.

Key differences:

  • Symlinks can cross mount boundaries; bind mounts stay within the same file system view
  • Bind mounts show the actual data, not a path that could be modified
  • Deleting through a bind mount affects the original (they're the same inode)
  • Symlinks have their own inode; bind mounts share the same inode

In container contexts, bind mounts are used to expose host directories into containers. The container sees the same data as the host because it's the same VFS entry, just accessed from a different mount point.

6. How does the kernel handle a mount operation at the VFS layer?

When you execute mount -t ext4 /dev/sda1 /mnt:

  1. sys_mount() system call: Triggers VFS mount logic
  2. Parse mount options: VFS extracts file system type and flags
  3. Locate file system driver: Looks up "ext4" in the registered file system list
  4. Call mount function: Invokes ext4's mount(), which reads the superblock
  5. Create super_block: Allocates kernel structure, reads superblock from disk
  6. Create root dentry and inode: Represents / of the new filesystem
  7. Link to mount tree: Adds to the per-process mount namespace
  8. Return: Now /mnt paths route through ext4 driver

The mount namespace is per-process (container isolation). Each process may see different mounts.

7. What is the purpose of the super_operations structure in VFS?

struct super_operations is a function pointer table that defines callbacks for file system-level operations. Each file system implements these to provide its specific behavior:

  • alloc_inode / destroy_inode: Create/free inode structures
  • dirty_inode: Called when inode is modified
  • write_inode: Persist inode to disk
  • put_super: Clean up during unmount
  • remount_fs: Handle mount option changes

This is the VFS polymorphism pattern: VFS calls these functions without knowing if it's ext4, XFS, or NTFS. Each driver fills in its own implementations, and VFS calls through the function pointers.

8. What is an example of when VFS abstraction "leaks" in practice?

VFS abstraction leaks when the unified interface doesn't fully mask differences:

  • Extended attributes: ext4 supports ACLs via xattrs; FAT32 doesn't. Copying files between them loses permissions.
  • Case sensitivity: ext4 is case-sensitive, NTFS/FAT are case-insensitive. A file created on Linux may be invisible on Windows mounts.
  • Symbolic links on FAT: FAT doesn't support symlinks. Creating one on a CIFS mount backed by FAT might create a shortcut (.lnk) file instead—or fail silently.
  • Special files: /proc and /sys aren't real directories. Tools like find behave differently on them.

Understanding these leaks helps diagnose cross-filesystem issues like "my permissions don't work on NAS."

9. How do containers use VFS mount namespaces for isolation?

Containers use mount namespaces (CLONE_NEWNS) to create isolated mount views:

  1. Clone with new namespace: clone(CLONE_NEWNS) creates process with copy of parent's mount namespace
  2. Private mount: The container's root is initially a copy of the host's
  3. Bind mounts: mount --bind /host/path /container/path exposes host directories at container paths
  4. Overlay mount: Upperdir/lowerdir layers implement copy-on-write for container changes
  5. Pivot_root or chroot: Changes the container's view of "/" to the container's rootfs

The container sees only its mounts—a process in the container cannot see or affect host mounts (unless explicitly shared). This isolation is entirely a VFS concept.

10. What is the difference between page cache and dentry cache in VFS?

Page cache: Stores actual file data content. When you read() a file, data goes into the page cache. When you write(), data is written to page cache first and flushed to disk later.

Dentry cache: Stores directory entry metadata—filename to inode mappings. When you resolve a path, you traverse dentry cache (cached lookups) to find the inode number. Dentries also cache child dentries for fast subtree traversal.

Key differences:

  • Page cache stores data; dentry cache stores structure
  • Page cache is page-granularity (4KB typically); dentries are variable-size
  • Dentry cache is purely kernel RAM; page cache can be swapped
  • Dentries implement directory tree structure; page cache is linear file content

Both are critical for performance—dentries speed path resolution, page cache speeds file content access.

11. What happens when you access a file on a network file system like NFS?

For NFS, each VFS operation triggers network I/O:

  1. open(): NFS client sends OPEN call to NFS server, receives file handle
  2. read(): Client sends READ request with file handle, offset, count; server responds with data
  3. write(): Client sends WRITE request with data; server acknowledges
  4. close(): Client sends CLOSE; server releases file state

NFS client caches aggressively:

  • Attribute cache: Stales inode metadata locally
  • Data cache: Pages cached locally with weak consistency
  • Dentry cache: Path component lookups cached

The trade-off: network latency (milliseconds) vs local disk (microseconds). NFS performance depends on cache hit rates.

12. How does path resolution work in VFS when traversing a path like /home/user/file.txt?

Path resolution in VFS follows a systematic traversal:

  1. Starting at root: VFS starts with the root dentry (always cached)
  2. Component lookup: For each path component ("home", "user", "file.txt"), VFS calls lookup() on the parent directory's inode
  3. Dentry cache check: Before calling the file system's lookup(), VFS checks the dentry cache. If the dentry is already cached, return it immediately
  4. FS-specific lookup: If not cached, call the file system's inode->i_op->lookup() function which reads directory entries from disk
  5. Cache the result: The newly found dentry is cached for future lookups
  6. Repeat: Continue until the final component is resolved

Each cached dentry also caches the dentries of its children, so deep path traversal after the first access is mostly cache hits. The d_lookup() function handles the hash-table lookup in the dentry cache.

13. What is the difference between a file struct and an inode in VFS?

Inode (struct inode): Represents a file on disk. There is exactly one inode per file (identified by inode number). It contains metadata (permissions, timestamps, size, block pointers) and points to the file's data blocks. Inodes are persistent—they exist on disk and are loaded into memory when needed.

File struct (struct file): Represents an open file handle. It exists only in memory for as long as the file is open. It contains the current file position (f_pos), open flags (f_flags), and points to the inode. Multiple processes can have the same file open, each with their own struct file but sharing the same inode.

Key difference: One inode per file on disk; one file struct per open file handle per process. If two processes open the same file, you have 2 file structs but 1 inode. If one process opens the same file twice, you have 2 file structs but 1 inode.

14. What happens when you unmount a file system in Linux?

Unmounting involves several steps:

  1. Sync: sync() flushes all dirty data and metadata to disk
  2. Reference count check: VFS checks that no files are open and no processes have chdir'd into the mount point
  3. Call put_super(): The file system's put_super() is called to release the super_block
  4. Free inodes: All inodes associated with the mount are freed (or marked for destruction)
  5. Remove from mount tree: The mount entry is removed from the VFS mount namespace
  6. Release resources: Filesystem-specific cleanup (close block device, free private data)

If files are still open or processes are using the mount, umount fails with "Device or resource busy" (unless umount -l lazy unmount is used, which detaches immediately and cleans up later).

15. How does VFS handle rename operations across different file systems?

Rename within the same file system is straightforward: VFS calls inode->i_op->rename(), which updates directory entries to point to the same inode under a new name.

Rename across file systems is not permitted at the VFS level. The operation:

  1. Check source and target: VFS verifies both paths resolve within the same mount
  2. Fail if different mounts: Cross-mount renames (e.g., /mnt/drive1/file to /mnt/drive2/file) return EXDEV ("Cross-device link")

This is a fundamental VFS constraint. Applications must implement cross-device rename as copy + delete: read source, write to destination, then unlink source. This preserves data integrity but loses metadata like timestamps and permissions unless explicitly preserved.

16. What is the role of the inode_operations structure in VFS?

struct inode_operations defines callbacks for file and directory operations that act on inodes. Each file system implements these for its specific semantics:

  • create: Create a regular file in a directory (e.g., open(filename, O_CREAT))
  • lookup: Find a directory entry by name, returning its inode
  • link: Create a hard link (same inode, new directory entry)
  • unlink: Remove a directory entry pointing to an inode
  • mkdir: Create a subdirectory
  • rmdir: Remove an empty subdirectory
  • rename: Change a file's name (possibly within the same directory)
  • setattr: Change inode attributes (permissions, timestamps)

The VFS layer calls these function pointers without knowing the underlying file system. ext4, XFS, and NTFS each have their own implementations with different algorithms and on-disk structures.

17. What is the relationship between tmpfs and the VFS layer?

tmpfs is a file system implemented entirely in VFS—it has no disk backing. It stores files in virtual memory (RAM) and can optionally use swap space when RAM is low.

tmpfs registers with VFS just like disk-based file systems (register_filesystem(&tmpfs_fs_type)). It implements the same VFS operations: inode_operations, file_operations, super_operations.

Key characteristics:

  • No on-disk structure—files vanish on reboot
  • Dynamic sizing: uses available RAM/swap up to a configured limit
  • Fast: no disk I/O for reads/writes
  • Commonly used for /dev/shm (shared memory), /tmp, and container mounts

From VFS perspective, tmpfs looks like any other file system. Applications access it via the same open(), read(), write() calls. The difference is purely in the implementation—the tmpfs driver never touches a block device.

18. How does VFS interact with the page cache during read and write operations?

Reads flow through VFS to the page cache:

  1. Application calls read(fd, buf, size)
  2. VFS's generic_file_read() checks the page cache first
  3. If the page is cached, copy data from page cache to userspace buffer
  4. If not cached, allocate a page, call the file system's readpage(), then copy

Writes also use write-back caching:

  1. Application calls write(fd, buf, size)
  2. VFS writes to the page cache (marking pages as dirty)
  3. Returns immediately to application (fast)
  4. Background kernel threads (pdflush/flush) periodically write dirty pages to disk

The page cache is unified—ext4, XFS, and all other file systems share it. When ext4 writes a block, it goes into the same page cache that XFS uses. This maximizes cache utilization across file systems.

19. What is the purpose of the file_operations structure and how does it differ from inode_operations?

struct file_operations defines callbacks for file I/O operations—things you do on an open file handle:

  • llseek: Change file position
  • read / write: Data I/O
  • readdir: Iterate directory entries (for readdir() syscall)
  • mmap: Memory-map the file
  • fsync: Force dirty pages to disk
  • lock: File locking (flock, fcntl)

struct inode_operations defines operations on the inode itself—metadata and name-level operations:

  • create: Create a new file
  • lookup: Find a file in a directory
  • mkdir, rmdir: Directory operations
  • rename: Change file name
  • link, unlink: Hard link operations

File operations are per-open-file (struct file), inode operations are per-file (struct inode). Multiple opens of the same file share the inode but have separate file operation tables.

20. How does the VFS layer handle file system mount propagation in mount namespaces?

Mount namespaces (CLONE_NEWNS) give each process or container group an independent view of the mount table. Mount propagation determines how mounts in one namespace affect others:

Mount types:

  • Private (default): Mounts and unmounts do not propagate to/from other namespaces
  • Shared: Mount propagates bidirectionally with peer namespaces
  • slave: Mounts from master propagate to slave, but not vice versa
  • unbindable: Cannot be bind mounted

When a container is created with its own mount namespace:

  1. Initial mounts are copied from the parent namespace (private mounts become private in the new namespace)
  2. Bind mounts (like mounting host directories into containers) can be marked shared so changes are visible to the host, or private for isolation
  3. Unmounting inside the container (e.g., /proc) does not affect the host's mount table

The /proc/self/mountinfo file shows the propagation type and peer relationships for each mount. Container runtimes carefully configure propagation (e.g., Docker's volume mounts are typically shared or slave) to enable the desired isolation.

Further Reading

Topic-Specific Deep Dives:

  • Page Cache Deep Dive: Explore how the page cache (formerly buffer cache) interacts with VFS. The page cache stores file data pages in memory, and mmap(), read(), and write() operations all flow through it. Study the address_space structure and how writeback works.

  • Container Storage Drivers: Overlay file systems are just one layer. Investigate how devicemapper (-thinp_), Btrfs, and VFS interact in container runtimes. Understand why overlay2 is preferred over overlay for Docker.

  • Linux Page Cache Eviction: The kernel uses an LRU (Least Recently Used) list with active/inactive pages. Study shrink_page_list() and how vm.vfs_cache_pressure affects dentry and inode cache eviction versus page cache.

  • Mount Namespaces: The mount namespace isolation that containers rely on is a VFS concept. Explore how clone() with CLONE_NEWNS creates isolated mount tables per process.

  • FUSE in Userspace: VFS supports user-space file systems through FUSE (Filesystem in Userspace). This enables creative implementations like sshfs, gocryptfs, and borgbackup—all without kernel code.

Conclusion

The Virtual File System layer is what makes Linux’s unified namespace possible. By defining a standard set of operations (super_operations, inode_operations, file_operations) that each file system implements, VFS allows applications to interact with ext4, XFS, NFS, CIFS, and even virtual file systems like procfs through the same POSIX API. The dentry cache and inode cache are the performance keys, dramatically reducing disk I/O for repeated path resolutions.

When working with file systems, understanding VFS helps you troubleshoot mount issues, choose appropriate mount options for security and performance, and design storage for containers and networked environments. The mount namespace isolation that containers rely on is fundamentally a VFS concept. Remember that network file systems (NFS, CIFS) add latency and failure modes that local file systems do not have—design for timeouts and retry logic in production.

For continued learning, explore the page cache and buffer cache interactions with VFS, study container storage drivers (overlay, devicemapper, btrfs) and how they build on VFS, and examine the Linux page cache eviction policies (LRU, active/inactive lists) that determine file system performance under memory pressure.

Category

Related Posts

ASLR & Stack Protection

Address Space Layout Randomization, stack canaries, and exploit mitigation techniques

#operating-systems #aslr-stack-protection #computer-science

Assembly Language Basics: Writing Code the CPU Understands

Learn to read and write simple programs in x86 and ARM assembly, understanding registers, instructions, and the art of thinking in low-level operations.

#operating-systems #assembly-language-basics #computer-science

Boolean Logic & Gates

Understanding AND, OR, NOT gates and how they combine into arithmetic logic units — the building blocks of every processor.

#operating-systems #boolean-logic-gates #computer-science