Interrupts & Polling

Hardware interrupts, IRQ handling, and interrupt bottom halves (tasklets, workqueues) for robust I/O processing.

published: May 19, 2026 reading time: 33 min read author: GeekWorkBench

Quick Summary

Hardware interrupts, IRQ handling, and interrupt bottom halves (tasklets, workqueues) for robust I/O processing.

Introduction

At any moment, dozens of hardware devices may need the CPU’s attention—network packets arriving, keystrokes pressed, disk operations completing, timers firing. The CPU cannot continuously poll every device (wasteful) nor can devices wait indefinitely (data would be lost). The interrupt mechanism bridges this gap: devices assert a signal that causes the CPU to suspend current work and handle the event immediately.

Interrupts are the nervous system of computing—electrical impulses that demand immediate attention. Without them, operating systems would be blind to the physical world. Understanding interrupt handling is essential for driver development, system performance tuning, and debugging real-time or latency-sensitive applications.

Polling represents the alternative model—software repeatedly checking device status registers to see if work needs doing. This section explores both models, their trade-offs, and the hybrid approaches modern systems use.

When to Use / When Not to Use

When Hardware Interrupts Are Appropriate

Sparse, unpredictable events: Network packets arriving, user input, hardware faults
Latency-critical operations: Real-time systems where sub-millisecond response matters
Low-power designs: CPU can sleep until hardware signals need for service
High-bandwidth devices with flow control: Disks, high-speed network cards that can buffer temporarily

When Polling Is Appropriate

Dense, predictable events: High-frequency timers, status monitoring
Latency-jitter-tolerant designs: When consistent polling interval is acceptable
Interrupt storm prevention: Devices that naturally generate very high interrupt rates
Virtualized environments: Hypervisors may emulate interrupt delivery; polling vCPUs can be more efficient
Modern NVMe drives: Internal queue depths and interrupt coalescing make polling viable

Hybrid Approaches (The Common Case)

Pure interrupt-driven I/O breaks down at high event rates. A 10GbE NIC receiving small packets can generate 500,000+ interrupts per second, each requiring a CPU context switch. Pure polling wastes cycles when devices are idle. Real systems blend both models deliberately, switching modes based on device activity level.

Interrupt coalescing trades latency for throughput. The device waits until it has accumulated N events (or T microseconds have passed) before asserting the IRQ line. A network card might fire one interrupt for every 16 received packets instead of 16 separate interrupts. The kernel sees fewer IRQ events, softirq CPU time drops, and the CPU handles other work between coalescing windows. The tradeoff is per-event latency increases by up to the coalescing timer value. You can tune this with ethtool -C eth0 rx-usecs 128 rx-frames 16. NVMe controllers apply aggressive coalescing internally before signaling the host.

Deferred processing (top-half/bottom-half split) keeps the ISR short. The ISR acknowledges the interrupt, reads minimal hardware state, stores what it needs, and returns. A separate handler running in process context (tasklet, workqueue, or threaded IRQ) does the heavy lifting: parsing, buffer management, waking waiters. This separation means the ISR runs for microseconds with interrupts disabled, then the system immediately unmasks and resumes normal work. The bottom half runs when the scheduler allows, potentially on a different CPU.

Polled phases temporarily replace interrupts with software polling during intensive workloads. The motivation is cache efficiency: when handling a burst of received packets, the CPU touches the same ring buffer descriptors repeatedly. Switching to polling avoids IRQ overhead and keeps the receive ring hot in cache. Linux uses this via NAPI: when traffic is high, the driver disables interrupts and calls the poll function in a tight loop until the ring drains. Once empty, interrupts re-enable. Storage drivers do the same thing during heavy filesystem operations. Virtualized environments also benefit: hypervisors often emulate interrupt delivery more efficiently via polling than via virtual IRQ injection.

Architecture or Flow Diagram

The following diagram illustrates the complete interrupt handling flow from hardware assertion to driver completion:

sequenceDiagram
    participant HW as Hardware Device
    participant CPU
    participant IDT as Interrupt Descriptor Table
    participant ISR as Interrupt Service Routine<br/>(Top Half)
    participant BH as Bottom Half Handler<br/>(Tasklet/Workqueue)
    participant DRV as Device Driver

    HW->>CPU: Assert IRQ line
    CPU->>CPU: Finish current instruction
    CPU->>IDT: Lookup interrupt vector
    IDT->>ISR: Jump to handler
    ISR->>HW: Acknowledge interrupt
    ISR->>ISR: Read status registers
    ISR->>BH: Schedule bottom half
    ISR-->>CPU: Return from interrupt

    Note over CPU: Now running other work
    BH->>DRV: Process I/O completions
    DRV->>DRV: Handle data, wake waiters

The critical insight is the split between the top half (ISR, runs with interrupts disabled) and the bottom half (runs later, in process context). This split minimizes interrupt latency while allowing lengthy processing.

Core Concepts

Hardware Interrupts

Hardware interrupts are electrical signals from devices to the CPU requesting attention. The interrupt request (IRQ) line the device uses determines which handler runs.

Types of interrupts:

Maskable interrupts (IRQ): Can be ignored by setting the IF flag in EFLAGS/RFLAGS
Non-maskable interrupts (NMI): Cannot be ignored—used for critical events like hardware failures
Message-signaled interrupts (MSI/MSI-X): Device writes to special memory address instead of using physical IRQ line; avoids interrupt pin limitations

Interrupt lifecycle:

// 1. Driver registers handler during initialization
static int my_driver_probe(struct pci_dev *pdev)
{
    int ret;

    // Register interrupt handler
    ret = request_threaded_irq(
        pdev->irq,                    // IRQ line
        my_driver_isr,               // Top half (atomic)
        my_driver_thread,            // Bottom half (threaded, can sleep)
        IRQF_SHARED,                  // Share IRQ with other devices
        "my_driver",                 // Name for /proc/interrupts
        my_dev);                     // Passed to handler

    return ret;
}

// 2. Interrupt fires, CPU vectors to ISR
static irqreturn_t my_driver_isr(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    u32 status = readl(dev->regs + STATUS_REG);

    if (!(status & IRQ_PENDING))
        return IRQ_NONE;  // Not our interrupt

    // Acknowledge—prevention of spurious interrupts
    writel(status, dev->regs + STATUS_REG);

    // Store any needed state for bottom half
    dev->interrupt_status = status;

    return IRQ_WAKE_THREAD;  // Schedule threaded handler
}

// 3. Threaded handler runs in process context
static irqreturn_t my_driver_thread(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    u32 status = dev->interrupt_status;

    if (status & PACKET_AVAILABLE)
        process_packets(dev);

    if (status & BUFFER_DONE)
        complete_transfer(dev);

    return IRQ_HANDLED;
}

IRQ Handling in Linux

The Linux kernel maintains an IRQ descriptor table. When an interrupt fires:

Kernel disables interrupts on that CPU (mask)
Looks up handler in irq_desc[]
Calls handler chain (multiple drivers can share one IRQ line)
Acknowledges interrupt to interrupt controller (PIC/APIC)
Re-enables interrupts (unmask)

The APIC (Advanced PIC) handles interrupt routing in modern multi-core systems, enabling per-core interrupt affinity—IRQs can be directed to specific CPUs for cache efficiency.

Interrupt Bottom Halves

The top half (ISR) must run quickly and cannot sleep. Bottom halves handle the heavy processing.

Tasklets — Lightweight, scheduled per-CPU, cannot be preempted:

// Tasklet structure embedded in device struct
struct my_device {
    // ... other fields ...
    struct tasklet_struct tlet;
    u32 pending_work;
};

// Initialize tasklet
void my_device_init(struct my_device *dev)
{
    tasklet_init(&dev->tlet, my_device_tasklet, (unsigned long)dev);
}

// Tasklet function (runs in interrupt context, but can schedule)
void my_device_tasklet(unsigned long data)
{
    struct my_device *dev = (struct my_device *)data;
    u32 work = xchg(&dev->pending_work, 0);  // Atomically grab and clear

    if (work & PACKET_WORK)
        process_packets(dev);
    if (work & TIMER_WORK)
        handle_timer(dev);
}

// In ISR: schedule tasklet instead of direct processing
static irqreturn_t my_isr(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    u32 status = readl(dev->regs + STATUS_REG);

    dev->pending_work |= status;
    tasklet_schedule(&dev->tlet);  // Schedule bottom half

    return IRQ_HANDLED;
}

Workqueues — Run in kernel thread context, can sleep, can be delayed:

// Work structure for deferred processing
struct delayed_work {
    struct work_struct work;
    struct timer_list timer;
};

// Initialize work
INIT_WORK(&dev->work, process_workqueue);
INIT_DELAYED_WORK(&dev->delayed_work, process_delayed);

// Schedule immediate work
schedule_work(&dev->work);

// Schedule delayed work (5 jiffies)
schedule_delayed_work(&dev->delayed_work, 5);

// In driver cleanup
flush_work(&dev->work);
cancel_delayed_work_sync(&dev->delayed_work);

Threaded IRQs — Simpler model: handler runs as a kernel thread:

// request_threaded_irq combines ISR + thread in one call
// ISR is minimal (runs atomic), thread handles rest
ret = request_threaded_irq(
    irq,
    isr_routine,     // Optional primary handler (can return IRQ_WAKE_THREAD)
    threadRoutine,    // Threaded handler (can sleep)
    flags,
    name,
    dev_id);

Polling Model

Polling uses explicit software checking instead of hardware signals:

// Simple polling loop (inefficient—don't do this!)
while (device_has_work()) {
    handle_device_event();
}

// Better: timer-driven polling with adjustable rate
static void poll_timer_callback(struct timer_list *t)
{
    struct my_device *dev = from_timer(dev, t, poll_timer);

    // Check and handle any pending work
    u32 status = readl(dev->regs + STATUS_REG);
    if (status)
        process_status(dev);

    // Reschedule if device still needs servicing
    if (device_needs_poll(dev))
        mod_timer(&dev->poll_timer, jiffies + POLL_INTERVAL);
}

// Initialize polling
timer_setup(&dev->poll_timer, poll_timer_callback, 0);
add_timer(&dev->poll_timer);

Comparison: Interrupts vs Polling

Aspect	Interrupts	Polling
CPU overhead (idle device)	Zero—CPU sleeps	Constant—CPU checks repeatedly
Response latency	Immediate—hardware signals	Bounded by poll interval
Scalability	Pin-limited (MSI helps)	No hardware limit
Complexity	More complex (handling race conditions)	Simpler state machine
Event density	Degrades at high rates	Handles high rates naturally
Power consumption	Lower when idle	Higher when idle

Production Failure Scenarios

Scenario 1: Interrupt Storm from Misbehaving Hardware

What happened: A faulty network card generated spurious interrupts at 50,000+ per second after receiving certain packet patterns. Each interrupt forced CPU context switches, consuming 40% of one core and causing massive latency spikes for legitimate operations.

Detection: cat /proc/interrupts showed interrupt count incrementing in thousands per second for that IRQ line. top showed softirq CPU time dominating.

Mitigation:

// Limit interrupt rate by returning IRQ_NONE for spurious events
static irqreturn_t net_driver_isr(int irq, void *dev_id)
{
    struct net_device *ndev = dev_id;
    u32 status = readl(ndev->regs + IRQ_STATUS);

    // If no real interrupt source, pretend we didn't see it
    if (!(status & (TX_DONE | RX_PENDING | LINK_CHANGE)))
        return IRQ_NONE;  // Spurious—don't reschedule

    // Handle real sources, acknowledge, schedule NAPI if needed
    // ...
    return IRQ_HANDLED;
}

Also configure interrupt coalescing in hardware to reduce interrupt frequency.

Scenario 2: Deadlock from Blocking in ISR

What happened: A driver’s ISR called copy_to_user() which can fault and may sleep on some architectures. On an ARM platform, this caused a deadlock—the ISR had interrupts disabled and the page fault required an interrupt to handle memory management, but interrupts were masked.

Detection: System hang with “scheduling while atomic” kernel panic.

Mitigation:

// BAD: copy_to_user can sleep
static irqreturn_t bad_isr(int irq, void *dev_id)
{
    char buf[256];
    // DON'T do this—copy_to_user can fault
    copy_to_user(user_buf, kernel_buf, sizeof(buf));
    return IRQ_HANDLED;
}

// GOOD: Use put_user in atomic context (designed for ISR)
static irqreturn_t good_isr(int irq, void *dev_id)
{
    // put_user is atomic-safe, no sleeping
    put_user(kernel_value, user_ptr);
    return IRQ_HANDLED;
}

// For larger data, defer to bottom half
static irqreturn_t isr(int irq, void *dev_id)
{
    // Store minimal info, schedule work
    dev->pending_data_len = readl(dev->regs + LEN_REG);
    tasklet_schedule(&dev->tlet);
    return IRQ_HANDLED;
}

static void tlet_handler(unsigned long data)
{
    struct my_device *dev = (void *)data;
    // Now in process context—copy_to_user is safe
    copy_to_user(dev->user_buf, dev->kernel_buf, dev->pending_data_len);
}

Scenario 3: Race Between ISR and Driver Removal

What happened: Driver was being unloaded while interrupts were still in flight. The ISR dereferenced memory that had been freed during driver removal, causing a use-after-free panic.

Detection: Kernel panic in ISR context with corrupted data structures.

Mitigation:

static int my_driver_remove(struct pci_dev *pdev)
{
    struct my_device *dev = pci_get_drvdata(pdev);

    // Synchronize with any pending interrupts
    // 1. Prevent new interrupts
    disable_irq(pdev->irq);

    // 2. Wait for any in-flight ISRs to complete
    synchronize_irq(pdev->irq);

    // 3. Now safe to free resources
    tasklet_kill(&dev->tlet);
    devm_free_irq(&pdev->dev, pdev->irq, dev);
    pci_set_drvdata(pdev, NULL);

    return 0;
}

Also use reference counting (kref_get/kref_put) to ensure device data isn’t freed while any code (ISR or otherwise) might access it.

Trade-off Table

Mechanism	Latency	CPU Overhead (idle)	Throughput ceiling	Complexity
Bare interrupts	Lowest (immediate)	Zero	Limited by IRQ rate	Medium
Interrupt + tasklet	Low + deferred processing	Minimal when idle	High (batched)	Medium
Threaded IRQ	Low + full kernel features	Context switch cost	High	Low
NAPI (poll mode)	Higher (polling interval)	Zero when idle	Very high	High
Pure polling	Bounded by interval	Constant 100%	Limited by poll freq	Low

Implementation Snippets

Complete Interrupt-Driven Driver Framework

#include <linux/module.h>
#include <linux/interrupt.h>
#include <linux/workqueue.h>
#include <linux/timer.h>

#define SHARED_IRQ
#define POLL_INTERVAL (HZ / 10)  // 100ms

struct my_device {
    void __iomem *regs;
    unsigned int irq;
    struct tasklet_struct tlet;
    struct work_struct work;
    struct timer_list poll_timer;
    u32 interrupt_count;
    bool use_polling;  // Fallback when interrupts fail
};

static irqreturn_t my_driver_isr(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    u32 status;

    /* Read status - clears interrupt latches in hardware */
    status = readl(dev->regs + INT_STATUS);
    if (!status)
        return IRQ_NONE;

    /* Atomically store status for bottom half */
    dev->interrupt_count++;

    /* For this device, bottom half is tasklet */
    tasklet_schedule(&dev->tlet);

    return IRQ_HANDLED;
}

static void my_tlet_handler(unsigned long data)
{
    struct my_device *dev = (struct my_device *)data;
    u32 status = readl(dev->regs + INT_STATUS);

    /* Process packet arrivals */
    if (status & PKT_AVAILABLE) {
        struct sk_buff *skb = dev_alloc_skb(2048);
        if (skb) {
            /* Transfer data - DMA or MMIO */
            read_memcpy_from_io(skb_put(skb, 2048),
                               dev->regs + RX_BUFFER, 2048);
            netif_rx(skb);
        }
    }
}

static void my_work_handler(struct work_struct *work)
{
    /* For longer operations that can't sleep in tasklet */
    struct my_device *dev = container_of(work, struct my_device, work);

    /* Process configuration changes, etc. */
    msleep(10);  // Safe here
}

/* Polling fallback for environments with broken interrupts */
static void my_poll_timer(struct timer_list *t)
{
    struct my_device *dev = from_timer(dev, t, poll_timer);
    u32 status = readl(dev->regs + INT_STATUS);

    if (status & PKT_AVAILABLE)
        my_tlet_handler((unsigned long)dev);

    if (dev->use_polling)
        mod_timer(&dev->poll_timer, jiffies + POLL_INTERVAL);
}

static int my_driver_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
    struct my_device *dev;
    int ret;

    dev = devm_kzalloc(&pdev->dev, sizeof(*dev), GFP_KERNEL);
    if (!dev)
        return -ENOMEM;

    pci_set_drvdata(pdev, dev);

    /* Map hardware registers */
    dev->regs = pcim_iomap(pdev, 0, 0);
    if (!dev->regs)
        return -ENOMEM;

    /* Request IRQ - threaded variant for sleeping-capable handler */
    ret = request_threaded_irq(pdev->irq, NULL, my_threaded_isr,
                               IRQF_SHARED, KBUILD_MODNAME, dev);
    if (ret) {
        dev_warn(&pdev->dev, "IRQ %u unavailable, using polling\n", pdev->irq);
        dev->use_polling = true;
        timer_setup(&dev->poll_timer, my_poll_timer, 0);
        mod_timer(&dev->poll_timer, jiffies + POLL_INTERVAL);
    } else {
        dev->irq = pdev->irq;
    }

    /* Initialize tasklet and work */
    tasklet_init(&dev->tlet, my_tlet_handler, (unsigned long)dev);
    INIT_WORK(&dev->work, my_work_handler);

    return 0;
}

static void my_driver_remove(struct pci_dev *pdev)
{
    struct my_device *dev = pci_get_drvdata(pdev);

    if (dev->use_polling) {
        del_timer_sync(&dev->poll_timer);
    } else {
        free_irq(pdev->irq, dev);
    }

    tasklet_kill(&dev->tlet);
    cancel_work_sync(&dev->work);

    pci_set_drvdata(pdev, NULL);
}

static struct pci_device_id my_driver_id[] = {
    { PCI_DEVICE(0x1234, 0x5678) },
    { }
};
MODULE_DEVICE_TABLE(pci, my_driver_id);

static struct pci_driver my_driver = {
    .name = KBUILD_MODNAME,
    .id_table = my_driver_id,
    .probe = my_driver_probe,
    .remove = my_driver_remove,
};
module_pci_driver(my_driver);

Python Simulation of Interrupt vs Polling

#!/usr/bin/env python3
"""
Simulates interrupt vs polling paradigms for I/O handling.
"""
import time
import threading
from dataclasses import dataclass
from typing import Callable
import random


@dataclass
class Device:
    """Simulated hardware device with interrupt capability."""
    name: str
    interrupt_callback: Callable[[], None]
    event_interval_ms: float  # How often device generates events

    def __post_init__(self):
        self._running = False
        self._thread = None

    def start(self):
        """Simulate device generating interrupts."""
        self._running = True
        def event_loop():
            while self._running:
                time.sleep(random.uniform(0, self.event_interval_ms * 2) / 1000)
                if self._running and random.random() < 0.7:  # 70% event rate
                    self.interrupt_callback()
        self._thread = threading.Thread(target=event_loop, daemon=True)
        self._thread.start()

    def stop(self):
        self._running = False
        if self._thread:
            self._thread.join(timeout=1.0)


class InterruptDrivenHandler:
    """Handle events immediately via callback."""
    def __init__(self):
        self.events_handled = 0
        self.last_event_time = None

    def handle_interrupt(self):
        self.events_handled += 1
        self.last_event_time = time.time()
        # Simulate minimal ISR work
        self._process_event()

    def _process_event(self):
        pass  # In real ISR, this would be minimal


class PollingHandler:
    """Handle events by periodic checking."""
    def __init__(self, device: Device, poll_interval_ms: float = 100):
        self.device = device
        self.poll_interval = poll_interval_ms / 1000
        self.events_handled = 0
        self.last_poll_time = None
        self._running = False
        self._thread = None
        self._pending = False  # Simulated device register

    def start(self):
        self._running = True
        def poll_loop():
            while self._running:
                self.last_poll_time = time.time()
                # Read device "registers" (check for pending events)
                self._check_device()
                time.sleep(self.poll_interval)
        self._thread = threading.Thread(target=poll_loop, daemon=True)
        self._thread.start()

    def _check_device(self):
        # Simulate reading status register
        if random.random() < 0.7:  # 70% chance of event pending
            self.events_handled += 1

    def stop(self):
        self._running = False
        if self._thread:
            self._thread.join(timeout=1.0)


if __name__ == "__main__":
    print("=== Interrupt vs Polling Simulation ===\n")

    # Interrupt-driven
    handler_int = InterruptDrivenHandler()
    device = Device("UART0", handler_int.handle_interrupt, event_interval_ms=10)
    device.start()
    time.sleep(1.0)
    device.stop()

    print(f"Interrupt-driven: {handler_int.events_handled} events handled")
    print(f"  Average latency: ~{(1.0/handler_int.events_handled)*10:.2f}ms (estimated)")

    # Polling
    handler_poll = PollingHandler(device, poll_interval_ms=50)
    device2 = Device("UART1", lambda: None, event_interval_ms=10)
    device2.start()
    handler_poll.start()
    time.sleep(1.0)
    handler_poll.stop()
    device2.stop()

    print(f"\nPolling (50ms interval): {handler_poll.events_handled} events handled")
    print(f"  Average latency: ~{handler_poll.poll_interval * 1000 / 2:.1f}ms (half poll interval)")

Observability Checklist

Linux Interrupt Metrics

# View all IRQ statistics
cat /proc/interrupts

# Per-CPU interrupt counts
cat /proc/softirqs  # Software interrupt (softirq) handling

# Check interrupt affinity
cat /proc/irq/32/smp_affinity  # Which CPU handles IRQ 32

# Set interrupt affinity (move NIC IRQ to CPU 0)
echo 1 > /proc/irq/42/smp_affinity

# View interrupt distribution with perf
perf stat -e 'irq:irq_handler_entry' -a sleep 1

Key Metrics to Monitor

Metric	Healthy Range	Alert Threshold	Indicates
`/proc/interrupts` (per-IRQ delta)	Baseline + small variance	Sudden 10x increase	Interrupt storm
`softirq` time in `top`	< 5% CPU	> 30% CPU	ISR deferred work overload
`irq_entry` perf events	< 10,000/s	> 100,000/s	Excessive interrupts
`nmi` count delta	Stable	Increasing	Hardware issues (memory errors)

Common Pitfalls / Anti-Patterns

Interrupt Security Concerns

Interrupt Timing Attacks: Observing interrupt timing can leak information. Deterministic scheduling of bottom halves can expose sensitive computation patterns. Consider using hrtimer for high-resolution timing rather than relying on interrupt patterns.
IRQ Routing Attacks: In virtualized environments, a malicious VM could manipulate interrupt routing to cause denial of service against other VMs sharing the same physical CPU. Use IOMMU interrupt remapping and verify APIC configuration.
MSI-X Vulnerabilities: Message-signaled interrupts write to memory addresses. If those addresses aren’t properly restricted, a device could potentially signal interrupts that access arbitrary memory. Modern systems use ATS (Address Translation Services) to validate DMA requests.
Interrupt Descriptor Table (IDT) Attacks: Malware can hook IDT entries to intercept interrupts. Use kexec to reload a known-good kernel or verify IDT integrity with cat /proc/kallsyms | grep idt_table.

Coding Anti-Patterns

Anti-pattern	Why It’s Bad	Correct Approach
Long-running ISR	Blocks all interrupts, causes latency spikes	Defer to tasklet/workqueue
Blocking in ISR	Scheduling while atomic = panic	Use non-blocking primitives
Not acknowledging interrupts	Spurious interrupts repeat forever	Always acknowledge in hardware
No IRQ sharing cleanup	Leak resources on driver removal	Use `free_irq` with proper synchronization
Assuming single CPU	Race conditions on SMP	Use proper locking/synchronization
Disabling interrupts globally	Excessive latency for other devices	Use per-IRQ masking instead
Not handling interrupt storms	Denial of service from device	Implement coalescing/irq throttling

Quick Recap Checklist

Hardware interrupts allow devices to demand CPU attention asynchronously
ISRs run with interrupts disabled—keep them fast and never block
Bottom halves (tasklets, workqueues, threaded IRQs) defer lengthy processing
Tasklets run in interrupt context but can schedule; workqueues run in process context
Polling is the alternative—better for high-density events, worse for power/latency
Hybrid NAPI model combines interrupts (for idle) with polling (for busy)
Always acknowledge hardware interrupts to prevent spurious repeats
Use synchronize_irq() before freeing IRQ-related resources
Monitor /proc/interrupts and softirqs to detect interrupt storms
Interrupt storms usually indicate hardware or driver bugs, not feature

Interview Questions

1. What happens when a hardware interrupt fires? Walk through the complete sequence.

When an interrupt fires: (1) The device asserts an electrical signal on an IRQ line. (2) The CPU finishes its current instruction. (3) The CPU looks up the interrupt vector in the IDT (Interrupt Descriptor Table) and disables interrupts (sets IF flag). (4) CPU pushes current state (flags, CS, IP, error code) onto kernel stack. (5) CPU jumps to the interrupt handler address. (6) The kernel's entry code runs, transitioning to kernel context. (7) The appropriate IRQ handler (from irq_desc[]) is called. (8) Handler chain may invoke multiple registered handlers. (9) Handler acknowledges interrupt to interrupt controller (APIC/PIC). (10) Handler returns; kernel restores state and re-enables interrupts. Total latency is typically 1-10 microseconds on modern hardware.

2. Why can't you call schedule() or sleep in an interrupt handler?

Interrupt handlers run in atomic context—the kernel has interrupts disabled and the current process context is indeterminate (the interrupted process may be user code, kernel code, or idle). schedule() tries to switch processes, which requires saving the current process's state and loading another. But the kernel cannot safely context-switch while an interrupt is being handled—locks may be held, the process list may be inconsistent, and the interrupted context's state is partially saved. Additionally, schedule() calls release_kernel_lock() which expects interrupts enabled. The correct pattern is to defer work to a bottom half (tasklet, workqueue, threaded IRQ) which runs in process context where scheduling is safe.

3. What is the difference between tasklets and workqueues in Linux?

Tasklets and workqueues both defer work, but with key differences. Tasklets run in interrupt context (via tasklet_schedule()), cannot sleep, run atomically on a specific CPU, and are deterministic (same priority as hardware interrupts). Workqueues run in kernel thread context, can sleep, may migrate CPUs, and have more overhead but more capability. Use tasklets for quick, atomic, non-blocking deferrals (packet processing, bottom-half ISR work). Use workqueues for anything that needs to sleep, allocate memory, or take a long time (flush caches, send signals, memory reclamation). The modern preference is threaded IRQs for most deferral needs—they combine ISR and deferral in one clean interface.

4. What is NAPI and when is it used?

NAPI (New API) is Linux's hybrid interrupt-polling mechanism for high-speed network drivers. It works as follows: When the network interface has low traffic, interrupts are enabled and packets arrive via interrupts. When traffic is high (dev->poll() is called and finds many packets), the driver disables interrupts and switches to polling mode, continuously calling napi->poll() in a tight loop to drain the receive ring. This prevents interrupt storms from fast devices (10GbE+), improves cache locality, and allows batching of packet processing. Once the ring is empty, interrupts are re-enabled. Most modern high-performance network drivers (ixgbe, mlx5, virtio-net) use NAPI.

5. How do you safely handle a device that might generate spurious interrupts?

Spurious interrupts waste CPU and can indicate hardware problems. Handling strategies: First, always read device status registers at the start of ISR and return IRQ_NONE if there's no real interrupt source—this tells the kernel "this IRQ wasn't for me." Second, implement interrupt coalescing in hardware (waiting for multiple events before asserting interrupt) if the device supports it. Third, for persistent spurious interrupts, Linux has a spurious IRQ handler count—if too many IRQ_NONE returns happen, the kernel will temporarily disable that IRQ line. Fourth, use irqdomain to properly map hardware IRQ numbers to Linux IRQ descriptors. Finally, ensure proper interrupt acknowledgment happens in the correct order (hardware first, then software) to prevent latch-up conditions.

6. What is interrupt coalescing and how does it improve performance for high-speed devices like network cards?

Interrupt coalescing reduces interrupt frequency by waiting for multiple events before asserting an interrupt. Instead of interrupting for every packet, a network card might wait until it has accumulated 16 packets or 128 microseconds have passed since the last interrupt, then fire one combined interrupt. This dramatically reduces CPU overhead from interrupt handling—from thousands of interrupts per second to hundreds. The tradeoff is increased latency: a packet must wait for the coalescing window before being processed. For bulk transfer (file downloads), this latency is unnoticeable. For latency-sensitive applications (high-frequency trading), you tune coalescing aggressively (low thresholds, short timers). Hardware supports this via configurable rx-usecs and rx-max-frames registers. On Linux, the ethtool tool sets coalescing parameters for network drivers.

7. What is the APIC (Advanced Programmable Interrupt Controller) and why is it necessary in multi-core systems?

The APIC is the interrupt controller in modern multi-core x86 systems, replacing the older 8259A PIC. It consists of the local APIC (one per CPU core, handling CPU-local interrupts like timer and performance monitoring) and the I/O APIC (connecting devices to the system and routing interrupts to specific cores). The APIC enables per-core interrupt affinity—IRQs can be directed to specific CPUs for cache efficiency or latency optimization. It also handles interrupt priority and masking at the CPU level. In multi-core systems, the APIC can simultaneously handle many more interrupt sources than the old PIC (which was limited to 15 IRQs). On large NUMA systems, APIC routing affects cache locality—if an interrupt is handled by a core far from the device's NUMA node, latency increases. Tools like `turbostat` and `/proc/interrupts` help analyze interrupt routing across cores.

8. What is the difference between an edge-triggered and level-triggered interrupt?

Edge-triggered interrupts fire once when the interrupt signal transitions (e.g., low to high). The device signals an event by pulsing the interrupt line; the handler runs once per pulse. Level-triggered interrupts fire continuously while the interrupt signal is active (e.g., high). The handler must clear the interrupt source to de-assert the line, otherwise the interrupt fires again immediately after returning. Most hardware uses edge-triggered interrupts because they are simpler to wire and work well when the device raises the line just long enough for the kernel to handle it. PCI devices typically use edge-triggered. Some platforms (like ARM) use level-triggered for GPIO interrupts. Using the wrong trigger type causes "interrupt storm" problems where the handler immediately re-fires after completing. Modern Linux uses `irq_set_type()` to configure trigger type if the hardware supports it.

9. What is the softirq system in Linux and how does it relate to hardware interrupts?

Softirqs are software-generated interrupts that run in interrupt context at a deferred priority level. They are the bottom half of the hardware interrupt handling split. After a hardware ISR runs (top half), it can schedule softirqs to run later—the `raise_softirq()` function wakes a softirq handler. The kernel's softirq system handles timer ticks, network packet processing (NET_RX/NET_TX softirqs), scheduler operations (SCHED_SOFTIRQ), and RCU callbacks. Softirqs run with interrupts enabled (unlike hardware ISR), on the same CPU that raised them (for cache efficiency), and are processed when the kernel checks the `need_resched` flag in the scheduler. High-frequency softirq activity (many network packets, many timer ticks) can dominate CPU time. You can see softirq activity in `/proc/softirqs`. The ksoftirqd kernel threads handle softirq processing when the system is under heavy load.

10. How does MSI (Message Signaled Interrupt) improve interrupt routing compared to wire-based interrupts?

Wire-based interrupts use physical interrupt lines (IRQ pins) routed through the interrupt controller. Each device needs a separate wire, limiting the maximum number of interrupts to the number of available pins. MSI (and MSI-X, the newer version) replaces physical wires with memory writes—the device writes to a special address, and that memory write triggers an interrupt to the CPU. This has major advantages: no interrupt pin limitation (a PCIe device can have up to 32 MSI vectors or 2048 MSI-X vectors), interrupts are directed to specific CPUs via the APIC's message routing, and MSI enables per-queue interrupt affinity for high-performance devices (each hardware queue gets its own interrupt, enabling CPU affinity). MSI-X is required for NVMe drives with multiple queues and for high-speed network cards that want minimal interrupt latency by binding each queue to a dedicated CPU.

11. What is interrupt affinity and how do you set it to optimize performance in a NUMA system?

Interrupt affinity allows directing an IRQ to a specific CPU or set of CPUs. The `smp_affinity` file in `/proc/irq/` controls this: writing a bitmask to `/proc/irq/42/smp_affinity` limits which CPUs can handle IRQ 42. In NUMA systems, the optimal setting is to direct the IRQ to a CPU on the same NUMA node as the device—this minimizes memory access latency when the IRQ handler accesses device memory. For a network card on NUMA node 0, set its IRQ affinity to CPUs on node 0. For storage controllers, similarly align with their NUMA node. Misaligned affinity causes cache-line bounces across the NUMA interconnect, adding microseconds to each interrupt handler invocation. Tools like `irqbalance` (a daemon) attempt to dynamically optimize affinity, but for deterministic high-performance workloads, manual tuning is often better. `cat /proc/irq/default_smp_affinity` sets the default affinity mask for newly allocated IRQs.

12. What is the difference between an interrupt handler and a kernel thread in terms of scheduling and preemption?

An interrupt handler (ISR) runs in atomic context with interrupts disabled on the current CPU—preemption is disabled and the scheduler cannot run until the ISR completes. A kernel thread runs in process context (can be preempted, can sleep, can schedule). ISR latency is bounded by hardware (microseconds) but cannot do anything that might sleep (no memory allocation, no mutex). A kernel thread can use full kernel services but has scheduling latency determined by its priority and the current scheduler load (milliseconds in worst case). When deciding between threaded IRQ and a workqueue task, the tradeoff is latency versus capability: a threaded IRQ handler runs quickly (latency) but still in atomic context, while a workqueue task has more overhead but can sleep and do blocking operations. For I/O-heavy work that needs to block (like waiting for a disk operation), use a workqueue. For quickly processing a received packet and waking a waiting task, use a threaded IRQ.

13. What is the role of the IRQ descriptor table (`irq_desc`) in Linux interrupt handling?

The IRQ descriptor table (`irq_desc[]` in the kernel) is an array of structures that describe each IRQ line in the system. Each descriptor contains: a pointer to the registered interrupt handler chain, IRQ state flags (enabled/disabled, in-progress), IRQ chip descriptor (pointer to low-level hardware operations like mask/unmask/ack), and per-IRQ statistics. When an interrupt fires, the kernel looks up the IRQ number in this table and dispatches to the registered handlers. The descriptor also stores the affinity mask (which CPUs can handle this IRQ) and the chip structure for platform-specific interrupt controller operations. Multiple devices can share one IRQ line—the kernel calls each handler in sequence until one claims the interrupt. The descriptor is also where the kernel stores per-IRQ configuration like trigger type and affinity.

14. How does the ` threaded_irq` flag change interrupt handling behavior and what are its advantages?

`IRQF_ONESHOT` tells the kernel that this interrupt line cannot be shared and the handler must run to completion (or be threaded) before the interrupt line is re-enabled. Without `IRQF_ONESHOT`, the kernel disables the interrupt line at handler start and re-enables at handler completion—on multi-core systems, this means the device cannot generate another interrupt on this line until the handler finishes. For a slow handler (doing I/O), this causes interrupt starvation. With `IRQF_ONESHOT`, the kernel keeps the hardware interrupt disabled but allows the kernel to re-enable software interrupt handling on that CPU. The interrupt is threaded via `request_threaded_irq`, which splits the handler into a fast primary handler (which runs atomic and acknowledges the hardware) and a threaded part (which can do slow work). This prevents slow device access from blocking other interrupts on that IRQ line.

15. What is the interrupt storm problem and how do you diagnose its cause using `/proc/interrupts`?

An interrupt storm is when a device generates interrupts at a rate faster than the system can handle, causing CPU saturation with interrupt processing and preventing useful work. The first diagnostic is `/proc/interrupts`—look at the delta (change) column if you're sampling repeatedly. If an IRQ's count is increasing by thousands per second, you have an interrupt storm. The causes: hardware malfunction (device defective, asserting IRQ line permanently), interrupt coalescing misconfiguration (too low thresholds), software bug causing the interrupt to be raised repeatedly for the same event (handler not acknowledging properly), or a race condition where clearing the interrupt doesn't actually clear the hardware condition. The `spurious` entry in `/proc/interrupts` shows counts of unhandled interrupt calls. Solving requires examining the device's interrupt status register in the ISR to see what condition is causing the repeat, and adjusting hardware coalescing or driver logic accordingly.

16. How does `enable_irq()` and `disable_irq()` work and why must they be called in symmetric pairs?

`disable_irq()` increments a counter and masks the IRQ line at the interrupt controller—each call must have a matching `enable_irq()` to decrement the counter and unmask. The IRQ line only re-enables when the counter reaches zero. This means calling `disable_irq()` twice requires two `enable_irq()` calls to restore. If `disable_irq()` is called from interrupt context, the function waits for any in-flight interrupts to complete before returning, ensuring no handler is running when it returns. The symmetric pair design prevents accidental re-enabling by another piece of code. Common bug: `disable_irq()` in init, `enable_irq()` in cleanup, but cleanup called multiple times causes double-enable. Using `request_threaded_irq()` with `IRQF_ONESHOT` avoids manual enable/disable for most cases. `disable_irq_nosync()` is a non-waiting variant that returns immediately without waiting for in-flight handlers.

17. What is the `IRQF_SHARED` flag and what are the requirements for sharing an interrupt line correctly?

`IRQF_SHARED` allows multiple devices to share one IRQ line—the kernel calls all registered handlers until one claims the interrupt. To share correctly: all devices on the line must support shared interrupts; the handler must check if its device actually raised the interrupt (by reading device status registers) and return `IRQ_NONE` if not; and the handler must acknowledge the interrupt before returning (so the hardware de-asserts the line). If two devices both return `IRQ_HANDLED` without checking, one might be handling the other's interrupt (and vice versa), leading to missed events. Shared interrupts are common when IRQ lines are limited (legacy systems). The request is `request_irq(irq, handler, IRQF_SHARED, name, dev_id)` where `dev_id` is a unique identifier each device passes so the handler can identify which device is reporting.

18. How does the kernel ensure that interrupt handlers can safely access shared data structures with normal kernel code?

Interrupt handlers run atomically (interrupts disabled on that CPU), so they can safely access data structures that might be modified by normal kernel code if proper synchronization is used. The key is: if data is accessed from interrupt context and also from process context, you need either: (1) disabling the interrupt source when the process context code accesses it (via `spin_lock_irqsave()` which also disables interrupts), or (2) using lock-free algorithms with memory ordering, or (3) using per-cpu data that interrupt handlers never touch. The `spin_lock_irqsave` pattern is common: it disables interrupts on the local CPU, acquires the spinlock, and saves the interrupt state. When the lock is released and state restored, interrupts are re-enabled only if they were enabled before. This prevents deadlock (nested interrupt handlers can't deadlock on the same lock because interrupts are disabled).

19. What is the role of `synchronize_irq()` and why might you call it during driver removal?

`synchronize_irq()` blocks until any interrupt handler currently executing on that IRQ line completes. It is called during driver removal to ensure that no handler is running when resources are freed. If you `free_irq()` without synchronizing, you might free the IRQ descriptor (or device structure) while the ISR is still accessing it—causing a use-after-free crash. The pattern is: `disable_irq()` stops new interrupts, `synchronize_irq()` waits for any in-flight ISR to finish, then you can safely free resources. `synchronize_irq()` is expensive (it can sleep if the handler is slow), so call it only during shutdown sequences where sleeping is acceptable. For faster synchronization, `synchronize_rcu()` is used (for RCU-based synchronization), but that only applies to RCU-protected data, not general ISR completion. In practice, driver removal code calls `free_irq()` which internally calls `synchronize_irq()`.

20. What is the difference between polling and interrupts in the context of high-frequency trading systems and why do they choose polling?

High-frequency trading (HFT) systems often use polling (busy-wait) instead of interrupts because interrupts have non-deterministic latency. When a market data packet arrives, the ISR runs after a latency of 1-10 microseconds due to interrupt masking, vector lookup, and scheduler preemption. In HFT, this latency is unacceptable—the difference between being first and second in queue can be millions of dollars. Polling a network socket with `epoll_wait()` with a timeout of zero (busy loop) processes data immediately when it arrives. The tradeoff is CPU overhead (the core is always busy even when no data is present), but at extreme frequencies (10Gbps network, thousands of updates per second), the CPU would be handling interrupts constantly anyway. The busy-wait loop ensures every microsecond is used for processing. Many HFT systems also use kernel bypass (DPDK, Solarflare) to avoid OS overhead entirely, moving NIC handling into userspace with busy-polling.

Conclusion

Interrupts and polling represent two fundamental models for device-CPU communication, with modern systems blending both into hybrid approaches. Hardware interrupts provide immediate response for sparse, unpredictable events; polling handles dense, predictable workloads without interrupt overhead. The top-half/bottom-half split in Linux interrupt handling reflects this: ISRs run atomic and fast, while deferred handlers do lengthy work in process context.

Understanding interrupt handling prepares you for deeper OS topics: the interaction between interrupt context and scheduler, the security implications of IRQ routing in virtualized environments, and the evolution toward message-signaled interrupts that avoid physical pin limitations. These fundamentals also appear in ARM and RISC-V interrupt controllers, embedded real-time operating systems, and hypervisor virtual interrupt delivery.

Looking forward, the lines between interrupt and polling continue blurring as hardware supports interrupt coalescing, modern NVMe drives benefit from pure polling during intensive periods, and software-defined interrupt controllers enable flexible IRQ routing for power and performance optimization.

Introduction

When to Use / When Not to Use

When Hardware Interrupts Are Appropriate

When Polling Is Appropriate

Hybrid Approaches (The Common Case)

Architecture or Flow Diagram

Core Concepts

Hardware Interrupts

IRQ Handling in Linux

Interrupt Bottom Halves

Polling Model

Comparison: Interrupts vs Polling

Production Failure Scenarios

Scenario 1: Interrupt Storm from Misbehaving Hardware

Scenario 2: Deadlock from Blocking in ISR

Scenario 3: Race Between ISR and Driver Removal

Trade-off Table

Implementation Snippets

Complete Interrupt-Driven Driver Framework

Python Simulation of Interrupt vs Polling

Observability Checklist

Linux Interrupt Metrics

Key Metrics to Monitor

Common Pitfalls / Anti-Patterns

Interrupt Security Concerns

Coding Anti-Patterns

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

ASLR & Stack Protection

Assembly Language Basics: Writing Code the CPU Understands

Boolean Logic & Gates