Deadlock & Starvation
Deadlock conditions (Coffman), prevention strategies, detection & recovery
Deadlock & Starvation
A mutex that is never released is not a performance bug—it is a correctness bug. When two threads each hold a lock that the other needs, neither can proceed. The system does not crash, the process does not exit. Threads simply wait for each other, forever. This is deadlock, and it is one of the most feared failure modes in concurrent systems because it is so difficult to detect in testing, so subtle in production, and so catastrophic when it occurs.
Beyond deadlock lies starvation: a thread that is never given CPU time or resource access because other threads perpetually ahead of it claim the resources it needs. Deadlock freezes a set of threads; starvation leaves a thread permanently behind. Both violate the fundamental promise of a multi-tasking operating system: that every runnable thread will eventually make progress.
Overview
Deadlock occurs when a set of threads are all blocked, each waiting for a lock held by another thread in the set. The threads cannot unblock themselves because the holder of each needed lock is itself blocked waiting for another lock.
The condition for deadlock was formally described by Edward Coffman in 1971, and the four Coffman conditions remain the canonical characterization:
- Mutual exclusion: At least one resource must be non-shareable (only one thread can use it at a time)
- Hold and wait: A thread holds a resource while waiting for another resource held by another thread
- No preemption: Resources cannot be forcibly taken from a holding thread; they must be released voluntarily
- Circular wait: There exists a circular chain of threads, each waiting for a resource held by the next
All four conditions must hold simultaneously for deadlock to occur. Remove any one, and deadlock is impossible. This gives us four categories of prevention strategies, and two strategies for handling existing deadlock: detection and recovery.
When to Use / When Not to Use
Design for deadlock prevention from the start. Deadlock is not an edge case to handle later—it is a fundamental property of any system with multiple locks and multiple threads. Your architectural decisions about lock ordering, lock granularity, and which resources are exclusive determine whether your system can deadlock. Address this upfront.
When to use lock ordering: If your system must acquire multiple locks, define a global partial order and ensure every thread acquires locks in that order. This eliminates circular wait and prevents deadlock by construction. The cost: you may need to acquire locks in a non-intuitive order that complicates your call graph.
When to use trylock with backoff: pthread_mutex_trylock() attempts to acquire a lock without blocking. If it fails (returns EBUSY), you can release your held locks, do something else, and retry. This breaks the hold-and-wait condition but at the cost of more complex retry logic and potential livelock if two threads continuously retry in lock-step.
When to accept deadlock risk: In short-lived programs (scripts, one-shot tools), deadlock may never manifest because the program exits before the deadlock scenario arises. For long-running servers and embedded systems, deadlock prevention is mandatory.
Architecture or Flow Diagram
graph TD
A[Thread A] --> B[Acquires Lock 1]
B --> C[Thread B] --> D[Acquires Lock 2]
D --> E[Thread A tries Lock 2<br/>BLOCKED - held by B]
E --> F[Thread B tries Lock 1<br/>BLOCKED - held by A]
F --> G[DEADLOCK]
style G color:#fff
The circular wait: Thread A holds Lock 1 and needs Lock 2; Thread B holds Lock 2 and needs Lock 1. Neither can proceed. No external entity can break this cycle.
Core Concepts
The Four Coffman Conditions
Understanding deadlock requires understanding exactly why it happens. The four conditions are both necessary and sufficient:
-
Mutual exclusion: If any number of threads could simultaneously hold a resource, there would be no contention. But mutexes, file handles, and other exclusive resources are inherently non-shareable. This condition cannot be removed without changing the nature of the resource.
-
Hold and wait: If threads never held a resource while waiting for another, deadlock couldn’t happen because a thread would either have no resources (free to acquire) or have all resources it needs (never blocked waiting). Breaking this condition requires either acquiring all locks at once (atomically) or never holding a lock while waiting for another (acquire/release per lock).
-
No preemption: If the OS could forcibly take a lock from a thread, circular wait would be broken—the cycle would be cut by preemption. But mutexes are designed to be released voluntarily. Some systems support lock priority inheritance to address this indirectly (not true preemption but prevents low-priority threads from blocking high-priority ones indefinitely).
-
Circular wait: The final ingredient. Even with all three conditions above, if the wait graph is acyclic, some thread will eventually acquire what it needs. Circular wait is what happens when each thread in a set waits for a resource held by the next in a closed chain.
Prevention: Lock Ordering
The most effective and practical prevention strategy. Assign each lock a unique rank in a total ordering (e.g., by memory address, by ID). Every thread must acquire locks in increasing order and release them in decreasing order.
// Global ordering: LOCK_A < LOCK_B < LOCK_C
// All threads must acquire in this order
void operationAB(shared_state_t *s) {
pthread_mutex_t *first = &s->lock_a; // Lower address
pthread_mutex_t *second = &s->lock_b; // Higher address
pthread_mutex_lock(first); // Always acquire lower-ranked first
pthread_mutex_lock(second); // Always acquire higher-ranked second
// Critical section using both locks
pthread_mutex_unlock(second); // Release higher-ranked first
pthread_mutex_unlock(first); // Release lower-ranked second
}
The key insight: if every thread acquires locks in the same order, a cycle in the wait graph is impossible. Thread A acquires L1 before L2; Thread B also acquires L1 before L2. Thread B cannot be waiting for L1 while holding L2 if Thread A is ahead of it in the acquisition order—Thread A already holds L1. The chain is broken.
Prevention: Trylock with Backoff
#define MAX_RETRIES 5
int acquire_multiple_with_retry(pthread_mutex_t *a, pthread_mutex_t *b) {
for (int i = 0; i < MAX_RETRIES; ++i) {
if (pthread_mutex_trylock(a) == 0) {
if (pthread_mutex_trylock(b) == 0) {
return 0; // Both acquired
}
pthread_mutex_unlock(a); // Release a, will retry
}
// Back off — sleep, yield, or do other work
struct timespec t = {0, 1000000 << i}; // Exponential backoff
nanosleep(&t, NULL);
}
return EBUSY; // Failed after retries
}
This approach breaks hold-and-wait: instead of holding Lock A while waiting for Lock B, you either acquire both or release any acquired locks and try again. The cost is potential livelock (both threads retry forever) if the backoff is not sufficient, and wasted work on retries.
Detection: Wait-for Graph
The OS can model deadlock by constructing a wait-for graph—a directed graph where nodes are threads and an edge from T1 to T2 means “T1 is waiting for a lock held by T2.” A cycle in this graph indicates deadlock.
graph LR
TA[Thread A] -->|waiting for L2| TB[Thread B]
TB -->|waiting for L1| TA
style TA stroke:#00fff9
style TB stroke:#ff00ff
The kernel can run a cycle detection algorithm periodically (e.g., every 30 seconds). If a cycle is found, the system can log an alert, terminate one of the deadlocked threads (victim selection via age, priority, or resource usage), or release locks on behalf of a thread. Many production systems disable this because the detection cost is non-trivial and the recovery options are all destructive.
Starvation vs. Deadlock
Deadlock: A set of threads are all blocked forever, each waiting for a resource held by another in the set. All are frozen.
Starvation: A thread is permanently denied resources because other threads perpetually consume them or have higher priority. The thread is technically runnable (not blocked) but makes no progress.
Starvation can occur without deadlock: imagine a priority-scheduling system where a low-priority thread is perpetually preempted by medium-priority threads. The low-priority thread never gets CPU time—not because it’s blocked on a lock, but because it is never scheduled. In priority-based mutex implementations, priority inheritance can also starve intermediate priorities. Fair mutex implementations that use simple FIFO queuing can starve writers if readers continuously arrive.
Production Failure Scenarios + Mitigations
The /etc/passwd Chown Bug (Linux Security Flaw, 2003)
An ancient but instructive bug: under certain configurations, a process would open /etc/passwd for writing (acquiring a lock on the file), then fork. The child inherited the open file descriptor and the lock. If the parent tried to acquire an exclusive lock on the file and blocked, and the child was terminated by a signal, the kernel would release the lock—but the parent’s blocking lock acquisition had already consumed the file descriptor from the descriptor table, leading to a use-after-free when the parent eventually acquired the lock and wrote. The root cause involved lock ordering across process hierarchies and the difficulty of managing locks across fork/exec boundaries.
Mitigation: Never hold locks across fork(). Use pthread_atfork() to register handlers that release or re-acquire locks in child processes. Better: use file locks (flock(), fcntl()) that are automatically released on close(), not on thread termination.
The Database Double-Lock Bug
A database transaction held lock A, performed some I/O, then tried to upgrade to a more exclusive lock level on the same resource. The lock upgrade failed because the transaction already held a shared lock, and exclusive upgrade required releasing the shared lock first—but releasing and re-acquiring introduced a window where another transaction could acquire the exclusive lock first. The result was a deadlock-like scenario where two transactions each waited for the other to release.
Mitigation: When upgrading lock levels, acquire the highest level first, then downgrade if needed. Or use lock upgrade primitives that are atomic (not available in all systems—SQL Server’s UpdLock hint is an example of atomic upgrade).
Starvation in Python’s GIL-Protected Code
Python’s Global Interpreter Lock (GIL) causes threads to starve each other for CPU time when running CPU-bound Python code. The GIL only allows one thread to execute Python bytecode at a time. If a thread runs a tight CPU loop, other threads are starved of CPU—they are not blocked on a lock, but they never get scheduled. This is starvation, not deadlock.
Mitigation: Use multiprocessing for CPU-bound workloads, which bypasses the GIL entirely by using separate processes. Use asyncio for I/O-bound workloads, which avoids threading overhead.
Trade-off Table
| Strategy | How it breaks Coffman | Pros | Cons |
|---|---|---|---|
| Lock Ordering | Circular wait | Simple, zero runtime cost | Requires global convention across codebase |
| Trylock + backoff | Hold and wait | Graceful degradation under contention | Livelock risk, wasted work on retries |
| One-shot acquisition | Hold and wait | Atomic, no partial state | Complex if multiple lock levels needed |
| Deadlock detection | (Post-hoc handling) | Can recover, no prevention overhead | Cycle detection is O(n^2), recovery is destructive |
| Priority inheritance | (Partial preemption) | Real-time safe | Doesn’t prevent deadlock, only mitigates priority inversion |
| Resource pre-allocation | Mutual exclusion | Eliminates contention entirely | Underutilizes resources, poor concurrency |
Implementation Snippets
C: Global Lock Ordering with Ranked Mutexes
#include <pthread.h>
#include <stdatomic.h>
typedef struct {
pthread_mutex_t lock_user;
pthread_mutex_t lock_account;
pthread_mutex_t lock_session;
} ranked_locks_t;
// Initialize with consistent ordering
void init_locks(ranked_locks_t *l) {
// Acquire in address order (lowest address = first)
pthread_mutex_t *locks[3] = {&l->lock_user, &l->lock_account, &l->lock_session};
for (int i = 0; i < 3; ++i) {
pthread_mutex_init(&locks[i], NULL);
}
}
// Always acquire in order: user -> account -> session
int get_user_session(ranked_locks_t *l, int user_id, int *session_id) {
pthread_mutex_lock(&l->lock_user);
pthread_mutex_lock(&l->lock_account);
pthread_mutex_lock(&l->lock_session); // Safe: no cycle possible
int result = lookup_user_session(user_id, session_id);
pthread_mutex_unlock(&l->lock_session); // Reverse order
pthread_mutex_unlock(&l->lock_account);
pthread_mutex_unlock(&l->lock_user);
return result;
}
Python: Timeout-based Deadlock Detection
import threading
import time
class TimeoutLock:
def __init__(self, timeout=5.0):
self._lock = threading.Lock()
self._timeout = timeout
self._owner = None
def acquire(self, blocking=True):
if not blocking:
return self._lock.acquire(False)
deadline = time.monotonic() + self._timeout
while True:
acquired = self._lock.acquire(False)
if acquired:
self._owner = threading.current_thread().name
return True
if time.monotonic() >= deadline:
raise TimeoutError(
f"Lock acquisition timeout after {self._timeout}s. "
f"Owner: {self._owner}"
)
time.sleep(0.01) # Retry interval
def release(self):
self._owner = None
self._lock.release()
C: Hierarchical Lock Ordering with Compile-Time Ordering Check
#define LOCK_ORDER_INIT(locks) \
do { for (int _i = 0; _i < sizeof(locks)/sizeof(locks[0]) - 1; _i++) \
if (&locks[_i] >= &locks[_i + 1]) \
__builtin_trap(); /* Die if order violated */ \
} while(0)
// Usage: enforce ordering at startup
pthread_mutex_t lock_a = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock_b = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t lock_c = PTHREAD_MUTEX_INITIALIZER;
void init() {
pthread_mutex_t order[3] = {lock_a, lock_b, lock_c};
LOCK_ORDER_INIT(order); // Crashes if lock_a >= lock_b >= lock_c violated
}
Observability Checklist
- Thread state monitoring: Track threads in
BLOCKEDstate waiting on mutexes. If blocked time exceeds threshold, alert - Lock acquisition time histogram: Long tail (p99 >> p50) indicates contention or potential undetected deadlock
- Wait-for graph cycle detection: Periodic check in the kernel (if your OS exposes this) or manual instrumentation
- CPU utilization by thread: A thread consuming 0% CPU over a sustained period while the system has idle cycles indicates starvation
- Deadlock timeout alerts: Use
pthread_mutex_timedlock()with a reasonable timeout (e.g., 10s), and alert if acquisition fails - Thread count in blocked state: Sudden spikes in blocked thread count can indicate lock contention or early-stage deadlock
Security/Compliance Notes
Denial of service via deadlock: An attacker who can trigger a deadlock can cause a service to hang, denying availability to all users. In networked services, this can be exploited remotely by crafting requests that cause lock acquisition patterns that lead to deadlock. Mitigation: use lock timeouts to ensure bounded wait, use lock ordering to prevent deadlock structurally.
Priority inversion as security issue: In systems with security domains (e.g., SELinux enforcement threads, capability checks), priority inversion can cause a high-integrity thread to be blocked by a low-integrity thread holding a lock. Priority inheritance mitigates this but must be explicitly enabled and tested.
For compliance (SOC2, ISO 27001): document deadlock risk assessment for any multi-threaded service handling sensitive operations. Implement lock ordering discipline and verify with static analysis. Deadlock that causes service unavailability is an availability failure under most compliance frameworks.
Common Pitfalls / Anti-patterns
1. Different lock ordering across functions. If funcA() acquires Lock1 then Lock2, and funcB() acquires Lock2 then Lock1, you have a deadlock hazard if both can be called concurrently. Use a documented global ordering and a linter rule to enforce it.
2. Lock acquisition in library code without ordering guarantee. A library function that acquires locks must document its lock ordering requirements. If a caller violates the ordering, deadlock occurs even though the library code is correct. Libraries should use lock ranking within their own code and never acquire caller-owned locks.
3. Forking while holding a lock. A thread holding a lock that then forks creates a child process with a lock held by a thread that no longer exists. The lock becomes unreleasable. Use pthread_atfork() handlers or avoid holding locks across fork().
4. Acquiring a lock, then calling a blocking operation. If you hold a lock, call read() on a file descriptor that might block, and the read never returns (network failure, disk hang), you hold the lock forever. Release the lock before any potentially blocking operation.
5. Starvation from reader-writer lock favoring readers. A read-preferring RW lock can starve writers indefinitely if reads keep arriving. Use write-preferring or fair RW lock variants if writers must make progress.
Quick Recap Checklist
- Deadlock requires all four Coffman conditions simultaneously: mutual exclusion, hold-and-wait, no preemption, circular wait
- Prevention by lock ordering (global consistent order) is the most practical and zero-cost strategy
- Trylock with backoff breaks hold-and-wait but risks livelock
- Deadlock detection constructs a wait-for graph and looks for cycles; recovery is typically to kill a thread
- Starvation is a thread permanently denied CPU or resources without being blocked on a lock
- Use
pthread_mutex_timedlock()to detect potential deadlock in production - Never hold locks across
fork()— usepthread_atfork()handlers - Starvation in reader-writer locks requires choosing a fairness policy (read-preferring vs. write-preferring vs. fair)
Interview Questions
The four Coffman conditions are: Mutual exclusion (at least one resource must be non-shareable, e.g., a mutex), Hold and wait (a thread holds a resource while waiting for another), No preemption (resources cannot be forcibly taken from a holding thread), and Circular wait (a closed chain of threads each waiting for a resource held by the next).
In practice, circular wait is the easiest to break. You can eliminate circular wait by establishing a global lock ordering and enforcing that all threads acquire locks in that order. This costs nothing at runtime but requires architectural discipline. Hold-and-wait can be broken using trylock-with-backoff or acquiring all locks atomically, but these are more complex and introduce their own issues (livelock, single-point-of-failure acquisition). Mutual exclusion cannot be removed if you need exclusive access. No preemption is fundamentally at odds with how mutexes are designed.
Deadlock is a set of threads all permanently blocked, each waiting for a resource held by another thread in the set. Classic example: Thread A holds Lock 1 and waits for Lock 2; Thread B holds Lock 2 and waits for Lock 1. Both freeze. Neither can make any progress.
Starvation is a thread that never gets scheduled or never gets the resources it needs because other threads perpetually get there first. Classic example: a low-priority thread in a priority-based scheduler never gets CPU time because higher-priority threads always preempt it. Another example: in a read-preferring reader-writer lock, writers can starve indefinitely if reads keep arriving, because each new reader resets the "no writer waiting" condition.
Lock ordering prevents deadlock by eliminating circular wait—the fourth Coffman condition. If every thread acquires locks in the same global order (e.g., Lock A before Lock B before Lock C), then when Thread A acquires Lock A and then tries to acquire Lock B, Thread B cannot be waiting for Lock A—Thread B hasn't acquired it yet. The wait-for graph becomes a DAG, not a cycle. Even if Thread B has acquired Lock B, it must eventually release Lock B before acquiring Lock A (because of the ordering). So Thread A either waits for Lock B to be released (Thread B will eventually release it and continue), or Lock B is free and Thread A acquires it immediately. Either way, progress is guaranteed and no cycle forms.
Priority inheritance is a scheduling protocol that addresses priority inversion—not classic deadlock. When a high-priority thread H is blocked waiting for a lock held by low-priority thread L, and medium-priority threads M are running and pre-empting L, H is indirectly blocked by M. Priority inheritance temporarily promotes L's priority to H's level so that L runs and releases the lock, unblocking H. This is not preemption (the lock is not forcibly taken) but priority elevation, and it breaks the inversion chain. Without it, H (the most important thread) is blocked by L (the least important) because of M (mid-importance). With it, L inherits H's urgency and completes quickly. Linux implements this via the PI (priority inheritance) futex mechanism.
Deadlock detection requires building a wait-for graph: nodes are threads, and a directed edge T1 -> T2 exists if T1 is blocked waiting for a lock held by T2. To build this, you traverse all blocked threads and for each blocked lock, identify the owner thread. Then run cycle detection (DFS from each node, track visited nodes and recursion stack). If a cycle is found, deadlock exists.
In practice, detection is expensive (O(n^2) per scan with n threads), so it's typically run periodically (every 30-60 seconds) rather than continuously. When a cycle is found, you must choose a victim: the thread with the least overall system impact—often the youngest thread (fewest resources acquired) to minimize cleanup work. Kill the victim, release its locks, log the event. Some systems use lock timeouts as a simpler proxy: if any lock acquisition takes longer than a threshold, abort with diagnostic information. This catches deadlock earlier but can false-positive under high load.
The Banker's Algorithm, coined by Dijkstra, prevents deadlock by ensuring the system stays in a "safe state" before granting resource requests. Before allocating resources, it simulates the request and checks if a safe sequence exists where all processes can complete. If not, the request is denied.
The problem is it requires a priori knowledge of maximum resource needs—information rarely available in practice. It also assumes a fixed number of processes, which servers don't have (processes fork and exit dynamically). The algorithm's overhead scales poorly with resources and processes. Most production systems use deadlock prevention (lock ordering) or detection (wait-for graph) instead, not avoidance algorithms that require predicting future resource consumption.
The convoy effect occurs when I/O-bound processes block CPU-bound processes temporarily, causing the CPU-bound processes to lose their cache state and performance to collapse. In FIFO scheduling, a long I/O-bound job arrives first and holds the CPU while waiting for disk—meanwhile, arriving CPU-bound jobs get preempted repeatedly as the I/O job blocks. The result is poor CPU utilization and throughput collapse.
It relates to starvation because CPU-bound processes are repeatedly denied efficient CPU time. The I/O-bound jobs are not malicious—they're just doing what the OS asked of them—but the scheduling policy (FIFO) creates pathological behavior. Modern schedulers (CFS, O(1) scheduler) use CPU burst modeling and multi-level feedback queues to prevent convoys by keeping CPU-bound jobs moving while still honoring I/O-bound wait states.
Five philosophers sit at a table, each needing both the left and right fork to eat. Classic deadlock scenario: all five simultaneously pick up their left fork (they're all symmetric), then each waits indefinitely for the right fork—which another philosopher holds. No philosopher can proceed, and no external entity breaks the cycle.
Starvation variation: if philosophers pick forks randomly or if there's a fairness problem, some philosopher may never get both forks simultaneously. Even without deadlock, asymmetry in access patterns can starve one or more philosophers indefinitely.
Solutions include: resource hierarchy (always pick lower-numbered fork first—eliminates circular wait), asymmetry (one philosopher starts with right fork), centralized arbiter (let one philosopher eat at a time), and concurrent resource acquisition with trylock (if both forks unavailable, release held forks and retry).
In deadlock, threads are blocked and make zero progress—they're all stuck waiting for each other. In livelock, threads are actively executing (not blocked) but accomplish nothing productive—they're stuck in a cycle of retrying without making headway.
Classic livelock example: two people meet in a hallway and both try to step aside in the same direction, causing them to shuffle back and forth indefinitely. Both are "active" (not sleeping, not blocked) but neither reaches the door. The trylock-with-backoff mutex pattern can livelock if both threads retry at the same rate without sufficient backoff—their retries synchronize and neither ever acquires the lock.
Key diagnostic difference: in deadlock, CPU usage may drop to near-zero (threads are all blocked); in livelock, CPU usage stays high (threads continuously work but accomplish nothing). Livelock is often harder to detect because the system appears busy.
A readers-writer lock allows multiple readers concurrently but exclusive writer access. In a read-preferring implementation, if readers keep arriving, a waiting writer is perpetually blocked—the lock stays available for readers and never transitions to exclusive mode while readers queue ahead.
Solutions: Write-preferring (writers jump ahead in the queue, blocking new readers when a writer waits). Fair queuing (FIFO discipline, no preference—first-writer-wins when a writer is waiting, but queued readers still get served in order). Read-copy-update (RCU) is a Linux-specific solution where readers access old data without blocking, and a writer coordinates a grace period before updating—readers never block writers and writers never block readers.
The choice between reader and writer preference depends on workload: read-heavy workloads prefer readers; write-latency-sensitive workloads prefer writers.
A deadlock-free lock guarantees that some thread will eventually acquire the lock if the system runs long enough—no guaranteed wait bound, but progress is guaranteed. Trylock with backoff is deadlock-free (you eventually either acquire or give up after retries). Starvation may still occur for unlucky threads.
A lock-free implementation guarantees that some thread makes progress in bounded time regardless of other threads' state. Lock-free typically uses atomic primitives (CAS) without locks. Wait-free extends this further: every thread completes any operation in bounded time. A mutex is neither lock-free (it blocks) nor wait-free (waiting threads can be indefinitely delayed). Lock-free/wait-free are stronger progress guarantees used in real-time and high-contention scenarios.
lockdep tracks lock acquisition ordering across all code paths. It builds a "lock dependency graph" as the kernel runs, recording the order each lock is acquired relative to others. When a new lock acquisition creates a cycle in the dependency graph, lockdep immediately warns (or panics if configured) about potential deadlock.
lockdep categorizes dependencies: non-recursive vs recursive locks, interrupt contexts, order reversals. It catches the classic scenario where two functions acquire locks in opposite orders across different code paths. The overhead is low (a few hundred bytes per lock) and it's compiled into debug kernels. Use cat /proc/lockdep_stats for statistics and /sys/kernel/debug/lockdep for detailed output.
The "no preemption" condition means resources cannot be forcibly taken from a holding thread—the thread must voluntarily release them. This is fundamental to how mutexes are designed; you can't preempt a thread and take its lock without corrupting state.
Priority inheritance partially addresses the effects of no preemption in priority-inverted scenarios. When a low-priority thread L holds a lock and a high-priority thread H blocks waiting for it, priority inheritance temporarily promotes L's priority to H's level, allowing L to run and release the lock. This isn't true preemption (the lock isn't taken), but it prevents medium-priority threads from starving L (and thus H). Linux implements this via PI (priority inheritance) futexes.
One-shot acquisition acquires all needed locks in a single atomic operation (or fails entirely). This breaks the "hold and wait" Coffman condition: either you acquire all locks together (and can proceed) or you acquire none (and can retry). There's no state where you hold some locks while waiting for others.
The challenge is implementation: OS-level lock acquisition APIs typically don't support multi-lock atomic acquisition. The pattern must be implemented by acquiring a "meta-lock" that guards the acquisition of all other locks, or using transactional memory (TSX, synchronized methods). The cost is reduced concurrency—a thread holding multiple locks serializes more work—and potential for single-thread bottlenecks if the meta-lock becomes a hotspot.
Prevention (lock ordering, one-shot acquisition) eliminates deadlock structurally at design time—no runtime overhead, no detection cost. But it constrains design: you must define global lock ordering, which becomes complex in large codebases, and hold-and-wait patterns may be necessary for some algorithms.
Detection + Recovery allows more natural lock usage (holds the "no preemption" and "hold-and-wait" conditions) but requires runtime cycle detection (expensive, O(n^2)) and recovery (typically kill a thread, which is destructive). The advantage is flexibility—locks can be acquired in any order, with detection catching violations at runtime.
Most production systems use prevention (lock ordering) as primary defense and detection only as a safety net. Lockdep in Linux is an example of detection-as-safety-net.
The futex (fast userspace mutex) allows a mutex to live entirely in userspace when uncontended—zero syscalls. A thread spins in userspace waiting for the lock. Only when the lock appears to be held (futex value indicates contention) does the thread enter the kernel via futex_wait() to sleep. On unlock, the locker sets the futex and wakes one waiter via futex_wake().
Deadlock risks in futex-based mutexes: if the lock holder exits without releasing (bug in mutex implementation), waiters sleep forever in kernel. If the futex word is in memory that gets reused or corrupted, undefined behavior results. The kernel-side futex wait/wake relies on userspace to maintain the futex value correctly—a buggy mutex can leak waiters into the kernel where they consume resources.
The ABA problem occurs in lock-free CAS-based algorithms: a thread reads location A (value X), then another thread modifies A to Y and back to X before the first thread performs its CAS. The CAS succeeds even though the location changed twice, because the value matches the original. This can cause data structure corruption if the "B" state contains structural modifications.
It relates to deadlock conceptually: both are concurrency failures where simple apparent safety guarantees are violated by unexpected thread interleaving. The ABA problem typically doesn't cause deadlock (threads don't block), but can cause data structure corruption that manifests as crashes or hangs—symptoms that can appear similar to deadlock in production debugging.
pthread_mutexattr_setrobust() exist and what happens without it?When a process terminates while holding a mutex (without pthread_mutexattr_setrobust()), the mutex is left in an inconsistent state. Any thread that subsequently tries to acquire the lock will wait forever—the lock holder is gone and no one can release it. This is a "permanent block" scenario distinct from classic deadlock (threads are alive but blocked) but functionally equivalent.
With PTHREAD_MUTEX_ROBUST set, if the lock holder dies while holding the mutex, EOWNERDEAD is returned to waiters, indicating the previous holder died. The waiter can then call pthread_mutex_consistent() to mark the mutex as valid and recover, or simply unlock and retry. This prevents permanent blocking but requires application-level recovery logic.
Lock coarsening (also called "locking consolidation") combines multiple fine-grained locks into one coarser lock. For example, instead of separate locks for hash table buckets, use a single lock for the entire table. This reduces deadlock risk (fewer locks = fewer cycles) and can improve performance by reducing contention overhead. The tradeoff is less parallelism—coarser locks mean more work serialized under one lock.
The deadlock relationship: lock coarsening reduces the number of lock acquisition sites, which reduces the possibility of inconsistent ordering. However, coarsening can also increase hold-and-wait violations if a coarse lock is held while calling functions that acquire other locks. If coarsening merges locks that are acquired in different orders across different operations, you may still have deadlock risk within the merged operation.
In PostgreSQL, tuple updates require acquiring row-level locks in a specific order. If Transaction A holds lock on row X and waits for row Y, while Transaction B holds lock on row Y and waits for row X, classic deadlock occurs. PostgreSQL's deadlock detector runs periodically, finds the cycle, and terminates one transaction (the younger one by default). This is detection-based handling, not prevention.
For prevention, PostgreSQL's advisory locks use a user-defined locking protocol—the application must acquire them in a consistent order. For regular row locks, lock ordering isn't enforced (rows are accessed in physical order, not logical), so detection with victim selection is used. This shows why prevention (lock ordering) is often impractical for database workloads where access order is determined by query execution, not by application design.
- Process Scheduling — How the OS decides which process runs next
- Mutex Implementation — How synchronization primitives work
- Semaphores — Alternative synchronization primitives
Conclusion
- Process Scheduling — How the OS decides which process runs next
- Mutex Implementation — How synchronization primitives work
- Semaphores — Alternative synchronization primitives
Category
Related Posts
ASLR & Stack Protection
Address Space Layout Randomization, stack canaries, and exploit mitigation techniques
Assembly Language Basics: Writing Code the CPU Understands
Learn to read and write simple programs in x86 and ARM assembly, understanding registers, instructions, and the art of thinking in low-level operations.
Boolean Logic & Gates
Understanding AND, OR, NOT gates and how they combine into arithmetic logic units — the building blocks of every processor.