Consistency Models in Distributed Systems: A Complete Guide
Learn about strong, weak, eventual, and causal consistency models. Understand read-your-writes, monotonic reads, and how to choose the right model for your system.
Consistency Models in Distributed Systems
Strong consistency sounds simple: reads always return the most recent write. In distributed systems, though, strong consistency is expensive. The CAP theorem forces us to think about consistency trade-offs, and the PACELC theorem reminds us that even without partitions, consistency has a latency cost.
Knowing where different consistency models apply helps you make better architectural decisions. Most systems do not need strong consistency everywhere. Finding where you can relax consistency guarantees can dramatically improve performance.
This post builds on the CAP theorem and PACELC theorem discussions.
The Consistency Spectrum
Think of consistency as a dial you can turn, not a binary switch:
graph LR
A[Strong<br/>Consistency] --> B[Sequential<br/>Consistency]
B --> C[Causal<br/>Consistency]
C --> D[Eventual<br/>Consistency]
D --> E[Weak<br/>Consistency]
Strong consistency at one end, weak at the other. Most practical systems sit somewhere in between.
Strong Consistency
Strong consistency guarantees that any read sees all preceding writes. The system appears to have a single copy of the data. This is what most developers assume when they think about database transactions.
Bank account balances are the standard example:
// Strong consistency: read after write always returns the new value
async function transfer(from, to, amount) {
const account = await db.account.findOne({ id: from });
if (account.balance >= amount) {
await db.account.update(
{ id: from },
{ balance: account.balance - amount },
);
await db.account.update({ id: to }, { balance: account.balance + amount });
return { success: true };
}
return { success: false, reason: "Insufficient funds" };
}
// After this completes, any read of 'from' balance shows the new value
const balance = await db.account.findOne({ id: from });
// balance is definitely reduced, even if you read from a different replica
Strong consistency uses consensus protocols like Paxos or Raft to coordinate reads and writes across nodes. This coordination adds latency. In a geo-distributed system, a strongly consistent write might take 100-300ms. An eventually consistent write might take 5ms.
Sequential Consistency
Sequential consistency sits between strong and eventual. It guarantees that all nodes see operations in the same order, but that order does not have to match real time.
Imagine two processes writing to the same variable:
// Sequential consistency: both replicas see writes in the same order
// but that order might not match real-world time
// Process A writes x = 1
// Process B writes x = 2
// Process C reads x
// Valid under sequential consistency: C might read 1 or 2
// Invalid: C reading x = 0 (the initial value) after both writes completed
With sequential consistency, you cannot have time travel paradoxes where a later write appears to happen before an earlier write. But you also cannot assume that a read reflects all writes that finished before it in real time.
Causal Consistency
Causal consistency is weaker. It only guarantees that causally related operations appear in order across all nodes. Operations with no causal relationship can be observed in different orders by different nodes.
Causally related means: if operation A causes operation B to happen, then A must be visible before B everywhere. Example:
// Causal consistency example
// Post 1: "I got promoted" (write by user)
// Post 2: "Celebrating tonight" (comment on post 1)
// Every node sees post 2 only after seeing post 1
// Because post 2 is causally dependent on post 1
// But "I got promoted" and "Weather is nice today" from the same user
// Are not causally related - they can appear in any order
Causal consistency is attractive because it is the strongest model that can be implemented without coordination in certain systems. The Dynamo paper{: rel=“noopener noreferrer”} popularized eventual consistency with causal metadata.
Eventual Consistency
Eventual consistency makes the weakest guarantee: if no new updates are made to an object, eventually all reads will return the last written value.
The word “eventually” is doing a lot of work here. It could mean milliseconds or hours. Different systems give different guarantees.
Bounded vs Unbounded Eventual Consistency
An important distinction often overlooked: bounded vs unbounded eventual consistency.
| Type | Definition | Real-World Examples |
|---|---|---|
| Bounded | Convergence happens within a known, finite time | DynamoDB (typically seconds), Cassandra (usually < 1 second normally) |
| Unbounded | Convergence will happen, but no time bound is guaranteed | DNS propagation (minutes to hours), CDN cache expiry (hours to days) |
Why this matters:
// Bounded eventual consistency - you can reason about staleness
// DynamoDB withDynamoDB Streams: staleness typically < 5 seconds
async function readWithStalenessBound(key) {
const item = await dynamodb.getItem({ Key: key }).promise();
// You know the item is at most 5 seconds stale (bounded)
return item;
}
// Unbounded eventual consistency - no promises about when
// Traditional DNS: TTL might be 24 hours, but convergence is not guaranteed
async function resolveWithDNS(hostname) {
const addresses = await dns.lookup(hostname);
// Could return old IP for hours; no bounded guarantee
return addresses;
}
Convergence time factors:
| Factor | Impact on Convergence Time |
|---|---|
| Network distance | Geo-distributed replicas add 50-200ms per hop |
| Load average | High load increases replication lag |
| Failure recovery | Node recovery triggers anti-entropy, can take minutes |
| Quorum availability | If quorum is impaired, writes may stall |
For bounded eventual consistency, typical convergence times:
- Same-region replicas: 10-100ms
- Cross-region replicas: 100ms - 2s
- Under failure conditions: can extend to 30s or more
For unbounded eventual consistency:
- DNS: minutes to 48+ hours (depending on TTL and propagation)
- CDN: hours to days
- Some gossip protocols: theoretically unbounded, practically O(log N)
Session Consistency and PRAM
Beyond the basic models, there are practical consistency models designed for specific use cases.
Session Consistency
Session consistency guarantees that within a single client session, certain properties hold. This is not a single model but a family of guarantees:
| Session Guarantee | Definition | Implementation |
|---|---|---|
| Read-your-writes | Session sees its own writes | Route reads to primary or last-write replica |
| Monotonic reads | Session never sees older values | Sticky sessions, read your writes routing |
| Monotonic writes | Session’s writes are ordered | Per-client write sequencing |
| Write-follows-reads | Write is ordered after reads that preceded it | Causal ordering tracking |
DynamoDB session consistency example:
// DynamoDB: Session-level consistency guarantees
// Using a client's session token to ensure read-your-writes
const dynamodb = new AWS.DynamoDB({ region: "us-east-1" });
const docClient = new AWS.DynamoDB.DocumentClient({
dynamodb,
sessionToken: "user-session-id", // Enables session guarantees
});
// With ConsistentRead=true, guarantees read-your-writes within session
const item = await docClient
.get({
TableName: "Posts",
Key: { userId, postId },
ConsistentRead: true, // Strong within this session
})
.promise();
PRAM (Pipelined Random Access Memory) Consistency
PRAM consistency is a weak model that guarantees:
Writes from a single process are observed by all processes in the order they were issued. Writes from different processes may be observed in different orders.
In simpler terms: your own writes appear in order to everyone; other people’s writes may appear in different orders to different observers.
// PRAM consistency example
// Process A writes x=1, then x=2
// Process B writes y=1, then y=2
// All processes see:
// - A's writes in order: x=1 then x=2
// - B's writes in order: y=1 then y=2
// But some might see B's writes interleaved with A's differently
// Valid PRAM ordering: x=1, x=2, y=1, y=2
// Also valid: y=1, y=2, x=1, x=2
// Also valid: x=1, y=1, x=2, y=2
// NOT valid: x=2, x=1 (A's writes out of order)
PRAM vs Causal consistency:
- PRAM: Only your own write ordering is guaranteed
- Causal: Write ordering that has causal dependency is guaranteed
PRAM is relatively easy to implement efficiently and provides useful guarantees for some workloads.
// Eventual consistency: write returns immediately
async function writeEventual(key, value) {
// Write to local replica, return immediately
await localCache.set(key, value);
return { success: true, version: ++localVersion };
}
// Later, background replication propagates the write
// During the propagation window, reads might return stale values
async function read(key) {
return await localCache.get(key); // Could be stale during propagation
}
Amazon DynamoDB uses eventual consistency by default. Google Spanner uses strong consistency. Most databases let you choose per-query.
Read Your Own Writes
Read-your-writes consistency (also called read-after-write consistency) guarantees that after a client writes a value, subsequent reads by that same client see that write.
This is intuitive for single-node systems but tricky in distributed systems:
// Without read-your-writes: user might not see their own post
await api.createPost({ title: "Hello World" });
const posts = await api.getMyPosts();
// posts might not include "Hello World" if the read hits a replica
// that has not yet received the write
// With read-your-writes: the read routes to the right place
await api.createPost({ title: "Hello World" });
const posts = await api.getMyPosts({ consistentRead: true });
// posts definitely includes "Hello World"
Most social media applications need read-your-writes for user-generated content. Users get confused when they post something and then do not see it.
Monotonic Reads
Monotonic reads guarantee that if a client reads a value V at time T, it will never read an older value at a later time. Reads never go backward in time.
This prevents a jarring experience where data appears to roll back:
// Without monotonic reads: timeline can go backward
await api.createComment({ text: "Great post!" });
const comments1 = await api.getComments(postId); // Includes "Great post!"
// ... some time passes, replication lag increases ...
const comments2 = await api.getComments(postId); // "Great post!" disappeared!
// With monotonic reads: once you see a value, you never unsee it
const comments = await api.getComments(postId, { monotonic: true });
// Once "Great post!" appears, it stays visible for this client session
Monotonic reads are important for any application where users see a chronological list of items. Without this guarantee, pagination can show duplicate items or skip items that were added while you were reading.
Monotonic Writes
Monotonic writes guarantee that writes from a client appear in the order they were issued. This sounds obvious, but in distributed systems, a client might issue two writes that arrive at different replicas in different orders.
// Without monotonic writes: writes can be applied out of order
await api.updateProfile({ name: "Alice" });
await api.updateProfile({ status: "Busy" });
// The replica receiving updates might apply status before name
// resulting in name being set but status showing previous value
// With monotonic writes: writes are serialized per client
await api.updateProfile({ name: "Alice" }, { serial: true });
await api.updateProfile({ status: "Busy" }, { serial: true });
// Updates are applied in order, regardless of network timing
This is less commonly discussed but matters for operations that build on each other.
Choosing a Consistency Model
Different parts of your application may need different consistency levels:
| Use Case | Recommended Model | Reason |
|---|---|---|
| User profile updates | Read-your-writes | Users expect to see their own changes |
| Social media feeds | Eventual + Monotonic reads | Speed matters, rollback is jarring |
| Financial transactions | Strong consistency | Cannot tolerate stale reads |
| Inventory counts | Strong consistency | Overselling costs money |
| IoT sensor data | Weak/Eventual | Historical accuracy vs real-time speed |
| Chat messages | Causal + Read-your-writes | Messages must appear in order |
The Availability Patterns post covers building systems that maintain availability while using strong consistency where needed.
Implementing Consistency Levels
Most modern databases let you choose consistency per query:
// MongoDB: read concern levels
db.collection.find({}, { readConcern: "linearizable" }); // Strong
db.collection.find({}, { readConcern: "majority" }); // Quorum
db.collection.find({}, { readConcern: "local" }); // Local/eventual
// Cassandra: consistency levels
const result = await client.execute(query, params, { consistency: "QUORUM" }); // Strong
const result = await client.execute(query, params, { consistency: "ONE" }); // Weaker
// DynamoDB: read modes
dynamodb.getItem({ Key: key, ConsistentRead: true }); // Strong
dynamodb.getItem({ Key: key, ConsistentRead: false }); // Eventual
Understanding these trade-offs lets you tune your database for your specific workload.
Conclusion
Consistency models exist on a spectrum. Strong consistency is simple to reason about but expensive. Weaker models are cheaper and more available but require more thought from application developers.
When to Use / When Not to Use
| Use Case | Recommended Model | Reason |
|---|---|---|
| User profile updates | Read-your-writes | Users expect to see their own changes |
| Social media feeds | Eventual + Monotonic reads | Speed matters, rollback is jarring |
| Financial transactions | Strong consistency | Cannot tolerate stale reads |
| Inventory counts | Strong consistency | Overselling costs money |
| IoT sensor data | Weak/Eventual | Historical accuracy vs real-time speed |
| Chat messages | Causal + Read-your-writes | Messages must appear in order |
| Gaming leaderboards | Eventual | Staleness acceptable, latency critical |
| Collaborative editing | Causal | Order matters, some staleness OK |
When TO Use Weaker Consistency
- User experience acceptable: When users will not notice or care about brief staleness
- High-volume reads: When throughput matters more than absolute freshness
- Geo-distributed reads: When users are far from primary and latency matters
- Non-critical data: Analytics, metrics, cached content where eventual is fine
When NOT to Use Weaker Consistency
- Financial operations: Money transfers, payment processing, balance inquiries
- Inventory operations: Any case where overselling has business impact
- Booking/reservations: Double-booking must be prevented
- Regulatory compliance: Operations requiring audit-trail ordering
Production Failure Scenarios
| Failure Scenario | Impact | Mitigation |
|---|---|---|
| Replica returns stale data after write | User sees old value after confirming write | Implement read-your-writes by routing reads to quorum or primary |
| Monotonic read violation | Data appears to roll back, breaking user experience | Use sticky sessions or read-your-writes guarantees |
| Causal ordering violation | Comment appears before post it references | Implement vector clocks for causal tracking |
| Conflict resolution produces wrong result | LWW picks wrong version in concurrent edits | Use CRDTs for automatically mergeable data |
| Replication lag spikes | Extended staleness window during high load | Monitor lag, alert on thresholds, scale replicas |
Conflict Resolution Strategies
When multiple replicas accept concurrent writes, conflicts must be resolved:
| Strategy | How It Works | Trade-offs |
|---|---|---|
| Last-Write-Wins (LWW) | Highest timestamp wins | Simple but can lose writes; depends on clock sync |
| First-Write-Wins | First write wins | Conservative but wasteful |
| CRDTs | Data structures that merge automatically | No conflicts by design; limited to certain types |
| Application Merge | App code decides how to combine | Flexible but requires custom logic |
| Manual Resolution | User decides | Best for critical conflicts; bad UX otherwise |
CRDT Example: G-Counter
// Grow-only counter - merges by taking max of each replica's value
class GCounter {
constructor(replicaId) {
this.replicaId = replicaId;
this.counts = {}; // { replicaId: value }
}
increment() {
this.counts[this.replicaId] = (this.counts[this.replicaId] || 0) + 1;
}
merge(other) {
for (const [id, value] of Object.entries(other.counts)) {
this.counts[id] = Math.max(this.counts[id] || 0, value);
}
}
value() {
return Object.values(this.counts).reduce((a, b) => a + b, 0);
}
}
CRDT Example: LWW-Register (Last-Write-Wins)
LWW-Register stores a value and a timestamp. On merge, the write with the higher timestamp wins:
// Last-Write-Wins Register - simplest conflict-free register
class LWWRegister {
constructor(replicaId) {
this.replicaId = replicaId;
this.value = null;
this.timestamp = 0;
}
set(value) {
// Use hybrid logical clock or wall clock with care
this.timestamp = Date.now();
this.value = value;
}
merge(other) {
if (other.timestamp > this.timestamp) {
this.value = other.value;
this.timestamp = other.timestamp;
}
// If our timestamp is newer or equal, keep our value
}
get() {
return this.value;
}
}
// Usage: inventory items, user preferences, configuration
const inventory = new LWWRegister("replica-1");
inventory.set({ item: "widget", count: 10 });
// Concurrent write from another replica with timestamp 1001
inventory.merge({ value: { item: "widget", count: 15 }, timestamp: 1001 });
console.log(inventory.get()); // { item: 'widget', count: 15 }
CRDT Example: OR-Set (Observed-Remove Set)
OR-Set allows adding and removing elements where add-wins-over-remove for concurrent operations:
// Observed-Remove Set - add wins over remove for concurrent ops
class ORSet {
constructor(replicaId) {
this.replicaId = replicaId;
this.items = new Map(); // element -> { addedBy: replicaId, addTag: number }
this.tombstones = new Map(); // element -> Set of removed tags
}
add(element) {
const tag = Date.now() + Math.random();
this.items.set(element, {
addedBy: this.replicaId,
addTag: tag,
removed: false,
});
}
remove(element) {
// Mark as removed but don't delete - we need to track for merging
const entry = this.items.get(element);
if (entry) {
entry.removed = true;
}
}
merge(other) {
for (const [element, entry] of other.items) {
const ourEntry = this.items.get(element);
if (!ourEntry) {
// Element doesn't exist locally, add it
this.items.set(element, { ...entry });
} else if (entry.addTag > ourEntry.addTag) {
// Other replica added this element more recently
this.items.set(element, { ...entry });
} else if (entry.removed && !ourEntry.removed) {
// Other replica removed it, but we haven't - add wins
// Keep the element
ourEntry.removed = false;
}
// If both removed, stays removed (no-op)
}
}
contains(element) {
const entry = this.items.get(element);
return entry && !entry.removed;
}
}
// Usage: shopping cart items, collaborative document elements
const cart = new ORSet("user-1");
cart.add("item-A");
cart.add("item-B");
cart.remove("item-B");
// Concurrent removal of item-B and addition of item-C from another device
cart.merge({
items: new Map([
["item-B", { addedBy: "user-1", addTag: 100, removed: true }],
["item-C", { addedBy: "user-1", addTag: 101, removed: false }],
]),
});
console.log(cart.contains("item-A")); // true
console.log(cart.contains("item-B")); // true (add wins)
console.log(cart.contains("item-C")); // true
CRDT Comparison Table
| CRDT Type | Operations | Merge Strategy | Use Cases |
|---|---|---|---|
| G-Counter | Increment only | Take max per replica | Vote counts, page views |
| PN-Counter | Increment + Decrement | Add positives, subtract negatives | Account balances |
| LWW-Register | Set value | Higher timestamp wins | Configuration, preferences |
| OR-Set | Add + Remove | Add wins over concurrent remove | Shopping carts, collaborative editing |
| RGA | Add + Remove (ordered) | Conflation of concurrent removes | Chat messages, collaborative text |
Read Repair vs Anti-Entropy Mechanisms
Distributed databases use two complementary mechanisms to maintain consistency: read repair and anti-entropy. Understanding the difference helps you diagnose consistency issues in production.
Read Repair (Reactive)
Read repair happens during read operations. When a replica returns stale data during a read, the system proactively pushes the fresh data to that replica.
// Conceptual read repair flow
async function readWithRepair(key) {
// Read from multiple replicas
const [replica1, replica2, replica3] = await Promise.all([
readReplica("replica1", key),
readReplica("replica2", key),
readReplica("replica3", key),
]);
// Check for divergence
const versions = [replica1, replica2, replica3];
const latest = findMostRecent(versions);
// If any replica is stale, repair it
for (const replica of versions) {
if (replica.version < latest.version) {
// Asynchronously push the correct value
repairReplica(replica, latest);
}
}
return latest.value;
}
Characteristics:
- Triggered by: Read operations
- Scope: Only repairs replicas that were contacted during the read
- Latency impact: Minimal (async repair)
- Coverage: Limited - replicas not read may remain stale
Anti-Entropy (Proactive)
Anti-entropy is a background process that continuously compares replicas and repairs any divergence, regardless of whether reads have occurred.
// Conceptual anti-entropy using Merkle trees
class AntiEntropyService {
constructor(replicas) {
this.replicas = replicas;
this.merkleTrees = new Map();
}
// Build Merkle tree for a replica's data range
buildMerkleTree(replica, range) {
const tree = new MerkleTree();
for (const [key, value] of this.getRange(replica, range)) {
tree.insert(hash(key + value.version));
}
return tree;
}
// Compare trees to find divergent keys
async compareReplicas(replicaA, replicaB) {
const treeA = this.buildMerkleTree(replicaA, "all");
const treeB = this.buildMerkleTree(replicaB, "all");
// Find nodes that differ
const diff = treeA.diff(treeB);
// For each divergent node, traverse down to find actual keys
const divergentKeys = [];
for (const node of diff) {
if (node.isLeaf()) {
divergentKeys.push(node.key);
} else {
// Fetch children for deeper comparison
const children = await this.fetchChildren(replicaA, node);
divergentKeys.push(...children);
}
}
// Sync divergent keys
await this.syncKeys(replicaA, replicaB, divergentKeys);
}
}
Characteristics:
- Triggered by: Background scheduled process (e.g., Cassandra nodetool repair)
- Scope: All replicas, regardless of recent reads
- Latency impact: Can be significant during repair operations
- Coverage: Comprehensive - finds all divergence
Comparison Table
| Aspect | Read Repair | Anti-Entropy |
|---|---|---|
| Trigger | On-read | Background/scheduled |
| Scope | Only read replicas | All replicas |
| Staleness window | Until next read of stale replica | Zero (when repair completes) |
| Resource cost | Low (per-read) | High (full tree comparison) |
| Failure detection | Detects on read | Detects all divergence |
| Example | Cassandra, DynamoDB | Cassandra nodetool repair, Riak |
When Each Mechanism Matters
| Scenario | Read Repair Sufficient? | Need Anti-Entropy? |
|---|---|---|
| Low read frequency | No - replicas may go stale for long periods | Yes |
| High read frequency | Yes - repairs happen frequently | No |
| Write-heavy workloads | No - stale replicas accumulate | Yes |
| Read-heavy workloads | Yes - continuous reads repair frequently | Optional |
| Compliance/audit requirements | No - need guaranteed convergence | Yes |
Practical note: Most production systems use both. Cassandra uses read-repair for immediate repairs during reads and nodetool repair (anti-entropy) as a periodic background job to ensure all replicas converge.
Capacity Estimation
Consistency Level Throughput
| Consistency Level | Relative Throughput | Latency Multiplier |
|---|---|---|
| ONE | 100% (baseline) | 1x |
| QUORUM | ~33% | 3x |
| ALL | ~10% | 10x |
| LOCAL_QUORUM | ~50% | 2x |
Staleness Window Calculation
Expected staleness = Replication lag during normal operation
Maximum staleness = Time to detect failure + Time to recover/re-route
For Cassandra with QUORUM:
- Normal lag: 10-50ms
- Failure detection: 5-30s (gossip protocol)
- Request routing: 1-5s (retry with higher consistency)
Worst case staleness: 30s+ during failure scenarios
Observability Checklist
Metrics to Capture
read_staleness_seconds(histogram) - Time between write and read reflecting itconsistency_level_used(counter) - Breakdown by consistency levelconflict_resolution_duration_ms(histogram) - Time spent resolving conflictsmonotonic_violations_total(counter) - When reads go backward in timereplication_lag_seconds(gauge) - Per replica, per datacenter
Testing Eventual Consistency
// Jepsen-style test for eventual consistency
async function testEventualConsistency(clients, writeKey, writeValue) {
// Write to all clients simultaneously
await Promise.all(clients.map((c) => c.write(writeKey, writeValue)));
// Poll until all clients return the value
const deadline = Date.now() + 30000; // 30 second timeout
while (Date.now() < deadline) {
const results = await Promise.all(clients.map((c) => c.read(writeKey)));
if (results.every((r) => r === writeValue)) {
return { success: true, timeMs: Date.now() - start };
}
await sleep(100);
}
return { success: false, results };
}
Alerts to Configure
| Alert | Threshold | Severity |
|---|---|---|
| Staleness > 1s | 1000ms | Warning |
| Staleness > 10s | 10000ms | Critical |
| Monotonic violations | > 0 in 5 minutes | Critical |
| Conflict rate > 1% | 1% of writes | Warning |
Security Checklist
- Authentication required for all replica communication
- TLS encryption for all inter-node replication traffic
- Audit logging of consistency level changes
- Network policies restricting replica access
- Encryption at rest for all replica data
- Client credentials rotated periodically
Common Pitfalls / Anti-Patterns
Pitfall 1: Assuming “Eventually Consistent” Means “Quickly Consistent”
Problem: Teams assume eventual consistency means consistent within seconds when it can take minutes or longer during high load or failures.
Solution: Measure your actual staleness distribution under various conditions. Do not make UX assumptions without data. Consider bounded staleness models if users need timeliness guarantees.
Pitfall 2: Ignoring Monotonic Read Violations
Problem: When reads route to different replicas, users can see data “go backward” - seeing a value, then an older value. This is deeply confusing even if technically “eventually consistent.”
Solution: Use sticky sessions or read-your-writes guarantees for user-facing applications. Route reads to the same replica within a session.
Pitfall 3: Using Vector Clocks Without Understanding Them
Problem: Implementing vector clocks for causal consistency seems simple, but the memory and bandwidth overhead grows with replica count and history length.
Solution: Most databases handle this internally (DynamoDB, Cassandra). Only implement custom vector clocks if you understand the trade-offs and have measured the overhead.
Pitfall 4: Testing Consistency at Wrong Layer
Problem: Testing consistency only at the unit test level misses distributed race conditions that only appear under concurrent load.
Solution: Use chaos testing frameworks (Jepsen, Chaos Monkey) to inject failures and verify consistency guarantees hold. Test your application, not just the database.
Real-World Case Studies
Amazon DynamoDB: Eventual Consistency at Scale
DynamoDB launched with eventual consistency as the default reading model. It was a deliberate engineering trade-off — strong consistency required quorum reads that doubled latency and tripled cost. The team knew most workloads could tolerate brief staleness.
The Dynamo paper (2007) described how they handled the Cassandra counter incident in 2010: a burst of traffic to a popular item caused replication lag, and thousands of reads returned stale data for several minutes. Their mitigation was straightforward — they rate-limited the popular key and let anti-entropy catch up. The incident revealed that bounded vs unbounded eventual consistency matters even at Amazon’s scale.
DynamoDB now offers both eventual and strong consistency options per read. The eventual model still handles the majority of requests — it’s faster and cheaper.
Google Spanner: Strong Consistency Across Geo-Distributed Nodes
Spanner achieves strong consistency across globally distributed nodes using TrueTime — GPS and atomic clock hardware that bounds clock uncertainty to ±7ms. Writes go through Paxos consensus, and reads quorum on the latest Paxos sequence number.
The operational cost is real: Spanner writes take 10-40ms in the same region, 100-200ms cross-region. This is why Spanner is expensive. The trade-off is deliberate — if you need globally consistent data with availability guarantees, you pay in latency and cost.
A known failure: Spanner’s leader lease mechanism means that if a datacenter loses power and its Paxos leader goes down, there is a lease expiration period where no writes can be accepted (typically 10 seconds). During this window the system is unavailable for writes, even though TrueTime could theoretically allow it.
Cassandra: Tunable Consistency Meets the 2018 Outage
Cassandra’s tunable consistency model lets you choose per-query — ONE, QUORUM, ALL. The problem: developers often chose ONE for speed, and did not understand what that meant for durability.
In October 2018, a large Cassandra deployment experienced data loss after a node failure during a configuration change. The root cause was a regression in how Cassandra’s failure detection and repair mechanisms interacted with the ONE consistency level for writes. With replication factor 3 and consistency level ONE, losing one node could silently lose writes that the other replicas had not yet acknowledged.
The fix was not purely technical — it required updating operational runbooks to mandate QUORUM for all writes in production, and adding tooling to enforce this automatically.
Interview Questions
Q: A user posts a message to a chat application and immediately refreshes to see it, but it is not there. What consistency model is violated, and how would you fix it?
A: This is a read-your-writes consistency violation. The write went to one replica but the subsequent read hit a different replica that had not yet received the write. Fixes include routing reads to the primary or last-written replica for the session, using sticky sessions, or issuing reads with a higher consistency level (QUORUM) after a write.
Q: You are designing a shopping cart for an e-commerce site. Which consistency model do you choose for cart operations, and why?
A: The cart needs strong consistency or at minimum read-your-writes. If a user adds an item and it disappears, they lose a sale. However, price and inventory checks can use eventual consistency — it is acceptable if the displayed price is slightly stale, as long as the checkout process re-verifies. Session consistency with QUORUM reads works well here.
Q: Explain the difference between read-your-writes and monotonic reads.
A: Read-your-writes guarantees that after writing value V, subsequent reads by the same session see V or a more recent value. Monotonic reads guarantee that if a session sees value V at time T, it will never see an older value at time T’>T. Read-your-writes is about your own writes; monotonic reads is about not going backward in time for any reason.
Q: A financial service processes payments. What consistency level would you use for the debit and credit operations?
A: Strong consistency (linearizable) — using QUORUM or ALL consistency levels, or distributed transactions with two-phase commit. The cost is latency and throughput, but for financial operations where incorrect balances cause real harm, this is non-negotiable. Many systems use a separate strongly-consistent ledger for the actual money movement and use eventual consistency only for display/read purposes.
Q: PACELC says that in the presence of a partition, you choose between consistency and availability. But even without partitions, consistency has a cost. Explain using PACELC.
A: PACELC states: if there is a partition (P), you choose between availability (A) and consistency (C); else (E), even without partitions, latency (L) and consistency (C) are in trade-off. Strong consistency requires coordination — writes must go through a consensus protocol or quorum, which adds round trips. In a geo-distributed setting, a strongly consistent write might cross continents. An eventually consistent write can commit locally and replicate asynchronously, cutting latency significantly. The trade-off is that you may read stale data while replication is in flight.
Quick Recap
- Consistency models exist on a spectrum from strong to weak.
- Read-your-writes matters for user-facing applications - always verify writes are visible.
- Monotonic reads prevent jarring rollback experiences in pagination and feeds.
- Causal consistency covers most real-world scenarios efficiently.
- Conflict resolution (LWW, CRDTs, app-merge) must match your data’s semantics.
- Most databases let you configure consistency per query - use this flexibility.
Copy/Paste Checklist
- [ ] Audit application to identify consistency requirements per operation
- [ ] Use strong consistency (QUORUM, LINEARIZABLE) for financial/inventory ops
- [ ] Implement read-your-writes for user-generated content
- [ ] Use sticky sessions to ensure monotonic reads
- [ ] Choose conflict resolution strategy: LWW, CRDT, or application merge
- [ ] Monitor replication lag and staleness metrics
- [ ] Test consistency guarantees under failure injection
- [ ] Document consistency requirements in ADR (Architecture Decision Records)
For latency trade-offs, see the PACELC theorem post. For building highly available systems with consistency guarantees, see Availability Patterns.
Category
Related Posts
Google Spanner: Globally Distributed SQL at Scale
Google Spanner architecture combining relational model with horizontal scalability, TrueTime API for global consistency, and F1 database implementation.
Gossip Protocol: Scalable State Propagation
Learn how gossip protocols enable scalable state sharing in distributed systems. Covers epidemic broadcast, anti-entropy, SWIM failure detection, and real-world applications like Cassandra and Consul.
Synchronous Replication: Guaranteeing Data Consistency Across Nodes
Explore synchronous replication patterns in distributed databases. Learn about the write-ahead log shipping, Quorum-based replication, and how synchronous replication ensures zero RPO in production systems.