Consistency Models in Distributed Systems: A Complete Guide

Learn about strong, weak, eventual, and causal consistency models. Understand read-your-writes, monotonic reads, and how to choose the right model for your system.

published: reading time: 27 min read

Consistency Models in Distributed Systems

Strong consistency sounds simple: reads always return the most recent write. In distributed systems, though, strong consistency is expensive. The CAP theorem forces us to think about consistency trade-offs, and the PACELC theorem reminds us that even without partitions, consistency has a latency cost.

Knowing where different consistency models apply helps you make better architectural decisions. Most systems do not need strong consistency everywhere. Finding where you can relax consistency guarantees can dramatically improve performance.

This post builds on the CAP theorem and PACELC theorem discussions.


The Consistency Spectrum

Think of consistency as a dial you can turn, not a binary switch:

graph LR
    A[Strong<br/>Consistency] --> B[Sequential<br/>Consistency]
    B --> C[Causal<br/>Consistency]
    C --> D[Eventual<br/>Consistency]
    D --> E[Weak<br/>Consistency]

Strong consistency at one end, weak at the other. Most practical systems sit somewhere in between.


Strong Consistency

Strong consistency guarantees that any read sees all preceding writes. The system appears to have a single copy of the data. This is what most developers assume when they think about database transactions.

Bank account balances are the standard example:

// Strong consistency: read after write always returns the new value
async function transfer(from, to, amount) {
  const account = await db.account.findOne({ id: from });
  if (account.balance >= amount) {
    await db.account.update(
      { id: from },
      { balance: account.balance - amount },
    );
    await db.account.update({ id: to }, { balance: account.balance + amount });
    return { success: true };
  }
  return { success: false, reason: "Insufficient funds" };
}

// After this completes, any read of 'from' balance shows the new value
const balance = await db.account.findOne({ id: from });
// balance is definitely reduced, even if you read from a different replica

Strong consistency uses consensus protocols like Paxos or Raft to coordinate reads and writes across nodes. This coordination adds latency. In a geo-distributed system, a strongly consistent write might take 100-300ms. An eventually consistent write might take 5ms.


Sequential Consistency

Sequential consistency sits between strong and eventual. It guarantees that all nodes see operations in the same order, but that order does not have to match real time.

Imagine two processes writing to the same variable:

// Sequential consistency: both replicas see writes in the same order
// but that order might not match real-world time

// Process A writes x = 1
// Process B writes x = 2
// Process C reads x

// Valid under sequential consistency: C might read 1 or 2
// Invalid: C reading x = 0 (the initial value) after both writes completed

With sequential consistency, you cannot have time travel paradoxes where a later write appears to happen before an earlier write. But you also cannot assume that a read reflects all writes that finished before it in real time.


Causal Consistency

Causal consistency is weaker. It only guarantees that causally related operations appear in order across all nodes. Operations with no causal relationship can be observed in different orders by different nodes.

Causally related means: if operation A causes operation B to happen, then A must be visible before B everywhere. Example:

// Causal consistency example
// Post 1: "I got promoted" (write by user)
// Post 2: "Celebrating tonight" (comment on post 1)

// Every node sees post 2 only after seeing post 1
// Because post 2 is causally dependent on post 1

// But "I got promoted" and "Weather is nice today" from the same user
// Are not causally related - they can appear in any order

Causal consistency is attractive because it is the strongest model that can be implemented without coordination in certain systems. The Dynamo paper{: rel=“noopener noreferrer”} popularized eventual consistency with causal metadata.


Eventual Consistency

Eventual consistency makes the weakest guarantee: if no new updates are made to an object, eventually all reads will return the last written value.

The word “eventually” is doing a lot of work here. It could mean milliseconds or hours. Different systems give different guarantees.

Bounded vs Unbounded Eventual Consistency

An important distinction often overlooked: bounded vs unbounded eventual consistency.

TypeDefinitionReal-World Examples
BoundedConvergence happens within a known, finite timeDynamoDB (typically seconds), Cassandra (usually < 1 second normally)
UnboundedConvergence will happen, but no time bound is guaranteedDNS propagation (minutes to hours), CDN cache expiry (hours to days)

Why this matters:

// Bounded eventual consistency - you can reason about staleness
// DynamoDB withDynamoDB Streams: staleness typically < 5 seconds
async function readWithStalenessBound(key) {
  const item = await dynamodb.getItem({ Key: key }).promise();
  // You know the item is at most 5 seconds stale (bounded)
  return item;
}

// Unbounded eventual consistency - no promises about when
// Traditional DNS: TTL might be 24 hours, but convergence is not guaranteed
async function resolveWithDNS(hostname) {
  const addresses = await dns.lookup(hostname);
  // Could return old IP for hours; no bounded guarantee
  return addresses;
}

Convergence time factors:

FactorImpact on Convergence Time
Network distanceGeo-distributed replicas add 50-200ms per hop
Load averageHigh load increases replication lag
Failure recoveryNode recovery triggers anti-entropy, can take minutes
Quorum availabilityIf quorum is impaired, writes may stall

For bounded eventual consistency, typical convergence times:

  • Same-region replicas: 10-100ms
  • Cross-region replicas: 100ms - 2s
  • Under failure conditions: can extend to 30s or more

For unbounded eventual consistency:

  • DNS: minutes to 48+ hours (depending on TTL and propagation)
  • CDN: hours to days
  • Some gossip protocols: theoretically unbounded, practically O(log N)

Session Consistency and PRAM

Beyond the basic models, there are practical consistency models designed for specific use cases.

Session Consistency

Session consistency guarantees that within a single client session, certain properties hold. This is not a single model but a family of guarantees:

Session GuaranteeDefinitionImplementation
Read-your-writesSession sees its own writesRoute reads to primary or last-write replica
Monotonic readsSession never sees older valuesSticky sessions, read your writes routing
Monotonic writesSession’s writes are orderedPer-client write sequencing
Write-follows-readsWrite is ordered after reads that preceded itCausal ordering tracking

DynamoDB session consistency example:

// DynamoDB: Session-level consistency guarantees
// Using a client's session token to ensure read-your-writes

const dynamodb = new AWS.DynamoDB({ region: "us-east-1" });
const docClient = new AWS.DynamoDB.DocumentClient({
  dynamodb,
  sessionToken: "user-session-id", // Enables session guarantees
});

// With ConsistentRead=true, guarantees read-your-writes within session
const item = await docClient
  .get({
    TableName: "Posts",
    Key: { userId, postId },
    ConsistentRead: true, // Strong within this session
  })
  .promise();

PRAM (Pipelined Random Access Memory) Consistency

PRAM consistency is a weak model that guarantees:

Writes from a single process are observed by all processes in the order they were issued. Writes from different processes may be observed in different orders.

In simpler terms: your own writes appear in order to everyone; other people’s writes may appear in different orders to different observers.

// PRAM consistency example
// Process A writes x=1, then x=2
// Process B writes y=1, then y=2

// All processes see:
// - A's writes in order: x=1 then x=2
// - B's writes in order: y=1 then y=2
// But some might see B's writes interleaved with A's differently

// Valid PRAM ordering: x=1, x=2, y=1, y=2
// Also valid: y=1, y=2, x=1, x=2
// Also valid: x=1, y=1, x=2, y=2
// NOT valid: x=2, x=1 (A's writes out of order)

PRAM vs Causal consistency:

  • PRAM: Only your own write ordering is guaranteed
  • Causal: Write ordering that has causal dependency is guaranteed

PRAM is relatively easy to implement efficiently and provides useful guarantees for some workloads.


// Eventual consistency: write returns immediately
async function writeEventual(key, value) {
  // Write to local replica, return immediately
  await localCache.set(key, value);
  return { success: true, version: ++localVersion };
}

// Later, background replication propagates the write
// During the propagation window, reads might return stale values
async function read(key) {
  return await localCache.get(key); // Could be stale during propagation
}

Amazon DynamoDB uses eventual consistency by default. Google Spanner uses strong consistency. Most databases let you choose per-query.


Read Your Own Writes

Read-your-writes consistency (also called read-after-write consistency) guarantees that after a client writes a value, subsequent reads by that same client see that write.

This is intuitive for single-node systems but tricky in distributed systems:

// Without read-your-writes: user might not see their own post
await api.createPost({ title: "Hello World" });
const posts = await api.getMyPosts();
// posts might not include "Hello World" if the read hits a replica
// that has not yet received the write

// With read-your-writes: the read routes to the right place
await api.createPost({ title: "Hello World" });
const posts = await api.getMyPosts({ consistentRead: true });
// posts definitely includes "Hello World"

Most social media applications need read-your-writes for user-generated content. Users get confused when they post something and then do not see it.


Monotonic Reads

Monotonic reads guarantee that if a client reads a value V at time T, it will never read an older value at a later time. Reads never go backward in time.

This prevents a jarring experience where data appears to roll back:

// Without monotonic reads: timeline can go backward
await api.createComment({ text: "Great post!" });
const comments1 = await api.getComments(postId); // Includes "Great post!"
// ... some time passes, replication lag increases ...
const comments2 = await api.getComments(postId); // "Great post!" disappeared!

// With monotonic reads: once you see a value, you never unsee it
const comments = await api.getComments(postId, { monotonic: true });
// Once "Great post!" appears, it stays visible for this client session

Monotonic reads are important for any application where users see a chronological list of items. Without this guarantee, pagination can show duplicate items or skip items that were added while you were reading.


Monotonic Writes

Monotonic writes guarantee that writes from a client appear in the order they were issued. This sounds obvious, but in distributed systems, a client might issue two writes that arrive at different replicas in different orders.

// Without monotonic writes: writes can be applied out of order
await api.updateProfile({ name: "Alice" });
await api.updateProfile({ status: "Busy" });
// The replica receiving updates might apply status before name
// resulting in name being set but status showing previous value

// With monotonic writes: writes are serialized per client
await api.updateProfile({ name: "Alice" }, { serial: true });
await api.updateProfile({ status: "Busy" }, { serial: true });
// Updates are applied in order, regardless of network timing

This is less commonly discussed but matters for operations that build on each other.


Choosing a Consistency Model

Different parts of your application may need different consistency levels:

Use CaseRecommended ModelReason
User profile updatesRead-your-writesUsers expect to see their own changes
Social media feedsEventual + Monotonic readsSpeed matters, rollback is jarring
Financial transactionsStrong consistencyCannot tolerate stale reads
Inventory countsStrong consistencyOverselling costs money
IoT sensor dataWeak/EventualHistorical accuracy vs real-time speed
Chat messagesCausal + Read-your-writesMessages must appear in order

The Availability Patterns post covers building systems that maintain availability while using strong consistency where needed.


Implementing Consistency Levels

Most modern databases let you choose consistency per query:

// MongoDB: read concern levels
db.collection.find({}, { readConcern: "linearizable" }); // Strong
db.collection.find({}, { readConcern: "majority" }); // Quorum
db.collection.find({}, { readConcern: "local" }); // Local/eventual

// Cassandra: consistency levels
const result = await client.execute(query, params, { consistency: "QUORUM" }); // Strong
const result = await client.execute(query, params, { consistency: "ONE" }); // Weaker

// DynamoDB: read modes
dynamodb.getItem({ Key: key, ConsistentRead: true }); // Strong
dynamodb.getItem({ Key: key, ConsistentRead: false }); // Eventual

Understanding these trade-offs lets you tune your database for your specific workload.


Conclusion

Consistency models exist on a spectrum. Strong consistency is simple to reason about but expensive. Weaker models are cheaper and more available but require more thought from application developers.


When to Use / When Not to Use

Use CaseRecommended ModelReason
User profile updatesRead-your-writesUsers expect to see their own changes
Social media feedsEventual + Monotonic readsSpeed matters, rollback is jarring
Financial transactionsStrong consistencyCannot tolerate stale reads
Inventory countsStrong consistencyOverselling costs money
IoT sensor dataWeak/EventualHistorical accuracy vs real-time speed
Chat messagesCausal + Read-your-writesMessages must appear in order
Gaming leaderboardsEventualStaleness acceptable, latency critical
Collaborative editingCausalOrder matters, some staleness OK

When TO Use Weaker Consistency

  • User experience acceptable: When users will not notice or care about brief staleness
  • High-volume reads: When throughput matters more than absolute freshness
  • Geo-distributed reads: When users are far from primary and latency matters
  • Non-critical data: Analytics, metrics, cached content where eventual is fine

When NOT to Use Weaker Consistency

  • Financial operations: Money transfers, payment processing, balance inquiries
  • Inventory operations: Any case where overselling has business impact
  • Booking/reservations: Double-booking must be prevented
  • Regulatory compliance: Operations requiring audit-trail ordering

Production Failure Scenarios

Failure ScenarioImpactMitigation
Replica returns stale data after writeUser sees old value after confirming writeImplement read-your-writes by routing reads to quorum or primary
Monotonic read violationData appears to roll back, breaking user experienceUse sticky sessions or read-your-writes guarantees
Causal ordering violationComment appears before post it referencesImplement vector clocks for causal tracking
Conflict resolution produces wrong resultLWW picks wrong version in concurrent editsUse CRDTs for automatically mergeable data
Replication lag spikesExtended staleness window during high loadMonitor lag, alert on thresholds, scale replicas

Conflict Resolution Strategies

When multiple replicas accept concurrent writes, conflicts must be resolved:

StrategyHow It WorksTrade-offs
Last-Write-Wins (LWW)Highest timestamp winsSimple but can lose writes; depends on clock sync
First-Write-WinsFirst write winsConservative but wasteful
CRDTsData structures that merge automaticallyNo conflicts by design; limited to certain types
Application MergeApp code decides how to combineFlexible but requires custom logic
Manual ResolutionUser decidesBest for critical conflicts; bad UX otherwise

CRDT Example: G-Counter

// Grow-only counter - merges by taking max of each replica's value
class GCounter {
  constructor(replicaId) {
    this.replicaId = replicaId;
    this.counts = {}; // { replicaId: value }
  }

  increment() {
    this.counts[this.replicaId] = (this.counts[this.replicaId] || 0) + 1;
  }

  merge(other) {
    for (const [id, value] of Object.entries(other.counts)) {
      this.counts[id] = Math.max(this.counts[id] || 0, value);
    }
  }

  value() {
    return Object.values(this.counts).reduce((a, b) => a + b, 0);
  }
}

CRDT Example: LWW-Register (Last-Write-Wins)

LWW-Register stores a value and a timestamp. On merge, the write with the higher timestamp wins:

// Last-Write-Wins Register - simplest conflict-free register
class LWWRegister {
  constructor(replicaId) {
    this.replicaId = replicaId;
    this.value = null;
    this.timestamp = 0;
  }

  set(value) {
    // Use hybrid logical clock or wall clock with care
    this.timestamp = Date.now();
    this.value = value;
  }

  merge(other) {
    if (other.timestamp > this.timestamp) {
      this.value = other.value;
      this.timestamp = other.timestamp;
    }
    // If our timestamp is newer or equal, keep our value
  }

  get() {
    return this.value;
  }
}

// Usage: inventory items, user preferences, configuration
const inventory = new LWWRegister("replica-1");
inventory.set({ item: "widget", count: 10 });
// Concurrent write from another replica with timestamp 1001
inventory.merge({ value: { item: "widget", count: 15 }, timestamp: 1001 });
console.log(inventory.get()); // { item: 'widget', count: 15 }

CRDT Example: OR-Set (Observed-Remove Set)

OR-Set allows adding and removing elements where add-wins-over-remove for concurrent operations:

// Observed-Remove Set - add wins over remove for concurrent ops
class ORSet {
  constructor(replicaId) {
    this.replicaId = replicaId;
    this.items = new Map(); // element -> { addedBy: replicaId, addTag: number }
    this.tombstones = new Map(); // element -> Set of removed tags
  }

  add(element) {
    const tag = Date.now() + Math.random();
    this.items.set(element, {
      addedBy: this.replicaId,
      addTag: tag,
      removed: false,
    });
  }

  remove(element) {
    // Mark as removed but don't delete - we need to track for merging
    const entry = this.items.get(element);
    if (entry) {
      entry.removed = true;
    }
  }

  merge(other) {
    for (const [element, entry] of other.items) {
      const ourEntry = this.items.get(element);

      if (!ourEntry) {
        // Element doesn't exist locally, add it
        this.items.set(element, { ...entry });
      } else if (entry.addTag > ourEntry.addTag) {
        // Other replica added this element more recently
        this.items.set(element, { ...entry });
      } else if (entry.removed && !ourEntry.removed) {
        // Other replica removed it, but we haven't - add wins
        // Keep the element
        ourEntry.removed = false;
      }
      // If both removed, stays removed (no-op)
    }
  }

  contains(element) {
    const entry = this.items.get(element);
    return entry && !entry.removed;
  }
}

// Usage: shopping cart items, collaborative document elements
const cart = new ORSet("user-1");
cart.add("item-A");
cart.add("item-B");
cart.remove("item-B");
// Concurrent removal of item-B and addition of item-C from another device
cart.merge({
  items: new Map([
    ["item-B", { addedBy: "user-1", addTag: 100, removed: true }],
    ["item-C", { addedBy: "user-1", addTag: 101, removed: false }],
  ]),
});
console.log(cart.contains("item-A")); // true
console.log(cart.contains("item-B")); // true (add wins)
console.log(cart.contains("item-C")); // true

CRDT Comparison Table

CRDT TypeOperationsMerge StrategyUse Cases
G-CounterIncrement onlyTake max per replicaVote counts, page views
PN-CounterIncrement + DecrementAdd positives, subtract negativesAccount balances
LWW-RegisterSet valueHigher timestamp winsConfiguration, preferences
OR-SetAdd + RemoveAdd wins over concurrent removeShopping carts, collaborative editing
RGAAdd + Remove (ordered)Conflation of concurrent removesChat messages, collaborative text

Read Repair vs Anti-Entropy Mechanisms

Distributed databases use two complementary mechanisms to maintain consistency: read repair and anti-entropy. Understanding the difference helps you diagnose consistency issues in production.

Read Repair (Reactive)

Read repair happens during read operations. When a replica returns stale data during a read, the system proactively pushes the fresh data to that replica.

// Conceptual read repair flow
async function readWithRepair(key) {
  // Read from multiple replicas
  const [replica1, replica2, replica3] = await Promise.all([
    readReplica("replica1", key),
    readReplica("replica2", key),
    readReplica("replica3", key),
  ]);

  // Check for divergence
  const versions = [replica1, replica2, replica3];
  const latest = findMostRecent(versions);

  // If any replica is stale, repair it
  for (const replica of versions) {
    if (replica.version < latest.version) {
      // Asynchronously push the correct value
      repairReplica(replica, latest);
    }
  }

  return latest.value;
}

Characteristics:

  • Triggered by: Read operations
  • Scope: Only repairs replicas that were contacted during the read
  • Latency impact: Minimal (async repair)
  • Coverage: Limited - replicas not read may remain stale

Anti-Entropy (Proactive)

Anti-entropy is a background process that continuously compares replicas and repairs any divergence, regardless of whether reads have occurred.

// Conceptual anti-entropy using Merkle trees
class AntiEntropyService {
  constructor(replicas) {
    this.replicas = replicas;
    this.merkleTrees = new Map();
  }

  // Build Merkle tree for a replica's data range
  buildMerkleTree(replica, range) {
    const tree = new MerkleTree();
    for (const [key, value] of this.getRange(replica, range)) {
      tree.insert(hash(key + value.version));
    }
    return tree;
  }

  // Compare trees to find divergent keys
  async compareReplicas(replicaA, replicaB) {
    const treeA = this.buildMerkleTree(replicaA, "all");
    const treeB = this.buildMerkleTree(replicaB, "all");

    // Find nodes that differ
    const diff = treeA.diff(treeB);

    // For each divergent node, traverse down to find actual keys
    const divergentKeys = [];
    for (const node of diff) {
      if (node.isLeaf()) {
        divergentKeys.push(node.key);
      } else {
        // Fetch children for deeper comparison
        const children = await this.fetchChildren(replicaA, node);
        divergentKeys.push(...children);
      }
    }

    // Sync divergent keys
    await this.syncKeys(replicaA, replicaB, divergentKeys);
  }
}

Characteristics:

  • Triggered by: Background scheduled process (e.g., Cassandra nodetool repair)
  • Scope: All replicas, regardless of recent reads
  • Latency impact: Can be significant during repair operations
  • Coverage: Comprehensive - finds all divergence

Comparison Table

AspectRead RepairAnti-Entropy
TriggerOn-readBackground/scheduled
ScopeOnly read replicasAll replicas
Staleness windowUntil next read of stale replicaZero (when repair completes)
Resource costLow (per-read)High (full tree comparison)
Failure detectionDetects on readDetects all divergence
ExampleCassandra, DynamoDBCassandra nodetool repair, Riak

When Each Mechanism Matters

ScenarioRead Repair Sufficient?Need Anti-Entropy?
Low read frequencyNo - replicas may go stale for long periodsYes
High read frequencyYes - repairs happen frequentlyNo
Write-heavy workloadsNo - stale replicas accumulateYes
Read-heavy workloadsYes - continuous reads repair frequentlyOptional
Compliance/audit requirementsNo - need guaranteed convergenceYes

Practical note: Most production systems use both. Cassandra uses read-repair for immediate repairs during reads and nodetool repair (anti-entropy) as a periodic background job to ensure all replicas converge.


Capacity Estimation

Consistency Level Throughput

Consistency LevelRelative ThroughputLatency Multiplier
ONE100% (baseline)1x
QUORUM~33%3x
ALL~10%10x
LOCAL_QUORUM~50%2x

Staleness Window Calculation

Expected staleness = Replication lag during normal operation
Maximum staleness = Time to detect failure + Time to recover/re-route

For Cassandra with QUORUM:
- Normal lag: 10-50ms
- Failure detection: 5-30s (gossip protocol)
- Request routing: 1-5s (retry with higher consistency)

Worst case staleness: 30s+ during failure scenarios

Observability Checklist

Metrics to Capture

  • read_staleness_seconds (histogram) - Time between write and read reflecting it
  • consistency_level_used (counter) - Breakdown by consistency level
  • conflict_resolution_duration_ms (histogram) - Time spent resolving conflicts
  • monotonic_violations_total (counter) - When reads go backward in time
  • replication_lag_seconds (gauge) - Per replica, per datacenter

Testing Eventual Consistency

// Jepsen-style test for eventual consistency
async function testEventualConsistency(clients, writeKey, writeValue) {
  // Write to all clients simultaneously
  await Promise.all(clients.map((c) => c.write(writeKey, writeValue)));

  // Poll until all clients return the value
  const deadline = Date.now() + 30000; // 30 second timeout
  while (Date.now() < deadline) {
    const results = await Promise.all(clients.map((c) => c.read(writeKey)));
    if (results.every((r) => r === writeValue)) {
      return { success: true, timeMs: Date.now() - start };
    }
    await sleep(100);
  }
  return { success: false, results };
}

Alerts to Configure

AlertThresholdSeverity
Staleness > 1s1000msWarning
Staleness > 10s10000msCritical
Monotonic violations> 0 in 5 minutesCritical
Conflict rate > 1%1% of writesWarning

Security Checklist

  • Authentication required for all replica communication
  • TLS encryption for all inter-node replication traffic
  • Audit logging of consistency level changes
  • Network policies restricting replica access
  • Encryption at rest for all replica data
  • Client credentials rotated periodically

Common Pitfalls / Anti-Patterns

Pitfall 1: Assuming “Eventually Consistent” Means “Quickly Consistent”

Problem: Teams assume eventual consistency means consistent within seconds when it can take minutes or longer during high load or failures.

Solution: Measure your actual staleness distribution under various conditions. Do not make UX assumptions without data. Consider bounded staleness models if users need timeliness guarantees.

Pitfall 2: Ignoring Monotonic Read Violations

Problem: When reads route to different replicas, users can see data “go backward” - seeing a value, then an older value. This is deeply confusing even if technically “eventually consistent.”

Solution: Use sticky sessions or read-your-writes guarantees for user-facing applications. Route reads to the same replica within a session.

Pitfall 3: Using Vector Clocks Without Understanding Them

Problem: Implementing vector clocks for causal consistency seems simple, but the memory and bandwidth overhead grows with replica count and history length.

Solution: Most databases handle this internally (DynamoDB, Cassandra). Only implement custom vector clocks if you understand the trade-offs and have measured the overhead.

Pitfall 4: Testing Consistency at Wrong Layer

Problem: Testing consistency only at the unit test level misses distributed race conditions that only appear under concurrent load.

Solution: Use chaos testing frameworks (Jepsen, Chaos Monkey) to inject failures and verify consistency guarantees hold. Test your application, not just the database.


Real-World Case Studies

Amazon DynamoDB: Eventual Consistency at Scale

DynamoDB launched with eventual consistency as the default reading model. It was a deliberate engineering trade-off — strong consistency required quorum reads that doubled latency and tripled cost. The team knew most workloads could tolerate brief staleness.

The Dynamo paper (2007) described how they handled the Cassandra counter incident in 2010: a burst of traffic to a popular item caused replication lag, and thousands of reads returned stale data for several minutes. Their mitigation was straightforward — they rate-limited the popular key and let anti-entropy catch up. The incident revealed that bounded vs unbounded eventual consistency matters even at Amazon’s scale.

DynamoDB now offers both eventual and strong consistency options per read. The eventual model still handles the majority of requests — it’s faster and cheaper.

Google Spanner: Strong Consistency Across Geo-Distributed Nodes

Spanner achieves strong consistency across globally distributed nodes using TrueTime — GPS and atomic clock hardware that bounds clock uncertainty to ±7ms. Writes go through Paxos consensus, and reads quorum on the latest Paxos sequence number.

The operational cost is real: Spanner writes take 10-40ms in the same region, 100-200ms cross-region. This is why Spanner is expensive. The trade-off is deliberate — if you need globally consistent data with availability guarantees, you pay in latency and cost.

A known failure: Spanner’s leader lease mechanism means that if a datacenter loses power and its Paxos leader goes down, there is a lease expiration period where no writes can be accepted (typically 10 seconds). During this window the system is unavailable for writes, even though TrueTime could theoretically allow it.

Cassandra: Tunable Consistency Meets the 2018 Outage

Cassandra’s tunable consistency model lets you choose per-query — ONE, QUORUM, ALL. The problem: developers often chose ONE for speed, and did not understand what that meant for durability.

In October 2018, a large Cassandra deployment experienced data loss after a node failure during a configuration change. The root cause was a regression in how Cassandra’s failure detection and repair mechanisms interacted with the ONE consistency level for writes. With replication factor 3 and consistency level ONE, losing one node could silently lose writes that the other replicas had not yet acknowledged.

The fix was not purely technical — it required updating operational runbooks to mandate QUORUM for all writes in production, and adding tooling to enforce this automatically.

Interview Questions

Q: A user posts a message to a chat application and immediately refreshes to see it, but it is not there. What consistency model is violated, and how would you fix it?

A: This is a read-your-writes consistency violation. The write went to one replica but the subsequent read hit a different replica that had not yet received the write. Fixes include routing reads to the primary or last-written replica for the session, using sticky sessions, or issuing reads with a higher consistency level (QUORUM) after a write.

Q: You are designing a shopping cart for an e-commerce site. Which consistency model do you choose for cart operations, and why?

A: The cart needs strong consistency or at minimum read-your-writes. If a user adds an item and it disappears, they lose a sale. However, price and inventory checks can use eventual consistency — it is acceptable if the displayed price is slightly stale, as long as the checkout process re-verifies. Session consistency with QUORUM reads works well here.

Q: Explain the difference between read-your-writes and monotonic reads.

A: Read-your-writes guarantees that after writing value V, subsequent reads by the same session see V or a more recent value. Monotonic reads guarantee that if a session sees value V at time T, it will never see an older value at time T’>T. Read-your-writes is about your own writes; monotonic reads is about not going backward in time for any reason.

Q: A financial service processes payments. What consistency level would you use for the debit and credit operations?

A: Strong consistency (linearizable) — using QUORUM or ALL consistency levels, or distributed transactions with two-phase commit. The cost is latency and throughput, but for financial operations where incorrect balances cause real harm, this is non-negotiable. Many systems use a separate strongly-consistent ledger for the actual money movement and use eventual consistency only for display/read purposes.

Q: PACELC says that in the presence of a partition, you choose between consistency and availability. But even without partitions, consistency has a cost. Explain using PACELC.

A: PACELC states: if there is a partition (P), you choose between availability (A) and consistency (C); else (E), even without partitions, latency (L) and consistency (C) are in trade-off. Strong consistency requires coordination — writes must go through a consensus protocol or quorum, which adds round trips. In a geo-distributed setting, a strongly consistent write might cross continents. An eventually consistent write can commit locally and replicate asynchronously, cutting latency significantly. The trade-off is that you may read stale data while replication is in flight.

Quick Recap

  • Consistency models exist on a spectrum from strong to weak.
  • Read-your-writes matters for user-facing applications - always verify writes are visible.
  • Monotonic reads prevent jarring rollback experiences in pagination and feeds.
  • Causal consistency covers most real-world scenarios efficiently.
  • Conflict resolution (LWW, CRDTs, app-merge) must match your data’s semantics.
  • Most databases let you configure consistency per query - use this flexibility.

Copy/Paste Checklist

- [ ] Audit application to identify consistency requirements per operation
- [ ] Use strong consistency (QUORUM, LINEARIZABLE) for financial/inventory ops
- [ ] Implement read-your-writes for user-generated content
- [ ] Use sticky sessions to ensure monotonic reads
- [ ] Choose conflict resolution strategy: LWW, CRDT, or application merge
- [ ] Monitor replication lag and staleness metrics
- [ ] Test consistency guarantees under failure injection
- [ ] Document consistency requirements in ADR (Architecture Decision Records)

For latency trade-offs, see the PACELC theorem post. For building highly available systems with consistency guarantees, see Availability Patterns.

Category

Related Posts

Google Spanner: Globally Distributed SQL at Scale

Google Spanner architecture combining relational model with horizontal scalability, TrueTime API for global consistency, and F1 database implementation.

#distributed-systems #databases #google

Gossip Protocol: Scalable State Propagation

Learn how gossip protocols enable scalable state sharing in distributed systems. Covers epidemic broadcast, anti-entropy, SWIM failure detection, and real-world applications like Cassandra and Consul.

#distributed-systems #gossip-protocol #consistency

Synchronous Replication: Guaranteeing Data Consistency Across Nodes

Explore synchronous replication patterns in distributed databases. Learn about the write-ahead log shipping, Quorum-based replication, and how synchronous replication ensures zero RPO in production systems.

#distributed-systems #replication #consistency