CAP Theorem: Consistency vs Availability Trade-offs

Learn the fundamental trade-off between Consistency, Availability, and Partition tolerance in distributed systems, with practical examples and database comparisons.

published: reading time: 27 min read

Understanding the CAP Theorem

The CAP theorem is one of the most important concepts in distributed systems design. It states that a distributed system can only guarantee two out of three properties: Consistency, Availability, and Partition tolerance. Understanding this trade-off is essential for building scalable, reliable distributed applications.


What is CAP Theorem?

The CAP theorem, also known as Brewer’s theorem (named after computer scientist Eric Brewer), describes a fundamental limitation in distributed computer systems. Formally stated:

A distributed system can only provide two of three guarantees: Consistency, Availability, and Partition tolerance.

This is not a matter of choice, it is a mathematical certainty proven by researchers at UC Berkeley in 2002. When a network partition occurs (and it will in any real-world distributed system), you must choose between consistency and availability.


The Three Properties

Consistency (C)

Every read receives the most recent write or an error.

In a consistent system, all nodes see the same data at the same time. When you write data to one node, that data must be replicated to all other nodes before any subsequent read can be served. This ensures that the system always appears to have a single, up-to-date copy of the data.

// Example: Consistent read
// After writing x = 5 to node A, any subsequent read from any node must return 5
await write("x", 5); // Write to node A
const result = await read("x"); // Must return 5 from any node

Availability (A)

Every request receives a non-error response, without guarantee that it contains the most recent write.

An available system will respond to every request, even if it cannot guarantee the most recent data. If a node is down or partitioned, the system will still respond using stale data from available nodes.

// Example: Available read
// Even if some nodes are down, the system returns a response
try {
  const result = await read("x"); // Returns cached/stale data if needed
  return result;
} catch (error) {
  // Must NOT happen in an available system
}

Partition Tolerance (P)

The system continues to operate despite network partitions between nodes.

Partitions are inevitable in distributed systems—network failures, latency spikes, or hardware issues can cause nodes to become disconnected. A partition-tolerant system continues to function during these failures.


Understanding Partitions

A network partition occurs when communication between nodes fails. This can happen due to:

  • Network hardware failure
  • Network congestion or latency
  • Data center outages
  • Geographic distance between nodes

The key insight of CAP theorem: Partitions will happen. You cannot avoid them in distributed systems. Therefore, the real choice is between Consistency and Availability when a partition occurs.

graph TD
    A[Client] -->|Request| B[Load Balancer]
    B -->|Route| C[Node 1]
    B -->|Route| D[Node 2]
    C -.->|Partition| D

The Trade-off

When a partition occurs, you face a choice:

ChoiceWhat HappensTrade-off
CP (Consistency + Partition Tolerance)System returns error or timeouts during partitionLoses availability
AP (Availability + Partition Tolerance)System returns stale data during partitionLoses consistency
CA (Consistency + Availability)Only works when there are no partitionsNot partition-tolerant

Note: In practice, you cannot build a truly CA system, partitions are inevitable. So the real choices are CP or AP.

When to Choose CP

Choose Consistency when:

  • Financial transactions require accurate data
  • Inventory systems must prevent overselling
  • Locking mechanisms require accurate state

Examples: MongoDB (in certain configurations), Apache ZooKeeper, etcd

When to Choose AP

Choose Availability when:

  • Social media feeds should always load
  • Analytics dashboards with slightly stale data are acceptable
  • User experience is more important than exact precision

Examples: Cassandra, Amazon DynamoDB, CouchDB


CAP in Practice

Modern databases often let you configure your preference. Here’s how some popular systems handle CAP:

DatabaseDefault ModeDescription
CassandraAPPrioritizes availability, eventual consistency
MongoDBCPStrong consistency by default, tunable
DynamoDBAPHighly available, eventually consistent by default
PostgreSQLCA (single node)Not distributed by default
RedisCPStrong consistency with replication

Real-world Example: E-commerce Inventory

Consider an e-commerce platform managing product inventory:

// CP Approach: Prevent overselling
async function reserveItem(productId, quantity) {
  await lock(productId);
  const currentStock = await getStock(productId);
  if (currentStock >= quantity) {
    await updateStock(productId, currentStock - quantity);
    await unlock(productId);
    return { success: true };
  }
  await unlock(productId);
  return { success: false, reason: "Out of stock" };
  // Returns error if partition causes lock issues
}

// AP Approach: Accept some overselling
async function reserveItem(productId, quantity) {
  const result = await reserveAsync(productId, quantity);
  return { success: true, message: "Reserved" };
  // May oversell during partitions, compensated later
}

Beyond CAP: PACELC

CAP has limitations. The PACELC theorem extends it:

Partition + Availability or Consistency → Error or Latency → Consistency

This highlights a second trade-off: even without partitions, you choose between latency and consistency.

graph LR
    A[System State] --> B{Partition?}
    B -->|Yes| C{CP or AP?}
    C --> D[Consistency]
    C --> E[Availability]
    B -->|No| F{Latency?}
    F --> G[Strong Consistency]
    F --> H[Eventual Consistency]

See Also

  • System Design Roadmap — A comprehensive learning path covering CAP theorem, distributed systems, and the patterns discussed here

Conclusion

The CAP theorem provides a foundational framework for thinking about distributed systems trade-offs. Key takeaways:

  1. Partitions are inevitable — design for network failures
  2. Choose based on requirements — CP for correctness, AP for availability
  3. Consider PACELC — latency matters even without partitions
  4. Modern systems are configurable — you can often adjust the trade-off

Understanding CAP helps you make informed architectural decisions and choose the right tools for your specific use case.


When to Use / When Not to Use

ScenarioRecommendation
Financial transactions, inventoryChoose CP (consistency critical)
Social media feeds, analyticsChoose AP (availability/staleness OK)
Globally distributed read-heavy systemsChoose AP
Systems requiring linearizabilityChoose CP
Single-node databasesCA (no partition tolerance needed locally)

When TO Use CP Systems

  • Financial systems: Banking, payments, stock trading where incorrect data causes monetary loss
  • Inventory management: E-commerce, reservations where overselling has direct business impact
  • Distributed coordination: Service discovery, locking, leader election where consistency is critical
  • Regulatory compliance: Systems requiring strict ordering and audit trails

When TO Use AP Systems

  • User-facing applications: Social feeds, content platforms where availability trumps momentary staleness
  • IoT and telemetry: High-volume ingestion where eventual consistency is acceptable
  • Caching layers:CDN, session stores where temporary inconsistency is tolerable
  • Collaborative applications: Multiple simultaneous editors where availability matters more than strict ordering

Decision Tree: Choosing CP vs AP

Use this flowchart to determine which consistency model fits your use case:

flowchart TD
    A[Start: Designing a distributed system] --> B{What happens if a network partition occurs?}

    B --> C{Can users tolerate stale data?}
    C -->|Yes| D{Availability critical?}
    C -->|No| E[Choose CP Systems]

    D -->|Yes| F[Choose AP Systems]
    D -->|No| G{R强 consistency needed?}
    G -->|Yes| E
    G -->|No| F

    E --> H[CP Examples]
    F --> I[AP Examples]

    H --> J[ZooKeeper, etcd, MongoDB, Spanner]
    I --> K[Cassandra, DynamoDB, CouchDB, S3]

Quick Decision Questions

Answer these questions to guide your choice:

QuestionIf YesIf No
Will data inconsistency cause financial loss?CPAP
Do you need linearizability?CPAP
Can users see stale data temporarily?APCP
Is availability more important than consistency?APCP
Are you building a coordination service?CPAP
Are you building a read-heavy cache?APCP

Implementation Complexity Comparison

AspectCP SystemsAP Systems
Conflict ResolutionSimple (single source of truth)Complex (must handle divergent writes)
Write LatencyHigher (synchronous replication)Lower (async replication possible)
Read LatencyLower (strongly consistent)Variable (can serve stale reads fast)
Failure HandlingFails fast on partitionServes stale data, reconciles later
Operational ComplexityLower (deterministic behavior)Higher (need conflict resolution)
Network DependencyCritical (partition = unavailability)Tolerant (continues with stale data)
Testing RequirementsPartition injection testingConflict resolution testing

Cost and Complexity Comparison

While implementation complexity covers operational characteristics, the real cost difference between CP and AP systems extends to infrastructure, personnel, and business outcomes:

Cost DimensionCP SystemsAP Systems
Infrastructure CostHigher (need synchronous replication, may need more replicas for availability)Lower (async replication, can use cheaper setups)
Write ThroughputLower (waits for acknowledgments from W replicas)Higher (writes confirmed locally, replicated async)
Read ThroughputHigher (strongly consistent reads are simple)Variable (stale reads are fast but conflict resolution is expensive)
Engineering ComplexityLower for writes (deterministic outcome)Higher for reads (need conflict resolution logic)
Operational OverheadLower (clear failure modes, fail-fast)Higher (background reconciliation, monitoring divergence)
Data Loss RiskNear zero (synchronous replication guarantees)Small window (depends on replication lag)
Downtime RiskHigher during partitions (fails availability)Lower during partitions (keeps serving)
Client ComplexityLower (assumes writes may fail)Higher (must handle stale data, retries, conflicts)
Conflict ResolutionNot needed (single source of truth)Required (last-write-wins, CRDTs, application-level)
Rollback ComplexitySimpler (transaction rollback)Complex (compensating transactions, saga patterns)

The business impact is stark: CP systems protect data integrity at the cost of availability. AP systems keep serving at the cost of requiring conflict resolution logic and accepting potential data divergence. For financial systems, CP is non-negotiable. For social media feeds, AP is usually acceptable.

# Python cost estimation helper

def estimate_cp_vs_ap_cost(n_replicas: int, write_rate: int, read_rate: int,
                           cpm_cost_per_instance: float, apm_cost_per_instance: float) -> dict:
    """
    Rough cost comparison between CP and AP configurations.

    Args:
        n_replicas: Number of replicas
        write_rate: Writes per second
        read_rate: Reads per second
        cpm_cost_per_instance: Monthly cost per instance for CP workload
        apm_cost_per_instance: Monthly cost per instance for AP workload

    Returns:
        Cost comparison dictionary
    """
    # CP: typically need majority quorum for both reads and writes
    # Synchronous replication to majority
    cpm_instances = n_replicas  # CP needs all replicas for sync
    cpm_monthly = cpm_instances * cpm_cost_per_instance

    # AP: can use fewer instances for writes, more for reads
    # Async replication allows flexibility
    ap_instances = n_replicas
    ap_monthly = ap_instances * apm_cost_per_instance

    # Operational overhead multiplier (CP is simpler, AP is more complex)
    cpm_operational = 1.0
    ap_operational = 1.3  # 30% more operational overhead for conflict resolution

    return {
        "cp_monthly_infra": cpm_monthly,
        "ap_monthly_infra": ap_monthly,
        "cp_operational_multiplier": cpm_operational,
        "ap_operational_multiplier": ap_operational,
        "cp_total_monthly": cpm_monthly * cpm_operational,
        "ap_total_monthly": ap_monthly * apm_cost_per_instance * ap_operational,
        "recommendation": "CP if data integrity is paramount, AP if availability is paramount"
    }

# Example: 3 replicas, high write rate, moderate read rate
cost_comparison = estimate_cp_vs_ap_cost(
    n_replicas=3,
    write_rate=10000,
    read_rate=100000,
    cpm_cost_per_instance=500,  # $500/month per instance
    apm_cost_per_instance=300   # $300/month per instance (can use cheaper async)
)
# CP: $500 * 3 * 1.0 = $1,500/month
# AP: $300 * 3 * 1.3 = $1,170/month (but requires conflict resolution engineering)

CAP Myth-Busting

CAP is often misunderstood. Here are common misconceptions:

Myth 1: “I Can Choose CA”

Reality: In any real distributed system, partitions WILL happen. The only practical choices are CP or AP. “CA” only exists in theoretical single-node systems.

Network Partition (P) = INEVITABLE in distributed systems
Therefore: You must choose C or A during partition
Therefore: CA is not a valid choice for distributed systems

Myth 2: “My System is CP or AP Forever”

Reality: You can choose different consistency models per operation. DynamoDB lets you choose strong or eventual consistency per query. Cassandra lets you choose consistency level per request.

// DynamoDB: choose per query
dynamodb.getItem({ Key: key, ConsistentRead: true }); // CP
dynamodb.getItem({ Key: key, ConsistentRead: false }); // AP

// Cassandra: choose per request
client.execute(query, { consistency: "ALL" }); // CP
client.execute(query, { consistency: "ONE" }); // AP

Myth 3: “Eventual Consistency Means Inconsistent”

Reality: Eventual consistency guarantees that if no new updates are made, all replicas will eventually converge to the same value. It does not mean permanent inconsistency.

Eventual Consistency = "Guaranteed to converge if updates stop"
NOT = "Might never become consistent"

Myth 4: “CAP Only Matters During Partitions”

Reality: CAP describes the partition scenario, but the latency consequences of consistency choices exist even when the network is healthy. This is exactly what PACELC captures. Synchronous replication for strong consistency adds latency even without partitions.


Production Failure Scenarios

Failure ScenarioImpactMitigation
Partition during writeCP: write fails; AP: write succeeds with potential divergenceMonitor partition events; have reconciliation process
Replica crash during writeCP: write fails if quorum not met; AP: write succeedsBackground repair mechanisms (Merkle trees)
Split-brainBoth partitions accept conflicting writesQuorum-based writes; use consensus protocols
Recovery from partitionTemporarily divergent data must convergeAnti-entropy protocols; read repair; conflict resolution
Network latency spikeCan appear as temporary partitionDistinguish slow network from true partition; use timeouts

Partition Recovery: What Happens When a Partition Heals

When a network partition ends, the separated nodes re-establish communication and must reconcile their divergent states. Nobody talks about partition recovery until it bites them in production.

The Reconciliation Problem

During a partition, CP and AP systems behave differently:

  • CP systems: One partition may have rejected writes (returning errors), while the other partition continued accepting them. When healed, the nodes must reconcile which writes were truly committed.
  • AP systems: Both partitions likely accepted conflicting writes. When healed, the system must detect and resolve conflicts through anti-entropy protocols, read repair, or application-level conflict resolution.

Reconciliation Mechanisms

Anti-Entropy Repair: Nodes exchange Merkle trees — cryptographic hashes of data ranges — to find which keys differ. Only the divergent keys get exchanged, so you don’t re-send the whole dataset. Cassandra and DynamoDB both use this approach.

sequenceDiagram
    participant NodeA
    participant NodeB
    Note over NodeA,NodeB: Partition heals
    NodeA->>NodeB: Exchange Merkle tree roots (hash of key ranges)
    NodeB-->>NodeA: Hash mismatch in range [K100-K200]
    NodeA->>NodeB: Send keys K100-K150 (divergent subset)
    NodeB->>NodeA: Send keys K150-K200
    Note over NodeA,NodeB: Reconcile conflicting values

Read Repair: On each read, a coordinator node queries multiple replicas. If replicas return different values, the coordinator resolves the conflict by writing the correct value back to all replicas. This “repairs” during normal read operations rather than as a dedicated background process.

Vector Clock Resolution: Some systems (Riak, early DynamoDB) use vector clocks to track causal ordering of updates. When partitions heal, the system uses vector clock history to determine which write should “win” based on causality.

Timeline of Partition Recovery

PhaseDurationWhat Happens
Partition endsT+0Network connectivity restored
Membership syncT+0 to T+30sNodes detect each other via gossip
Merkle exchangeT+30s to T+5minAnti-entropy identifies divergent keys
Data syncT+5min to T+1hrActual data exchanged based on analysis
ConvergenceT+1hr+All replicas report consistent values

The actual duration depends on data volume, network bandwidth, and the degree of divergence. A partition lasting hours can generate gigabytes of divergent writes that take days to fully reconcile.

Common Pitfalls During Recovery

  1. Sudden traffic spike: Recovered nodes may experience hot-grouping as clients reconnect simultaneously. Rate limiting and gradual rebalancing help.
  2. Overshooting reconciliation: Anti-entropy may sync a newer value from a partition that actually had less authoritative data. Quorum-based reconciliation prevents this.
  3. Application-level conflicts: If the database cannot auto-resolve (e.g., two simultaneous inventory decrements), the application must handle conflicts. This requires idempotent compensation logic.
  4. Stale reads during convergence: Even after “recovery”, a window exists where replicas may briefly disagree. Read-repair continuously closes this window.

Testing Partition Recovery

You can use chaos engineering to simulate partitions and verify recovery behavior:

// Chaos test: partition heals, verify no data loss
async function testPartitionRecovery() {
  // Simulate partition: isolate node
  await chaosEngine.partitionNode(node3);

  // Write during partition
  await writeKey("k1", "v1"); // succeeds on partition accepting writes

  // Heal partition
  await chaosEngine.healPartition(node3);

  // Verify all nodes converge
  await eventuallyConsistent(node3, 5000); // within 5 seconds
  const allValues = await readFromAllReplicas("k1");
  expect(allValues).toHaveSameValue();
}

Real-World Incident Case Studies

AWS S3 2017 — When a Metadata Bug Took Down the Internet

On February 28, 2017, a bug in a billing service restart caused S3 to be unavailable for about 4 hours in the US-EAST-1 region. This wasn’t a CAP violation — S3’s metadata layer is CP by design. When it failed, S3 had no choice but to become unavailable.

What happened: A routine restart of a billing service that was designed to scale S3’s internal metadata service went wrong. S3’s metadata service experienced a fault that cascaded.

CAP perspective: S3 chose CP for its metadata (strong consistency for bucket and object listings). When the metadata service failed, S3 became unavailable — choosing consistency over availability.

Key lesson: Even “internal” services need HA planning. The billing service restart triggered a metadata outage affecting thousands of downstream services.

GitHub 2018 — Maintenance Tasks Are Partition-Like Events

A routine schema migration caused unexpected replication lag on GitHub’s MySQL read replicas. The primary kept accepting writes, but the replicas fell behind. When replication lag exceeded thresholds, GitHub had to route reads directly to the primary — which meant reduced availability for anything that couldn’t hit the primary.

The lesson here is straightforward: maintenance windows behave like partitions. You get a window where replicas diverge and your system is temporarily inconsistent. Treat them with the same rigor you treat failure scenarios.

Cloudflare 2019 — Even AP Systems Need Circuit Breakers

On June 15, 2019, Cloudflare’s DNS service went down for about 30 minutes, affecting millions of websites. The root cause was a bug in how expired DNS records were handled during a routine blocklist update. A maintenance process tried to re-route DNS traffic, and a bug caused every DNS query to fail globally.

Here’s the irony: DNS is about as AP as you get — availability is the whole point. Yet during this outage, resolvers served nothing. Not even stale cached responses. A simple circuit breaker around that maintenance process would have prevented the total failure.


Capacity Estimation

Quorum Calculations

For N replicas with R read quorum and W write quorum:

// Strong consistency requires: R + W > N
// This ensures read-your-writes consistency

// Example: N=3, W=2, R=2
// R + W = 2 + 2 = 4 > 3 ✓ Strong consistency guaranteed

// Example: N=3, W=1, R=1
// R + W = 1 + 1 = 2 < 3 ✗ Eventual consistency only

Consistency Level Latency Reference

Consistency LevelExpected LatencyWhen to Use
ONE1-5msHighest availability, any replica
QUORUM10-50msBalanced consistency and availability
ALL50-200msStrongest consistency, lowest availability
LOCAL_QUORUM10-30msGeo-distributed, local DC consistency

Quorum Math Deeper Dive

The quorum condition R + W > N is not magic—it is a direct consequence of how overlapping read and write sets guarantee that any read intersects any write. Let us derive it formally.

The Intuition

Imagine you have N replicas. A write must be acknowledged by W replicas to be considered committed. A read must query R replicas to return a result. If R + W > N, then any set of R read replicas must overlap with any set of W write replicas by at least one node.

Write quorum:     {W1, W2, ..., Wk}   (size = W)
Read quorum:     {R1, R2, ..., Rm}   (size = R)

Overlap guaranteed when: R + W > N
Proof: |W ∩ R| = |W| + |R| - |W ∪ R|
                ≥ W + R - N           (because |W ∪ R| ≤ N)
                > 0                   (when R + W > N)

This overlap means every read sees at least one node that has the latest write.

Majority Quorum

The most common quorum configuration uses majority:

def majority_quorum(n: int) -> int:
    """
    Calculate majority quorum for N replicas.
    A majority is > N/2, meaning any two majorities overlap.
    """
    return (n // 2) + 1

# Examples:
# N=3 -> majority = 2
# N=5 -> majority = 3
# N=7 -> majority = 4

For N=3 with W=2, R=2: R + W = 4 > 3, so you have strong consistency. The read set of 2 always intersects the write set of 2, guaranteeing you see the latest write.

What Happens When R + W <= N

When R + W <= N, there is no guarantee of strong consistency. A read quorum and write quorum may be completely disjoint:

# Example: N=5, W=2, R=3
# R + W = 5, which is NOT > N (5)
# Write quorum: nodes {A, B}
# Read quorum:  nodes {C, D, E}
# These sets are disjoint — read may return stale data

def check_strong_consistency(n: int, r: int, w: int) -> bool:
    """
    Check if R+W>N condition for strong consistency.
    """
    return r + w > n

def consistency_guarantee(n: int, r: int, w: int) -> str:
    """
    Describe the consistency guarantee for given quorum settings.
    """
    if r + w > n:
        return "Strong consistency: every read sees latest write"
    elif r + w == n:
        return "Read-your-writes NOT guaranteed: quorum sets may be disjoint"
    else:
        return "Weak consistency: read may return stale data"

# Case study: Cassandra configurations
# N=3, W=1, R=1 -> R+W=2 <= 3 -> Eventual consistency only
# N=3, W=2, R=1 -> R+W=3 > 3? No, =3 -> Not guaranteed
# N=3, W=2, R=2 -> R+W=4 > 3 -> Strong consistency

Quorum Calculator

Here is a practical calculator function for designing quorum systems:

def quorum_calculator(n: int, target_consistency: str = "strong") -> dict:
    """
    Calculate read and write quorum for a given replication factor.

    Args:
        n: Number of replicas
        target_consistency: "strong", "read-heavy", "write-heavy"

    Returns:
        Dictionary with recommended R, W and consistency guarantee
    """
    def majority_quorum(nn: int) -> int:
        return (nn // 2) + 1

    if target_consistency == "strong":
        # R + W > N with minimum latency
        # Best: W = majority, R = majority
        w = majority_quorum(n)
        r = majority_quorum(n)
        guarantee = "Strong consistency (linearizable)"
    elif target_consistency == "read-heavy":
        # Optimize for reads: R=1, choose W to ensure R+W>N
        # W must be > N-1, so W = majority
        r = 1
        w = majority_quorum(n)
        guarantee = "Read-your-writes not guaranteed, but durable writes"
    elif target_consistency == "write-heavy":
        # Optimize for writes: W=1, choose R to ensure R+W>N
        # R must be > N-1, so R = majority
        w = 1
        r = majority_quorum(n)
        guarantee = "Fast writes, reads may be stale until quorum read"
    else:
        raise ValueError(f"Unknown target: {target_consistency}")

    return {
        "n": n,
        "r": r,
        "w": w,
        "r_plus_w": r + w,
        "quorum_overlap": r + w > n,
        "guarantee": guarantee
    }

# Interactive examples
for n in [3, 5, 7]:
    print(f"N={n}: majority quorum = {majority_quorum(n)}")

# Design scenarios
print(quorum_calculator(3, "strong"))      # N=3, R=2, W=2
print(quorum_calculator(5, "read-heavy")) # N=5, R=1, W=3
print(quorum_calculator(5, "write-heavy")) # N=5, R=3, W=1

Fault Tolerance Analysis

Quorum settings directly determine failure tolerance:

def failure_tolerance(n: int, r: int, w: int) -> dict:
    """
    Calculate how many replicas can fail while maintaining read/write availability.
    """
    def majority_quorum(nn: int) -> int:
        return (nn // 2) + 1

    # For writes: W replicas must be available
    write_fail_tolerance = n - w

    # For reads: R replicas must be available
    read_fail_tolerance = n - r

    # For strong consistency: quorum of both reads and writes
    # Both conditions must hold simultaneously
    consistent_fail_tolerance = n - max(r, w)

    return {
        "write_tolerance": write_fail_tolerance,
        "read_tolerance": read_fail_tolerance,
        "consistent_tolerance": consistent_fail_tolerance,
        "can_read_with_n_minus_w_failures": r >= w,
        "can_write_with_n_minus_r_failures": w >= r
    }

# N=3, W=2, R=2 -> can tolerate 1 failure and maintain consistency
# N=5, W=3, R=1 -> can tolerate 2 write failures, 4 read failures
#                  but NOT both reads and writes at same time if failures overlap

Why R+W>N Is Not Sufficient for All Consistency Models

The R + W > N condition guarantees that reads see the latest write in a single-key linearizable system. However, it does not guarantee:

  • Read-your-writes consistency: Requires reading from the same client after writing, not just any read
  • Causal consistency: Requires tracking causality across operations, not just latest write
  • Monotonic reads: Requires tracking which version a client has already seen
# R+W>N is necessary but not sufficient for all guarantees
# DynamoDB example: even with quorum, read-your-writes needs explicit design

def dynamodb_consistency_check(n: int, r: int, w: int, session_id: str) -> str:
    """
    Check what guarantees DynamoDB provides with given quorum.
    """
    if r + w <= n:
        return "Eventual consistency only"

    # With quorum, you get linearizability for individual operations
    # But read-your-writes requires tracking session state
    return "Linearizable for individual operations, but session consistency requires additional tracking"

Interview-Style Questions

Q1: How would you design a shopping cart service?

Consider: Should users always be able to add items even during partitions? If yes, AP. But checkout requires consistent inventory counts, so CP for that operation.

Model Answer: “I would use eventual consistency for cart operations (AP) so users can always add items, but use strong consistency for inventory checks during checkout (CP). This hybrid approach optimizes for both user experience and data integrity.”

Q2: Can you achieve both consistency and availability during a partition?

Model Answer: “No, not truly. During a partition, you must choose. However, you can get close by designing for ‘consistent enough’ - using techniques like read-repair, anti-entropy, and conflict resolution to minimize inconsistency windows. But by definition, during an asynchronous network failure, you cannot have both.”

Q3: Why does Cassandra claim to be “tunable consistency”?

Model Answer: “Cassandra’s tunable consistency lets you choose per-query. You can read from ONE node (fast, potentially stale) or QUORUM (slower, more consistent). Similarly for writes - ONE (fast, can be lost) or QUORUM (slower, durable). But you cannot escape the fundamental trade-off - you’re just choosing when to make the trade-off.”

Q4: What’s the difference between CAP and PACELC?

Model Answer: “CAP focuses only on partition scenarios. PACELC extends this by saying the consistency-latency trade-off exists even without partitions. Even when the network is healthy, strong consistency (synchronous replication) is slower than eventual consistency (async replication). PACELC captures the ‘always present’ trade-off; CAP captures the ‘partition scenario’ trade-off.”


Observability Checklist

Metrics to Capture

  • read_consistency_level (counter) - Breakdown of consistency levels used
  • write_consistency_level (counter) - Write acknowledgments by quorum
  • partition_events_total (counter) - Count and duration of partition events
  • replication_lag_seconds (gauge) - How far behind replicas are
  • quorum_failures_total (counter) - When quorum not achieved

Logs to Emit

{
  "timestamp": "2026-03-21T10:15:30.123Z",
  "operation": "write",
  "partition_detected": true,
  "quorum_achieved": true,
  "nodes_contacted": 3,
  "nodes_acknowledged": 2,
  "latency_ms": 45
}

Alerts to Configure

AlertThresholdSeverity
Partition lasting > 30s30000msWarning
Partition lasting > 60s60000msCritical
Quorum failures > 1%1% of writesWarning
Replication lag > 10s10000msWarning

Security Checklist

  • All inter-node communication encrypted (TLS)
  • Authentication required for replica communication
  • Network policies restricting replica-to-replica traffic
  • Audit logging of consistency level changes
  • Secrets rotation for cluster credentials
  • Certificate management and rotation automation
  • Access control for cluster management operations

Common Pitfalls / Anti-Patterns

Pitfall 1: Choosing CP Everywhere “Because Consistency Matters”

Problem: Over-engineering by using strong consistency for operations that do not need it. This adds latency and reduces availability unnecessarily.

Solution: Audit each operation. Most operations can tolerate eventual consistency. Reserve strong consistency for operations where correctness truly matters.

Pitfall 2: Ignoring Partition Probability

Problem: Assuming partitions are rare so CAP choice does not matter much. In reality, partitions happen regularly in any distributed system.

Solution: Plan for partitions. Document what your system does during partitions. Test failure scenarios. Your users will encounter partition behavior whether you plan for it or not.

Pitfall 3: Not Testing Consistency Guarantees

Problem: Assuming the database provides the consistency guarantees you configured. Without testing, you cannot be sure.

Solution: Use chaos engineering to inject failures. Verify that your system behaves correctly under partition conditions. Use tools like Jepsen to formally verify consistency guarantees.

Pitfall 4: Confusing “Available” with “Responsive”

Problem: An AP system during a partition still responds, but with stale data. Users may not understand why their write “succeeded” but they cannot see it.

Solution: Be explicit about what guarantees your system provides. Consider showing users when they are operating with stale data. Make the cost of AP visible.


Quick Recap

  • CAP theorem: During partition, you must choose between consistency (CP) or availability (AP).
  • Partitions are inevitable in distributed systems - you cannot avoid the trade-off.
  • CA does not exist in practice for distributed systems.
  • Modern databases let you tune consistency per operation.
  • Myth-busting: You cannot have both; eventual does not mean permanent inconsistency.
  • PACELC extends CAP to cover latency-consistency trade-offs even without partitions.

Copy/Paste Checklist

- [ ] Audit operations to classify by consistency requirement
- [ ] Choose CP for financial/inventory/locking operations
- [ ] Choose AP for social feeds/caching/high-availability needs
- [ ] Use tunable consistency to optimize per operation
- [ ] Document system behavior during partitions
- [ ] Test consistency guarantees under failure injection
- [ ] Monitor partition events and replication lag
- [ ] Plan for partitions - do not assume they will not happen
- [ ] Consider PACELC for latency trade-offs during normal operation

Category

Related Posts

Microservices vs Monolith: Choosing the Right Architecture

Understand the fundamental differences between monolithic and microservices architectures, their trade-offs, and how to decide which approach fits your project.

#microservices #monolith #architecture

Distributed Systems Primer: Key Concepts for Modern Architecture

A practical introduction to distributed systems fundamentals. Learn about failure modes, replication strategies, consensus algorithms, and the core challenges of building distributed software.

#distributed-systems #system-design #architecture

The Eight Fallacies of Distributed Computing

Explore the classic assumptions developers make about networked systems that lead to failures. Learn how to avoid these pitfalls in distributed architecture.

#distributed-systems #distributed-computing #system-design