Distributed Transactions: ACID vs BASE Trade-offs

Explore distributed transaction patterns: ACID vs BASE trade-offs, two-phase commit, saga pattern, eventual consistency, and choosing the right model.

published: March 22, 2026 reading time: 17 min read

Distributed Transactions: ACID vs BASE Trade-offs

In a single database, transactions are straightforward. The database handles atomicity, isolation, consistency, and durability (ACID). When you commit, it is committed. When you read data, it is the latest.

In distributed systems, this simplicity breaks. Data lives across multiple databases, services, and nodes. Keeping everything consistent requires coordination. Coordination has costs: latency, availability, and complexity.

Here I will explore trade-offs between consistency models, from strict ACID to relaxed eventual consistency, and how to choose the right model for your system.

ACID vs BASE

These are two philosophies for handling distributed data.

ACID (Atomicity, Consistency, Isolation, Durability) prioritizes consistency above all else. Every transaction sees a consistent state. The database will wait (or fail) to maintain consistency.

BASE (Basically Available, Soft state, Eventually consistent) prioritizes availability. The system may return stale data temporarily, but given enough time, all replicas converge to the same value.

graph TD
    A[ACID] -->|Strong consistency| B[Reads always see latest writes]
    A -->|Isolation| C[Concurrent tx look serial]
    A -->|Durability| D[Committed data survives crashes]

    E[BASE] -->|Availability| F[System stays available]
    E -->|Eventual| G[Data converges over time]
    E -->|Soft state| H[State may change without input]

ACID is what you want for financial transactions. BASE is what you get with many NoSQL databases and distributed caches.

The CAP Theorem

Eric Brewer’s CAP theorem says you can have at most two of three: Consistency, Availability, and Partition tolerance. Since network partitions happen in real systems, you must choose between consistency and availability.

graph LR
    C[Consistency] -->|CP| DB[(Database)]
    C -->|CA| DB
    A[Availability] -->|AP| DB
    P[Partition Tolerance] --> C
    P --> A

A CP system (like ZooKeeper, etcd) will refuse to serve requests if it cannot confirm consistency. An AP system (like Cassandra, DynamoDB) will serve requests even if it cannot confirm all nodes have the latest data.

Most real systems choose AP and use compensating transactions (like saga) to handle inconsistency.

Two-Phase Commit

Two-phase commit (2PC) coordinates a distributed transaction across multiple participants. A coordinator asks all participants to prepare. If all say yes, it tells them to commit. If any says no, it tells them all to rollback.

For a detailed explanation including failure handling, see Two-Phase Commit.

The problem with 2PC is that it is blocking. If the coordinator crashes after participants have prepared but before sending commit, participants must wait indefinitely. For this reason, many systems avoid 2PC in favor of saga or other patterns.

The Saga Pattern

Saga breaks a distributed transaction into a sequence of local transactions, each with a compensating transaction. If step 3 fails, steps 1 and 2 are compensated (undone).

See Saga Pattern for full details.

Saga sacrifices isolation and atomicity for availability. Unlike 2PC, saga does not block. Unlike ACID, saga allows intermediate states to be visible.

sequence
    Saga-->|Reserve| Inv:Inventory
    Inv-->|OK| Saga
    Saga-->|Charge| Pay:Payment
    Pay-->|OK| Saga
    Note over Saga: If payment fails, compensate inventory

Eventual Consistency

Eventual consistency guarantees that, if no new updates are made, all replicas will eventually return the same value. The system is available during the “eventually” window.

Amazon DynamoDB, Cassandra, and many distributed databases use eventual consistency by default or offer it as an option.

The “eventually” window can be milliseconds or seconds. Under high load or network partitions, it stretches.

Read Repair

One way to achieve eventual consistency: when you read data and detect inconsistency, you repair the stale replica on the fly.

def read(key):
    values = [replica.read(key) for replica in replicas]
    if not all_equal(values):
        # Repair stale replicas
        latest = max(values, key=lambda v: v.version)
        for replica in replicas:
            if replica.value != latest.value:
                replica.write(key, latest)
    return values[0]

This makes reads slightly more expensive but ensures convergence over time.

Anti-Entropy

Anti-entropy is background repair. Nodes periodically compare their data and fix inconsistencies. This runs continuously in the background, separate from read operations.

Consistency Levels

Many databases let you tune consistency per operation.

Strong consistency: All reads see the result of the latest write. No staleness.

Sequential consistency: All nodes see operations in the same order, but that order may lag behind wall clock time.

Causal consistency: If operation A causally precedes operation B, all nodes see A before B.

Eventual consistency: Given no new updates, all replicas eventually converge.

Read-your-writes consistency: Your own writes are always visible to subsequent reads from the same client. A special case of causal consistency.

# Strong (block until confirmed by quorum)
result = db.read('key', consistency='strong')

# Eventual (return immediately from nearest replica)
result = db.read('key', consistency='eventual')

# Session consistency (read from same replica as write)
result = db.read('key', consistency='session', session_id=client_session)

# Quorum read (R out of N replicas must respond)
result = db.read('key', consistency='quorum', read_quorum=3)

# Bounded staleness (read from replica within time bound)
result = db.read('key', consistency='bounded', max_staleness='5s')

Latency and Throughput by Consistency Level

The consistency level you choose directly affects both latency and throughput. Here’s what to expect in practice:

Consistency Level	Read Latency	Write Latency	Throughput	Notes
Strong (linearizable)	20-100ms	30-150ms	Lowest	Must confirm with quorum or leader
Sequential	10-50ms	20-80ms	Low	Total order broadcast required
Session	5-15ms	10-30ms	Medium	Reads from preferred replica
Bounded staleness	2-10ms	10-30ms	High	Return stale data within bound
Eventual	1-5ms	5-20ms	Highest	Nearest replica, no confirmation

These numbers assume 3-5 replicas in a single region. Cross-region and global deployments push these higher — strong consistency across regions can hit 200-500ms round-trip.

Why the gap matters: A user-facing API with a 200ms timeout can barely afford one strong-consistency read cross-region. It can handle dozens of eventual reads in the same window. For high-frequency operations (analytics, caching, social feeds), eventual consistency is the only practical choice.

Throughput scaling: With eventual consistency, you can scale read throughput horizontally by adding replicas — every replica can serve reads. With strong consistency, every read must contact a quorum, so adding replicas doesn’t help read throughput (though it does help with read availability).

Database-Specific Consistency Implementations

Different databases handle consistency very differently under the hood. Understanding what your database actually does helps you make better trade-offs.

Google Spanner uses TrueTime (GPS + atomic clocks) to bound clock uncertainty to ±7ms. This lets Spanner provide strong consistency without a single leader — any replica can serve reads at any timestamp, as long as it waits out the uncertainty window. The cost is read latency: strong reads must wait for time to advance past the uncertainty bound.

CockroachDB uses Hybrid Logical Clocks (HLC) instead of physical clocks. HLC combines physical time with a logical counter, giving you causality tracking without TrueTime’s hardware requirements. CockroachDB’s default consistency level is snapshot (like eventual), but you can request SYNC reads that force consensus.

Amazon DynamoDB uses last-write-wins by default for eventual consistency. Its conditional writes provide optimistic concurrency control, but it doesn’t support true multi-item transactions without DynamoDB Transactions (which uses 2PC internally). DynamoDB Streams provides ordered change capture per partition.

Apache Cassandra uses eventual consistency with tunable consistency per query. You can request ONE, QUORUM, or ALL reads/writes. Cassandra’s lightweight transactions (LWT) use Paxos for linearizable consistency on a per-partition basis — but LWT is expensive (4 round-trips) and should be used sparingly.

Database	Consistency Model	Transaction Support	Max Safe Latency
Spanner	Strong (TrueTime)	2PC + Paxos	~50ms (single region)
CockroachDB	Strong (HLC + Raft)	Raft-based distributed SQL	~20ms (single region)
DynamoDB	Eventual (LWW)	Per-item with Transactions	~10ms (provisioned)
Cassandra	Eventual + LWT	Paxos (LWT) per partition	~5ms (LOCAL_* modes)
etcd	Strong (Raft)	Linearizable reads by default	~5ms

Handling Inconsistency at the Application Level

When you choose eventual consistency, your application must handle partial states. This is where things get tricky.

Scenario: User orders a product. Payment succeeds. Inventory reservation succeeds. Shipment creation fails. The saga compensates and refunds the payment.

During the failure window, the user sees “order placed” but shipment does not exist. The saga eventually undoes everything, but the user may have seen inconsistent data briefly.

Mitigation options:

Show pending states clearly to the user
Use idempotency keys to safely retry
Implement compensating transactions that notify the user
Design sagas to minimize the inconsistency window

When to Use / When Not to Use Strong Consistency

Criteria	Strong Consistency (ACID)	Eventual Consistency (BASE)
Latency	Higher (waits for confirmations)	Lower (returns immediately)
Availability	Lower (may refuse requests during partitions)	Higher (always serves requests)
Isolation	Full serializable isolation	No isolation (dirty reads possible)
Complexity	Simple transactions	Complex conflict resolution
Rollback	Automatic and free	Requires compensation logic
Use Case	Financial, inventory, compliance	Analytics, social features, caching
Data Returned	Always latest	May be stale

When to Use Strong Consistency

Use strong consistency when:

Financial transactions where double-charging is unacceptable
Inventory systems where overselling has direct business impact
Operations with regulatory compliance requirements
Any case where stale reads cause business damage
Systems where users expect immediate visibility of their actions

When to Use Eventual Consistency

Avoid strong consistency when:

High-volume, low-value operations where blocking is too expensive
Systems that must stay available during network partitions
User-generated content (likes, comments) where occasional duplication is acceptable
Analytics and reporting where slightly stale data is fine
Operations where eventual consistency is explicitly acceptable to the business

Decision Tree: Which Consistency Model Should I Choose?

flowchart TD
    Start{What are you building?}
    Start --> Financial{Financial transaction?}
    Financial -->|Yes| Strong1[Use Strong Consistency<br/>ACID transactions or 2PC<br/>Accept higher latency]
    Financial -->|No| Inventory{Inventory management?}
    Inventory -->|Yes| Strong2[Use Strong Consistency<br/>or Saga with careful design<br/>Prevent overselling]
    Inventory -->|No| UserContent{User-generated content?}
    UserContent -->|Yes| Eventual1[Use Eventual Consistency<br/>Likes, comments, posts<br/>Accept duplicates]
    UserContent -->|No| Analytics{Analytics or reporting?}
    Analytics -->|Yes| Eventual2[Use Eventual Consistency<br/>Bounded staleness OK<br/>Focus on throughput]
    Analytics -->|No| Partition{Need availability<br/>during partitions?}
    Partition -->|Yes| Eventual3[Use Eventual Consistency<br/>BASE model<br/>Saga for transactions]
    Partition -->|No| Latency{Critical latency<br/>requirements?}
    Latency -->|Yes| Eventual4[Use Eventual Consistency<br/>with quorum reads<br/>Tune R/W values]
    Latency -->|No| Regulatory{Regulatory compliance<br/>requirements?}
    Regulatory -->|Yes| Strong3[Use Strong Consistency<br/>Audit trails, compliance<br/>Document decisions]
    Regulatory -->|No| Default[Default to Eventual<br/>Add Strong only where needed<br/>Profile before optimizing]

Quick Reference by Use Case:

Use Case	Recommended Model	Reasoning
Banking/Financial	Strong (ACID/2PC)	Double-charging unacceptable
E-commerce inventory	Strong or Saga	Overselling has direct cost
Social media likes	Eventual	Duplicates tolerable, speed matters
Product catalog	Eventual	Stale reads acceptable
Order processing	Saga	Cross-service, compensating transactions
Analytics/Dashboard	Eventual	Stale data fine, throughput critical
User profiles	Session/Read-your-writes	User expects to see own changes
Stock trading	Strong	Regulatory, exact values required

Production Failure Scenarios

Failure	Impact	Mitigation
Coordinator crashes in 2PC prepare phase	Participants hold locks indefinitely	Use Paxos-based consensus for coordinator; implement timeout-based lock release
Network partition during commit	Some participants commit, others rollback; inconsistent state	Design for partition tolerance; use saga for compensation
Replica lag in synchronous replication	Read latency increases; writes may timeout	Use semi-synchronous replication; monitor replica lag; adjust consistency level
Read repair produces wrong result	Stale data returned under high contention	Use version vectors; implement conflict resolution; use quorum reads
Eventual consistency window too long	Users see stale data for unacceptable period	Monitor convergence time; adjust replication factor; use read repair
Deadlock in distributed transaction	Transactions timeout; partial state possible	Implement deadlock detection; set appropriate timeout values

Consistency Failure Flow

graph TD
    Start[Read Request] --> Quorum{Quorum Reached?}
    Quorum -->|Yes| Latest{All Replicas Current?}
    Latest -->|Yes| ReturnLatest[Return Latest Value]
    Latest -->|No| Repair[Trigger Read Repair]
    Repair --> ReturnLatest
    Quorum -->|No| StaleOK{Can Return Stale?}
    StaleOK -->|Yes| ReturnStale[Return Stale Value<br/>Mark for Repair]
    StaleOK -->|No| Error[Return Error]

2PC Failure Flow

graph TD
    Start[Start 2PC] --> Prepare[Coordinator sends<br/>Prepare to all participants]
    Prepare --> Wait{All Participants<br/>Prepare OK?}
    Wait -->|Yes| Commit[Coordinator sends<br/>Commit to all]
    Wait -->|No| Rollback[Coordinator sends<br/>Rollback to all]
    Commit --> CoordCrash{Coordinator<br/>Crashes?}
    CoordCrash -->|Yes| ParticipantsWait[Participants wait<br/>indefinitely]
    ParticipantsWait --> Recovery[Recovery protocol:<br/>Timeout + Query participants]
    Recovery --> Committed[Determine outcome:<br/>Commit or Rollback]

Observability Checklist

Metrics

Transaction commit rate and latency
Replication lag (for replica-based systems)
2PC prepare and commit phase duration
Transaction abort rate by reason
Distributed lock wait time
Consistency level effectiveness (staleness detected)
Saga compensation execution count

Logs

Log transaction start, participants, and outcome
Include consistency level used per operation
Log replica state changes
Log deadlock detection events
Include transaction correlation IDs for distributed tracing

Alerts

Alert when transaction latency exceeds threshold
Alert when replica lag exceeds acceptable window
Alert when transaction abort rate spikes
Alert when deadlock detection occurs frequently
Alert when consistency violations detected

OpenTelemetry Integration

Distributed transactions span multiple services, making traditional request-level tracing insufficient. OpenTelemetry provides the context propagation and span linking you need to trace a transaction across service boundaries.

Trace context for distributed transactions:

from opentelemetry import trace
from opentelemetry.propagate import inject, extract
from opentelemetry.trace import SpanKind

tracer = trace.get_tracer(__name__)

def start_distributed_transaction(txn_name, txn_id, context=None):
    """Start a traced distributed transaction with proper context."""
    # Extract incoming context (if initiated by upstream service)
    if context:
        ctx = extract(context)
        span = tracer.start_span(
            txn_name,
            context=ctx,
            kind=SpanKind.PRODUCER
        )
    else:
        span = tracer.start_span(txn_name, kind=SpanKind.PRODUCER)

    span.set_attribute("txn.id", txn_id)
    span.set_attribute("txn.type", "distributed")
    span.set_attribute("txn.participants", [])

    return span

def record_participant(span, service_name, consistency_level, success):
    """Record a participant's involvement in the transaction."""
    participant_span = tracer.start_span(
        f"participant.{service_name}",
        kind=SpanKind.CLIENT,
        links=[Link(span.get_span_context())]
    )
    participant_span.set_attribute("participant.service", service_name)
    participant_span.set_attribute("consistency.level", consistency_level)
    participant_span.set_attribute("success", success)
    participant_span.end()

Key OpenTelemetry patterns for distributed transactions:

Link parent spans to child operations: When a coordinator starts a sub-operation on a participant, link the child span to the parent’s span context. This creates a visible trace tree.
Record consistency level on spans: Tag every span with the consistency level used. This lets you filter traces by consistency level and correlate latency with consistency choices.
Track saga compensation separately: Compensation operations should be their own spans, linked back to the original step span. Mark them as SpanKind.INTERNAL so they don’t look like new user-facing requests.

# Example: tracing a saga step and its compensation with linking
def traced_saga_step(saga_id, step_name, operation, rollback_operation=None):
    with tracer.start_as_current_span(f"saga.step.{step_name}") as step_span:
        step_span.set_attribute("saga.id", saga_id)
        step_span.set_attribute("step.name", step_name)
        step_span.set_attribute("saga.is_compensation", False)

        try:
            result = operation()
            step_span.set_status(trace.Status.OK)
            return result
        except Exception as e:
            step_span.record_exception(e)
            step_span.set_status(trace.Status.ERROR)

            # If compensation exists, trace it as a linked span
            if rollback_operation:
                with tracer.start_as_current_span(
                    f"saga.compensate.{step_name}",
                    kind=SpanKind.INTERNAL
                ) as comp_span:
                    comp_span.set_attribute("saga.is_compensation", True)
                    comp_span.set_attribute("saga.original_step", step_name)
                    comp_span.set_attribute("compensation.triggered_by", str(e))
                    # Link to original step span
                    comp_span.add_event("linked_to_original_step")

                    try:
                        rollback_operation()
                        comp_span.set_status(trace.Status.OK)
                    except Exception as comp_error:
                        comp_span.record_exception(comp_error)
                        comp_span.set_status(trace.Status.ERROR)
                        raise

            raise

Metrics to export alongside traces:

distributed_txn.duration (histogram): End-to-end transaction latency
distributed_txn.participant_count (gauge): Number of participants per transaction
distributed_txn.consistency_level (label): Which consistency level was used
distributed_txn.abort_reason (label): Why aborted transactions failed
saga.compensation.duration (histogram): How long compensations take
replication.lag (gauge): Current replica lag in milliseconds

Security Checklist

Authenticate all distributed transaction participants
Authorize which participants can join which transactions
Encrypt transaction data in transit between participants
Audit log all distributed transaction completions and failures
Validate transaction inputs to prevent injection attacks
Do not expose internal transaction IDs in error messages

Common Pitfalls / Anti-Patterns

Defaulting to strong consistency everywhere: Strong consistency is expensive. Not every operation needs it. Profile your access patterns and choose consistency levels appropriately.

Ignoring the CAP tradeoff: You cannot have both strong consistency and availability during partitions. Decide what your system needs and design accordingly.

Using 2PC without understanding blocking: The coordinator blocking problem is real. If you use 2PC, understand the failure modes and have mitigation plans.

Treating eventual consistency as failure: Eventual consistency is not a bug; it is a feature for scalability. Design your application to handle eventual consistency gracefully.

Not monitoring replication lag: If you use replica reads for scalability but do not monitor lag, you may serve stale data without knowing it.

Ignoring conflict resolution: With eventual consistency and concurrent writes, conflicts happen. Have a clear conflict resolution strategy (last-write-wins, application-level merge, etc.).

Quick Recap

graph TD
    A[Consistency Models] --> B[Strong ACID]
    A --> C[Eventual BASE]
    B --> D[2PC, Synchronous Replication]
    C --> E[Saga, Async Replication]
    D --> F[High Latency, Lower Availability]
    E --> G[Low Latency, Higher Availability]

Key Points

ACID prioritizes consistency; BASE prioritizes availability
CAP theorem: you cannot have consistency, availability, and partition tolerance simultaneously
Choose consistency level per operation based on business requirements
Saga pattern handles distributed transactions without 2PC blocking
Eventual consistency requires application-level handling for conflicts

Production Checklist

# Distributed Transactions Production Readiness

- [ ] Consistency level chosen per operation based on requirements
- [ ] Saga pattern implemented for cross-service business transactions
- [ ] Replication lag monitored and alerted
- [ ] Transaction latency SLAs defined and monitored
- [ ] Conflict resolution strategy documented and implemented
- [ ] Idempotency implemented for all distributed operations
- [ ] CAP tradeoff understood and documented for each service
- [ ] Circuit breakers implemented to prevent cascade failures
- [ ] Correlation IDs in all distributed transaction logs

Use Cases: Strong Consistency

Strong consistency matters for financial transactions where double-charging is unacceptable. Inventory systems where overselling is unacceptable. Operations with regulatory compliance requirements. Any case where stale reads cause business damage.

For these cases, use 2PC or synchronous replication. Accept the latency and availability trade-offs.

Use Cases: Eventual Consistency

Eventual consistency makes sense for user-generated content (likes, comments) where occasional duplication is acceptable. Analytics and reporting where slightly stale data is fine. High-volume, low-value operations where blocking is too expensive. Systems that must stay available during network partitions.

For more on read and write patterns, see Database Replication.

Consistency and Microservices

In microservices, each service owns its data. Cross-service consistency is the hardest problem. You cannot wrap a transaction around service A’s database and service B’s database.

Options:

Accept eventual consistency: Use saga or choreography. Handle partial failures explicitly.
Shared database: Both services use the same database. This creates coupling.
2PC: Atomic transaction across databases. Blocked during commit. Rarely used.
Event sourcing: Store events, not state. Reconstruct state by replaying.

Most microservice architectures choose option 1: eventual consistency with saga or orchestration.

For related patterns, see Saga Pattern, Event-Driven Architecture, and Message Queue Types.

Conclusion

Distributed transactions do not have a one-size-fits-all answer. ACID gives you consistency but sacrifices availability and latency. BASE gives you availability but introduces inconsistency windows.

In practice, most systems use a mix. Core business operations get strong consistency. Peripheral operations get eventual consistency. Saga patterns handle cross-service business transactions without 2PC’s blocking problems.

The right choice depends on your domain, your tolerance for inconsistency, and what your users expect. Do not default to strong consistency everywhere. It is expensive. Do not default to eventual consistency everywhere. Your users will notice bugs.

Distributed Transactions: ACID vs BASE Trade-offs

Distributed Transactions: ACID vs BASE Trade-offs

ACID vs BASE

The CAP Theorem

Two-Phase Commit

The Saga Pattern

Eventual Consistency

Read Repair

Anti-Entropy

Consistency Levels

Latency and Throughput by Consistency Level

Database-Specific Consistency Implementations

Handling Inconsistency at the Application Level

When to Use / When Not to Use Strong Consistency

When to Use Strong Consistency

When to Use Eventual Consistency

Decision Tree: Which Consistency Model Should I Choose?

Production Failure Scenarios

Consistency Failure Flow

2PC Failure Flow

Observability Checklist

Metrics

Logs

Alerts

OpenTelemetry Integration

Security Checklist

Common Pitfalls / Anti-Patterns

Quick Recap

Key Points

Production Checklist

Use Cases: Strong Consistency

Use Cases: Eventual Consistency

Consistency and Microservices

Conclusion

Category

Tags

Related Posts

The Outbox Pattern: Reliable Event Publishing in Distributed Systems

TCC: Try-Confirm-Cancel Pattern for Distributed Transactions

Two-Phase Commit Protocol Explained