Distributed Transactions: ACID vs BASE Trade-offs
Explore distributed transaction patterns: ACID vs BASE trade-offs, two-phase commit, saga pattern, eventual consistency, and choosing the right model.
Distributed Transactions: ACID vs BASE Trade-offs
In a single database, transactions are straightforward. The database handles atomicity, isolation, consistency, and durability (ACID). When you commit, it is committed. When you read data, it is the latest.
In distributed systems, this simplicity breaks. Data lives across multiple databases, services, and nodes. Keeping everything consistent requires coordination. Coordination has costs: latency, availability, and complexity.
Here I will explore trade-offs between consistency models, from strict ACID to relaxed eventual consistency, and how to choose the right model for your system.
ACID vs BASE
These are two philosophies for handling distributed data.
ACID (Atomicity, Consistency, Isolation, Durability) prioritizes consistency above all else. Every transaction sees a consistent state. The database will wait (or fail) to maintain consistency.
BASE (Basically Available, Soft state, Eventually consistent) prioritizes availability. The system may return stale data temporarily, but given enough time, all replicas converge to the same value.
graph TD
A[ACID] -->|Strong consistency| B[Reads always see latest writes]
A -->|Isolation| C[Concurrent tx look serial]
A -->|Durability| D[Committed data survives crashes]
E[BASE] -->|Availability| F[System stays available]
E -->|Eventual| G[Data converges over time]
E -->|Soft state| H[State may change without input]
ACID is what you want for financial transactions. BASE is what you get with many NoSQL databases and distributed caches.
The CAP Theorem
Eric Brewer’s CAP theorem says you can have at most two of three: Consistency, Availability, and Partition tolerance. Since network partitions happen in real systems, you must choose between consistency and availability.
graph LR
C[Consistency] -->|CP| DB[(Database)]
C -->|CA| DB
A[Availability] -->|AP| DB
P[Partition Tolerance] --> C
P --> A
A CP system (like ZooKeeper, etcd) will refuse to serve requests if it cannot confirm consistency. An AP system (like Cassandra, DynamoDB) will serve requests even if it cannot confirm all nodes have the latest data.
Most real systems choose AP and use compensating transactions (like saga) to handle inconsistency.
Two-Phase Commit
Two-phase commit (2PC) coordinates a distributed transaction across multiple participants. A coordinator asks all participants to prepare. If all say yes, it tells them to commit. If any says no, it tells them all to rollback.
For a detailed explanation including failure handling, see Two-Phase Commit.
The problem with 2PC is that it is blocking. If the coordinator crashes after participants have prepared but before sending commit, participants must wait indefinitely. For this reason, many systems avoid 2PC in favor of saga or other patterns.
The Saga Pattern
Saga breaks a distributed transaction into a sequence of local transactions, each with a compensating transaction. If step 3 fails, steps 1 and 2 are compensated (undone).
See Saga Pattern for full details.
Saga sacrifices isolation and atomicity for availability. Unlike 2PC, saga does not block. Unlike ACID, saga allows intermediate states to be visible.
sequence
Saga-->|Reserve| Inv:Inventory
Inv-->|OK| Saga
Saga-->|Charge| Pay:Payment
Pay-->|OK| Saga
Note over Saga: If payment fails, compensate inventory
Eventual Consistency
Eventual consistency guarantees that, if no new updates are made, all replicas will eventually return the same value. The system is available during the “eventually” window.
Amazon DynamoDB, Cassandra, and many distributed databases use eventual consistency by default or offer it as an option.
The “eventually” window can be milliseconds or seconds. Under high load or network partitions, it stretches.
Read Repair
One way to achieve eventual consistency: when you read data and detect inconsistency, you repair the stale replica on the fly.
def read(key):
values = [replica.read(key) for replica in replicas]
if not all_equal(values):
# Repair stale replicas
latest = max(values, key=lambda v: v.version)
for replica in replicas:
if replica.value != latest.value:
replica.write(key, latest)
return values[0]
This makes reads slightly more expensive but ensures convergence over time.
Anti-Entropy
Anti-entropy is background repair. Nodes periodically compare their data and fix inconsistencies. This runs continuously in the background, separate from read operations.
Consistency Levels
Many databases let you tune consistency per operation.
Strong consistency: All reads see the result of the latest write. No staleness.
Sequential consistency: All nodes see operations in the same order, but that order may lag behind wall clock time.
Causal consistency: If operation A causally precedes operation B, all nodes see A before B.
Eventual consistency: Given no new updates, all replicas eventually converge.
Read-your-writes consistency: Your own writes are always visible to subsequent reads from the same client. A special case of causal consistency.
# Strong (block until confirmed by quorum)
result = db.read('key', consistency='strong')
# Eventual (return immediately from nearest replica)
result = db.read('key', consistency='eventual')
# Session consistency (read from same replica as write)
result = db.read('key', consistency='session', session_id=client_session)
# Quorum read (R out of N replicas must respond)
result = db.read('key', consistency='quorum', read_quorum=3)
# Bounded staleness (read from replica within time bound)
result = db.read('key', consistency='bounded', max_staleness='5s')
Latency and Throughput by Consistency Level
The consistency level you choose directly affects both latency and throughput. Here’s what to expect in practice:
| Consistency Level | Read Latency | Write Latency | Throughput | Notes |
|---|---|---|---|---|
| Strong (linearizable) | 20-100ms | 30-150ms | Lowest | Must confirm with quorum or leader |
| Sequential | 10-50ms | 20-80ms | Low | Total order broadcast required |
| Session | 5-15ms | 10-30ms | Medium | Reads from preferred replica |
| Bounded staleness | 2-10ms | 10-30ms | High | Return stale data within bound |
| Eventual | 1-5ms | 5-20ms | Highest | Nearest replica, no confirmation |
These numbers assume 3-5 replicas in a single region. Cross-region and global deployments push these higher — strong consistency across regions can hit 200-500ms round-trip.
Why the gap matters: A user-facing API with a 200ms timeout can barely afford one strong-consistency read cross-region. It can handle dozens of eventual reads in the same window. For high-frequency operations (analytics, caching, social feeds), eventual consistency is the only practical choice.
Throughput scaling: With eventual consistency, you can scale read throughput horizontally by adding replicas — every replica can serve reads. With strong consistency, every read must contact a quorum, so adding replicas doesn’t help read throughput (though it does help with read availability).
Database-Specific Consistency Implementations
Different databases handle consistency very differently under the hood. Understanding what your database actually does helps you make better trade-offs.
Google Spanner uses TrueTime (GPS + atomic clocks) to bound clock uncertainty to ±7ms. This lets Spanner provide strong consistency without a single leader — any replica can serve reads at any timestamp, as long as it waits out the uncertainty window. The cost is read latency: strong reads must wait for time to advance past the uncertainty bound.
CockroachDB uses Hybrid Logical Clocks (HLC) instead of physical clocks. HLC combines physical time with a logical counter, giving you causality tracking without TrueTime’s hardware requirements. CockroachDB’s default consistency level is snapshot (like eventual), but you can request SYNC reads that force consensus.
Amazon DynamoDB uses last-write-wins by default for eventual consistency. Its conditional writes provide optimistic concurrency control, but it doesn’t support true multi-item transactions without DynamoDB Transactions (which uses 2PC internally). DynamoDB Streams provides ordered change capture per partition.
Apache Cassandra uses eventual consistency with tunable consistency per query. You can request ONE, QUORUM, or ALL reads/writes. Cassandra’s lightweight transactions (LWT) use Paxos for linearizable consistency on a per-partition basis — but LWT is expensive (4 round-trips) and should be used sparingly.
| Database | Consistency Model | Transaction Support | Max Safe Latency |
|---|---|---|---|
| Spanner | Strong (TrueTime) | 2PC + Paxos | ~50ms (single region) |
| CockroachDB | Strong (HLC + Raft) | Raft-based distributed SQL | ~20ms (single region) |
| DynamoDB | Eventual (LWW) | Per-item with Transactions | ~10ms (provisioned) |
| Cassandra | Eventual + LWT | Paxos (LWT) per partition | ~5ms (LOCAL_* modes) |
| etcd | Strong (Raft) | Linearizable reads by default | ~5ms |
Handling Inconsistency at the Application Level
When you choose eventual consistency, your application must handle partial states. This is where things get tricky.
Scenario: User orders a product. Payment succeeds. Inventory reservation succeeds. Shipment creation fails. The saga compensates and refunds the payment.
During the failure window, the user sees “order placed” but shipment does not exist. The saga eventually undoes everything, but the user may have seen inconsistent data briefly.
Mitigation options:
- Show pending states clearly to the user
- Use idempotency keys to safely retry
- Implement compensating transactions that notify the user
- Design sagas to minimize the inconsistency window
When to Use / When Not to Use Strong Consistency
| Criteria | Strong Consistency (ACID) | Eventual Consistency (BASE) |
|---|---|---|
| Latency | Higher (waits for confirmations) | Lower (returns immediately) |
| Availability | Lower (may refuse requests during partitions) | Higher (always serves requests) |
| Isolation | Full serializable isolation | No isolation (dirty reads possible) |
| Complexity | Simple transactions | Complex conflict resolution |
| Rollback | Automatic and free | Requires compensation logic |
| Use Case | Financial, inventory, compliance | Analytics, social features, caching |
| Data Returned | Always latest | May be stale |
When to Use Strong Consistency
Use strong consistency when:
- Financial transactions where double-charging is unacceptable
- Inventory systems where overselling has direct business impact
- Operations with regulatory compliance requirements
- Any case where stale reads cause business damage
- Systems where users expect immediate visibility of their actions
When to Use Eventual Consistency
Avoid strong consistency when:
- High-volume, low-value operations where blocking is too expensive
- Systems that must stay available during network partitions
- User-generated content (likes, comments) where occasional duplication is acceptable
- Analytics and reporting where slightly stale data is fine
- Operations where eventual consistency is explicitly acceptable to the business
Decision Tree: Which Consistency Model Should I Choose?
flowchart TD
Start{What are you building?}
Start --> Financial{Financial transaction?}
Financial -->|Yes| Strong1[Use Strong Consistency<br/>ACID transactions or 2PC<br/>Accept higher latency]
Financial -->|No| Inventory{Inventory management?}
Inventory -->|Yes| Strong2[Use Strong Consistency<br/>or Saga with careful design<br/>Prevent overselling]
Inventory -->|No| UserContent{User-generated content?}
UserContent -->|Yes| Eventual1[Use Eventual Consistency<br/>Likes, comments, posts<br/>Accept duplicates]
UserContent -->|No| Analytics{Analytics or reporting?}
Analytics -->|Yes| Eventual2[Use Eventual Consistency<br/>Bounded staleness OK<br/>Focus on throughput]
Analytics -->|No| Partition{Need availability<br/>during partitions?}
Partition -->|Yes| Eventual3[Use Eventual Consistency<br/>BASE model<br/>Saga for transactions]
Partition -->|No| Latency{Critical latency<br/>requirements?}
Latency -->|Yes| Eventual4[Use Eventual Consistency<br/>with quorum reads<br/>Tune R/W values]
Latency -->|No| Regulatory{Regulatory compliance<br/>requirements?}
Regulatory -->|Yes| Strong3[Use Strong Consistency<br/>Audit trails, compliance<br/>Document decisions]
Regulatory -->|No| Default[Default to Eventual<br/>Add Strong only where needed<br/>Profile before optimizing]
Quick Reference by Use Case:
| Use Case | Recommended Model | Reasoning |
|---|---|---|
| Banking/Financial | Strong (ACID/2PC) | Double-charging unacceptable |
| E-commerce inventory | Strong or Saga | Overselling has direct cost |
| Social media likes | Eventual | Duplicates tolerable, speed matters |
| Product catalog | Eventual | Stale reads acceptable |
| Order processing | Saga | Cross-service, compensating transactions |
| Analytics/Dashboard | Eventual | Stale data fine, throughput critical |
| User profiles | Session/Read-your-writes | User expects to see own changes |
| Stock trading | Strong | Regulatory, exact values required |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Coordinator crashes in 2PC prepare phase | Participants hold locks indefinitely | Use Paxos-based consensus for coordinator; implement timeout-based lock release |
| Network partition during commit | Some participants commit, others rollback; inconsistent state | Design for partition tolerance; use saga for compensation |
| Replica lag in synchronous replication | Read latency increases; writes may timeout | Use semi-synchronous replication; monitor replica lag; adjust consistency level |
| Read repair produces wrong result | Stale data returned under high contention | Use version vectors; implement conflict resolution; use quorum reads |
| Eventual consistency window too long | Users see stale data for unacceptable period | Monitor convergence time; adjust replication factor; use read repair |
| Deadlock in distributed transaction | Transactions timeout; partial state possible | Implement deadlock detection; set appropriate timeout values |
Consistency Failure Flow
graph TD
Start[Read Request] --> Quorum{Quorum Reached?}
Quorum -->|Yes| Latest{All Replicas Current?}
Latest -->|Yes| ReturnLatest[Return Latest Value]
Latest -->|No| Repair[Trigger Read Repair]
Repair --> ReturnLatest
Quorum -->|No| StaleOK{Can Return Stale?}
StaleOK -->|Yes| ReturnStale[Return Stale Value<br/>Mark for Repair]
StaleOK -->|No| Error[Return Error]
2PC Failure Flow
graph TD
Start[Start 2PC] --> Prepare[Coordinator sends<br/>Prepare to all participants]
Prepare --> Wait{All Participants<br/>Prepare OK?}
Wait -->|Yes| Commit[Coordinator sends<br/>Commit to all]
Wait -->|No| Rollback[Coordinator sends<br/>Rollback to all]
Commit --> CoordCrash{Coordinator<br/>Crashes?}
CoordCrash -->|Yes| ParticipantsWait[Participants wait<br/>indefinitely]
ParticipantsWait --> Recovery[Recovery protocol:<br/>Timeout + Query participants]
Recovery --> Committed[Determine outcome:<br/>Commit or Rollback]
Observability Checklist
Metrics
- Transaction commit rate and latency
- Replication lag (for replica-based systems)
- 2PC prepare and commit phase duration
- Transaction abort rate by reason
- Distributed lock wait time
- Consistency level effectiveness (staleness detected)
- Saga compensation execution count
Logs
- Log transaction start, participants, and outcome
- Include consistency level used per operation
- Log replica state changes
- Log deadlock detection events
- Include transaction correlation IDs for distributed tracing
Alerts
- Alert when transaction latency exceeds threshold
- Alert when replica lag exceeds acceptable window
- Alert when transaction abort rate spikes
- Alert when deadlock detection occurs frequently
- Alert when consistency violations detected
OpenTelemetry Integration
Distributed transactions span multiple services, making traditional request-level tracing insufficient. OpenTelemetry provides the context propagation and span linking you need to trace a transaction across service boundaries.
Trace context for distributed transactions:
from opentelemetry import trace
from opentelemetry.propagate import inject, extract
from opentelemetry.trace import SpanKind
tracer = trace.get_tracer(__name__)
def start_distributed_transaction(txn_name, txn_id, context=None):
"""Start a traced distributed transaction with proper context."""
# Extract incoming context (if initiated by upstream service)
if context:
ctx = extract(context)
span = tracer.start_span(
txn_name,
context=ctx,
kind=SpanKind.PRODUCER
)
else:
span = tracer.start_span(txn_name, kind=SpanKind.PRODUCER)
span.set_attribute("txn.id", txn_id)
span.set_attribute("txn.type", "distributed")
span.set_attribute("txn.participants", [])
return span
def record_participant(span, service_name, consistency_level, success):
"""Record a participant's involvement in the transaction."""
participant_span = tracer.start_span(
f"participant.{service_name}",
kind=SpanKind.CLIENT,
links=[Link(span.get_span_context())]
)
participant_span.set_attribute("participant.service", service_name)
participant_span.set_attribute("consistency.level", consistency_level)
participant_span.set_attribute("success", success)
participant_span.end()
Key OpenTelemetry patterns for distributed transactions:
-
Link parent spans to child operations: When a coordinator starts a sub-operation on a participant, link the child span to the parent’s span context. This creates a visible trace tree.
-
Record consistency level on spans: Tag every span with the consistency level used. This lets you filter traces by consistency level and correlate latency with consistency choices.
-
Track saga compensation separately: Compensation operations should be their own spans, linked back to the original step span. Mark them as
SpanKind.INTERNALso they don’t look like new user-facing requests.
# Example: tracing a saga step and its compensation with linking
def traced_saga_step(saga_id, step_name, operation, rollback_operation=None):
with tracer.start_as_current_span(f"saga.step.{step_name}") as step_span:
step_span.set_attribute("saga.id", saga_id)
step_span.set_attribute("step.name", step_name)
step_span.set_attribute("saga.is_compensation", False)
try:
result = operation()
step_span.set_status(trace.Status.OK)
return result
except Exception as e:
step_span.record_exception(e)
step_span.set_status(trace.Status.ERROR)
# If compensation exists, trace it as a linked span
if rollback_operation:
with tracer.start_as_current_span(
f"saga.compensate.{step_name}",
kind=SpanKind.INTERNAL
) as comp_span:
comp_span.set_attribute("saga.is_compensation", True)
comp_span.set_attribute("saga.original_step", step_name)
comp_span.set_attribute("compensation.triggered_by", str(e))
# Link to original step span
comp_span.add_event("linked_to_original_step")
try:
rollback_operation()
comp_span.set_status(trace.Status.OK)
except Exception as comp_error:
comp_span.record_exception(comp_error)
comp_span.set_status(trace.Status.ERROR)
raise
raise
Metrics to export alongside traces:
distributed_txn.duration(histogram): End-to-end transaction latencydistributed_txn.participant_count(gauge): Number of participants per transactiondistributed_txn.consistency_level(label): Which consistency level was useddistributed_txn.abort_reason(label): Why aborted transactions failedsaga.compensation.duration(histogram): How long compensations takereplication.lag(gauge): Current replica lag in milliseconds
Security Checklist
- Authenticate all distributed transaction participants
- Authorize which participants can join which transactions
- Encrypt transaction data in transit between participants
- Audit log all distributed transaction completions and failures
- Validate transaction inputs to prevent injection attacks
- Do not expose internal transaction IDs in error messages
Common Pitfalls / Anti-Patterns
Defaulting to strong consistency everywhere: Strong consistency is expensive. Not every operation needs it. Profile your access patterns and choose consistency levels appropriately.
Ignoring the CAP tradeoff: You cannot have both strong consistency and availability during partitions. Decide what your system needs and design accordingly.
Using 2PC without understanding blocking: The coordinator blocking problem is real. If you use 2PC, understand the failure modes and have mitigation plans.
Treating eventual consistency as failure: Eventual consistency is not a bug; it is a feature for scalability. Design your application to handle eventual consistency gracefully.
Not monitoring replication lag: If you use replica reads for scalability but do not monitor lag, you may serve stale data without knowing it.
Ignoring conflict resolution: With eventual consistency and concurrent writes, conflicts happen. Have a clear conflict resolution strategy (last-write-wins, application-level merge, etc.).
Quick Recap
graph TD
A[Consistency Models] --> B[Strong ACID]
A --> C[Eventual BASE]
B --> D[2PC, Synchronous Replication]
C --> E[Saga, Async Replication]
D --> F[High Latency, Lower Availability]
E --> G[Low Latency, Higher Availability]
Key Points
- ACID prioritizes consistency; BASE prioritizes availability
- CAP theorem: you cannot have consistency, availability, and partition tolerance simultaneously
- Choose consistency level per operation based on business requirements
- Saga pattern handles distributed transactions without 2PC blocking
- Eventual consistency requires application-level handling for conflicts
Production Checklist
# Distributed Transactions Production Readiness
- [ ] Consistency level chosen per operation based on requirements
- [ ] Saga pattern implemented for cross-service business transactions
- [ ] Replication lag monitored and alerted
- [ ] Transaction latency SLAs defined and monitored
- [ ] Conflict resolution strategy documented and implemented
- [ ] Idempotency implemented for all distributed operations
- [ ] CAP tradeoff understood and documented for each service
- [ ] Circuit breakers implemented to prevent cascade failures
- [ ] Correlation IDs in all distributed transaction logs
Use Cases: Strong Consistency
Strong consistency matters for financial transactions where double-charging is unacceptable. Inventory systems where overselling is unacceptable. Operations with regulatory compliance requirements. Any case where stale reads cause business damage.
For these cases, use 2PC or synchronous replication. Accept the latency and availability trade-offs.
Use Cases: Eventual Consistency
Eventual consistency makes sense for user-generated content (likes, comments) where occasional duplication is acceptable. Analytics and reporting where slightly stale data is fine. High-volume, low-value operations where blocking is too expensive. Systems that must stay available during network partitions.
For more on read and write patterns, see Database Replication.
Consistency and Microservices
In microservices, each service owns its data. Cross-service consistency is the hardest problem. You cannot wrap a transaction around service A’s database and service B’s database.
Options:
- Accept eventual consistency: Use saga or choreography. Handle partial failures explicitly.
- Shared database: Both services use the same database. This creates coupling.
- 2PC: Atomic transaction across databases. Blocked during commit. Rarely used.
- Event sourcing: Store events, not state. Reconstruct state by replaying.
Most microservice architectures choose option 1: eventual consistency with saga or orchestration.
For related patterns, see Saga Pattern, Event-Driven Architecture, and Message Queue Types.
Conclusion
Distributed transactions do not have a one-size-fits-all answer. ACID gives you consistency but sacrifices availability and latency. BASE gives you availability but introduces inconsistency windows.
In practice, most systems use a mix. Core business operations get strong consistency. Peripheral operations get eventual consistency. Saga patterns handle cross-service business transactions without 2PC’s blocking problems.
The right choice depends on your domain, your tolerance for inconsistency, and what your users expect. Do not default to strong consistency everywhere. It is expensive. Do not default to eventual consistency everywhere. Your users will notice bugs.
Category
Related Posts
The Outbox Pattern: Reliable Event Publishing in Distributed Systems
Learn the transactional outbox pattern for reliable event publishing. Discover how to solve the dual-write problem, implement idempotent consumers, and achieve exactly-once delivery.
TCC: Try-Confirm-Cancel Pattern for Distributed Transactions
Learn the Try-Confirm-Cancel pattern for distributed transactions. Explore how TCC differs from 2PC and saga, with implementation examples and real-world use cases.
Two-Phase Commit Protocol Explained
Learn the two-phase commit protocol for distributed transactions: prepare and commit phases, coordinator role, failure handling, and why 2PC is rarely used.