TCC: Try-Confirm-Cancel Pattern for Distributed Transactions
Learn the Try-Confirm-Cancel pattern for distributed transactions. Explore how TCC differs from 2PC and saga, with implementation examples and real-world use cases.
TCC: Try-Confirm-Cancel Pattern for Distributed Transactions
Two-phase commit blocks. Basic saga compensation gets messy when steps have complex dependencies. TCC (Try-Confirm-Cancel) offers a middle ground: more structure than plain saga, less blocking than 2PC.
The pattern splits every operation into three phases. Try reserves. Confirm commits. Cancel releases. Each service implements these three operations, and a coordinator orchestrates the flow.
Let me walk through how TCC actually works, where it fits, and the trade-offs that matter.
How TCC Works
TCC works by splitting every operation into three phases. Try reserves what you need. Confirm makes it permanent. Cancel releases what you reserved. Each service implements these three operations, and a coordinator orchestrates the flow.
sequence
Coordinator->>ServiceA: Try(Reserve 5 units)
ServiceA->>Coordinator: TryConfirmed
Coordinator->>ServiceB: Try(Reserve $100)
ServiceB->>Coordinator: TryConfirmed
Coordinator->>ServiceA: Confirm
ServiceA->>Coordinator: Confirmed
Coordinator->>ServiceB: Confirm
ServiceB->>Coordinator: Confirmed
When Try fails for any participant:
sequence
Coordinator->>ServiceA: Try(Reserve 5 units)
ServiceA->>Coordinator: TryConfirmed
Coordinator->>ServiceB: Try(Reserve $100)
ServiceB->>Coordinator: TryFailed(Review carefully)
Coordinator->>ServiceA: Cancel
ServiceA->>Coordinator: Cancelled
Idempotency matters at every phase. Services must handle duplicate Try calls gracefully. Confirm and Cancel also need to be idempotent, since the coordinator may retry if calls fail or get lost.
TCC vs Two-Phase Commit
TCC and 2PC both have three phases, but they work differently. In 2PC, participants lock resources during Prepare and hold those locks until Commit or Rollback. This blocking is the price of atomicity. In TCC, Try reserves but does not lock. Other operations can proceed against the same data, aware of pending reservations but not blocked by them.
The difference shows up under contention. Two competing transactions trying to reserve the same inventory: 2PC locks the rows and makes one wait or fail. TCC shows the second transaction a pending reservation and lets it decide what to do, whether that means queuing or picking an alternative.
2PC also assumes participants share a transaction manager. TCC works across heterogeneous systems because each service implements its own Try/Confirm/Cancel logic. Payment service, inventory service, shipping service, all different stacks, all coordinatable.
For a deeper look at 2PC and its limitations, see Two-Phase Commit.
TCC vs Basic Saga
Saga and TCC both avoid blocking, but failure handling differs. In a basic saga, each step has a compensation that undoes what the step did. If Step 3 fails, run compensation for Step 2, then Step 1. The compensation logic must know how to reverse each step’s effects.
TCC takes a different angle. Confirm and Cancel are explicit and symmetrical. Confirm commits the tentative reservation. Cancel releases it. The complexity shifts from writing reverse logic to implementing a reservation system that tracks pending operations.
TCC fits naturally when you can model operations as reservations. Hotel booking, inventory allocation, credit holds, seat reservations. These have a clear notion of “tentatively take this” and “make it official or release it.”
Saga fits better when operations are transformations rather than reservations. Moving money from account A to account B, transforming an order into an invoice. These lack natural reservation semantics and saga works fine there.
For more on saga, see Saga Pattern.
TCC vs 2PC vs Saga Comparison
| Aspect | 2PC | Saga | TCC |
|---|---|---|---|
| Blocking | Yes - participants block during commit | No - no blocking | No - no blocking |
| Locking | Locks resources during prepare | No locks | Reservations, not locks |
| Atomicity | True atomic commit | Eventual atomicity | Eventual atomicity |
| Isolation | Full serializable isolation | No isolation | No isolation |
| Coordination | Centralized coordinator | Centralized or choreographed | Centralized coordinator |
| Heterogeneous Systems | Requires shared TM | Yes | Yes |
| Compensation Model | Automatic rollback | Explicit compensations | Explicit Confirm/Cancel |
| Failure Handling | Blocking on coordinator crash | Compensations in reverse | Confirm/Cancel with retries |
| Latency | Two round trips minimum | Per-step latency | Two round trips minimum |
| Use Case Fit | Strong consistency required | Transformation operations | Reservation operations |
| Recovery Complexity | High (blocking states) | Medium (compensation chain) | Medium (tentative state cleanup) |
| Implementation Complexity | Medium (DB-supported) | Low-Medium | Medium-High (reservation design) |
Implementing TCC in Practice
TCC requires a coordinator and participant implementations. Many frameworks handle the coordinator role. You focus on implementing Try, Confirm, and Cancel methods on your services.
A Flight Booking Example
Consider a flight booking system that coordinates an airline reservation, a hotel booking, and a car rental. All three must succeed or all three must be cancelled.
class FlightBooking:
def try_reserve(self, flight_id, passenger_id, seats):
# Check availability and tentatively hold seats
reservation = Reservation(
flight_id=flight_id,
passenger_id=passenger_id,
seats=seats,
status="TENTATIVE"
)
self.reservations.save(reservation)
return "TryConfirmed"
def confirm(self, flight_id, passenger_id):
# Make the tentative reservation permanent
reservation = self.reservations.find(flight_id, passenger_id)
reservation.status = "CONFIRMED"
self.reservations.save(reservation)
return "Confirmed"
def cancel(self, flight_id, passenger_id):
# Release the tentative hold
reservation = self.reservations.find(flight_id, passenger_id)
reservation.status = "CANCELLED"
self.reservations.save(reservation)
return "Cancelled"
The coordinator orchestrates the three-phase flow:
class BookingCoordinator:
def __init__(self, flight, hotel, car):
self.flight = flight
self.hotel = hotel
self.car = car
def book_trip(self, flight_id, hotel_id, car_id, passenger):
# Try phase
results = []
results.append(self.flight.try_reserve(flight_id, passenger, 1))
results.append(self.hotel.try_reserve(hotel_id, passenger, 1))
results.append(self.car.try_reserve(car_id, passenger, 1))
if all(r == "TryConfirmed" for r in results):
# Confirm phase
self.flight.confirm(flight_id, passenger)
self.hotel.confirm(hotel_id, passenger)
self.car.confirm(car_id, passenger)
else:
# Cancel phase
self.flight.cancel(flight_id, passenger)
self.hotel.cancel(hotel_id, passenger)
self.car.cancel(car_id, passenger)
This example omits error handling, timeouts, and duplicate detection. A production implementation needs retry logic, idempotency keys, and timeout handlers for when participants fail to respond.
Handling Failures and Timeouts
TCC assumes participants will eventually respond to Try, Confirm, or Cancel calls. When a participant becomes unresponsive, the coordinator must decide what to do. This is where TCC implementations diverge.
Some frameworks use guaranteed delivery. They store the intended action in a log and retry until the participant acknowledges. Others use a maximum retry count and then flag the transaction as requiring manual intervention.
The tricky case is when Try succeeded but Confirm failed. The participant reserved the resource but never received the confirmation. From the participant’s perspective, it has a tentative reservation waiting to be confirmed or cancelled. The coordinator’s retry of Confirm should eventually clear this state. But if the coordinator crashed entirely, you need a recovery process that queries participants about their pending states.
graph TD
A[Coordinator calls Confirm] --> B{Participant reachable?}
B -->|Yes| C[Confirm succeeds]
B -->|No| D[Store in retry queue]
D --> E[Retry with backoff]
E --> F{Participant responds?}
F -->|Yes| C
F -->|No| G[Max retries exceeded]
G --> H[Flag for manual review]
Complete TCC Flow Diagram
flowchart TD
Start[TCC Transaction] --> TryPhase[Coordinator sends<br/>Try to all participants]
TryPhase --> TryResults{All Try succeed?}
TryResults -->|No| CancelPhase[Coordinator sends<br/>Cancel to all participants]
CancelPhase --> CancelDone[Resources released<br/>Transaction aborted]
TryResults -->|Yes| ConfirmPhase[Coordinator sends<br/>Confirm to all participants]
ConfirmPhase --> ConfirmResults{All Confirm succeed?}
ConfirmResults -->|No| ConfirmRetry[Retry with backoff]
ConfirmRetry --> ConfirmResults
ConfirmResults -->|Yes| Success[Transaction committed<br/>All reservations finalized]
TryPhase --> Timeout{Participant times out?}
Timeout -->|Yes| CancelPhase
Timeout -->|No| TryResults
Three Main Scenarios:
| Scenario | Trigger | Coordinator Action | Outcome |
|---|---|---|---|
| Try -> Confirm success | All participants respond TryConfirmed | Send Confirm to all | All reservations become permanent |
| Try -> Cancel | Any participant responds TryFailed | Send Cancel to all | All tentative reservations released |
| Try timeout -> Cancel | Participant times out on Try | Send Cancel to all | All tentative reservations released |
State Transitions for a Single Participant:
stateDiagram-v2
[*] --> Idle: Transaction starts
Idle --> Tentative: Try succeeds
Tentative --> Confirmed: Confirm received
Tentative --> Cancelled: Cancel received
Tentative --> Tentative: Try timeout, waiting for Cancel
Confirmed --> [*]
Cancelled --> [*]
TCC Frameworks
Building TCC from scratch means managing coordinator state, retry logic, timeout handling, and recovery — all nontrivial. Several frameworks handle the heavy lifting.
Apache TCM (Transaction Coordinator Manager)
Apache TCM is the reference implementation for J2EE-style TCC. It integrates with application servers and provides declarative transaction boundaries. Best for Java/J2EE shops already invested in that ecosystem.
Narayana (JBossTS)
Narayana is an open-source transaction manager supporting LRC (Last Resource Commit) optimization, 2PC, and TCC. It provides both programmatic and declarative (annotation-based) approaches. Works well with Spring via Narayana’s Spring integration.
@Compensable(compensationMethod = "cancelReservation")
public void tryReserveSeats(ReservationRequest request) {
// Tentatively reserve seats
reservationService.createTentativeReservation(request);
}
public void cancelReservation(ReservationRequest request) {
// Release the tentative hold
reservationService.cancelReservation(request.getReservationId());
}
public void confirmReservation(ReservationRequest request) {
// Finalize the reservation
reservationService.confirmReservation(request.getReservationId());
}
ByteTCC
ByteTCC is a TCC implementation for Spring applications. It uses annotations to define Try/Confirm/Cancel methods and handles coordinator logic transparently. Lightweight and Spring-native, good for microservices running in Spring Boot.
@Compensable(confirmMethod = "confirm", cancelMethod = "cancel")
public boolean tryReserveInventory(InventoryRequest request) {
// Try logic: check availability, create tentative hold
return inventoryService.tentativeHold(request.getItemId(), request.getQuantity());
}
public void confirm(InventoryRequest request) {
inventoryService.confirmHold(request.getItemId(), request.getQuantity());
}
public void cancel(InventoryRequest request) {
inventoryService.releaseHold(request.getItemId(), request.getQuantity());
}
Spring TCC (Spring-Cloud-tencent)
Spring TCC is part of the Tencent Spring Cloud stack. Integrates with Service Comb and provides distributed TCC transaction support for Spring Cloud microservices.
Framework Comparison
| Framework | Language | Coordinator | Spring Integration | Recovery Support | Best For |
|---|---|---|---|---|---|
| Apache TCM | Java | Embedded | Yes (J2EE) | Yes | Enterprise Java apps |
| Narayana | Java/C | Both | Yes | Yes | JBoss/Spring ecosystem |
| ByteTCC | Java | External | Yes (Spring Boot) | Yes | Lightweight Spring microservices |
| Spring TCC | Java | External | Yes | Yes | Tencent/Spring Cloud stack |
For most new projects, ByteTCC or Narayana are the practical choices. ByteTCC is simpler and more Spring Boot-friendly. Narayana has more enterprise features and longer track record.
Confirm/Cancel Idempotency Implementation
Idempotency is not optional in TCC — it is load-bearing. The coordinator retries Confirm and Cancel calls until it gets a response. Your participant must handle duplicates gracefully.
The Idempotency Problem
Consider this scenario:
- Coordinator calls
confirm(reservation_id="abc") - Participant confirms successfully but the network drops before the response arrives
- Coordinator retries
confirm(reservation_id="abc") - If your confirm handler is not idempotent, you might re-confirm an already-confirmed reservation
Idempotency Key Design
Use a dedicated idempotency key for each Try/Confirm/Cancel call. The key should be deterministic — the same operation always gets the same key.
import hashlib
def make_idempotency_key(transaction_id, participant_id, phase):
"""Generate a deterministic idempotency key.
Same transaction + participant + phase always produces same key.
"""
raw = f"{transaction_id}:{participant_id}:{phase}"
return hashlib.sha256(raw.encode()).hexdigest()[:16]
class TccParticipant:
def confirm(self, transaction_id, participant_id, reservation_data):
key = make_idempotency_key(transaction_id, participant_id, "confirm")
# Idempotency check
existing = self.confirm_log.find_by_idempotency_key(key)
if existing:
# Already confirmed — return success without re-confirming
return ConfirmResult(
success=True,
already_confirmed=True,
confirmed_at=existing.confirmed_at
)
# Actual confirmation logic
reservation = self.reservations.find(reservation_data.id)
reservation.status = "CONFIRMED"
self.reservations.save(reservation)
# Record this confirmation for future idempotency
self.confirm_log.save(IdempotencyRecord(
key=key,
transaction_id=transaction_id,
confirmed_at=datetime.utcnow()
))
return ConfirmResult(success=True, already_confirmed=False)
def cancel(self, transaction_id, participant_id, reservation_data):
key = make_idempotency_key(transaction_id, participant_id, "cancel")
existing = self.cancel_log.find_by_idempotency_key(key)
if existing:
return CancelResult(
success=True,
already_cancelled=True,
cancelled_at=existing.cancelled_at
)
reservation = self.reservations.find(reservation_data.id)
reservation.status = "CANCELLED"
self.reservations.save(reservation)
self.cancel_log.save(IdempotencyRecord(
key=key,
transaction_id=transaction_id,
cancelled_at=datetime.utcnow()
))
return CancelResult(success=True, already_cancelled=False)
Confirm Before Cancel Problem
A subtler idempotency problem: what if Confirm runs twice (first times out, second succeeds), and then Cancel is retried? The cancellation would incorrectly release a confirmed reservation.
Track state transitions explicitly. Confirm transitions from TENTATIVE to CONFIRMED. Cancel transitions from TENTATIVE to CANCELLED. Once in CONFIRMED, Cancel should be a no-op, not a failure.
def cancel(self, transaction_id, participant_id, reservation_data):
reservation = self.reservations.find(reservation_data.id)
if reservation.status == "CONFIRMED":
# Already confirmed — Cancel is correctly a no-op
return CancelResult(success=True, reason="already_confirmed")
if reservation.status == "CANCELLED":
# Already cancelled — still a no-op
return CancelResult(success=True, reason="already_cancelled")
# Actual cancellation from TENTATIVE state
reservation.status = "CANCELLED"
self.reservations.save(reservation)
return CancelResult(success=True)
Timeout vs Permanent Failure
TCC distinguishes between transient failures (retry might succeed) and permanent failures (never going to succeed). In your Try handler:
- Transient failure: Return a retryable error, coordinator retries
- Permanent failure: Return
TryFailedwith a reason that means “do not retry, cancel everything”
def try_reserve(self, inventory_id, quantity):
try:
# Try logic
reserved = self.inventory.tentative_hold(inventory_id, quantity)
return TryResult(success=True, reservation_id=reserved.id)
except InsufficientInventory:
# Permanent failure — not retrying will help
return TryResult(success=False, reason="INSUFFICIENT_INVENTORY")
except TemporaryDatabaseError:
# Transient failure — worth retrying
raise TryRetryableError("Database temporarily unavailable")
except CapacityExceeded:
# Permanent failure — no amount of retry will fix this
return TryResult(success=False, reason="CAPACITY_EXCEEDED")
Advantages of TCC
The main advantage is that resources do not lock during the transaction. Other operations can read or modify the same data, aware of pending reservations but not blocked by them. This makes TCC more scalable than 2PC, particularly under high contention.
The three-phase structure is explicit. Every participant agrees to the contract: if you can reserve in Try, you guarantee you can confirm or cancel later.
TCC also works across heterogeneous systems. No shared transaction manager required. Each service implements its own semantics for Try, Confirm, and Cancel.
Limitations and Challenges
TCC is not a silver bullet.
The biggest challenge is designing Try/Confirm/Cancel for your specific domain. Not all operations map naturally to reservation semantics. Forcing a square peg into a round hole produces brittle implementations.
Idempotency trips people up. Confirm and Cancel must handle duplicate calls gracefully. If the coordinator retries a Confirm that actually succeeded, the participant needs to recognize this and return Confirmed, not try to confirm again.
The timeout case requires care. Try succeeds but the coordinator crashes before sending Confirm or Cancel. Resources sit in a tentative state. Without a resolution mechanism, you get resource leaks that pile up silently.
Latency also increases. Every transaction needs at least two round trips to each participant.
When to Use TCC
TCC fits well when your business logic naturally supports reservation semantics. Inventory allocation, booking systems, credit reservations, seat holds. If you can model the operation as “tentatively take X and later either commit or release,” TCC gives you a clean structure.
TCC gets awkward when operations are transformations rather than reservations. If Step 2 depends on the output of Step 1 in a way that does not fit reservation semantics, you end up stuffing intermediate state into Try and carrying it forward to Confirm. This works but loses the elegance.
For an overview of distributed transaction patterns including TCC, see Distributed Transactions. For reliable message delivery in distributed systems, see the Outbox Pattern.
Observability Checklist
TCC transactions span multiple services and involve multiple round trips. Without observability, you cannot tell whether a failed transaction left dangling tentative reservations.
Metrics
- TCC transaction completion rate (success vs try-fail vs confirm-fail vs cancel)
- Try phase duration and success rate
- Confirm phase duration and retry count
- Cancel phase duration and how often it runs
- Average number of participants per transaction
- Timeout rate per phase (try timeout, confirm timeout)
- Dangling tentative reservation count (reservations stuck in TENTATIVE state)
Logs
- Log Try phase start with transaction ID, participant ID, and reservation data
- Log Try phase outcome (confirmed, failed, timeout)
- Log Confirm/Cancel phase starts and outcomes
- Log retry attempts with attempt number and delay
- Include idempotency key in all phase logs for correlation
- Log participant state transitions (TENTATIVE → CONFIRMED, TENTATIVE → CANCELLED)
Alerts
- Alert when dangling TENTATIVE reservations accumulate (cleanup is failing)
- Alert when confirm retry count exceeds threshold
- Alert when cancel phase runs frequently (indicates try phase instability)
- Alert when participant times out repeatedly on try phase
- Alert when transaction takes longer than expected threshold
Tracing
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
class TccTransaction:
def execute(self):
with tracer.start_as_current_span("tcc.transaction") as span:
span.set_attribute("tcc.transaction_id", self.txn_id)
span.set_attribute("tcc.participant_count", len(self.participants))
# Try phase
try_results = []
with tracer.start_as_current_span("tcc.try_phase") as try_span:
for participant in self.participants:
with tracer.start_as_current_span(f"tcc.try.{participant.name}") as p_span:
p_span.set_attribute("participant.name", participant.name)
result = participant.try_(self.request)
try_results.append(result)
p_span.set_attribute("tcc.try_result", result)
if all(r.success for r in try_results):
# Confirm phase
with tracer.start_as_current_span("tcc.confirm_phase") as confirm_span:
for participant in self.participants:
with tracer.start_as_current_span(f"tcc.confirm.{participant.name}") as p_span:
result = participant.confirm()
p_span.set_attribute("tcc.confirm_result", result)
else:
# Cancel phase
with tracer.start_as_current_span("tcc.cancel_phase") as cancel_span:
for participant in self.participants:
with tracer.start_as_current_span(f"tcc.cancel.{participant.name}") as p_span:
result = participant.cancel()
p_span.set_attribute("tcc.cancel_result", result)
Security Checklist
TCC coordination involves multiple services making state changes. Security misconfigurations can lead to unauthorized reservations or data leakage.
- Authenticate the coordinator-to-participant RPC calls (mutual TLS or JWT tokens)
- Authorize participants — coordinator should only call Confirm/Cancel on registered participants
- Validate reservation data in Try phase — do not trust coordinator-supplied quantities or IDs without validation
- Audit log all state transitions on tentative reservations (created, confirmed, cancelled)
- Encrypt coordinator-to-participant communication in transit
- Do not expose internal transaction IDs in error responses (use correlation IDs instead)
- Rate-limit Try requests per participant to prevent reservation exhaustion attacks
- Set TTL on tentative reservations so abandoned transactions auto-expire
Reservation Exhaustion Attack
A subtle TCC security concern: an attacker triggers many Try operations that succeed but never Confirm or Cancel. If tentative reservations hold inventory, the attacker can exhaust available inventory without paying.
Mitigations:
- TTL on tentative reservations: Auto-cancel after timeout
- Per-entity locking: Lock the reservation entity itself, not just the inventory
- Rate limiting Try: Limit how many Try requests a single client can make
- Verification on Confirm: Check the original request is still valid before confirming
Conclusion
TCC gives you a structured way to coordinate distributed transactions without blocking. The three-phase model makes the contract explicit: reserve, commit, release. When your domain fits the reservation pattern, you get clean separation of concerns and better scalability than 2PC.
The trade-offs are real. Idempotent operations, timeout handling, recovery logic for dangling reservations. For high-contention scenarios with natural reservation semantics, TCC is worth the implementation effort. For simpler saga flows or operations that do not fit reservation semantics, basic saga or choreography may be the better choice.
See also Event-Driven Architecture for patterns that complement TCC in microservices ecosystems.
Category
Related Posts
The Outbox Pattern: Reliable Event Publishing in Distributed Systems
Learn the transactional outbox pattern for reliable event publishing. Discover how to solve the dual-write problem, implement idempotent consumers, and achieve exactly-once delivery.
Distributed Transactions: ACID vs BASE Trade-offs
Explore distributed transaction patterns: ACID vs BASE trade-offs, two-phase commit, saga pattern, eventual consistency, and choosing the right model.
CQRS Pattern
Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.