TCC: Try-Confirm-Cancel Pattern for Distributed Transactions

Learn the Try-Confirm-Cancel pattern for distributed transactions. Explore how TCC differs from 2PC and saga, with implementation examples and real-world use cases.

published: March 24, 2026 reading time: 34 min read author: GeekWorkBench updated: June 17, 2026

Quick Summary

TCC (Try-Confirm-Cancel) coordinates distributed transactions across services that cannot share a transaction manager. Each operation splits into three phases: Try reserves resources tentatively, Confirm makes reservations permanent, and Cancel releases them. Unlike 2PC, TCC avoids locking resources during the transaction, allowing concurrent operations to proceed while awareness of pending reservations. The implementation cost is real—you need idempotent Confirm and Cancel operations, timeout handling, and recovery logic for dangling tentative reservations. But if your domain fits reservation semantics like booking systems or inventory allocation, TCC delivers a non-blocking alternative to 2PC that scales better under contention.

Introduction

TCC works by splitting every operation into three phases. Try reserves what you need. Confirm makes it permanent. Cancel releases what you reserved. Each service implements these three operations, and a coordinator orchestrates the flow.

sequenceDiagram
    Coordinator->>ServiceA: Try(Reserve 5 units)
    ServiceA->>Coordinator: TryConfirmed
    Coordinator->>ServiceB: Try(Reserve $100)
    ServiceB->>Coordinator: TryConfirmed
    Coordinator->>ServiceA: Confirm
    ServiceA->>Coordinator: Confirmed
    Coordinator->>ServiceB: Confirm
    ServiceB->>Coordinator: Confirmed

When Try fails for any participant:

sequenceDiagram
    Coordinator->>ServiceA: Try(Reserve 5 units)
    ServiceA->>Coordinator: TryConfirmed
    Coordinator->>ServiceB: Try(Reserve $100)
    ServiceB->>Coordinator: TryFailed(Review carefully)
    Coordinator->>ServiceA: Cancel
    ServiceA->>Coordinator: Cancelled

Idempotency matters at every phase. Services must handle duplicate Try calls gracefully. Confirm and Cancel also need to be idempotent, since the coordinator may retry if calls fail or get lost.

Topic-Specific Deep Dives

Conceptual Foundations

TCC vs Two-Phase Commit

TCC and 2PC both have three phases, but they work differently. In 2PC, participants lock resources during Prepare and hold those locks until Commit or Rollback. This blocking is the price of atomicity. In TCC, Try reserves but does not lock. Other operations can proceed against the same data, aware of pending reservations but not blocked by them.

The difference shows up under contention. Two competing transactions trying to reserve the same inventory: 2PC locks the rows and makes one wait or fail. TCC shows the second transaction a pending reservation and lets it decide what to do, whether that means queuing or picking an alternative.

2PC also assumes participants share a transaction manager. TCC works across heterogeneous systems because each service implements its own Try/Confirm/Cancel logic. Payment service, inventory service, shipping service, all different stacks, all coordinatable.

For a deeper look at 2PC and its limitations, see Two-Phase Commit.

TCC vs Basic Saga

Saga and TCC both avoid blocking, but failure handling differs. In a basic saga, each step has a compensation that undoes what the step did. If Step 3 fails, run compensation for Step 2, then Step 1. The compensation logic must know how to reverse each step’s effects.

TCC takes a different angle. Confirm and Cancel are explicit and symmetrical. Confirm commits the tentative reservation. Cancel releases it. The complexity shifts from writing reverse logic to implementing a reservation system that tracks pending operations.

TCC fits naturally when you can model operations as reservations. Hotel booking, inventory allocation, credit holds, seat reservations. These have a clear notion of “tentatively take this” and “make it official or release it.”

Saga fits better when operations are transformations rather than reservations. Moving money from account A to account B, transforming an order into an invoice. These lack natural reservation semantics and saga works fine there.

For more on saga, see Saga Pattern.

TCC compared to 2PC and saga

Aspect	2PC	Saga	TCC
Blocking	Yes - participants block during commit	No - no blocking	No - no blocking
Locking	Locks resources during prepare	No locks	Reservations, not locks
Atomicity	True atomic commit	Eventual atomicity	Eventual atomicity
Isolation	Full serializable isolation	No isolation	No isolation
Coordination	Centralized coordinator	Centralized or choreographed	Centralized coordinator
Heterogeneous Systems	Requires shared TM	Yes	Yes
Compensation Model	Automatic rollback	Explicit compensations	Explicit Confirm/Cancel
Failure Handling	Blocking on coordinator crash	Compensations in reverse	Confirm/Cancel with retries
Latency	Two round trips minimum	Per-step latency	Two round trips minimum
Use Case Fit	Strong consistency required	Transformation operations	Reservation operations
Recovery Complexity	High (blocking states)	Medium (compensation chain)	Medium (tentative state cleanup)
Implementation Complexity	Medium (DB-supported)	Low-Medium	Medium-High (reservation design)

Implementing TCC in Practice

TCC requires a coordinator and participant implementations. Many frameworks handle the coordinator role. You focus on implementing Try, Confirm, and Cancel methods on your services.

A Flight Booking Example

Consider a flight booking system that coordinates an airline reservation, a hotel booking, and a car rental. All three must succeed or all three must be cancelled.

class FlightBooking:
    def try_reserve(self, flight_id, passenger_id, seats):
        # Check availability and tentatively hold seats
        reservation = Reservation(
            flight_id=flight_id,
            passenger_id=passenger_id,
            seats=seats,
            status="TENTATIVE"
        )
        self.reservations.save(reservation)
        return "TryConfirmed"

    def confirm(self, flight_id, passenger_id):
        # Make the tentative reservation permanent
        reservation = self.reservations.find(flight_id, passenger_id)
        reservation.status = "CONFIRMED"
        self.reservations.save(reservation)
        return "Confirmed"

    def cancel(self, flight_id, passenger_id):
        # Release the tentative hold
        reservation = self.reservations.find(flight_id, passenger_id)
        reservation.status = "CANCELLED"
        self.reservations.save(reservation)
        return "Cancelled"

The coordinator orchestrates the three-phase flow:

class BookingCoordinator:
    def __init__(self, flight, hotel, car):
        self.flight = flight
        self.hotel = hotel
        self.car = car

    def book_trip(self, flight_id, hotel_id, car_id, passenger):
        # Try phase
        results = []
        results.append(self.flight.try_reserve(flight_id, passenger, 1))
        results.append(self.hotel.try_reserve(hotel_id, passenger, 1))
        results.append(self.car.try_reserve(car_id, passenger, 1))

        if all(r == "TryConfirmed" for r in results):
            # Confirm phase
            self.flight.confirm(flight_id, passenger)
            self.hotel.confirm(hotel_id, passenger)
            self.car.confirm(car_id, passenger)
        else:
            # Cancel phase
            self.flight.cancel(flight_id, passenger)
            self.hotel.cancel(hotel_id, passenger)
            self.car.cancel(car_id, passenger)

This example omits error handling, timeouts, and duplicate detection. A production implementation needs retry logic, idempotency keys, and timeout handlers for when participants fail to respond.

Handling Failures and Timeouts

TCC assumes participants will eventually respond to Try, Confirm, or Cancel calls. When a participant becomes unresponsive, the coordinator must decide what to do. This is where TCC implementations diverge.

Some frameworks use guaranteed delivery. They store the intended action in a log and retry until the participant acknowledges. Others use a maximum retry count and then flag the transaction as requiring manual intervention.

The tricky case is when Try succeeded but Confirm failed. The participant reserved the resource but never received the confirmation. From the participant’s perspective, it has a tentative reservation waiting to be confirmed or cancelled. The coordinator’s retry of Confirm should eventually clear this state. But if the coordinator crashed entirely, you need a recovery process that queries participants about their pending states.

flowchart TD
    A[Coordinator calls Confirm] --> B{Participant reachable?}
    B -->|Yes| C[Confirm succeeds]
    B -->|No| D[Store in retry queue]
    D --> E[Retry with backoff]
    E --> F{Participant responds?}
    F -->|Yes| C
    F -->|No| G[Max retries exceeded]
    G --> H[Flag for manual review]

Complete TCC Flow Diagram

flowchart TD
    Start[TCC Transaction] --> TryPhase[Coordinator sends<br/>Try to all participants]
    TryPhase --> TryResults{All Try succeed?}
    TryResults -->|No| CancelPhase[Coordinator sends<br/>Cancel to all participants]
    CancelPhase --> CancelDone[Resources released<br/>Transaction aborted]
    TryResults -->|Yes| ConfirmPhase[Coordinator sends<br/>Confirm to all participants]
    ConfirmPhase --> ConfirmResults{All Confirm succeed?}
    ConfirmResults -->|No| ConfirmRetry[Retry with backoff]
    ConfirmRetry --> ConfirmResults
    ConfirmResults -->|Yes| Success[Transaction committed<br/>All reservations finalized]
    TryPhase --> Timeout{Participant times out?}
    Timeout -->|Yes| CancelPhase
    Timeout -->|No| TryResults

Three Main Scenarios:

Scenario	Trigger	Coordinator Action	Outcome
Try -> Confirm success	All participants respond TryConfirmed	Send Confirm to all	All reservations become permanent
Try -> Cancel	Any participant responds TryFailed	Send Cancel to all	All tentative reservations released
Try timeout -> Cancel	Participant times out on Try	Send Cancel to all	All tentative reservations released

State Transitions for a Single Participant:

stateDiagram-v2
    [*] --> Idle: Transaction starts
    Idle --> Tentative: Try succeeds
    Tentative --> Confirmed: Confirm received
    Tentative --> Cancelled: Cancel received
    Tentative --> Tentative: Try timeout, waiting for Cancel
    Confirmed --> [*]
    Cancelled --> [*]

TCC Frameworks

Building TCC from scratch means managing coordinator state, retry logic, timeout handling, and recovery — all nontrivial. Several frameworks handle the heavy lifting.

Apache TCM (Transaction Coordinator Manager)

Apache TCM is the reference implementation for J2EE-style TCC. It integrates with application servers and provides declarative transaction boundaries. Best for Java/J2EE shops already invested in that ecosystem.

Narayana (JBossTS)

Narayana is an open-source transaction manager supporting LRC (Last Resource Commit) optimization, 2PC, and TCC. It provides both programmatic and declarative (annotation-based) approaches. Works well with Spring via Narayana’s Spring integration.

@Compensable(compensationMethod = "cancelReservation")
public void tryReserveSeats(ReservationRequest request) {
    // Tentatively reserve seats
    reservationService.createTentativeReservation(request);
}

public void cancelReservation(ReservationRequest request) {
    // Release the tentative hold
    reservationService.cancelReservation(request.getReservationId());
}

public void confirmReservation(ReservationRequest request) {
    // Finalize the reservation
    reservationService.confirmReservation(request.getReservationId());
}

ByteTCC

ByteTCC is a TCC implementation for Spring applications. It uses annotations to define Try/Confirm/Cancel methods and handles coordinator logic transparently. Lightweight and Spring-native, good for microservices running in Spring Boot.

@Compensable(confirmMethod = "confirm", cancelMethod = "cancel")
public boolean tryReserveInventory(InventoryRequest request) {
    // Try logic: check availability, create tentative hold
    return inventoryService.tentativeHold(request.getItemId(), request.getQuantity());
}

public void confirm(InventoryRequest request) {
    inventoryService.confirmHold(request.getItemId(), request.getQuantity());
}

public void cancel(InventoryRequest request) {
    inventoryService.releaseHold(request.getItemId(), request.getQuantity());
}

Spring TCC (Spring-Cloud-tencent)

Spring TCC is part of the Tencent Spring Cloud stack. Integrates with Service Comb and provides distributed TCC transaction support for Spring Cloud microservices.

Framework Comparison

Framework	Language	Coordinator	Spring Integration	Recovery Support	Best For
Apache TCM	Java	Embedded	Yes (J2EE)	Yes	Enterprise Java apps
Narayana	Java/C	Both	Yes	Yes	JBoss/Spring ecosystem
ByteTCC	Java	External	Yes (Spring Boot)	Yes	Lightweight Spring microservices
Spring TCC	Java	External	Yes	Yes	Tencent/Spring Cloud stack

For most new projects, ByteTCC or Narayana are the practical choices. ByteTCC is simpler and more Spring Boot-friendly. Narayana has more enterprise features and longer track record.

Confirm/Cancel Idempotency Implementation

Idempotency is not optional in TCC — it is load-bearing. The coordinator retries Confirm and Cancel calls until it gets a response. Your participant must handle duplicates gracefully.

The Idempotency Problem

Consider this scenario:

Coordinator calls confirm(reservation_id="abc")
Participant confirms successfully but the network drops before the response arrives
Coordinator retries confirm(reservation_id="abc")
If your confirm handler is not idempotent, you might re-confirm an already-confirmed reservation

Idempotency Key Design

Use a dedicated idempotency key for each Try/Confirm/Cancel call. The key should be deterministic — the same operation always gets the same key.

import hashlib

def make_idempotency_key(transaction_id, participant_id, phase):
    """Generate a deterministic idempotency key.

    Same transaction + participant + phase always produces same key.
    """
    raw = f"{transaction_id}:{participant_id}:{phase}"
    return hashlib.sha256(raw.encode()).hexdigest()[:16]

class TccParticipant:
    def confirm(self, transaction_id, participant_id, reservation_data):
        key = make_idempotency_key(transaction_id, participant_id, "confirm")

        # Idempotency check
        existing = self.confirm_log.find_by_idempotency_key(key)
        if existing:
            # Already confirmed — return success without re-confirming
            return ConfirmResult(
                success=True,
                already_confirmed=True,
                confirmed_at=existing.confirmed_at
            )

        # Actual confirmation logic
        reservation = self.reservations.find(reservation_data.id)
        reservation.status = "CONFIRMED"
        self.reservations.save(reservation)

        # Record this confirmation for future idempotency
        self.confirm_log.save(IdempotencyRecord(
            key=key,
            transaction_id=transaction_id,
            confirmed_at=datetime.utcnow()
        ))

        return ConfirmResult(success=True, already_confirmed=False)

    def cancel(self, transaction_id, participant_id, reservation_data):
        key = make_idempotency_key(transaction_id, participant_id, "cancel")

        existing = self.cancel_log.find_by_idempotency_key(key)
        if existing:
            return CancelResult(
                success=True,
                already_cancelled=True,
                cancelled_at=existing.cancelled_at
            )

        reservation = self.reservations.find(reservation_data.id)
        reservation.status = "CANCELLED"
        self.reservations.save(reservation)

        self.cancel_log.save(IdempotencyRecord(
            key=key,
            transaction_id=transaction_id,
            cancelled_at=datetime.utcnow()
        ))

        return CancelResult(success=True, already_cancelled=False)

Confirm Before Cancel Problem

A subtler idempotency problem: what if Confirm runs twice (first times out, second succeeds), and then Cancel is retried? The cancellation would incorrectly release a confirmed reservation.

Track state transitions explicitly. Confirm transitions from TENTATIVE to CONFIRMED. Cancel transitions from TENTATIVE to CANCELLED. Once in CONFIRMED, Cancel should be a no-op, not a failure.

def cancel(self, transaction_id, participant_id, reservation_data):
    reservation = self.reservations.find(reservation_data.id)

    if reservation.status == "CONFIRMED":
        # Already confirmed — Cancel is correctly a no-op
        return CancelResult(success=True, reason="already_confirmed")

    if reservation.status == "CANCELLED":
        # Already cancelled — still a no-op
        return CancelResult(success=True, reason="already_cancelled")

    # Actual cancellation from TENTATIVE state
    reservation.status = "CANCELLED"
    self.reservations.save(reservation)
    return CancelResult(success=True)

Timeout vs Permanent Failure

TCC distinguishes between transient failures (retry might succeed) and permanent failures (never going to succeed). In your Try handler:

Transient failure: Return a retryable error, coordinator retries
Permanent failure: Return TryFailed with a reason that means “do not retry, cancel everything”

def try_reserve(self, inventory_id, quantity):
    try:
        # Try logic
        reserved = self.inventory.tentative_hold(inventory_id, quantity)
        return TryResult(success=True, reservation_id=reserved.id)
    except InsufficientInventory:
        # Permanent failure — not retrying will help
        return TryResult(success=False, reason="INSUFFICIENT_INVENTORY")
    except TemporaryDatabaseError:
        # Transient failure — worth retrying
        raise TryRetryableError("Database temporarily unavailable")
    except CapacityExceeded:
        # Permanent failure — no amount of retry will fix this
        return TryResult(success=False, reason="CAPACITY_EXCEEDED")

Advantages of TCC

The main advantage is that resources do not lock during the transaction. Other operations can read or modify the same data, aware of pending reservations but not blocked by them. This makes TCC more scalable than 2PC, particularly under high contention.

The three-phase structure is explicit. Every participant agrees to the contract: if you can reserve in Try, you guarantee you can confirm or cancel later.

TCC also works across heterogeneous systems. No shared transaction manager required. Each service implements its own semantics for Try, Confirm, and Cancel.

Common Pitfalls / Anti-Patterns

TCC is not a silver bullet.

The biggest challenge is designing Try/Confirm/Cancel for your specific domain. Not all operations map naturally to reservation semantics. Forcing a square peg into a round hole produces brittle implementations.

Idempotency trips people up. Confirm and Cancel must handle duplicate calls gracefully. If the coordinator retries a Confirm that actually succeeded, the participant needs to recognize this and return Confirmed, not try to confirm again.

The timeout case requires care. Try succeeds but the coordinator crashes before sending Confirm or Cancel. Resources sit in a tentative state. Without a resolution mechanism, you get resource leaks that pile up silently.

Latency also increases. Every transaction needs at least two round trips to each participant.

Use Cases & Decision Criteria

This section maps specific scenarios to TCC and explains the trade-offs that determine whether TCC is the right fit compared to alternatives like 2PC or saga. The goal is to give you a decision framework, not just a list of use cases.

When TCC Is the Right Fit

TCC shines when your business operations map naturally to reservation semantics. Think of operations that have a clear notion of “tentatively allocate this now, confirm or release it later.” Classic examples:

Inventory allocation: Reserve stock when an order is placed, confirm when payment clears, release if checkout times out
Booking systems: Hold a hotel room, a flight seat, or a table for a set window before the customer completes payment
Credit holds: Place a hold on a credit limit when a card is presented, confirm if authorization succeeds, release otherwise
Seat reservations: Temporarily assign a seat in a theater or event, release if payment fails

The common thread is that the tentative state is meaningful to the business. Other operations can see the pending reservation and act accordingly (show “limited availability,” queue the request, suggest alternatives) without being blocked by it.

When TCC Gets Awkward

TCC fits poorly when operations are transformations rather than reservations. If Step 2 depends on the output of Step 1 in a way that does not fit reservation semantics, you end up stuffing intermediate state into Try and carrying it forward to Confirm. Consider a money transfer from account A to account B. There is no natural “tentatively move money” step — the debit and credit must happen together or not at all. Saga handles this better with explicit compensation.

Signs you are forcing TCC into the wrong shape:

Your Try phase is doing real work (not just reserving) and passing state forward to Confirm
Cancel logic needs to know the details of what Confirm did, rather than simply releasing a reservation
You are inventing artificial reservation concepts to model something that is fundamentally a transformation

In these cases, basic saga with compensation is simpler and more natural.

TCC vs 2PC vs Saga Decision Matrix

Scenario	Recommended Pattern
Operations are reservations (inventory, booking, holds)	TCC
Operations are transformations (money transfer, order-to-invoice)	Saga
True atomicity with full isolation required	2PC
Heterogeneous systems (different stacks, no shared TM)	TCC
High contention with concurrent access to same resources	TCC (with concurrency control) or Saga
Simple domain with low-stakes operations	Saga
Strict no-blocking requirement with strong eventual consistency	TCC
Enterprise Java ecosystem with existing transaction infrastructure	2PC or Narayana TCC

For an overview of distributed transaction patterns including TCC, see Distributed Transactions. For reliable message delivery in distributed systems, see the Outbox Pattern.

Production Failure Scenarios

Scenario 1: Network Partition During Confirm Phase

Trigger: Network partition separates coordinator from one or more participants after Try succeeds.

What happens:

All participants respond TryConfirmed
Coordinator starts Confirm phase
Coordinator can reach Participant A and B but not Participant C
A and B confirm successfully; C never receives Confirm
Coordinator retries C with backoff until max retries
Max retries exceeded, transaction flagged for manual review

Outcome: Participants A and B have CONFIRMED reservations; Participant C has TENTATIVE. Inventory is inconsistently allocated. Manual intervention required to either Confirm C (if resources still available) or Cancel all participants.

Mitigation: Use a recovery process that periodically scans for TENTATIVE reservations older than a threshold and either completes or cancels them. Implement saga-style compensation as a fallback.

Scenario 2: Participant Crash and Recovery

Trigger: A participant process crashes after receiving Try but before processing Confirm.

What happens:

Participant D receives Try, writes TENTATIVE reservation to durable storage, responds TryConfirmed
Participant D crashes before receiving Confirm
Participant D restarts and recovers its state
Coordinator retries Confirm for Participant D
If participant tracks idempotency correctly, it recognizes this is a retry and returns Confirmed (no-op on actual state)

Outcome: Transaction completes successfully if participant implemented idempotency correctly. If not implemented, the recovery might not recognize the pending reservation and create duplicate or conflicting state.

Mitigation: Durable write of TENTATIVE state before responding to Try. Idempotency check on Confirm that recognizes already-processed transactions.

Scenario 3: Clock Skew Across Participants

Trigger: Different participants have slightly different system clocks, causing TTL-based auto-expiry to fire at different times.

What happens:

Transaction with 5-minute TTL on tentative reservations starts
Participant E has a fast clock and expires the reservation at minute 4
Participant F has a slow clock and still has TENTATIVE at minute 5
Coordinator sends Confirm at minute 5 to both
Participant E rejects Confirm (no longer has reservation)
Participant F accepts Confirm

Outcome: Inconsistent state. Participant E has cancelled, Participant F has confirmed. Without reconciliation, the system has a phantom confirmed reservation.

Mitigation: Use logical time or a centralized time source for TTL. Build reconciliation logic that detects and resolves inconsistent states across participants.

Scenario 4: Cancel Storm Under High Contention

Trigger: System experiences high contention on a shared resource during peak load.

What happens:

High traffic causes many transactions to timeout on Try simultaneously
Coordinator sends Cancel to all affected participants
Cancel storm overwhelms participant capacity
Some Cancels time out, causing the coordinator to retry
Retries add more load, worsening the situation

Outcome: System becomes unstable. Participants may need to shed load or the coordinator may need to slow down transaction initiation.

Mitigation: Implement circuit breakers on participants. Use bulkhead isolation between TCC participants. Rate-limit Try requests per client. Consider queueing Try requests instead of immediate rejection.

Scenario 5: Double-Spend via Race Condition

Trigger: Two different coordinators attempt to reserve the same inventory simultaneously.

What happens:

Coordinator X sends Try(reserve 5 units) to Inventory Service
Coordinator Y sends Try(reserve 5 units) to Inventory Service (same inventory)
Both see availability because neither reservation is final yet
Both receive TryConfirmed
Both send Confirm (or one confirms and one cancels then retries)
Without careful concurrency control, double-spend occurs

Outcome: More inventory is confirmed than actually available. Overselling occurs.

Mitigation: Use row-level locking or optimistic concurrency control in the Try phase. The tentative reservation should check available inventory at Confirm time, not just Try time. Some implementations use per-resource locking that short-circuits concurrent Try attempts on the same resource.

Trade-off Analysis

TCC vs 2PC:

| Dimension | TCC | 2PC | | ----------------------------- | -------------------------------------- | ---------------------------------------- | ------------------------------------------------- | | Blocking | No — participants not blocked | Yes — participants block during commit | | Resource Locking | Reservations only, no locks | Locks held during prepare and commit | | Atomicity | Eventual (via retry) | True atomic commit | | Isolation | None (others see pending reservations) | Full serializable isolation | | Latency | Two round trips minimum | Two round trips minimum | | Heterogeneous Support | Yes — each service implements T/C/C | No — requires shared transaction manager | | Implementation Complexity | Medium-High (reservation design) | Medium (DB-supported) | Winner for heterogeneous microservices | | Failure Handling | Retry with Confirm/Cancel | Blocking on coordinator crash | Winner for availability under partial failure |

TCC vs Saga

| Dimension | TCC | Saga | | --------------------------- | -------------------------------- | ---------------------------------------- | ------------------------------ | | Failure Handling | Explicit Confirm/Cancel symmetry | Reverse compensations (undo each step) | | Best For | Reservation semantics | Transformation semantics | | Idempotency Requirement | Critical (all phases) | Lower (compensations must be idempotent) | | Complexity Location | Reservation system design | Compensation chain logic | | Natural Fit | Hotel booking, inventory holds | Money transfers, order-to-invoice | Depends on domain | | Implementation Effort | Medium-High | Low-Medium | Winner for simpler domains |

Best-Effort vs Exactly-Once TCC

| Dimension | Best-Effort TCC | Exactly-Once TCC | | ----------------------------- | --------------------------------------------- | ------------------------------------- | ----------------------------------------------- | | Delivery Guarantee | Retry with limit, then flag for manual review | Retry indefinitely until acknowledged | | Dangling Reservations | Possible (manual cleanup needed) | Eliminated | | Implementation Complexity | Lower | Higher (persistent logging required) | | Operational Overhead | Higher (manual intervention) | Lower (self-healing) | Winner for reliability-critical systems | | Storage Overhead | Minimal | Logs must be maintained durably | Winner for storage-constrained environments |

When to Choose TCC Over Alternatives

Scenario	Recommendation
Operations are reservations (inventory, booking, holds)	TCC
Operations are transformations (money transfer, order processing)	Saga
True atomicity with full isolation required	2PC
Heterogeneous systems (different stacks, no shared TM)	TCC
High contention with concurrent access to same resources	TCC (with careful concurrency control) or Saga
Simple domain with low-stakes operations	Saga
Strict no-blocking requirement with strong eventual consistency	TCC
Enterprise Java ecosystem with existing transaction infrastructure	2PC (if available) or Narayana TCC

Quick Recap

Before finalizing your TCC implementation, verify these critical items:

Observability Checklist

TCC transactions span multiple services and involve multiple round trips. Without observability, you cannot tell whether a failed transaction left dangling tentative reservations eating into your available inventory. This checklist covers the minimum observability surface you need before going to production.

Transaction-Level Metrics:

Overall TCC transaction completion rate (success vs try-fail vs confirm-fail vs cancel)
Transaction count per phase outcome — helps you spot systemic try-phase failures early
Average and p95/99 phase durations — long confirm phases indicate participant-side contention
Transaction timeout rate — too many timeouts means participants are slow or network is unhealthy

Participant-Level Metrics:

Per-participant try phase success rate — identify which service is the weak link
Per-participant confirm/cancel retry count — high retry counts on a specific participant signal trouble
Dangling TENTATIVE reservation count per participant — any upward trend here is a red flag
Participant timeout rate — separates slow participants from network issues

Coordinator Metrics:

Coordinator crash recovery count — how often does the coordinator restart mid-transaction
Retry queue depth — confirm retries waiting to be delivered
Manual intervention flag count — transactions that exceeded retry limits

Alerting Thresholds (tune for your traffic pattern):

Alert when dangling TENTATIVE reservations grow by more than 5% in 15 minutes
Alert when confirm retry count exceeds 3 for any single transaction
Alert when cancel phase runs for more than 10% of all transactions in a rolling window
Alert when any participant’s try-phase success rate drops below 95%

A reservation that sits in TENTATIVE past the configured TTL means the coordinator failed to complete the transaction. Left alone, these pile up silently and shrink your effective inventory.

Observability & Monitoring

Metrics

TCC transaction completion rate (success vs try-fail vs confirm-fail vs cancel)
Try phase duration and success rate
Confirm phase duration and retry count
Cancel phase duration and how often it runs
Average number of participants per transaction
Timeout rate per phase (try timeout, confirm timeout)
Dangling tentative reservation count (reservations stuck in TENTATIVE state)

Logs

Log Try phase start with transaction ID, participant ID, and reservation data
Log Try phase outcome (confirmed, failed, timeout)
Log Confirm/Cancel phase starts and outcomes
Log retry attempts with attempt number and delay
Include idempotency key in all phase logs for correlation
Log participant state transitions (TENTATIVE → CONFIRMED, TENTATIVE → CANCELLED)

Alerts

Alert when dangling TENTATIVE reservations accumulate (cleanup is failing)
Alert when confirm retry count exceeds threshold
Alert when cancel phase runs frequently (indicates try phase instability)
Alert when participant times out repeatedly on try phase
Alert when transaction takes longer than expected threshold

Tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

class TccTransaction:
    def execute(self):
        with tracer.start_as_current_span("tcc.transaction") as span:
            span.set_attribute("tcc.transaction_id", self.txn_id)
            span.set_attribute("tcc.participant_count", len(self.participants))

            # Try phase
            try_results = []
            with tracer.start_as_current_span("tcc.try_phase") as try_span:
                for participant in self.participants:
                    with tracer.start_as_current_span(f"tcc.try.{participant.name}") as p_span:
                        p_span.set_attribute("participant.name", participant.name)
                        result = participant.try_(self.request)
                        try_results.append(result)
                        p_span.set_attribute("tcc.try_result", result)

            if all(r.success for r in try_results):
                # Confirm phase
                with tracer.start_as_current_span("tcc.confirm_phase") as confirm_span:
                    for participant in self.participants:
                        with tracer.start_as_current_span(f"tcc.confirm.{participant.name}") as p_span:
                            result = participant.confirm()
                            p_span.set_attribute("tcc.confirm_result", result)
            else:
                # Cancel phase
                with tracer.start_as_current_span("tcc.cancel_phase") as cancel_span:
                    for participant in self.participants:
                        with tracer.start_as_current_span(f"tcc.cancel.{participant.name}") as p_span:
                            result = participant.cancel()
                            p_span.set_attribute("tcc.cancel_result", result)

Security & Resilience

Security Checklist

TCC coordination involves multiple services making state changes. Security misconfigurations can lead to unauthorized reservations or data leakage.

Authenticate the coordinator-to-participant RPC calls (mutual TLS or JWT tokens)
Authorize participants — coordinator should only call Confirm/Cancel on registered participants
Validate reservation data in Try phase — do not trust coordinator-supplied quantities or IDs without validation
Audit log all state transitions on tentative reservations (created, confirmed, cancelled)
Encrypt coordinator-to-participant communication in transit
Do not expose internal transaction IDs in error responses (use correlation IDs instead)
Rate-limit Try requests per participant to prevent reservation exhaustion attacks
Set TTL on tentative reservations so abandoned transactions auto-expire

Reservation Exhaustion Attack

A subtle TCC security concern: an attacker triggers many Try operations that succeed but never Confirm or Cancel. If tentative reservations hold inventory, the attacker can exhaust available inventory without paying.

Mitigations:

TTL on tentative reservations: Auto-cancel after timeout
Per-entity locking: Lock the reservation entity itself, not just the inventory
Rate limiting Try: Limit how many Try requests a single client can make
Verification on Confirm: Check the original request is still valid before confirming

Interview Questions

1. What are the three phases of the TCC pattern and what does each phase do?

Expected answer points:

Try: Reserves resources tentatively without committing. Participant records the intent and reserves the required capacity.
Confirm: Commits the tentative reservation, making it permanent. Only called when Try succeeded for all participants.
Cancel: Releases the tentative reservation, undoing the effects of Try. Called when any Try fails or the transaction times out.

2. How does TCC differ from Two-Phase Commit (2PC) in terms of locking behavior?

Expected answer points:

2PC locks resources during the Prepare phase and holds those locks until Commit or Rollback, blocking other transactions.
TCC uses reservations, not locks. Try reserves capacity but does not block other operations from seeing or using the same resources.
Under contention, 2PC makes competing transactions wait or fail, while TCC shows the second transaction a pending reservation and lets it decide how to respond.

3. Why is idempotency critical in TCC implementations?

Expected answer points:

The coordinator retries Confirm and Cancel calls until it receives a response, so duplicate calls are inevitable.
If a participant is not idempotent, a duplicate Confirm might re-confirm an already-confirmed reservation, causing incorrect state.
Idempotency is typically implemented using deterministic idempotency keys based on transaction ID, participant ID, and phase.
Participants must also handle the Confirm-before-Cancel problem: once CONFIRMED, Cancel should be a no-op.

4. What happens when Try succeeds but Confirm fails or times out?

Expected answer points:

The participant has a tentative reservation waiting to be confirmed or cancelled.
The coordinator retries Confirm with exponential backoff until the participant responds or max retries are exceeded.
If max retries are exceeded, the transaction is flagged for manual intervention.
If the coordinator crashes entirely, a recovery process must query participants about their pending states to resolve dangling reservations.

5. How does TCC handle heterogeneous systems compared to 2PC?

Expected answer points:

2PC assumes participants share a common transaction manager (TM) and often a shared database.
TCC works across heterogeneous systems because each service implements its own Try/Confirm/Cancel logic independently.
Payment service, inventory service, shipping service can all run different stacks but still participate in the same TCC transaction.
No shared TM or distributed lock manager is required.

6. What is the "Confirm before Cancel" problem and how do you solve it?

Expected answer points:

Scenario: Confirm runs twice (first times out, second succeeds), then Cancel is retried. The cancellation would incorrectly release a confirmed reservation.
Solution: Track state transitions explicitly. Confirm transitions from TENTATIVE to CONFIRMED. Cancel transitions from TENTATIVE to CANCELLED.
Once in CONFIRMED state, Cancel should be a no-op that returns success, not an error or a re-cancellation.
Similarly, once in CANCELLED state, subsequent Cancel calls should be idempotent no-ops.

7. When is TCC a better choice than Basic Saga?

Expected answer points:

TCC fits naturally when operations can be modeled as reservations: hotel booking, inventory allocation, credit holds, seat reservations.
These have a clear notion of "tentatively take this" and "make it official or release it."
Saga fits better when operations are transformations rather than reservations: moving money from account A to B, transforming an order into an invoice.
TCC provides more structure than plain saga with explicit Confirm/Cancel symmetry.

8. What is a reservation exhaustion attack in TCC and how do you mitigate it?

Expected answer points:

An attacker triggers many Try operations that succeed but never Confirm or Cancel.
If tentative reservations hold inventory, the attacker can exhaust available inventory without paying.
Mitigations: TTL on tentative reservations (auto-cancel after timeout), per-entity locking, rate limiting Try requests per client, and verification on Confirm that the original request is still valid.

9. What is the difference between transient and permanent failure in TCC Try handling?

Expected answer points:

Transient failure: Return a retryable error. The coordinator retries the Try call. Example: TemporaryDatabaseError.
Permanent failure: Return TryFailed with a reason indicating "do not retry, cancel everything." Example: InsufficientInventory, CapacityExceeded.
Mixed case handling: CapacityExceeded might be permanent (no amount of retry will fix it), while a temporary lock timeout might be transient.

10. Name three TCC frameworks and explain their primary use cases.

Expected answer points:

Apache TCM: Reference implementation for J2EE-style TCC. Best for Java/J2EE shops already invested in that ecosystem with declarative transaction boundaries.
Narayana (JBossTS): Open-source transaction manager supporting LRC optimization, 2PC, and TCC. Works well with Spring via Narayana's Spring integration. Good for JBoss/Spring ecosystem.
ByteTCC: TCC implementation for Spring applications using annotations. Lightweight and Spring-native. Good for lightweight Spring Boot microservices.

11. How do you generate a deterministic idempotency key in TCC?

Expected answer points:

Use a hash function (e.g., SHA-256) over a string combining transaction_id, participant_id, and phase.
Example: raw = f"{transaction_id}:{participant_id}:{phase}", then hash and take first N characters.
The same operation always produces the same key, allowing duplicate detection.
Store the idempotency key with the result to detect and skip duplicate Confirm/Cancel calls.

12. What metrics should you monitor for TCC transaction health?

Expected answer points:

TCC transaction completion rate (success vs try-fail vs confirm-fail vs cancel).
Try/Confirm/Cancel phase duration and success rate per phase.
Confirm phase retry count and cancel phase frequency.
Average number of participants per transaction.
Timeout rate per phase and dangling TENTATIVE reservation count (reservations stuck in TENTATIVE state).

13. How does TCC provide eventual atomicity rather than true atomicity like 2PC?

Expected answer points:

2PC provides true atomic commit: all participants commit or all roll back in the same instant.
TCC provides eventual atomicity: if Try succeeds for all, Confirm will eventually make all permanent, but there is a window where participants may have inconsistent states.
TCC lacks the isolation property that 2PC provides. Other transactions can see pending reservations.
The coordinator may take time to retry Confirm or Cancel, during which the state is not fully resolved.

14. What is the role of a coordinator in TCC and what happens if it crashes?

Expected answer points:

The coordinator orchestrates the three-phase flow: sends Try to all participants, then Confirm or Cancel based on results.
It manages timeouts, retries, and recovery logic for dangling transactions.
If the coordinator crashes after Try succeeds but before Confirm/Cancel is sent, participants may have pending tentative reservations.
A recovery process must query participants about their pending states and either confirm or cancel based on what it finds.

15. Why does TCC increase latency compared to single-service operations?

Expected answer points:

Every TCC transaction needs at least two round trips to each participant: Try, then Confirm (or Cancel).
In 2PC, participants also block with two phases, but TCC adds the overhead of managing tentative state and retry logic.
Coordination overhead grows with the number of participants, all of whom must respond before the transaction completes.

16. What are the state transitions for a single TCC participant?

Expected answer points:

Idle (initial state) -> Tentative (when Try succeeds).
Tentative -> Confirmed (when Confirm is received and processed successfully).
Tentative -> Cancelled (when Cancel is received and processed successfully).
Tentative -> Tentative (if Try times out while waiting for Cancel to arrive).
Confirmed and Cancelled are terminal states.

17. What logging information is essential for debugging TCC transactions?

Expected answer points:

Log Try phase start with transaction ID, participant ID, and reservation data.
Log Try phase outcome (confirmed, failed, timeout) and Confirm/Cancel phase starts and outcomes.
Log retry attempts with attempt number and delay, including the idempotency key for correlation.
Log participant state transitions (TENTATIVE -> CONFIRMED, TENTATIVE -> CANCELLED).

18. How does the TCC pattern relate to the Saga pattern?

Expected answer points:

Both TCC and Saga avoid blocking, unlike 2PC.
Saga uses compensations that undo what each step did. If Step 3 fails, run compensation for Step 2, then Step 1.
TCC uses explicit Confirm and Cancel operations that are symmetrical and domain-independent.
The complexity in Saga is writing reverse logic for each step; in TCC, complexity shifts to implementing the reservation system.

19. What is the "best-effort" vs "exactly-once" semantics distinction in TCC?

Expected answer points:

Best-effort TCC: The coordinator retries Confirm/Cancel with backoff until a limit, then gives up and flags for manual intervention.
Exactly-once TCC: Uses guaranteed delivery with persistent logging of intended actions. Retries indefinitely until participant acknowledges.
Best-effort is simpler but may leave dangling reservations; exactly-once is more complex but ensures eventual resolution.

20. What security considerations are specific to TCC implementations?

Expected answer points:

Authenticate coordinator-to-participant RPC calls (mutual TLS or JWT tokens).
Authorize participants so the coordinator only calls Confirm/Cancel on registered participants.
Validate reservation data in Try phase; do not trust coordinator-supplied quantities or IDs without validation.
Audit log all state transitions on tentative reservations, encrypt coordinator-to-participant communication, use correlation IDs instead of internal transaction IDs in errors, rate-limit Try requests, and set TTL on tentative reservations.

Conclusion

TCC gives you a structured way to coordinate distributed transactions without blocking. The three-phase model makes the contract explicit: reserve, commit, release. When your domain fits the reservation pattern, you get clean separation of concerns and better scalability than 2PC.

The trade-offs are real. Idempotent operations, timeout handling, recovery logic for dangling reservations. For high-contention scenarios with natural reservation semantics, TCC is worth the implementation effort. For simpler saga flows or operations that do not fit reservation semantics, basic saga or choreography may be the better choice.

See also Event-Driven Architecture for patterns that complement TCC in microservices ecosystems.

Introduction

Topic-Specific Deep Dives

Conceptual Foundations

TCC vs Two-Phase Commit

TCC vs Basic Saga

TCC compared to 2PC and saga

Implementing TCC in Practice

A Flight Booking Example

Handling Failures and Timeouts

Complete TCC Flow Diagram

TCC Frameworks

Apache TCM (Transaction Coordinator Manager)

Narayana (JBossTS)

ByteTCC

Spring TCC (Spring-Cloud-tencent)

Framework Comparison

Confirm/Cancel Idempotency Implementation

The Idempotency Problem

Idempotency Key Design

Confirm Before Cancel Problem

Timeout vs Permanent Failure

Advantages of TCC

Common Pitfalls / Anti-Patterns

Use Cases & Decision Criteria

When TCC Is the Right Fit

When TCC Gets Awkward

TCC vs 2PC vs Saga Decision Matrix

Production Failure Scenarios

Scenario 2: Participant Crash and Recovery

Scenario 3: Clock Skew Across Participants

Scenario 4: Cancel Storm Under High Contention

Scenario 5: Double-Spend via Race Condition

Trade-off Analysis

TCC vs Saga

Best-Effort vs Exactly-Once TCC

When to Choose TCC Over Alternatives

Quick Recap

Observability Checklist

Observability & Monitoring

Metrics

Logs

Alerts

Tracing

Security & Resilience

Security Checklist

Reservation Exhaustion Attack

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

The Outbox Pattern: Reliable Event Publishing in Distributed Systems

Distributed Transactions: ACID vs BASE Trade-offs

CQRS Pattern