TCC: Try-Confirm-Cancel Pattern for Distributed Transactions

Learn the Try-Confirm-Cancel pattern for distributed transactions. Explore how TCC differs from 2PC and saga, with implementation examples and real-world use cases.

published: reading time: 16 min read

TCC: Try-Confirm-Cancel Pattern for Distributed Transactions

Two-phase commit blocks. Basic saga compensation gets messy when steps have complex dependencies. TCC (Try-Confirm-Cancel) offers a middle ground: more structure than plain saga, less blocking than 2PC.

The pattern splits every operation into three phases. Try reserves. Confirm commits. Cancel releases. Each service implements these three operations, and a coordinator orchestrates the flow.

Let me walk through how TCC actually works, where it fits, and the trade-offs that matter.

How TCC Works

TCC works by splitting every operation into three phases. Try reserves what you need. Confirm makes it permanent. Cancel releases what you reserved. Each service implements these three operations, and a coordinator orchestrates the flow.

sequence
    Coordinator->>ServiceA: Try(Reserve 5 units)
    ServiceA->>Coordinator: TryConfirmed
    Coordinator->>ServiceB: Try(Reserve $100)
    ServiceB->>Coordinator: TryConfirmed
    Coordinator->>ServiceA: Confirm
    ServiceA->>Coordinator: Confirmed
    Coordinator->>ServiceB: Confirm
    ServiceB->>Coordinator: Confirmed

When Try fails for any participant:

sequence
    Coordinator->>ServiceA: Try(Reserve 5 units)
    ServiceA->>Coordinator: TryConfirmed
    Coordinator->>ServiceB: Try(Reserve $100)
    ServiceB->>Coordinator: TryFailed(Review carefully)
    Coordinator->>ServiceA: Cancel
    ServiceA->>Coordinator: Cancelled

Idempotency matters at every phase. Services must handle duplicate Try calls gracefully. Confirm and Cancel also need to be idempotent, since the coordinator may retry if calls fail or get lost.

TCC vs Two-Phase Commit

TCC and 2PC both have three phases, but they work differently. In 2PC, participants lock resources during Prepare and hold those locks until Commit or Rollback. This blocking is the price of atomicity. In TCC, Try reserves but does not lock. Other operations can proceed against the same data, aware of pending reservations but not blocked by them.

The difference shows up under contention. Two competing transactions trying to reserve the same inventory: 2PC locks the rows and makes one wait or fail. TCC shows the second transaction a pending reservation and lets it decide what to do, whether that means queuing or picking an alternative.

2PC also assumes participants share a transaction manager. TCC works across heterogeneous systems because each service implements its own Try/Confirm/Cancel logic. Payment service, inventory service, shipping service, all different stacks, all coordinatable.

For a deeper look at 2PC and its limitations, see Two-Phase Commit.

TCC vs Basic Saga

Saga and TCC both avoid blocking, but failure handling differs. In a basic saga, each step has a compensation that undoes what the step did. If Step 3 fails, run compensation for Step 2, then Step 1. The compensation logic must know how to reverse each step’s effects.

TCC takes a different angle. Confirm and Cancel are explicit and symmetrical. Confirm commits the tentative reservation. Cancel releases it. The complexity shifts from writing reverse logic to implementing a reservation system that tracks pending operations.

TCC fits naturally when you can model operations as reservations. Hotel booking, inventory allocation, credit holds, seat reservations. These have a clear notion of “tentatively take this” and “make it official or release it.”

Saga fits better when operations are transformations rather than reservations. Moving money from account A to account B, transforming an order into an invoice. These lack natural reservation semantics and saga works fine there.

For more on saga, see Saga Pattern.

TCC vs 2PC vs Saga Comparison

Aspect2PCSagaTCC
BlockingYes - participants block during commitNo - no blockingNo - no blocking
LockingLocks resources during prepareNo locksReservations, not locks
AtomicityTrue atomic commitEventual atomicityEventual atomicity
IsolationFull serializable isolationNo isolationNo isolation
CoordinationCentralized coordinatorCentralized or choreographedCentralized coordinator
Heterogeneous SystemsRequires shared TMYesYes
Compensation ModelAutomatic rollbackExplicit compensationsExplicit Confirm/Cancel
Failure HandlingBlocking on coordinator crashCompensations in reverseConfirm/Cancel with retries
LatencyTwo round trips minimumPer-step latencyTwo round trips minimum
Use Case FitStrong consistency requiredTransformation operationsReservation operations
Recovery ComplexityHigh (blocking states)Medium (compensation chain)Medium (tentative state cleanup)
Implementation ComplexityMedium (DB-supported)Low-MediumMedium-High (reservation design)

Implementing TCC in Practice

TCC requires a coordinator and participant implementations. Many frameworks handle the coordinator role. You focus on implementing Try, Confirm, and Cancel methods on your services.

A Flight Booking Example

Consider a flight booking system that coordinates an airline reservation, a hotel booking, and a car rental. All three must succeed or all three must be cancelled.

class FlightBooking:
    def try_reserve(self, flight_id, passenger_id, seats):
        # Check availability and tentatively hold seats
        reservation = Reservation(
            flight_id=flight_id,
            passenger_id=passenger_id,
            seats=seats,
            status="TENTATIVE"
        )
        self.reservations.save(reservation)
        return "TryConfirmed"

    def confirm(self, flight_id, passenger_id):
        # Make the tentative reservation permanent
        reservation = self.reservations.find(flight_id, passenger_id)
        reservation.status = "CONFIRMED"
        self.reservations.save(reservation)
        return "Confirmed"

    def cancel(self, flight_id, passenger_id):
        # Release the tentative hold
        reservation = self.reservations.find(flight_id, passenger_id)
        reservation.status = "CANCELLED"
        self.reservations.save(reservation)
        return "Cancelled"

The coordinator orchestrates the three-phase flow:

class BookingCoordinator:
    def __init__(self, flight, hotel, car):
        self.flight = flight
        self.hotel = hotel
        self.car = car

    def book_trip(self, flight_id, hotel_id, car_id, passenger):
        # Try phase
        results = []
        results.append(self.flight.try_reserve(flight_id, passenger, 1))
        results.append(self.hotel.try_reserve(hotel_id, passenger, 1))
        results.append(self.car.try_reserve(car_id, passenger, 1))

        if all(r == "TryConfirmed" for r in results):
            # Confirm phase
            self.flight.confirm(flight_id, passenger)
            self.hotel.confirm(hotel_id, passenger)
            self.car.confirm(car_id, passenger)
        else:
            # Cancel phase
            self.flight.cancel(flight_id, passenger)
            self.hotel.cancel(hotel_id, passenger)
            self.car.cancel(car_id, passenger)

This example omits error handling, timeouts, and duplicate detection. A production implementation needs retry logic, idempotency keys, and timeout handlers for when participants fail to respond.

Handling Failures and Timeouts

TCC assumes participants will eventually respond to Try, Confirm, or Cancel calls. When a participant becomes unresponsive, the coordinator must decide what to do. This is where TCC implementations diverge.

Some frameworks use guaranteed delivery. They store the intended action in a log and retry until the participant acknowledges. Others use a maximum retry count and then flag the transaction as requiring manual intervention.

The tricky case is when Try succeeded but Confirm failed. The participant reserved the resource but never received the confirmation. From the participant’s perspective, it has a tentative reservation waiting to be confirmed or cancelled. The coordinator’s retry of Confirm should eventually clear this state. But if the coordinator crashed entirely, you need a recovery process that queries participants about their pending states.

graph TD
    A[Coordinator calls Confirm] --> B{Participant reachable?}
    B -->|Yes| C[Confirm succeeds]
    B -->|No| D[Store in retry queue]
    D --> E[Retry with backoff]
    E --> F{Participant responds?}
    F -->|Yes| C
    F -->|No| G[Max retries exceeded]
    G --> H[Flag for manual review]

Complete TCC Flow Diagram

flowchart TD
    Start[TCC Transaction] --> TryPhase[Coordinator sends<br/>Try to all participants]
    TryPhase --> TryResults{All Try succeed?}
    TryResults -->|No| CancelPhase[Coordinator sends<br/>Cancel to all participants]
    CancelPhase --> CancelDone[Resources released<br/>Transaction aborted]
    TryResults -->|Yes| ConfirmPhase[Coordinator sends<br/>Confirm to all participants]
    ConfirmPhase --> ConfirmResults{All Confirm succeed?}
    ConfirmResults -->|No| ConfirmRetry[Retry with backoff]
    ConfirmRetry --> ConfirmResults
    ConfirmResults -->|Yes| Success[Transaction committed<br/>All reservations finalized]
    TryPhase --> Timeout{Participant times out?}
    Timeout -->|Yes| CancelPhase
    Timeout -->|No| TryResults

Three Main Scenarios:

ScenarioTriggerCoordinator ActionOutcome
Try -> Confirm successAll participants respond TryConfirmedSend Confirm to allAll reservations become permanent
Try -> CancelAny participant responds TryFailedSend Cancel to allAll tentative reservations released
Try timeout -> CancelParticipant times out on TrySend Cancel to allAll tentative reservations released

State Transitions for a Single Participant:

stateDiagram-v2
    [*] --> Idle: Transaction starts
    Idle --> Tentative: Try succeeds
    Tentative --> Confirmed: Confirm received
    Tentative --> Cancelled: Cancel received
    Tentative --> Tentative: Try timeout, waiting for Cancel
    Confirmed --> [*]
    Cancelled --> [*]

TCC Frameworks

Building TCC from scratch means managing coordinator state, retry logic, timeout handling, and recovery — all nontrivial. Several frameworks handle the heavy lifting.

Apache TCM (Transaction Coordinator Manager)

Apache TCM is the reference implementation for J2EE-style TCC. It integrates with application servers and provides declarative transaction boundaries. Best for Java/J2EE shops already invested in that ecosystem.

Narayana (JBossTS)

Narayana is an open-source transaction manager supporting LRC (Last Resource Commit) optimization, 2PC, and TCC. It provides both programmatic and declarative (annotation-based) approaches. Works well with Spring via Narayana’s Spring integration.

@Compensable(compensationMethod = "cancelReservation")
public void tryReserveSeats(ReservationRequest request) {
    // Tentatively reserve seats
    reservationService.createTentativeReservation(request);
}

public void cancelReservation(ReservationRequest request) {
    // Release the tentative hold
    reservationService.cancelReservation(request.getReservationId());
}

public void confirmReservation(ReservationRequest request) {
    // Finalize the reservation
    reservationService.confirmReservation(request.getReservationId());
}

ByteTCC

ByteTCC is a TCC implementation for Spring applications. It uses annotations to define Try/Confirm/Cancel methods and handles coordinator logic transparently. Lightweight and Spring-native, good for microservices running in Spring Boot.

@Compensable(confirmMethod = "confirm", cancelMethod = "cancel")
public boolean tryReserveInventory(InventoryRequest request) {
    // Try logic: check availability, create tentative hold
    return inventoryService.tentativeHold(request.getItemId(), request.getQuantity());
}

public void confirm(InventoryRequest request) {
    inventoryService.confirmHold(request.getItemId(), request.getQuantity());
}

public void cancel(InventoryRequest request) {
    inventoryService.releaseHold(request.getItemId(), request.getQuantity());
}

Spring TCC (Spring-Cloud-tencent)

Spring TCC is part of the Tencent Spring Cloud stack. Integrates with Service Comb and provides distributed TCC transaction support for Spring Cloud microservices.

Framework Comparison

FrameworkLanguageCoordinatorSpring IntegrationRecovery SupportBest For
Apache TCMJavaEmbeddedYes (J2EE)YesEnterprise Java apps
NarayanaJava/CBothYesYesJBoss/Spring ecosystem
ByteTCCJavaExternalYes (Spring Boot)YesLightweight Spring microservices
Spring TCCJavaExternalYesYesTencent/Spring Cloud stack

For most new projects, ByteTCC or Narayana are the practical choices. ByteTCC is simpler and more Spring Boot-friendly. Narayana has more enterprise features and longer track record.

Confirm/Cancel Idempotency Implementation

Idempotency is not optional in TCC — it is load-bearing. The coordinator retries Confirm and Cancel calls until it gets a response. Your participant must handle duplicates gracefully.

The Idempotency Problem

Consider this scenario:

  1. Coordinator calls confirm(reservation_id="abc")
  2. Participant confirms successfully but the network drops before the response arrives
  3. Coordinator retries confirm(reservation_id="abc")
  4. If your confirm handler is not idempotent, you might re-confirm an already-confirmed reservation

Idempotency Key Design

Use a dedicated idempotency key for each Try/Confirm/Cancel call. The key should be deterministic — the same operation always gets the same key.

import hashlib

def make_idempotency_key(transaction_id, participant_id, phase):
    """Generate a deterministic idempotency key.

    Same transaction + participant + phase always produces same key.
    """
    raw = f"{transaction_id}:{participant_id}:{phase}"
    return hashlib.sha256(raw.encode()).hexdigest()[:16]

class TccParticipant:
    def confirm(self, transaction_id, participant_id, reservation_data):
        key = make_idempotency_key(transaction_id, participant_id, "confirm")

        # Idempotency check
        existing = self.confirm_log.find_by_idempotency_key(key)
        if existing:
            # Already confirmed — return success without re-confirming
            return ConfirmResult(
                success=True,
                already_confirmed=True,
                confirmed_at=existing.confirmed_at
            )

        # Actual confirmation logic
        reservation = self.reservations.find(reservation_data.id)
        reservation.status = "CONFIRMED"
        self.reservations.save(reservation)

        # Record this confirmation for future idempotency
        self.confirm_log.save(IdempotencyRecord(
            key=key,
            transaction_id=transaction_id,
            confirmed_at=datetime.utcnow()
        ))

        return ConfirmResult(success=True, already_confirmed=False)

    def cancel(self, transaction_id, participant_id, reservation_data):
        key = make_idempotency_key(transaction_id, participant_id, "cancel")

        existing = self.cancel_log.find_by_idempotency_key(key)
        if existing:
            return CancelResult(
                success=True,
                already_cancelled=True,
                cancelled_at=existing.cancelled_at
            )

        reservation = self.reservations.find(reservation_data.id)
        reservation.status = "CANCELLED"
        self.reservations.save(reservation)

        self.cancel_log.save(IdempotencyRecord(
            key=key,
            transaction_id=transaction_id,
            cancelled_at=datetime.utcnow()
        ))

        return CancelResult(success=True, already_cancelled=False)

Confirm Before Cancel Problem

A subtler idempotency problem: what if Confirm runs twice (first times out, second succeeds), and then Cancel is retried? The cancellation would incorrectly release a confirmed reservation.

Track state transitions explicitly. Confirm transitions from TENTATIVE to CONFIRMED. Cancel transitions from TENTATIVE to CANCELLED. Once in CONFIRMED, Cancel should be a no-op, not a failure.

def cancel(self, transaction_id, participant_id, reservation_data):
    reservation = self.reservations.find(reservation_data.id)

    if reservation.status == "CONFIRMED":
        # Already confirmed — Cancel is correctly a no-op
        return CancelResult(success=True, reason="already_confirmed")

    if reservation.status == "CANCELLED":
        # Already cancelled — still a no-op
        return CancelResult(success=True, reason="already_cancelled")

    # Actual cancellation from TENTATIVE state
    reservation.status = "CANCELLED"
    self.reservations.save(reservation)
    return CancelResult(success=True)

Timeout vs Permanent Failure

TCC distinguishes between transient failures (retry might succeed) and permanent failures (never going to succeed). In your Try handler:

  • Transient failure: Return a retryable error, coordinator retries
  • Permanent failure: Return TryFailed with a reason that means “do not retry, cancel everything”
def try_reserve(self, inventory_id, quantity):
    try:
        # Try logic
        reserved = self.inventory.tentative_hold(inventory_id, quantity)
        return TryResult(success=True, reservation_id=reserved.id)
    except InsufficientInventory:
        # Permanent failure — not retrying will help
        return TryResult(success=False, reason="INSUFFICIENT_INVENTORY")
    except TemporaryDatabaseError:
        # Transient failure — worth retrying
        raise TryRetryableError("Database temporarily unavailable")
    except CapacityExceeded:
        # Permanent failure — no amount of retry will fix this
        return TryResult(success=False, reason="CAPACITY_EXCEEDED")

Advantages of TCC

The main advantage is that resources do not lock during the transaction. Other operations can read or modify the same data, aware of pending reservations but not blocked by them. This makes TCC more scalable than 2PC, particularly under high contention.

The three-phase structure is explicit. Every participant agrees to the contract: if you can reserve in Try, you guarantee you can confirm or cancel later.

TCC also works across heterogeneous systems. No shared transaction manager required. Each service implements its own semantics for Try, Confirm, and Cancel.

Limitations and Challenges

TCC is not a silver bullet.

The biggest challenge is designing Try/Confirm/Cancel for your specific domain. Not all operations map naturally to reservation semantics. Forcing a square peg into a round hole produces brittle implementations.

Idempotency trips people up. Confirm and Cancel must handle duplicate calls gracefully. If the coordinator retries a Confirm that actually succeeded, the participant needs to recognize this and return Confirmed, not try to confirm again.

The timeout case requires care. Try succeeds but the coordinator crashes before sending Confirm or Cancel. Resources sit in a tentative state. Without a resolution mechanism, you get resource leaks that pile up silently.

Latency also increases. Every transaction needs at least two round trips to each participant.

When to Use TCC

TCC fits well when your business logic naturally supports reservation semantics. Inventory allocation, booking systems, credit reservations, seat holds. If you can model the operation as “tentatively take X and later either commit or release,” TCC gives you a clean structure.

TCC gets awkward when operations are transformations rather than reservations. If Step 2 depends on the output of Step 1 in a way that does not fit reservation semantics, you end up stuffing intermediate state into Try and carrying it forward to Confirm. This works but loses the elegance.

For an overview of distributed transaction patterns including TCC, see Distributed Transactions. For reliable message delivery in distributed systems, see the Outbox Pattern.

Observability Checklist

TCC transactions span multiple services and involve multiple round trips. Without observability, you cannot tell whether a failed transaction left dangling tentative reservations.

Metrics

  • TCC transaction completion rate (success vs try-fail vs confirm-fail vs cancel)
  • Try phase duration and success rate
  • Confirm phase duration and retry count
  • Cancel phase duration and how often it runs
  • Average number of participants per transaction
  • Timeout rate per phase (try timeout, confirm timeout)
  • Dangling tentative reservation count (reservations stuck in TENTATIVE state)

Logs

  • Log Try phase start with transaction ID, participant ID, and reservation data
  • Log Try phase outcome (confirmed, failed, timeout)
  • Log Confirm/Cancel phase starts and outcomes
  • Log retry attempts with attempt number and delay
  • Include idempotency key in all phase logs for correlation
  • Log participant state transitions (TENTATIVE → CONFIRMED, TENTATIVE → CANCELLED)

Alerts

  • Alert when dangling TENTATIVE reservations accumulate (cleanup is failing)
  • Alert when confirm retry count exceeds threshold
  • Alert when cancel phase runs frequently (indicates try phase instability)
  • Alert when participant times out repeatedly on try phase
  • Alert when transaction takes longer than expected threshold

Tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

class TccTransaction:
    def execute(self):
        with tracer.start_as_current_span("tcc.transaction") as span:
            span.set_attribute("tcc.transaction_id", self.txn_id)
            span.set_attribute("tcc.participant_count", len(self.participants))

            # Try phase
            try_results = []
            with tracer.start_as_current_span("tcc.try_phase") as try_span:
                for participant in self.participants:
                    with tracer.start_as_current_span(f"tcc.try.{participant.name}") as p_span:
                        p_span.set_attribute("participant.name", participant.name)
                        result = participant.try_(self.request)
                        try_results.append(result)
                        p_span.set_attribute("tcc.try_result", result)

            if all(r.success for r in try_results):
                # Confirm phase
                with tracer.start_as_current_span("tcc.confirm_phase") as confirm_span:
                    for participant in self.participants:
                        with tracer.start_as_current_span(f"tcc.confirm.{participant.name}") as p_span:
                            result = participant.confirm()
                            p_span.set_attribute("tcc.confirm_result", result)
            else:
                # Cancel phase
                with tracer.start_as_current_span("tcc.cancel_phase") as cancel_span:
                    for participant in self.participants:
                        with tracer.start_as_current_span(f"tcc.cancel.{participant.name}") as p_span:
                            result = participant.cancel()
                            p_span.set_attribute("tcc.cancel_result", result)

Security Checklist

TCC coordination involves multiple services making state changes. Security misconfigurations can lead to unauthorized reservations or data leakage.

  • Authenticate the coordinator-to-participant RPC calls (mutual TLS or JWT tokens)
  • Authorize participants — coordinator should only call Confirm/Cancel on registered participants
  • Validate reservation data in Try phase — do not trust coordinator-supplied quantities or IDs without validation
  • Audit log all state transitions on tentative reservations (created, confirmed, cancelled)
  • Encrypt coordinator-to-participant communication in transit
  • Do not expose internal transaction IDs in error responses (use correlation IDs instead)
  • Rate-limit Try requests per participant to prevent reservation exhaustion attacks
  • Set TTL on tentative reservations so abandoned transactions auto-expire

Reservation Exhaustion Attack

A subtle TCC security concern: an attacker triggers many Try operations that succeed but never Confirm or Cancel. If tentative reservations hold inventory, the attacker can exhaust available inventory without paying.

Mitigations:

  1. TTL on tentative reservations: Auto-cancel after timeout
  2. Per-entity locking: Lock the reservation entity itself, not just the inventory
  3. Rate limiting Try: Limit how many Try requests a single client can make
  4. Verification on Confirm: Check the original request is still valid before confirming

Conclusion

TCC gives you a structured way to coordinate distributed transactions without blocking. The three-phase model makes the contract explicit: reserve, commit, release. When your domain fits the reservation pattern, you get clean separation of concerns and better scalability than 2PC.

The trade-offs are real. Idempotent operations, timeout handling, recovery logic for dangling reservations. For high-contention scenarios with natural reservation semantics, TCC is worth the implementation effort. For simpler saga flows or operations that do not fit reservation semantics, basic saga or choreography may be the better choice.

See also Event-Driven Architecture for patterns that complement TCC in microservices ecosystems.

Category

Related Posts

The Outbox Pattern: Reliable Event Publishing in Distributed Systems

Learn the transactional outbox pattern for reliable event publishing. Discover how to solve the dual-write problem, implement idempotent consumers, and achieve exactly-once delivery.

#distributed-systems #patterns #event-driven

Distributed Transactions: ACID vs BASE Trade-offs

Explore distributed transaction patterns: ACID vs BASE trade-offs, two-phase commit, saga pattern, eventual consistency, and choosing the right model.

#distributed-systems #transactions #consistency

CQRS Pattern

Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.

#database #cqrs #event-sourcing