Exactly-Once Delivery: The Elusive Guarantee

Explore exactly-once semantics in distributed messaging - why it's hard, how Kafka and SQS approach it, and practical patterns for deduplication.

published: reading time: 21 min read author: GeekWorkBench

Introduction

Distributed systems fail in ways that are hard to predict. A producer sends a message. The broker receives it and acknowledges. The acknowledgment gets lost in transit. The producer sends again. Now the broker has two copies. Or the consumer processes the message, crashes before committing its offset, and on restart receives the same message again.

sequenceDiagram
    Producer->Broker: Send(msg)
    Broker->Producer: ACK
    Note over Producer,Broker: ACK lost in transit
    Producer->Broker: Send(msg)
    Broker->Producer: ACK
    Note over Broker,Consumer: Two copies in broker

Any retry between producer, broker, or consumer can create duplicates. At-least-once is straightforward. Exactly-once requires coordinating all three.

Delivery Semantics

Message systems typically offer one of three guarantees.

At-least-once means messages may be redelivered but never lost. The consumer acknowledges after processing. If processing succeeds but the ack fails, the message comes back. This is the most common setting, but it requires idempotent consumers.

At-most-once means messages may be lost but never duplicated. The consumer acknowledges before processing. If the consumer crashes after ack but before processing, the message is gone. This is rare and usually a poor trade-off.

Exactly-once means each message is processed precisely once. This requires either a transaction spanning the message and its side effects, or idempotent processing with deduplication.

stateDiagram-v2
    [*] --> AtMostOnce: Ack before processing
    AtMostOnce --> AtLeastOnce: Ack after processing
    AtLeastOnce --> ExactlyOnce: Add deduplication/idempotency

    note right of AtMostOnce
        Risk: Message loss
        Use case: Log aggregation
    end note

    note right of AtLeastOnce
        Risk: Duplicates
        Use case: Most applications
    end note

    note right of ExactlyOnce
        Risk: Higher latency
        Use case: Financial transactions
    end note

The delivery guarantee spectrum:

GuaranteeDuplicatesLossComplexityTypical Use
At-most-onceNoYesLowCritical logging
At-least-onceYesNoMediumMost applications
Exactly-onceNoNoHighFinancial transactions

Kafka’s Exactly-Once Semantics

Kafka offers exactly-once semantics within the Kafka ecosystem. It can guarantee that messages produced to a topic are consumed exactly once by consumers within Kafka.

Kafka achieves this through transactions introduced in version 0.11. When you enable transactions, Kafka writes messages to partitions as part of a transaction. Consumer offsets are committed as part of the same transaction, tying message consumption to offset commit.

flowchart LR
    Producer-->|produce| Kafka[Kafka Cluster]
    Kafka-->|consume| Consumer[Consumer]
    Consumer-->|commit offset| Kafka
    Kafka-->|write tx| Partition[Partition]
    Consumer-->|write| DB[(Database)]
    Note over Producer,DB: Atomic if both in same transaction

For exactly-once processing outside Kafka, say writing to a database, you need to handle it yourself. The consumer must write the offset and the database update in a single transaction, or rely on idempotent semantics to deduplicate.

For more on Kafka’s partitioning and consumer groups, see Apache Kafka.

AWS SQS FIFO

SQS offers two queue types. Standard queues provide at-least-once delivery. FIFO queues provide exactly-once processing and preserve order.

FIFO queues use content-based deduplication to discard duplicate messages within a 5-minute window. Send the same message twice within 5 minutes and SQS drops the second copy.

sqs.send_message(
    QueueUrl=queue_url,
    MessageBody=json.dumps({'order_id': '12345', 'action': 'ship'}),
    MessageDeduplicationId='order-12345-ship',
    MessageGroupId='orders'
)

SQS applies a hash to the message body to derive the deduplication ID when you enable content-based deduplication. Identical bodies produce identical deduplication IDs.

SQS FIFO can guarantee exactly-once within the queue itself, but your consumer still needs to be idempotent. If your consumer times out and does not delete the message within the visibility timeout window, SQS will deliver it again.

For more on SQS features, see AWS SQS and SNS.

RabbitMQ and Idempotent Consumers

RabbitMQ does not support exactly-once semantics out of the box. It gives you at-least-once with manual acknowledgments. Your consumer must make itself idempotent.

Idempotency means processing the same message multiple times has the same effect as processing it once. Here is how to actually do it.

Deduplication table: Store processed message IDs in a database. Before processing, check if the ID exists.

INSERT INTO processed_messages (msg_id, processed_at)
VALUES ('msg-123', NOW())
ON CONFLICT (msg_id) DO NOTHING;

-- If insert succeeds, process the message
-- If insert fails (duplicate), skip processing

Natural idempotency: Some operations are naturally idempotent. Setting a field to the same value twice leaves the database in the same state. UPDATE users SET status = 'active' WHERE id = 1 can run twice and nothing changes. Spotting these operations and using them cuts down on overhead.

Message-level transactions: Combine the message acknowledgment with your database transaction. If the database update fails, the message is not acknowledged and comes back.

For more on RabbitMQ’s acknowledgment model, see RabbitMQ.

Practical Patterns

Idempotency Keys

Every message carries a unique ID or you derive one from content. The consumer stores processed IDs. When a message arrives, check if its ID was already processed and skip if so. This approach works with any message broker.

def process_message(msg):
    msg_id = msg.headers.get('x-idempotency-key') or hash(msg.body)

    if redis.exists(f"processed:{msg_id}"):
        return

    do_work(msg)
    redis.setex(f"processed:{msg_id}", ttl=86400, value='1')

Set the TTL longer than your maximum retry window to prevent unbounded growth.

Outbox Pattern

The outbox pattern addresses the dual-write problem. Instead of writing to the database and publishing a message in two separate steps, you write both to the database in a single transaction.

flowchart TD
    Service-->|1 write| DB[(Outbox Table)]
    DB-->|2 read| Relay[Outbox Relay]
    Relay-->|3 publish| Broker[Message Broker]
    Note over Service,Broker: Atomic write ensures consistency

The outbox table lives in the same database as your business data. You write the business record and the outbox event in the same transaction. A separate process polls the outbox and publishes to the broker.

The key guarantee: if the business record exists, the event will eventually be published. The event is never published without the corresponding record.

For more on these patterns, see Event-Driven Architecture and Asynchronous Communication.

Transactional Outbox with CDC

In high-throughput systems, polling the outbox adds latency. Change Data Capture tools like Debezium tail the database transaction log and publish when records change. This is faster but adds operational complexity.

The pattern stays the same: write to the outbox atomically with business data, let CDC handle propagation.

When Exactly-Once Matters

You do not always need exactly-once. For many use cases, at-least-once with idempotent consumers is sufficient and simpler. A log aggregator can handle duplicates. A notification system can usually tolerate a duplicate email if it is not critical.

Exactly-once matters for financial transactions (duplicate charges are serious), inventory operations (double decrements overstate deductions), unique constraints (double user creation may violate uniqueness), and external API calls (retrying a non-idempotent API creates duplicate side effects).

Measure the cost of duplicates in your system before implementing exactly-once. The complexity is real and the operational burden is higher.

Saga Pattern vs Outbox Pattern

The saga pattern and outbox pattern solve related but distinct problems. Both address exactly-once in distributed systems, but they handle different failure scenarios.

The saga pattern coordinates multiple services in a distributed transaction. When step 3 fails, saga executes compensating transactions to undo steps 1 and 2. This handles partial failure without distributed locks.

The outbox pattern handles the dual-write problem within a single service. Instead of writing to the database and publishing a message in two separate steps, you write both in a single database transaction. If the business record exists, the event will eventually be published.

AspectSaga PatternOutbox Pattern
Problem solvedMulti-service distributed transactionsSingle-service dual-write consistency
Failure handlingCompensating transactionsEventual publication
Idempotency neededYes (compensating transactions must be idempotent)Yes (outbox relay may retry)
ComplexityHigh (requires careful compensation design)Medium (requires polling outbox)
Consistency modelEventualStronger eventual (within a single service)
Use whenMultiple services must update in coordinated wayService needs to publish events atomically with database changes

Use the saga pattern when you have multiple independent services each making state changes that must be coordinated. Use the outbox pattern when a single service needs to publish events reliably without a distributed transaction coordinator.

For more on saga, see Saga Pattern. For outbox, see Outbox Pattern.

Idempotent Consumer with Redis Deduplication

Redis works well as a fast deduplication store for idempotent consumers. Store processed message IDs with a TTL longer than your retry window:

import redis
import json
import hashlib

class RedisIdempotentConsumer:
    def __init__(self, redis_client: redis.Redis, ttl_seconds: int = 86400):
        self.redis = redis_client
        self.ttl = ttl_seconds

    def _get_dedup_key(self, message) -> str:
        """Derive a unique key from the message."""
        msg_id = message.get('message_id')
        if msg_id:
            return f"dedup:{msg_id}"

        # Fallback: hash the message content
        content = json.dumps(message, sort_keys=True)
        content_hash = hashlib.sha256(content.encode()).hexdigest()[:16]
        return f"dedup:hash:{content_hash}"

    def is_duplicate(self, message) -> bool:
        """Check if this message was already processed."""
        key = self._get_dedup_key(message)
        return self.redis.exists(key) == 1

    def mark_processed(self, message) -> None:
        """Mark message as processed with TTL."""
        key = self._get_dedup_key(message)
        self.redis.setex(key, self.ttl, '1')

    def process(self, message: dict) -> bool:
        """
        Process message idempotently.
        Returns True if processed, False if duplicate.
        """
        if self.is_duplicate(message):
            return False

        # Process the actual work
        do_work(message)

        # Mark as processed
        self.mark_processed(message)
        return True

# Usage in a consumer loop
consumer = RedisIdempotentConsumer(redis_client, ttl_seconds=86400)

for message in receive_messages():
    if consumer.process(message):
        print(f"Processed message {message['message_id']}")
    else:
        print(f"Skipped duplicate {message['message_id']}")

Set the TTL longer than your maximum retry window. If a message might be retried for up to 1 hour due to broker retry policies, set TTL to 2 hours or more. If you process millions of messages per day, watch Redis memory usage.

Kafka Exactly-Once with External Systems

Kafka transactions guarantee exactly-once within the Kafka ecosystem. When writing to external systems like databases, you need additional handling.

Kafka Transactions with JDBC

The Kafka JDBC sink connector handles exactly-once for database writes using the transactional outbox pattern:

# Kafka Connect JDBC sink configuration for exactly-once
transforms=insertIdempotenceKey
transforms.insertIdempotenceKey.type=org.apache.kafka.connect.transformes.ValueToKey
transforms.insertIdempotenceKey.fields=id

# Enable exactly-once for the sink
s Exactly-once support guarantees that records delivered to external systems are processed exactly once.

For the Kafka Streams word count application, each word count update is idempotent. If a count is recalculated due to reprocessing, the database write produces the same result. Design your stream processing to produce deterministic outputs, and exactly-once guarantees become unnecessary complexity.

## Google Pub/Sub Exactly-Once

Pub/Sub offers exactly-once delivery with a 7-day retry window. It deduplicates messages using publisher-supplied message IDs:

```python
from google.cloud import pubsub_v1

publisher = pubsub_v1.PublisherClient()

# Publish with explicit message ID (recommended)
future = publisher.publish(
    'projects/my-project/topics/my-topic',
    data=b'event data',
    idempotency_key='my-unique-idempotency-key'
)

The idempotency_key parameter enables exactly-once delivery on the publish side. If you publish with the same key multiple times, Pub/Sub delivers only one message.

On the subscriber side, Pub/Sub acknowledges automatically after processing. If your subscriber crashes and restarts, messages are redelivered within the ack deadline window:

subscriber = pubsub_v1.SubscriberClient()
subscription_path = 'projects/my-project/subscriptions/my-sub'

def callback(message):
    try:
        process(message.data)
        message.ack()
    except Exception:
        # Nack and let Pub/Sub redeliver
        message.nack()

subscriber.subscribe(subscription_path, callback=callback)

Pub/Sub exactly-once works when:

  1. Publisher provides an idempotency key (or uses automatic deduplication based on message content)
  2. Subscriber acknowledges within the ack deadline
  3. Subscriber is idempotent (safe to process the same message twice)

Pub/Sub does not provide exactly-once if the subscriber crashes after processing but before acknowledging. In that case, the message is redelivered and must be idempotent.

Comparing Broker Support

BrokerExactly-Once SupportNotes
KafkaYes (transactions)Within Kafka ecosystem
SQS FIFOYes (deduplication)Within SQS, 5-min window
RabbitMQNo (build it yourself)At-least-once with idempotency
Google Pub/SubYesDelivery guarantee with retry window

Trade-off Comparison Tables

Idempotency Implementation Strategies

StrategyComplexityLatency OverheadStorage RequiredBest For
Deduplication tableMedium1-2 DB round tripsGrows with message volumeHigh-value transactions
Redis TTL keysLow1-2 Redis opsKeys with TTLHigh-throughput, short retry windows
Natural idempotencyLowNoneNoneOperations that set deterministic values
Message-level transactionsHighTransaction overheadNoneWhen message ack must align with DB commit

Message Broker Exactly-Once Comparison

BrokerExactly-Once ScopeDeduplication WindowConsumer Idempotency Required?
KafkaWithin Kafka ecosystem onlyN/A (transactional)Yes for external systems
SQS FIFOWithin SQS, queue-level5 minutesYes, after visibility timeout
RabbitMQNot supported nativelyN/AYes, mandatory
Google Pub/SubDelivery guarantee with retry7 daysYes, after crash before ack

Outbox Pattern Deployment Options

ApproachLatencyComplexityConsistency GuaranteeOperational Overhead
Polling outboxHigher (poll interval)LowEventual (within poll window)Low
CDC (Debezium)Low (log-based)HighNear real-timeHigh (additional infrastructure)
Polling + CDCFlexibleMediumTunableMedium

Saga vs Outbox Decision Matrix

ConditionUse Saga PatternUse Outbox Pattern
Multi-service distributed transactionYesNo
Single-service dual-write problemNoYes
Cross-service consistency requiredYesNo
Event publication alongside business dataNoYes
Compensating transactions availableYesNo
Low operational complexity desiredNoYes

Production Failure Scenarios

Scenario 1: The Payment Processor Crash

A payment service processes an order and publishes a “payment successful” event. The consumer processes the payment, updates the order status, but crashes before deleting the message from SQS. With SQS FIFO’s visibility timeout, the message becomes visible again after the timeout. If the consumer is not idempotent, the payment is processed twice, resulting in a duplicate charge.

Mitigation: Implement idempotent consumers using deduplication tables or Redis-based dedup keys with TTL longer than the visibility timeout.

Scenario 2: The Kafka Transaction Timeout

A producer writes to Kafka with transactions enabled. The transaction commits successfully on the Kafka side, but the database write that was part of the same transaction times out. The Kafka offset is committed, but the downstream database update never happens, leaving the system in an inconsistent state where the offset indicates the message was processed but the side effect did not occur.

Mitigation: Ensure database transaction timeout is longer than Kafka’s transaction timeout, or use the transactional outbox pattern to decouple the writes.

Scenario 3: The RabbitMQ Network Partition

During a network partition, RabbitMQ’s publisher confirms may timeout. The producer assumes the message was not received and retries. After the partition heals, the original message is delivered along with the retry, creating duplicates. The consumer processes both but fails to detect the duplicate because the deduplication logic has a bug in its hash generation.

Mitigation: Use explicit message IDs for deduplication, validate hash generation logic, and ensure deduplication table entries survive restarts.

Scenario 4: The Pub/Sub Ack Deadline Miss

A Pub/Sub subscriber processes a message that requires calling an external API. The API call takes longer than the ack deadline. Pub/Sub redelivers the message to another subscriber instance. Both instances call the external API with the same payload, creating duplicate side effects on the external system.

Mitigation: Extend the ack deadline for long-running operations, implement idempotency keys at the external API level, and use distributed locking if necessary.

Scenario 5: The Outbox Relay Crash

A service writes to the outbox table and business tables in a single transaction. The outbox relay crashes after reading the outbox event but before publishing to the broker. The event is lost because the relay did not mark it as processed. Downstream consumers never receive the event, and the business state becomes inconsistent with the expected side effects.

Mitigation: Implement outbox relay with at-least-once delivery semantics, use idempotent publishing, and track published event IDs to prevent duplicates.

Quick Recap

  • Understand that exactly-once refers to processing semantics, not transport guarantees
  • Identify whether your system needs exactly-once or if at-least-once with idempotent consumers is sufficient
  • Choose the right message broker based on your exactly-once support requirements (Kafka, SQS FIFO, Pub/Sub)
  • Implement idempotency keys for message deduplication across retries
  • Use the outbox pattern to solve the dual-write problem atomically
  • Consider CDC tools like Debezium for high-throughput outbox relay
  • Design consumers to be idempotent regardless of broker guarantees
  • Apply natural idempotency where operations allow (e.g., SET same value)
  • Use saga pattern with compensating transactions for multi-service coordination
  • Monitor Redis memory usage if using it for deduplication at high message volumes
  • Set TTL longer than maximum retry window to prevent premature key expiration
  • Design stream processing outputs to be deterministic when possible

Interview Questions

1. What makes exactly-once delivery hard in distributed systems?
  • Partial failures cause message duplication when acks get lost
  • Consumers can crash after processing but before committing offsets
  • Any retry across producer-broker-consumer chain creates duplicates
2. How do at-least-once, at-most-once, and exactly-once differ?
  • At-least-once: redelivers but never loses; requires idempotent consumers
  • At-most-once: loses messages but never duplicates; consumer acks before processing
  • Exactly-once: processed precisely once; needs transactions or idempotent deduplication
3. How does Kafka implement exactly-once within its ecosystem?
  • Transactions (0.11+) write messages and offset commits atomically
  • Ties message consumption to offset commit in a single transaction
  • Only applies within Kafka; external systems need extra handling
4. What is content-based deduplication in SQS FIFO?
  • SQS hashes the message body to generate deduplication IDs
  • Identical bodies within 5 minutes get dropped automatically
  • You can also supply an explicit MessageDeduplicationId
5. Why does RabbitMQ need idempotent consumers?
  • RabbitMQ only guarantees at-least-once with manual acks
  • Use deduplication tables with ON CONFLICT DO NOTHING
  • Or leverage natural idempotency (setting a field to the same value)
  • Message-level transactions tie ack to database commit
6. What is the outbox pattern and what problem does it solve?
  • Writes the business record and outbox event in one database transaction
  • A relay process polls the outbox and publishes to the broker
  • If the business record exists, the event will eventually publish
7. When do you choose saga over outbox?
  • Saga: multi-service distributed transactions with compensating actions
  • Outbox: single-service dual-write consistency (one database, one transaction)
  • Saga coordinates across services; outbox handles reliable event publication
8. How does Redis help with message deduplication?
  • Use message_id or content hash as the dedup key
  • SETEX with TTL longer than your retry window prevents premature expiry
  • Fast in-memory lookups scale well for high-throughput systems
9. What are compensating transactions in sagas?
  • When a step fails, undo previous steps by running counter-transactions
  • Each saga step needs a corresponding compensation
  • Compensations must be idempotent since they can be retried
10. How does Google Pub/Sub deliver exactly-once?
  • 7-day retry window with publisher-supplied idempotency_key
  • Without explicit key, content-based deduplication applies
  • Subscribers must ack within the deadline; crash before ack means redelivery
11. What happens in Pub/Sub when a consumer crashes after processing but before ack?
  • Message gets redelivered within the ack deadline window
  • Subscriber must be idempotent to survive this safely
  • Exactly-once is violated unless your consumer handles duplicates
12. What is natural idempotency and why does it matter?
  • Operations that produce the same result regardless of repetitions
  • Example: UPDATE users SET status='active' WHERE id=1 changes nothing if run twice
  • Reduces need for external deduplication stores and overhead
13. How does CDC relate to the outbox pattern?
  • CDC tools like Debezium tail the transaction log instead of polling
  • Faster propagation in high-throughput systems
  • Trade-off: more operational complexity than polling outbox
14. What are the costs of implementing exactly-once?
  • Higher latency from transactional coordination
  • Operational complexity: outbox polling, CDC, saga coordination
  • Consumer idempotency is non-negotiable regardless of broker support
  • At-least-once + idempotent consumers is simpler for most use cases
15. How does SQS visibility timeout interact with idempotency?
  • Consumer has a visibility window to process and delete the message
  • If it times out without deleting, SQS redelivers
  • Your consumer needs to handle this redelivery idempotently
16. What is an idempotency key in message publishing?
  • A unique identifier the publisher attaches to enable broker-side deduplication
  • Kafka includes it in transactions; Pub/Sub uses idempotency_key parameter
  • Lets publishers retry safely without creating duplicate messages
17. Why measure duplicate cost before implementing exactly-once?
  • Not every system needs it; at-least-once + idempotent consumers often suffices
  • Log aggregators and non-critical notifications tolerate duplicates fine
  • Financial transactions, inventory, and unique constraints may actually need it
18. How does Kafka JDBC sink achieve exactly-once?
  • Uses transactional outbox pattern internally
  • Writes offset and database update in one transaction
  • Failed DB write means offset not committed, preventing reprocessing
19. Why does deterministic stream processing make exactly-once unnecessary?
  • Reprocessing produces the same result if outputs are deterministic
  • Kafka Streams word count is naturally idempotent
  • If your stream operations are deterministic, exactly-once is redundant
20. Compare exactly-once support across Kafka, SQS FIFO, RabbitMQ, and Pub/Sub.
  • Kafka: yes via transactions (within Kafka only)
  • SQS FIFO: yes via content dedup (5-minute window)
  • RabbitMQ: no native support, build it yourself
  • Pub/Sub: yes via 7-day retry window and idempotency keys

Further Reading

Conclusion

Exactly-once delivery is not a single mechanism but a combination of transport guarantees, idempotent processing, and sometimes distributed transactions. Most systems should start with at-least-once and idempotent consumers. Add exactly-once semantics only where the cost of duplicates justifies the implementation complexity.

The outbox pattern and idempotency keys are the practical tools here. They work across brokers and do not require special broker features.

For related reading, see Message Queue Types, Distributed Transactions, and Pub-Sub Patterns.

Category

Related Posts

Ordering Guarantees in Distributed Messaging

Understand how message brokers provide ordering guarantees, from FIFO queues to causal ordering across partitions, and the trade-offs in distributed systems.

#distributed-systems #messaging #kafka

Dead Letter Queues: Handling Message Failures Gracefully

Design and implement Dead Letter Queues for reliable message processing. Learn DLQ patterns, retry strategies, monitoring, and recovery workflows.

#data-engineering #dead-letter-queue #kafka

Apache Kafka: Distributed Streaming Platform

Learn how Apache Kafka handles distributed streaming with partitions, consumer groups, exactly-once semantics, and event-driven architecture patterns.

#kafka #messaging #streaming