Exactly-Once Delivery: The Elusive Guarantee
Explore exactly-once semantics in distributed messaging - why it's hard, how Kafka and SQS approach it, and practical patterns for deduplication.
Exactly-Once Delivery: The Elusive Guarantee
Every message processed exactly once, no duplicates, no loss. That is exactly-once delivery, and it is harder than it sounds. The name is misleading because exactly-once usually refers to processing semantics, not transport. The network can deliver a message multiple times. What matters is that the downstream system processes it only once.
This matters for financial transactions, inventory updates, and any system where duplicates create real problems. A payment processed twice is a disaster. An inventory decrement applied twice is a stock discrepancy.
Why Exactly-Once Is Hard
Distributed systems fail partially. A producer sends a message. The broker receives it and acknowledges. The acknowledgment gets lost in transit. The producer resends. Now the broker has two copies. Or the consumer processes the message, crashes before committing its offset, and on restart receives the same message again.
sequence
Producer->Broker: Send(msg)
Broker->Producer: ACK
Note over Producer,Broker: ACK lost in transit
Producer->Broker: Send(msg)
Broker->Producer: ACK
Note over Broker,Consumer: Two copies in broker
Any retry between producer, broker, or consumer can produce duplicates. At-least-once is easy. Getting exactly-once requires coordination across all three.
Delivery Semantics
Message systems typically offer one of three guarantees.
At-least-once means messages may be redelivered but never lost. The consumer acknowledges after processing. If processing succeeds but the ack fails, the message comes back. This is the most common setting, but it requires idempotent consumers.
At-most-once means messages may be lost but never duplicated. The consumer acknowledges before processing. If the consumer crashes after ack but before processing, the message is gone. This is rare and usually a poor trade-off.
Exactly-once means each message is processed precisely once. This requires either a transaction spanning the message and its side effects, or idempotent processing with deduplication.
stateDiagram-v2
[*] --> AtMostOnce: Ack before processing
AtMostOnce --> AtLeastOnce: Ack after processing
AtLeastOnce --> ExactlyOnce: Add deduplication/idempotency
note right of AtMostOnce
Risk: Message loss
Use case: Log aggregation
end note
note right of AtLeastOnce
Risk: Duplicates
Use case: Most applications
end note
note right of ExactlyOnce
Risk: Higher latency
Use case: Financial transactions
end note
The delivery guarantee spectrum:
| Guarantee | Duplicates | Loss | Complexity | Typical Use |
|---|---|---|---|---|
| At-most-once | No | Yes | Low | Critical logging |
| At-least-once | Yes | No | Medium | Most applications |
| Exactly-once | No | No | High | Financial transactions |
Kafka’s Exactly-Once Semantics
Kafka offers exactly-once semantics within the Kafka ecosystem. It can guarantee that messages produced to a topic are consumed exactly once by consumers within Kafka.
Kafka achieves this through transactions introduced in version 0.11. When you enable transactions, Kafka writes messages to partitions as part of a transaction. Consumer offsets are committed as part of the same transaction, tying message consumption to offset commit.
graph LR
Producer-->|produce| Kafka[Kafka Cluster]
Kafka-->|consume| Consumer[Consumer]
Consumer-->|commit offset| Kafka
Kafka-->|write tx| Partition[Partition]
Consumer-->|write| DB[(Database)]
Note over Producer,DB: Atomic if both in same transaction
For exactly-once processing outside Kafka, say writing to a database, you need to handle it yourself. The consumer must write the offset and the database update in a single transaction, or rely on idempotent semantics to deduplicate.
For more on Kafka’s partitioning and consumer groups, see Apache Kafka.
AWS SQS FIFO
SQS offers two queue types. Standard queues provide at-least-once delivery. FIFO queues provide exactly-once processing and preserve order.
FIFO queues use content-based deduplication to discard duplicate messages within a 5-minute window. Send the same message twice within 5 minutes and SQS drops the second copy.
sqs.send_message(
QueueUrl=queue_url,
MessageBody=json.dumps({'order_id': '12345', 'action': 'ship'}),
MessageDeduplicationId='order-12345-ship',
MessageGroupId='orders'
)
SQS applies a hash to the message body to derive the deduplication ID when you enable content-based deduplication. Identical bodies produce identical deduplication IDs.
SQS FIFO can guarantee exactly-once within the queue itself, but your consumer still needs to be idempotent. If your consumer times out and does not delete the message within the visibility timeout window, SQS will deliver it again.
For more on SQS features, see AWS SQS and SNS.
RabbitMQ and Idempotent Consumers
RabbitMQ does not support exactly-once semantics out of the box. It gives you at-least-once with manual acknowledgments. Your consumer must make itself idempotent.
Idempotency means processing the same message multiple times has the same effect as processing it once. Here is how to actually do it.
Deduplication table: Store processed message IDs in a database. Before processing, check if the ID exists.
INSERT INTO processed_messages (msg_id, processed_at)
VALUES ('msg-123', NOW())
ON CONFLICT (msg_id) DO NOTHING;
-- If insert succeeds, process the message
-- If insert fails (duplicate), skip processing
Natural idempotency: Some operations are naturally idempotent. Setting a field to the same value twice leaves the database in the same state. UPDATE users SET status = 'active' WHERE id = 1 can run twice and nothing changes. Spotting these operations and using them cuts down on overhead.
Message-level transactions: Combine the message acknowledgment with your database transaction. If the database update fails, the message is not acknowledged and comes back.
For more on RabbitMQ’s acknowledgment model, see RabbitMQ.
Practical Patterns
Idempotency Keys
Every message carries a unique ID or you derive one from content. The consumer stores processed IDs. When a message arrives, check if its ID was already processed and skip if so. This approach works with any message broker.
def process_message(msg):
msg_id = msg.headers.get('x-idempotency-key') or hash(msg.body)
if redis.exists(f"processed:{msg_id}"):
return
do_work(msg)
redis.setex(f"processed:{msg_id}", ttl=86400, value='1')
Set the TTL longer than your maximum retry window to prevent unbounded growth.
Outbox Pattern
The outbox pattern addresses the dual-write problem. Instead of writing to the database and publishing a message in two separate steps, you write both to the database in a single transaction.
graph TD
Service-->|1 write| DB[(Outbox Table)]
DB-->|2 read| Relay[Outbox Relay]
Relay-->|3 publish| Broker[Message Broker]
Note over Service,Broker: Atomic write ensures consistency
The outbox table lives in the same database as your business data. You write the business record and the outbox event in the same transaction. A separate process polls the outbox and publishes to the broker.
The key guarantee: if the business record exists, the event will eventually be published. The event is never published without the corresponding record.
For more on these patterns, see Event-Driven Architecture and Asynchronous Communication.
Transactional Outbox with CDC
In high-throughput systems, polling the outbox adds latency. Change Data Capture tools like Debezium tail the database transaction log and publish when records change. This is faster but adds operational complexity.
The pattern stays the same: write to the outbox atomically with business data, let CDC handle propagation.
When Exactly-Once Matters
You do not always need exactly-once. For many use cases, at-least-once with idempotent consumers is sufficient and simpler. A log aggregator can handle duplicates. A notification system can usually tolerate a duplicate email if it is not critical.
Exactly-once matters for financial transactions (duplicate charges are serious), inventory operations (double decrements overstate deductions), unique constraints (double user creation may violate uniqueness), and external API calls (retrying a non-idempotent API creates duplicate side effects).
Measure the cost of duplicates in your system before implementing exactly-once. The complexity is real and the operational burden is higher.
Saga Pattern vs Outbox Pattern
The saga pattern and outbox pattern solve related but distinct problems. Both address exactly-once in distributed systems, but they handle different failure scenarios.
The saga pattern coordinates multiple services in a distributed transaction. When step 3 fails, saga executes compensating transactions to undo steps 1 and 2. This handles partial failure without distributed locks.
The outbox pattern handles the dual-write problem within a single service. Instead of writing to the database and publishing a message in two separate steps, you write both in a single database transaction. If the business record exists, the event will eventually be published.
| Aspect | Saga Pattern | Outbox Pattern |
|---|---|---|
| Problem solved | Multi-service distributed transactions | Single-service dual-write consistency |
| Failure handling | Compensating transactions | Eventual publication |
| Idempotency needed | Yes (compensating transactions must be idempotent) | Yes (outbox relay may retry) |
| Complexity | High (requires careful compensation design) | Medium (requires polling outbox) |
| Consistency model | Eventual | Stronger eventual (within a single service) |
| Use when | Multiple services must update in coordinated way | Service needs to publish events atomically with database changes |
Use the saga pattern when you have multiple independent services each making state changes that must be coordinated. Use the outbox pattern when a single service needs to publish events reliably without a distributed transaction coordinator.
For more on saga, see Saga Pattern. For outbox, see Outbox Pattern.
Idempotent Consumer with Redis Deduplication
Redis works well as a fast deduplication store for idempotent consumers. Store processed message IDs with a TTL longer than your retry window:
import redis
import json
import hashlib
class RedisIdempotentConsumer:
def __init__(self, redis_client: redis.Redis, ttl_seconds: int = 86400):
self.redis = redis_client
self.ttl = ttl_seconds
def _get_dedup_key(self, message) -> str:
"""Derive a unique key from the message."""
msg_id = message.get('message_id')
if msg_id:
return f"dedup:{msg_id}"
# Fallback: hash the message content
content = json.dumps(message, sort_keys=True)
content_hash = hashlib.sha256(content.encode()).hexdigest()[:16]
return f"dedup:hash:{content_hash}"
def is_duplicate(self, message) -> bool:
"""Check if this message was already processed."""
key = self._get_dedup_key(message)
return self.redis.exists(key) == 1
def mark_processed(self, message) -> None:
"""Mark message as processed with TTL."""
key = self._get_dedup_key(message)
self.redis.setex(key, self.ttl, '1')
def process(self, message: dict) -> bool:
"""
Process message idempotently.
Returns True if processed, False if duplicate.
"""
if self.is_duplicate(message):
return False
# Process the actual work
do_work(message)
# Mark as processed
self.mark_processed(message)
return True
# Usage in a consumer loop
consumer = RedisIdempotentConsumer(redis_client, ttl_seconds=86400)
for message in receive_messages():
if consumer.process(message):
print(f"Processed message {message['message_id']}")
else:
print(f"Skipped duplicate {message['message_id']}")
Set the TTL longer than your maximum retry window. If a message might be retried for up to 1 hour due to broker retry policies, set TTL to 2 hours or more. If you process millions of messages per day, watch Redis memory usage.
Kafka Exactly-Once with External Systems
Kafka transactions guarantee exactly-once within the Kafka ecosystem. When writing to external systems like databases, you need additional handling.
Kafka Transactions with JDBC
The Kafka JDBC sink connector handles exactly-once for database writes using the transactional outbox pattern:
# Kafka Connect JDBC sink configuration for exactly-once
transforms=insertIdempotenceKey
transforms.insertIdempotenceKey.type=org.apache.kafka.connect.transformes.ValueToKey
transforms.insertIdempotenceKey.fields=id
# Enable exactly-once for the sink
s Exactly-once support guarantees that records delivered to external systems are processed exactly once.
For the Kafka Streams word count application, each word count update is idempotent. If a count is recalculated due to reprocessing, the database write produces the same result. Design your stream processing to produce deterministic outputs, and exactly-once guarantees become unnecessary complexity.
## Google Pub/Sub Exactly-Once
Pub/Sub offers exactly-once delivery with a 7-day retry window. It deduplicates messages using publisher-supplied message IDs:
```python
from google.cloud import pubsub_v1
publisher = pubsub_v1.PublisherClient()
# Publish with explicit message ID (recommended)
future = publisher.publish(
'projects/my-project/topics/my-topic',
data=b'event data',
idempotency_key='my-unique-idempotency-key'
)
The idempotency_key parameter enables exactly-once delivery on the publish side. If you publish with the same key multiple times, Pub/Sub delivers only one message.
On the subscriber side, Pub/Sub acknowledges automatically after processing. If your subscriber crashes and restarts, messages are redelivered within the ack deadline window:
subscriber = pubsub_v1.SubscriberClient()
subscription_path = 'projects/my-project/subscriptions/my-sub'
def callback(message):
try:
process(message.data)
message.ack()
except Exception:
# Nack and let Pub/Sub redeliver
message.nack()
subscriber.subscribe(subscription_path, callback=callback)
Pub/Sub exactly-once works when:
- Publisher provides an idempotency key (or uses automatic deduplication based on message content)
- Subscriber acknowledges within the ack deadline
- Subscriber is idempotent (safe to process the same message twice)
Pub/Sub does not provide exactly-once if the subscriber crashes after processing but before acknowledging. In that case, the message is redelivered and must be idempotent.
Comparing Broker Support
| Broker | Exactly-Once Support | Notes |
|---|---|---|
| Kafka | Yes (transactions) | Within Kafka ecosystem |
| SQS FIFO | Yes (deduplication) | Within SQS, 5-min window |
| RabbitMQ | No (build it yourself) | At-least-once with idempotency |
| Google Pub/Sub | Yes | Delivery guarantee with retry window |
Summary
Exactly-once delivery is not a single mechanism but a combination of transport guarantees, idempotent processing, and sometimes distributed transactions. Most systems should start with at-least-once and idempotent consumers. Add exactly-once semantics only where the cost of duplicates justifies the implementation complexity.
The outbox pattern and idempotency keys are the practical tools here. They work across brokers and do not require special broker features.
For related reading, see Message Queue Types, Distributed Transactions, and Pub-Sub Patterns.
Category
Related Posts
Ordering Guarantees in Distributed Messaging
Understand how message brokers provide ordering guarantees - from FIFO queues to causal ordering across partitions, and trade-offs in distributed systems.
Dead Letter Queues: Handling Message Failures Gracefully
Design and implement Dead Letter Queues for reliable message processing. Learn DLQ patterns, retry strategies, monitoring, and recovery workflows.
Apache Kafka: Distributed Streaming Platform
Learn how Apache Kafka handles distributed streaming with partitions, consumer groups, exactly-once semantics, and event-driven architecture patterns.