Apache Kafka: Distributed Streaming Platform
Learn how Apache Kafka handles distributed streaming with partitions, consumer groups, exactly-once semantics, and event-driven architecture patterns.
Apache Kafka: Distributed Streaming Platform
Kafka is not a message queue. It is a distributed streaming platform built for continuous data streams, not batch message processing. This distinction matters. Queues deliver messages point-to-point. Kafka publishes and consumes streams of events with durable storage and replay capability.
This post covers Kafka’s core concepts: topics and partitions, consumer groups, exactly-once semantics, and practical use cases.
Topics and partitions
Kafka organizes data into topics. Unlike a queue where messages are consumed and deleted, Kafka topics are logs. Messages are appended and kept for a configurable retention period: hours, days, or indefinitely.
Topic: order-events
Partition 0: [msg1, msg2, msg3, msg5, msg8]
Partition 1: [msg4, msg6, msg7, msg9]
Partition 2: [msg10, msg11, msg12]
Each topic splits into partitions for parallelism. Partitions distribute across brokers. Within a partition, messages have a monotonically increasing offset that uniquely identifies each one.
Message keying
Producers can specify a key when publishing:
producer.send(new ProducerRecord("order-events", orderId, orderJson));
Kafka hashes the key to determine the partition. Messages with the same key always go to the same partition, which means ordering per key. All events for the same order arrive in the same partition, in order.
Partition assignment
Partitions assign to brokers at topic creation time:
Broker 1: Partition 0, Partition 3
Broker 2: Partition 1, Partition 4
Broker 3: Partition 2, Partition 5
The leader broker for each partition handles reads and writes. Followers replicate for fault tolerance.
Consumer groups
Kafka consumers belong to consumer groups. Each message in a topic delivers to one consumer within each group.
graph LR
Producer -->|publish| Topic[Topic with 3 Partitions]
Topic -->|P0| CG1[Consumer Group A]
Topic -->|P1| CG1
Topic -->|P2| CG1
Topic -->|P0| CG2[Consumer Group B]
Topic -->|P1| CG2
Topic -->|P2| CG2
Group A might have one consumer processing all partitions. Group B might have three consumers, each owning one partition. Both groups receive all messages independently.
Rebalancing
When a consumer joins or leaves a group, Kafka rebalances partition assignments. The system redistributes partitions to maintain even load.
Rebalancing has a cost: during rebalancing, no consumer processes messages. If rebalances happen frequently, throughput suffers.
Offset management
Kafka tracks consumption progress via offsets. Consumers commit offsets to mark what they have processed:
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, String> record : records) {
process(record);
}
consumer.commitSync();
}
If a consumer crashes after processing but before committing, it receives the same messages on restart. This is at-least-once delivery. With idempotent processing, duplicates are harmless.
Exactly-once semantics
Exactly-once is the gold standard: each message processes precisely once, no duplicates, no loss. Kafka offers exactly-once semantics via transactions.
The problem
Exactly-once is hard because processing involves multiple systems:
Kafka -> Consumer -> Database
If the consumer crashes after writing to the database but before committing the Kafka offset, the message reprocesses on restart, causing a duplicate database write.
Kafka transactions
Kafka transactions solve this by atomically committing offsets and output:
producer.initTransactions();
producer.beginTransaction();
producer.send(producerRecord);
producer.sendOffsetsToTransaction(consumer.offsets(), consumer.groupMetadata());
producer.commitTransaction();
If the transaction commits, the message writes and the offset commits atomically. If it aborts, neither happens. The consumer reads the message again.
When you need exactly-once
Most use cases do not need exactly-once. At-least-once with idempotent processing is simpler and performs better. Consider exactly-once only when:
- Duplicate processing has serious consequences (financial transactions, inventory updates)
- Your output system does not support idempotent writes
- The cost is acceptable
For most event processing pipelines, at-least-once plus idempotency is the right choice.
End-to-end exactly-once flow
The exactly-once flow in practice:
sequenceDiagram
participant Producer
participant Kafka as Kafka Cluster
participant Consumer as Consumer App
participant DB as Output DB
participant Coordinator as Transaction Coordinator
Producer->>Kafka: send() with transactional ID
Kafka->>Producer: acknowledged
Consumer->>Kafka: poll() receives record
Consumer->>DB: write to database
Consumer->>Coordinator: sendOffsetsToTransaction()
Coordinator->>Kafka: commit offsets + data atomically
Kafka->>Coordinator: commit confirmed
Coordinator->>Consumer: offset commit confirmed
Note over Consumer,DB: If crash here: replay from committed offset, skip already-written records
The transaction coordinator bundles the database write and the offset commit together. They either both commit or both abort. When the consumer restarts, it resumes from the last committed offset — skipping any records that were already written to the output system.
Without transactions, there is a window between “wrote to database” and “committed offset.” Crash in that window and you get duplicates. Transactions close that window.
Broker replication and ISR
Kafka replicates partitions across brokers for fault tolerance. Each partition has a leader and multiple followers. Followers replicate messages from the leader, staying in sync.
graph LR
subgraph Broker-1
L1[Partition 0 Leader]
end
subgraph Broker-2
F1[Partition 0 Follower - ISR]
end
subgraph Broker-3
F2[Partition 0 Follower - ISR]
end
Producer -->|writes| L1
L1 -->|replicate| F1
L1 -->|replicate| F2
L1 -->|consume| Consumer
In-Sync Replicas (ISR) are replicas that have fully caught up with the leader. Only ISR members can become leaders after a failure. The replication.factor setting controls how many replicas exist, and min.insync.replicas defines the minimum ISR size for acknowledging writes.
For example, with replication.factor=3 and min.insync.replicas=2, a partition has 3 replicas. Writes acknowledge when at least 2 replicas (the leader plus 1 follower) have persisted the message.
Key configuration defaults
| Parameter | Default | Description | Production recommendation |
|---|---|---|---|
replication.factor | 1 | Number of replicas per partition | 3 for critical topics |
min.insync.replicas | 1 | Minimum ISR for acknowledge | 2 (requires acks=all) |
retention.ms | 7 days | Message retention period | Based on replay requirements |
acks | 1 (leader) | Acknowledgment required | all for critical data |
compression.type | producer | Compression codec | lz4 or zstd |
max.in.flight.requests.per.connection | 5 | Unacknowledged requests | 1 for exactly-once |
Kafka Streams example
Kafka Streams is a client library for building real-time stream processing applications. The word count example is the canonical demonstration. It counts occurrences of words across an infinite stream of text events:
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;
import java.util.Arrays;
import java.util.Properties;
public class WordCountApplication {
public static void main(String[] args) {
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-app");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
StreamsBuilder builder = new StreamsBuilder();
// Source: read from input topic
KStream<String, String> textLines = builder.stream("text-lines-topic");
// Process: split into words, group, count
textLines
.flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
.groupBy((key, word) -> word)
.count(Materialized.as("word-counts-store"))
.toStream()
.to("word-counts-output-topic");
KafkaStreams streams = new KafkaStreams(builder.build(), config);
streams.start();
}
}
How it works:
flatMapValuessplits each input line into lowercase wordsgroupBygroups by word, discarding the message keycountmaintains a running count per word in a state storetoStream.towrites results to the output topic
Scaling behavior: each partition processes words independently. The word “kafka” appearing in partitions 0 and 1 produces two separate counts that must be aggregated downstream if you need a global count. For true global word count, either use a single partition or aggregate in a subsequent step.
Fault tolerance: Kafka Streams checkpointing persists state to Kafka topics. If a stream processor crashes, it resumes from the last checkpointed position without data loss.
Partition count sizing
The number of partitions determines parallelism for both producers and consumers. Getting it right at topic creation matters since partition count is immutable.
Factors that drive partition count:
| Factor | Impact | Consideration |
|---|---|---|
| Desired consumer parallelism | More partitions means more concurrent consumers | One partition per consumer in a group |
| Producer throughput target | Each partition handles about 10 MB/s | Partitions should be >= target_MB_s / 10 |
| Maximum broker scale | Kafka performance degrades past roughly 4000 partitions per broker | Consider broker count when sizing |
| Key cardinality | High-cardinality keys distribute evenly | Avoid partitions that become hot spots |
Sizing formula:
def calculate_partition_count(
target_throughput_mbps: float,
max_consumer_parallelism: int,
num_brokers: int,
replication_factor: int = 3
) -> dict:
"""
Calculate recommended partition count for a topic.
"""
# Throughput-based: each partition handles ~10 MB/s reliably
partitions_for_throughput = math.ceil(target_throughput_mbps / 10)
# Consumer-based: need at least as many partitions as consumers
partitions_for_consumers = max_consumer_parallelism
# Broker-based: avoid too many partitions per broker
# Guideline: < 4000 partitions per broker for good performance
max_partitions_per_broker = 4000
partitions_for_broker_capacity = num_brokers * max_partitions_per_broker
recommended = max(partitions_for_throughput, partitions_for_consumers)
recommended = min(recommended, partitions_for_broker_capacity)
# Account for replication: total partitions including replicas
total_partitions_in_cluster = recommended * replication_factor
return {
'recommended_partitions': recommended,
'based_on': 'throughput' if partitions_for_throughput >= partitions_for_consumers else 'consumer_parallelism',
'throughput_achievable_mbps': recommended * 10,
'max_consumer_parallelism': recommended,
'total_partition_slots_used': total_partitions_in_cluster,
'partitions_per_broker': recommended / num_brokers
}
# Example: 100 MB/s target, 30 consumers, 6 brokers
result = calculate_partition_count(100, 30, 6)
# Returns: recommended=30 (based on consumer count),
# throughput_achievable=300 MB/s,
# partitions_per_broker=5
Practical guidelines:
- Start conservative: 6-12 partitions for most topics
- Increase when consumer lag appears and more consumers cannot be added
- Monitor per-partition throughput to detect hot spots
- When repartitioning is needed, create a new topic and migrate data with a consumer that reads from both
Producer and consumer implications:
- More partitions means more producer connections and higher client-side memory
- More partitions means longer leader election time after broker failure
- Consumers with many partitions need more heap for offset tracking
Kafka use cases
Kafka excels at specific workloads:
Event streaming
Capture events from many sources and distribute to multiple consumers:
Clickstream -> Kafka -> Analytics
-> Personalization
-> Fraud Detection
-> Audit Log
Message bus replacement
Replace traditional message queues with Kafka for better throughput and replay capability.
Data integration
Connect different systems without point-to-point coupling:
Database CDC -> Kafka -> Search Index
-> Cache
-> Data Warehouse
-> ML Pipeline
Change Data Capture from databases fits this pattern well. Any system can consume the stream without touching the source database.
Kafka vs traditional queues
| Aspect | Kafka | Traditional Queue |
|---|---|---|
| Retention | Days/weeks/forever | Until consumed |
| Replay | Yes | No (usually) |
| Ordering | Per partition | Per queue |
| Throughput | Very high | Moderate |
| Consumer groups | Independent per group | Shared or exclusive |
Kafka’s retention and replay make it unique. You can reprocess historical data if your processing logic changes. Traditional queues cannot do this.
For broader event-driven patterns, see our post on event-driven architecture. For pub/sub patterns that overlap with Kafka’s topic model, see pub/sub patterns.
When to use and when not to use
When to use Apache Kafka
- High-throughput event streaming: when you need to process millions of events per second (clickstreams, IoT sensors, logs)
- Event sourcing: when you need to store and replay event history for audit trails or read model rebuilding
- Data pipeline backbone: when connecting multiple systems (CDC, ETL, data lake ingestion) without point-to-point coupling
- Multiple independent consumers: when many different consumer groups need to process the same data stream independently
- Ordered per-key processing: when messages with the same key must maintain order (same partition)
- Long-term message retention: when you need replay capability for historical data analysis
When not to use Apache Kafka
- Simple task queues: when you just need point-to-point work distribution with single consumer
- Low-volume point-to-point messaging: when the overhead of managing partitions and consumer groups is not justified
- Request/response: when you need synchronous responses from specific services
- Message priority: when you need priority queuing (Kafka does not support this)
- Small teams with limited operational capacity: when Kafka’s operational complexity (partition management, replication, leader election) exceeds team capabilities
- Very low latency requirements: when sub-millisecond latency is critical (Kafka adds latency for batching and acknowledgment)
Production failure scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Broker goes down | Partition leader election; temporary unavailability | Configure replication factor of 3; use ISR configuration |
| Controller failure | Cluster-wide coordination停顿 | Run multiple brokers; use ZooKeeper/KRaft for controller election |
| Network partition | Followers fall out of ISR; potential data loss | Monitor ISR size; alert when replicas fall behind |
| Producer retry storm | Duplicate messages after transient failures | Enable idempotent producer; design idempotent consumers |
| Consumer rebalance storm | Throughput drops during rebalancing | Use sticky partition assignment; avoid frequent consumer restarts |
| Offset commit failure | Messages reprocessed or skipped | Use transactional producers with exactly-once semantics when needed |
| Partition imbalance | Some brokers overloaded while others idle | Monitor partition distribution; use Cruise Control for rebalancing |
| Data loss on leader change | Under-replicated partitions lose messages | Ensure min.insync.replicas >= 2; acks=all on producers |
Observability checklist
Metrics to monitor
- Consumer lag: difference between latest offset and consumer position (critical for SLAs)
- Under-replicated partitions: partitions without full replication
- ISR size: In-Sync Replicas count per partition
- Message throughput: messages and bytes per second per topic
- Request latency: P99 producer and consumer request latencies
- Disk usage: broker disk utilization and growth rate
- Consumer group status: active members and partition assignments
- Controller status: leader elections and controller changes
Logs to capture
- Broker startup and shutdown events
- Partition leader election events
- Consumer group rebalancing events
- Producer acknowledgment failures and retries
- Controller changes and election events
- Under-replicated partition events
- Disk space warnings
Alerts to configure
- Consumer lag exceeds SLA threshold (for example, more than 5 minutes behind)
- Under-replicated partitions greater than 0
- Broker disk usage above 80%
- Producer error rate above 1%
- Consumer group has no active members
- Controller is unavailable
- Messages per second exceeds capacity threshold
- Request latency P99 exceeds threshold
ZooKeeper and KRaft storage requirements
Kafka needs somewhere to store cluster metadata — ZooKeeper traditionally, KRaft in Kafka 3.3+.
What is stored:
- Partition leadership and ISR membership
- Consumer group offsets and membership
- Access control lists (ACLs)
- Topic configurations
- Delegation token information
ZooKeeper/KRaft storage ≈
partitions × (leadership + ISR state)
+ consumer_groups × (member offsets + metadata)
+ topics × (configs + ACLs)
+ delegation_tokens
Typical storage needs:
| Cluster Size | Topics | Partitions | Consumer Groups | ZooKeeper/KRaft Storage |
|---|---|---|---|---|
| Small (3 brokers) | 50 | 200 | 30 | 50-200 MB |
| Medium (6 brokers) | 200 | 1,000 | 150 | 200-500 MB |
| Large (12 brokers) | 500 | 5,000 | 500 | 1-3 GB |
| Very large (24+ brokers) | 1,000+ | 20,000+ | 2,000+ | 5-10 GB |
Storage is not the real bottleneck. Znode count is. ZooKeeper falls over when there are millions of child znodes — which happens when you have many consumer group members or partition replicas. KRaft mode removes ZooKeeper dependency and scales better.
If you are still on ZooKeeper, migrate to KRaft. Kafka 3.3+ supports live migration — no downtime needed.
Security checklist
- Authentication: use SASL/PLAIN or SCRAM for client authentication; mTLS for certificate-based auth
- Authorization: implement ACLs to restrict topic access; principle of least privilege
- Encryption in transit: enable SSL/TLS for all broker and client connections
- Encryption at rest: use disk encryption or Kafka’s Secret API for sensitive data
- Schema validation: validate message schemas with Confluent Schema Registry
- Data sanitization: sanitize message keys and values to prevent injection
- Audit logging: enable Kafka’s audit logging for admin operations
- Network segmentation: place brokers in private networks; restrict inter-broker communication
Common pitfalls and anti-patterns
Pitfall 1: too many partitions
Each partition increases Kafka’s overhead (file handles, memory, leader elections). Creating thousands of partitions when you only need dozens causes unnecessary complexity. Start with fewer partitions and increase only when needed.
Pitfall 2: ignoring consumer lag
If consumers cannot keep up with producers, lag grows indefinitely. Monitor lag continuously and scale consumers or optimize processing logic.
Pitfall 3: not using compression
Uncompressed messages waste network bandwidth and storage. Enable compression (LZ4, ZSTD, or Snappy) on producers.
Pitfall 4: auto.offset.reset = earliest without understanding consequences
This setting replays all messages from the beginning of the log. In production with large retention, this can overwhelm consumers. Use it deliberately, not as a default.
Pitfall 5: sending sensitive data unencrypted
Kafka does not encrypt messages by default. If you send PII, credentials, or sensitive data, enable encryption or do not send such data through Kafka.
Pitfall 6: not planning for partition count
Partition count is fixed at topic creation. If you need more later, you must recreate the topic. Plan partition count based on expected throughput and consumer parallelism requirements.
Quick recap
Key points
- Kafka is a distributed log, not a queue; messages are retained and can be replayed
- Topics divide into partitions for parallelism; same key goes to same partition
- Consumer groups enable independent consumption by different services
- At-least-once delivery is the default; exactly-once requires Kafka transactions
- Rebalancing occurs when consumers join or leave; frequent rebalances hurt throughput
- ZooKeeper (or KRaft in newer versions) manages cluster metadata
Pre-deployment checklist
- [ ] Replication factor set to 3 for critical topics
- [ ] min.insync.replicas configured appropriately
- [ ] Consumer lag monitoring configured and alerts set
- [ ] Producer retries and idempotency configured
- [ ] Compression enabled on producers
- [ ] Schema Registry deployed for schema validation
- [ ] ACLs configured for topic access control
- [ ] TLS/SSL encryption enabled for all connections
- [ ] Partition count planned based on throughput requirements
- [ ] Retention policy configured (hours, days, weeks)
- [ ] Dead letter topic configured for failed messages
- [ ] Consumer group offset reset policy defined
- [ ] Backup and disaster recovery plan documented
Conclusion
Kafka is a distributed log that happens to support messaging patterns. Topics are logs, partitions are shards, consumer groups enable independent consumption. The offset-based consumption model gives you replay capability that traditional queues lack.
For high-throughput event streaming, Kafka delivers. The operational complexity is real (ZooKeeper or KRaft for metadata, partition management, replication), but the scalability and durability are legitimate.
If you need true exactly-once, Kafka transactions are available but costly. For most pipelines, at-least-once with idempotent consumers is simpler and sufficient.
Category
Related Posts
Exactly-Once Delivery: The Elusive Guarantee
Explore exactly-once semantics in distributed messaging - why it's hard, how Kafka and SQS approach it, and practical patterns for deduplication.
Ordering Guarantees in Distributed Messaging
Understand how message brokers provide ordering guarantees - from FIFO queues to causal ordering across partitions, and trade-offs in distributed systems.
Dead Letter Queues: Handling Message Failures Gracefully
Design and implement Dead Letter Queues for reliable message processing. Learn DLQ patterns, retry strategies, monitoring, and recovery workflows.