Apache Kafka: Distributed Streaming Platform

Learn how Apache Kafka handles distributed streaming with partitions, consumer groups, exactly-once semantics, and event-driven architecture patterns.

published: reading time: 16 min read

Apache Kafka: Distributed Streaming Platform

Kafka is not a message queue. It is a distributed streaming platform built for continuous data streams, not batch message processing. This distinction matters. Queues deliver messages point-to-point. Kafka publishes and consumes streams of events with durable storage and replay capability.

This post covers Kafka’s core concepts: topics and partitions, consumer groups, exactly-once semantics, and practical use cases.

Topics and partitions

Kafka organizes data into topics. Unlike a queue where messages are consumed and deleted, Kafka topics are logs. Messages are appended and kept for a configurable retention period: hours, days, or indefinitely.

Topic: order-events
Partition 0: [msg1, msg2, msg3, msg5, msg8]
Partition 1: [msg4, msg6, msg7, msg9]
Partition 2: [msg10, msg11, msg12]

Each topic splits into partitions for parallelism. Partitions distribute across brokers. Within a partition, messages have a monotonically increasing offset that uniquely identifies each one.

Message keying

Producers can specify a key when publishing:

producer.send(new ProducerRecord("order-events", orderId, orderJson));

Kafka hashes the key to determine the partition. Messages with the same key always go to the same partition, which means ordering per key. All events for the same order arrive in the same partition, in order.

Partition assignment

Partitions assign to brokers at topic creation time:

Broker 1: Partition 0, Partition 3
Broker 2: Partition 1, Partition 4
Broker 3: Partition 2, Partition 5

The leader broker for each partition handles reads and writes. Followers replicate for fault tolerance.

Consumer groups

Kafka consumers belong to consumer groups. Each message in a topic delivers to one consumer within each group.

graph LR
    Producer -->|publish| Topic[Topic with 3 Partitions]
    Topic -->|P0| CG1[Consumer Group A]
    Topic -->|P1| CG1
    Topic -->|P2| CG1
    Topic -->|P0| CG2[Consumer Group B]
    Topic -->|P1| CG2
    Topic -->|P2| CG2

Group A might have one consumer processing all partitions. Group B might have three consumers, each owning one partition. Both groups receive all messages independently.

Rebalancing

When a consumer joins or leaves a group, Kafka rebalances partition assignments. The system redistributes partitions to maintain even load.

Rebalancing has a cost: during rebalancing, no consumer processes messages. If rebalances happen frequently, throughput suffers.

Offset management

Kafka tracks consumption progress via offsets. Consumers commit offsets to mark what they have processed:

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        process(record);
    }
    consumer.commitSync();
}

If a consumer crashes after processing but before committing, it receives the same messages on restart. This is at-least-once delivery. With idempotent processing, duplicates are harmless.

Exactly-once semantics

Exactly-once is the gold standard: each message processes precisely once, no duplicates, no loss. Kafka offers exactly-once semantics via transactions.

The problem

Exactly-once is hard because processing involves multiple systems:

Kafka -> Consumer -> Database

If the consumer crashes after writing to the database but before committing the Kafka offset, the message reprocesses on restart, causing a duplicate database write.

Kafka transactions

Kafka transactions solve this by atomically committing offsets and output:

producer.initTransactions();
producer.beginTransaction();
producer.send(producerRecord);
producer.sendOffsetsToTransaction(consumer.offsets(), consumer.groupMetadata());
producer.commitTransaction();

If the transaction commits, the message writes and the offset commits atomically. If it aborts, neither happens. The consumer reads the message again.

When you need exactly-once

Most use cases do not need exactly-once. At-least-once with idempotent processing is simpler and performs better. Consider exactly-once only when:

  • Duplicate processing has serious consequences (financial transactions, inventory updates)
  • Your output system does not support idempotent writes
  • The cost is acceptable

For most event processing pipelines, at-least-once plus idempotency is the right choice.

End-to-end exactly-once flow

The exactly-once flow in practice:

sequenceDiagram
    participant Producer
    participant Kafka as Kafka Cluster
    participant Consumer as Consumer App
    participant DB as Output DB
    participant Coordinator as Transaction Coordinator

    Producer->>Kafka: send() with transactional ID
    Kafka->>Producer: acknowledged

    Consumer->>Kafka: poll() receives record
    Consumer->>DB: write to database
    Consumer->>Coordinator: sendOffsetsToTransaction()
    Coordinator->>Kafka: commit offsets + data atomically
    Kafka->>Coordinator: commit confirmed
    Coordinator->>Consumer: offset commit confirmed

    Note over Consumer,DB: If crash here: replay from committed offset, skip already-written records

The transaction coordinator bundles the database write and the offset commit together. They either both commit or both abort. When the consumer restarts, it resumes from the last committed offset — skipping any records that were already written to the output system.

Without transactions, there is a window between “wrote to database” and “committed offset.” Crash in that window and you get duplicates. Transactions close that window.

Broker replication and ISR

Kafka replicates partitions across brokers for fault tolerance. Each partition has a leader and multiple followers. Followers replicate messages from the leader, staying in sync.

graph LR
    subgraph Broker-1
        L1[Partition 0 Leader]
    end
    subgraph Broker-2
        F1[Partition 0 Follower - ISR]
    end
    subgraph Broker-3
        F2[Partition 0 Follower - ISR]
    end
    Producer -->|writes| L1
    L1 -->|replicate| F1
    L1 -->|replicate| F2
    L1 -->|consume| Consumer

In-Sync Replicas (ISR) are replicas that have fully caught up with the leader. Only ISR members can become leaders after a failure. The replication.factor setting controls how many replicas exist, and min.insync.replicas defines the minimum ISR size for acknowledging writes.

For example, with replication.factor=3 and min.insync.replicas=2, a partition has 3 replicas. Writes acknowledge when at least 2 replicas (the leader plus 1 follower) have persisted the message.

Key configuration defaults

ParameterDefaultDescriptionProduction recommendation
replication.factor1Number of replicas per partition3 for critical topics
min.insync.replicas1Minimum ISR for acknowledge2 (requires acks=all)
retention.ms7 daysMessage retention periodBased on replay requirements
acks1 (leader)Acknowledgment requiredall for critical data
compression.typeproducerCompression codeclz4 or zstd
max.in.flight.requests.per.connection5Unacknowledged requests1 for exactly-once

Kafka Streams example

Kafka Streams is a client library for building real-time stream processing applications. The word count example is the canonical demonstration. It counts occurrences of words across an infinite stream of text events:

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;

import java.util.Arrays;
import java.util.Properties;

public class WordCountApplication {
    public static void main(String[] args) {
        Properties config = new Properties();
        config.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-app");
        config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        StreamsBuilder builder = new StreamsBuilder();

        // Source: read from input topic
        KStream<String, String> textLines = builder.stream("text-lines-topic");

        // Process: split into words, group, count
        textLines
            .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
            .groupBy((key, word) -> word)
            .count(Materialized.as("word-counts-store"))
            .toStream()
            .to("word-counts-output-topic");

        KafkaStreams streams = new KafkaStreams(builder.build(), config);
        streams.start();
    }
}

How it works:

  1. flatMapValues splits each input line into lowercase words
  2. groupBy groups by word, discarding the message key
  3. count maintains a running count per word in a state store
  4. toStream.to writes results to the output topic

Scaling behavior: each partition processes words independently. The word “kafka” appearing in partitions 0 and 1 produces two separate counts that must be aggregated downstream if you need a global count. For true global word count, either use a single partition or aggregate in a subsequent step.

Fault tolerance: Kafka Streams checkpointing persists state to Kafka topics. If a stream processor crashes, it resumes from the last checkpointed position without data loss.

Partition count sizing

The number of partitions determines parallelism for both producers and consumers. Getting it right at topic creation matters since partition count is immutable.

Factors that drive partition count:

FactorImpactConsideration
Desired consumer parallelismMore partitions means more concurrent consumersOne partition per consumer in a group
Producer throughput targetEach partition handles about 10 MB/sPartitions should be >= target_MB_s / 10
Maximum broker scaleKafka performance degrades past roughly 4000 partitions per brokerConsider broker count when sizing
Key cardinalityHigh-cardinality keys distribute evenlyAvoid partitions that become hot spots

Sizing formula:

def calculate_partition_count(
    target_throughput_mbps: float,
    max_consumer_parallelism: int,
    num_brokers: int,
    replication_factor: int = 3
) -> dict:
    """
    Calculate recommended partition count for a topic.
    """
    # Throughput-based: each partition handles ~10 MB/s reliably
    partitions_for_throughput = math.ceil(target_throughput_mbps / 10)

    # Consumer-based: need at least as many partitions as consumers
    partitions_for_consumers = max_consumer_parallelism

    # Broker-based: avoid too many partitions per broker
    # Guideline: < 4000 partitions per broker for good performance
    max_partitions_per_broker = 4000
    partitions_for_broker_capacity = num_brokers * max_partitions_per_broker

    recommended = max(partitions_for_throughput, partitions_for_consumers)
    recommended = min(recommended, partitions_for_broker_capacity)

    # Account for replication: total partitions including replicas
    total_partitions_in_cluster = recommended * replication_factor

    return {
        'recommended_partitions': recommended,
        'based_on': 'throughput' if partitions_for_throughput >= partitions_for_consumers else 'consumer_parallelism',
        'throughput_achievable_mbps': recommended * 10,
        'max_consumer_parallelism': recommended,
        'total_partition_slots_used': total_partitions_in_cluster,
        'partitions_per_broker': recommended / num_brokers
    }

# Example: 100 MB/s target, 30 consumers, 6 brokers
result = calculate_partition_count(100, 30, 6)
# Returns: recommended=30 (based on consumer count),
#          throughput_achievable=300 MB/s,
#          partitions_per_broker=5

Practical guidelines:

  • Start conservative: 6-12 partitions for most topics
  • Increase when consumer lag appears and more consumers cannot be added
  • Monitor per-partition throughput to detect hot spots
  • When repartitioning is needed, create a new topic and migrate data with a consumer that reads from both

Producer and consumer implications:

  • More partitions means more producer connections and higher client-side memory
  • More partitions means longer leader election time after broker failure
  • Consumers with many partitions need more heap for offset tracking

Kafka use cases

Kafka excels at specific workloads:

Event streaming

Capture events from many sources and distribute to multiple consumers:

Clickstream -> Kafka -> Analytics
                     -> Personalization
                     -> Fraud Detection
                     -> Audit Log

Message bus replacement

Replace traditional message queues with Kafka for better throughput and replay capability.

Data integration

Connect different systems without point-to-point coupling:

Database CDC -> Kafka -> Search Index
                     -> Cache
                     -> Data Warehouse
                     -> ML Pipeline

Change Data Capture from databases fits this pattern well. Any system can consume the stream without touching the source database.

Kafka vs traditional queues

AspectKafkaTraditional Queue
RetentionDays/weeks/foreverUntil consumed
ReplayYesNo (usually)
OrderingPer partitionPer queue
ThroughputVery highModerate
Consumer groupsIndependent per groupShared or exclusive

Kafka’s retention and replay make it unique. You can reprocess historical data if your processing logic changes. Traditional queues cannot do this.

For broader event-driven patterns, see our post on event-driven architecture. For pub/sub patterns that overlap with Kafka’s topic model, see pub/sub patterns.

When to use and when not to use

When to use Apache Kafka

  • High-throughput event streaming: when you need to process millions of events per second (clickstreams, IoT sensors, logs)
  • Event sourcing: when you need to store and replay event history for audit trails or read model rebuilding
  • Data pipeline backbone: when connecting multiple systems (CDC, ETL, data lake ingestion) without point-to-point coupling
  • Multiple independent consumers: when many different consumer groups need to process the same data stream independently
  • Ordered per-key processing: when messages with the same key must maintain order (same partition)
  • Long-term message retention: when you need replay capability for historical data analysis

When not to use Apache Kafka

  • Simple task queues: when you just need point-to-point work distribution with single consumer
  • Low-volume point-to-point messaging: when the overhead of managing partitions and consumer groups is not justified
  • Request/response: when you need synchronous responses from specific services
  • Message priority: when you need priority queuing (Kafka does not support this)
  • Small teams with limited operational capacity: when Kafka’s operational complexity (partition management, replication, leader election) exceeds team capabilities
  • Very low latency requirements: when sub-millisecond latency is critical (Kafka adds latency for batching and acknowledgment)

Production failure scenarios

FailureImpactMitigation
Broker goes downPartition leader election; temporary unavailabilityConfigure replication factor of 3; use ISR configuration
Controller failureCluster-wide coordination停顿Run multiple brokers; use ZooKeeper/KRaft for controller election
Network partitionFollowers fall out of ISR; potential data lossMonitor ISR size; alert when replicas fall behind
Producer retry stormDuplicate messages after transient failuresEnable idempotent producer; design idempotent consumers
Consumer rebalance stormThroughput drops during rebalancingUse sticky partition assignment; avoid frequent consumer restarts
Offset commit failureMessages reprocessed or skippedUse transactional producers with exactly-once semantics when needed
Partition imbalanceSome brokers overloaded while others idleMonitor partition distribution; use Cruise Control for rebalancing
Data loss on leader changeUnder-replicated partitions lose messagesEnsure min.insync.replicas >= 2; acks=all on producers

Observability checklist

Metrics to monitor

  • Consumer lag: difference between latest offset and consumer position (critical for SLAs)
  • Under-replicated partitions: partitions without full replication
  • ISR size: In-Sync Replicas count per partition
  • Message throughput: messages and bytes per second per topic
  • Request latency: P99 producer and consumer request latencies
  • Disk usage: broker disk utilization and growth rate
  • Consumer group status: active members and partition assignments
  • Controller status: leader elections and controller changes

Logs to capture

  • Broker startup and shutdown events
  • Partition leader election events
  • Consumer group rebalancing events
  • Producer acknowledgment failures and retries
  • Controller changes and election events
  • Under-replicated partition events
  • Disk space warnings

Alerts to configure

  • Consumer lag exceeds SLA threshold (for example, more than 5 minutes behind)
  • Under-replicated partitions greater than 0
  • Broker disk usage above 80%
  • Producer error rate above 1%
  • Consumer group has no active members
  • Controller is unavailable
  • Messages per second exceeds capacity threshold
  • Request latency P99 exceeds threshold

ZooKeeper and KRaft storage requirements

Kafka needs somewhere to store cluster metadata — ZooKeeper traditionally, KRaft in Kafka 3.3+.

What is stored:

  • Partition leadership and ISR membership
  • Consumer group offsets and membership
  • Access control lists (ACLs)
  • Topic configurations
  • Delegation token information
ZooKeeper/KRaft storage ≈
    partitions × (leadership + ISR state)
  + consumer_groups × (member offsets + metadata)
  + topics × (configs + ACLs)
  + delegation_tokens

Typical storage needs:

Cluster SizeTopicsPartitionsConsumer GroupsZooKeeper/KRaft Storage
Small (3 brokers)502003050-200 MB
Medium (6 brokers)2001,000150200-500 MB
Large (12 brokers)5005,0005001-3 GB
Very large (24+ brokers)1,000+20,000+2,000+5-10 GB

Storage is not the real bottleneck. Znode count is. ZooKeeper falls over when there are millions of child znodes — which happens when you have many consumer group members or partition replicas. KRaft mode removes ZooKeeper dependency and scales better.

If you are still on ZooKeeper, migrate to KRaft. Kafka 3.3+ supports live migration — no downtime needed.

Security checklist

  • Authentication: use SASL/PLAIN or SCRAM for client authentication; mTLS for certificate-based auth
  • Authorization: implement ACLs to restrict topic access; principle of least privilege
  • Encryption in transit: enable SSL/TLS for all broker and client connections
  • Encryption at rest: use disk encryption or Kafka’s Secret API for sensitive data
  • Schema validation: validate message schemas with Confluent Schema Registry
  • Data sanitization: sanitize message keys and values to prevent injection
  • Audit logging: enable Kafka’s audit logging for admin operations
  • Network segmentation: place brokers in private networks; restrict inter-broker communication

Common pitfalls and anti-patterns

Pitfall 1: too many partitions

Each partition increases Kafka’s overhead (file handles, memory, leader elections). Creating thousands of partitions when you only need dozens causes unnecessary complexity. Start with fewer partitions and increase only when needed.

Pitfall 2: ignoring consumer lag

If consumers cannot keep up with producers, lag grows indefinitely. Monitor lag continuously and scale consumers or optimize processing logic.

Pitfall 3: not using compression

Uncompressed messages waste network bandwidth and storage. Enable compression (LZ4, ZSTD, or Snappy) on producers.

Pitfall 4: auto.offset.reset = earliest without understanding consequences

This setting replays all messages from the beginning of the log. In production with large retention, this can overwhelm consumers. Use it deliberately, not as a default.

Pitfall 5: sending sensitive data unencrypted

Kafka does not encrypt messages by default. If you send PII, credentials, or sensitive data, enable encryption or do not send such data through Kafka.

Pitfall 6: not planning for partition count

Partition count is fixed at topic creation. If you need more later, you must recreate the topic. Plan partition count based on expected throughput and consumer parallelism requirements.

Quick recap

Key points

  • Kafka is a distributed log, not a queue; messages are retained and can be replayed
  • Topics divide into partitions for parallelism; same key goes to same partition
  • Consumer groups enable independent consumption by different services
  • At-least-once delivery is the default; exactly-once requires Kafka transactions
  • Rebalancing occurs when consumers join or leave; frequent rebalances hurt throughput
  • ZooKeeper (or KRaft in newer versions) manages cluster metadata

Pre-deployment checklist

- [ ] Replication factor set to 3 for critical topics
- [ ] min.insync.replicas configured appropriately
- [ ] Consumer lag monitoring configured and alerts set
- [ ] Producer retries and idempotency configured
- [ ] Compression enabled on producers
- [ ] Schema Registry deployed for schema validation
- [ ] ACLs configured for topic access control
- [ ] TLS/SSL encryption enabled for all connections
- [ ] Partition count planned based on throughput requirements
- [ ] Retention policy configured (hours, days, weeks)
- [ ] Dead letter topic configured for failed messages
- [ ] Consumer group offset reset policy defined
- [ ] Backup and disaster recovery plan documented

Conclusion

Kafka is a distributed log that happens to support messaging patterns. Topics are logs, partitions are shards, consumer groups enable independent consumption. The offset-based consumption model gives you replay capability that traditional queues lack.

For high-throughput event streaming, Kafka delivers. The operational complexity is real (ZooKeeper or KRaft for metadata, partition management, replication), but the scalability and durability are legitimate.

If you need true exactly-once, Kafka transactions are available but costly. For most pipelines, at-least-once with idempotent consumers is simpler and sufficient.

Category

Related Posts

Exactly-Once Delivery: The Elusive Guarantee

Explore exactly-once semantics in distributed messaging - why it's hard, how Kafka and SQS approach it, and practical patterns for deduplication.

#distributed-systems #messaging #kafka

Ordering Guarantees in Distributed Messaging

Understand how message brokers provide ordering guarantees - from FIFO queues to causal ordering across partitions, and trade-offs in distributed systems.

#distributed-systems #messaging #kafka

Dead Letter Queues: Handling Message Failures Gracefully

Design and implement Dead Letter Queues for reliable message processing. Learn DLQ patterns, retry strategies, monitoring, and recovery workflows.

#data-engineering #dead-letter-queue #kafka