Apache Kafka: Distributed Streaming Platform

Learn how Apache Kafka handles distributed streaming with partitions, consumer groups, exactly-once semantics, and event-driven architecture patterns.

published: reading time: 35 min read author: GeekWorkBench

Introduction

Apache Kafka is a distributed streaming platform built for high-throughput, fault-tolerant event streaming at scale. Originally developed at LinkedIn and later open-sourced through the Apache Foundation, Kafka has become the backbone of event-driven architectures across industries — powering real-time analytics pipelines, event sourcing systems, microservices communication, and log aggregation at companies like Netflix, Uber, and Airbnb.

Unlike traditional message queues that delete messages after consumption, Kafka persists messages in immutable logs. Producers write events to topics; consumers read from those topics independently. This durability and the ability to replay events make Kafka well-suited for systems where late-arriving consumers or audit trails matter.

This post covers the core concepts that make Kafka work: topics and partitions, consumer groups, offset management, exactly-once semantics, broker replication, dead letter queues, and backpressure handling. By the end, you’ll understand how Kafka achieves its legendary throughput, how to design consumer groups for parallelism, and how to avoid the common pitfalls in production.

Core Concepts

Topics and partitions

Kafka organizes data into topics. Unlike a queue where messages are consumed and deleted, Kafka topics are logs. Messages are appended and kept for a configurable retention period: hours, days, or indefinitely.

Topic: order-events
Partition 0: [msg1, msg2, msg3, msg5, msg8]
Partition 1: [msg4, msg6, msg7, msg9]
Partition 2: [msg10, msg11, msg12]

Each topic splits into partitions for parallelism. Partitions distribute across brokers. Within a partition, messages have a monotonically increasing offset that uniquely identifies each one.

Message keying

Producers can specify a key when publishing:

producer.send(new ProducerRecord("order-events", orderId, orderJson));

Kafka hashes the key to determine the partition. Messages with the same key always go to the same partition, which means ordering per key. All events for the same order arrive in the same partition, in order.

Partition assignment

Partitions assign to brokers at topic creation time:

Broker 1: Partition 0, Partition 3
Broker 2: Partition 1, Partition 4
Broker 3: Partition 2, Partition 5

The leader broker for each partition handles reads and writes. Followers replicate for fault tolerance.

Consumer groups

Kafka consumers belong to consumer groups. Each message in a topic delivers to one consumer within each group.

graph LR
    Producer -->|publish| Topic[Topic with 3 Partitions]
    Topic -->|P0| CG1[Consumer Group A]
    Topic -->|P1| CG1
    Topic -->|P2| CG1
    Topic -->|P0| CG2[Consumer Group B]
    Topic -->|P1| CG2
    Topic -->|P2| CG2

Group A might have one consumer processing all partitions. Group B might have three consumers, each owning one partition. Both groups receive all messages independently.

Rebalancing

When a consumer joins or leaves a group, Kafka rebalances partition assignments. The system redistributes partitions to maintain even load.

Rebalancing has a cost: during rebalancing, no consumer processes messages. If rebalances happen frequently, throughput suffers.

Offset management

Kafka tracks consumption progress via offsets. Consumers commit offsets to mark what they have processed:

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> record : records) {
        process(record);
    }
    consumer.commitSync();
}

If a consumer crashes after processing but before committing, it receives the same messages on restart. This is at-least-once delivery. With idempotent processing, duplicates are harmless.

Exactly-Once Delivery

Overview

Exactly-once is the gold standard: each message processes precisely once, no duplicates, no loss. Kafka offers exactly-once semantics via transactions.

The problem

Exactly-once is hard because processing involves multiple systems:

Kafka -> Consumer -> Database

If the consumer crashes after writing to the database but before committing the Kafka offset, the message reprocesses on restart, causing a duplicate database write.

Kafka transactions

Kafka transactions solve this by atomically committing offsets and output:

producer.initTransactions();
producer.beginTransaction();
producer.send(producerRecord);
producer.sendOffsetsToTransaction(consumer.offsets(), consumer.groupMetadata());
producer.commitTransaction();

If the transaction commits, the message writes and the offset commits atomically. If it aborts, neither happens. The consumer reads the message again.

When you need exactly-once

Most use cases do not need exactly-once. At-least-once with idempotent processing is simpler and performs better. Consider exactly-once only when:

  • Duplicate processing has serious consequences (financial transactions, inventory updates)
  • Your output system does not support idempotent writes
  • The cost is acceptable

For most event processing pipelines, at-least-once plus idempotency is the right choice.

End-to-end exactly-once flow

The exactly-once flow in practice:

sequenceDiagram
    participant Producer
    participant Kafka as Kafka Cluster
    participant Consumer as Consumer App
    participant DB as Output DB
    participant Coordinator as Transaction Coordinator

    Producer->>Kafka: send() with transactional ID
    Kafka->>Producer: acknowledged

    Consumer->>Kafka: poll() receives record
    Consumer->>DB: write to database
    Consumer->>Coordinator: sendOffsetsToTransaction()
    Coordinator->>Kafka: commit offsets + data atomically
    Kafka->>Coordinator: commit confirmed
    Coordinator->>Consumer: offset commit confirmed

    Note over Consumer,DB: If crash here: replay from committed offset, skip already-written records

The transaction coordinator bundles the database write and the offset commit together. They either both commit or both abort. When the consumer restarts, it resumes from the last committed offset — skipping any records that were already written to the output system.

Without transactions, there is a window between “wrote to database” and “committed offset.” Crash in that window and you get duplicates. Transactions close that window.

Broker Replication and Fault Tolerance

Replication and ISR

Kafka replicates partitions across brokers for fault tolerance. Each partition has a leader and multiple followers. Followers replicate messages from the leader, staying in sync.

graph LR
    subgraph Broker-1
        L1[Partition 0 Leader]
    end
    subgraph Broker-2
        F1[Partition 0 Follower - ISR]
    end
    subgraph Broker-3
        F2[Partition 0 Follower - ISR]
    end
    Producer -->|writes| L1
    L1 -->|replicate| F1
    L1 -->|replicate| F2
    L1 -->|consume| Consumer

In-Sync Replicas (ISR) are replicas that have fully caught up with the leader. Only ISR members can become leaders after a failure. The replication.factor setting controls how many replicas exist, and min.insync.replicas defines the minimum ISR size for acknowledging writes.

For example, with replication.factor=3 and min.insync.replicas=2, a partition has 3 replicas. Writes acknowledge when at least 2 replicas (the leader plus 1 follower) have persisted the message.

Key configuration defaults

ParameterDefaultDescriptionProduction recommendation
replication.factor1Number of replicas per partition3 for critical topics
min.insync.replicas1Minimum ISR for acknowledge2 (requires acks=all)
retention.ms7 daysMessage retention periodBased on replay requirements
acks1 (leader)Acknowledgment requiredall for critical data
compression.typeproducerCompression codeclz4 or zstd
max.in.flight.requests.per.connection5Unacknowledged requests1 for exactly-once

Dead Letter Queues

When processing fails and you cannot retry, messages need somewhere to go. Dead letter queues (DLQs) catch messages that consumer processing repeatedly fails on.

@Bean
public DeadLetterPublishingPostProcessor deadLetterPublishingPostProcessor(
        ConcurrentKafkaListenerContainerFactory<String, String> factory) {

    DefaultErrorHandler errorHandler = new DefaultErrorHandler(
        new FixedBackOff(1000L, 3)  // 3 retries, 1 second apart
    );

    // Send to DLQ after retries exhausted
    errorHandler.setBackOffMultiplier(2);
    factory.setCommonErrorHandler(errorHandler);

    return new DeadLetterPublishingPostProcessor(
        factory.getKafkaTemplate(),
        (record, exception) -> new TopicPartition(
            record.topic() + ".DLQ",  // convention: topic.DLQ
            record.partition()
        )
    );
}

This sends failed messages to order-events.DLQ after 3 retry attempts. The DLQ preserves the original topic, partition, and key so you can investigate without losing context.

DLQ design considerations

  • Monitoring: Alert when DLQ depth exceeds zero — messages piling up means something is wrong upstream
  • Retention: DLQ retention is often shorter than main topic; set a TTL or explicit cleanup job
  • Reprocessing: DLQ messages can be reprocessed by a debugging consumer or manually re-published to the original topic after fixing the issue
  • Causality tracking: Include the original exception stack trace or error code in the message value for debugging

Backpressure Handling

Kafka producers send messages faster than consumers can process them. Backpressure management prevents unbounded lag growth.

Consumer-side backpressure

max.poll.records limits how many messages a consumer fetches per poll:

config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 100);  // process 100, then poll again
config.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 300000);  // 5 minute max poll interval

fetch.min.bytes and fetch.max.wait.ms control how much data the consumer waits for:

config.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 1024 * 1024);  // wait for at least 1MB
config.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 500);  // or 500ms, whichever comes first

Producer-side backpressure

Producer buffer memory (buffer.memory) and batch.size interact to create natural backpressure. If the broker is slow and the send buffer fills, send() blocks or throws an exception:

config.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 32 * 1024 * 1024);  // 32 MB buffer
config.put(ProducerConfig.BATCH_SIZE_CONFIG, 16 * 1024);  // 16 KB batches
config.put(ProducerConfig.LINGER_MS_CONFIG, 5);  // wait up to 5ms to batch

Backpressure signals to watch

SignalMeaningAction
Consumer lag growingProducers outpacing consumersAdd consumers or optimize processing
Producer send() blockingBroker throughput saturatedAdd brokers or reduce producer load
Under-replicated partitions > 0Followers falling behind leaderCheck disk I/O, network, or broker load
Request timeout exceptionsBrokers too slow to respondIncrease request.timeout.ms or scale

Delivery Guarantees

Kafka provides three delivery guarantee levels. Understanding when each applies matters for system design.

At-most-once delivery

Messages may be lost but are never duplicated. This happens when consumers commit offsets before processing:

consumer.commitAsync();  // commit before processing
process(record);  // if crash here, message is lost

Use at-most-once when:

  • Duplicate messages cost more than missed messages (sensor data aggregation, metrics)
  • You need lowest possible latency and can tolerate gaps

At-least-once delivery (default)

Messages are never lost but may be duplicated. Consumer commits after processing:

process(record);  // do work first
consumer.commitSync();  // then commit offset

If the consumer crashes after processing but before committing, the same messages reprocess on restart. With idempotent operations (writes with unique keys), duplicates are safe.

Use at-least-once when:

  • Missing messages is worse than duplicates (inventory updates, payment processing)
  • Your consumers are idempotent

Exactly-once delivery

Each message processes exactly once, no duplicates, no loss. Requires Kafka transactions as shown earlier in this post.

Use exactly-once when:

  • Duplicate processing has serious consequences
  • Your output system cannot handle idempotent writes
  • The performance cost is acceptable
GuaranteeMessage LossDuplicatesLatencyBest Use Case
At-most-onceYesNoLowestTelemetry, metrics, gaps acceptable
At-least-onceNoYesMediumMost use cases; idempotent consumers
Exactly-onceNoNoHighestFinancial, inventory, idempotency-impossible

Consumer Group Coordination

Consumer group behavior depends on partition assignment strategy and coordination needs.

Sticky assignor

The default sticky assignor minimizes partition movement during rebalances. When a consumer leaves, its partitions stay together when reassigned rather than being scattered across remaining consumers.

config.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
    StickyAssignor.class.getName());

Benefit: fewer message reorderings during rebalance since partitions that were together stay together.

Cooperative sticky assignor

For minimal disruption during rebalances, use cooperative sticky assignor. It allows incremental rebalancing without stopping consumption entirely:

config.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
    CooperativeStickyAssignor.class.getName());

The consumer continues processing while partition ownership shifts incrementally.

Standalone consumers

Sometimes you need a single consumer not part of a group:

consumer.subscribe(List.of("topic"));  // group-based
// vs
consumer.assign(List.of(new TopicPartition("topic", 0)));  // standalone, manual assignment

Standalone consumers manually assign partitions and manage their own offsets. Useful for:

  • One-off administrative tasks (consuming from beginning to rebuild state)
  • Specialized processing pipelines that should not interfere with normal consumers
  • Debugging and testing

Maximum parallelism calculation

For a given topic, maximum consumer parallelism equals partition count:

Topic with 12 partitions
  → Up to 12 consumers in the group (each gets 1 partition)
  → Adding more consumers does nothing (no extra partitions)

If you need 20 consumers for parallel processing:
  → Topic must have at least 20 partitions

Partition count is fixed at topic creation. Plan accordingly.

Kafka Streams example

Kafka Streams is a client library for building real-time stream processing applications. The word count example is the canonical demonstration. It counts occurrences of words across an infinite stream of text events:

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;

import java.util.Arrays;
import java.util.Properties;

public class WordCountApplication {
    public static void main(String[] args) {
        Properties config = new Properties();
        config.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-app");
        config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        StreamsBuilder builder = new StreamsBuilder();

        // Source: read from input topic
        KStream<String, String> textLines = builder.stream("text-lines-topic");

        // Process: split into words, group, count
        textLines
            .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("\\W+")))
            .groupBy((key, word) -> word)
            .count(Materialized.as("word-counts-store"))
            .toStream()
            .to("word-counts-output-topic");

        KafkaStreams streams = new KafkaStreams(builder.build(), config);
        streams.start();
    }
}

How it works:

  1. flatMapValues splits each input line into lowercase words
  2. groupBy groups by word, discarding the message key
  3. count maintains a running count per word in a state store
  4. toStream.to writes results to the output topic

Scaling behavior: each partition processes words independently. The word “kafka” appearing in partitions 0 and 1 produces two separate counts that must be aggregated downstream if you need a global count. For true global word count, either use a single partition or aggregate in a subsequent step.

Fault tolerance: Kafka Streams checkpointing persists state to Kafka topics. If a stream processor crashes, it resumes from the last checkpointed position without data loss.

Topic-Specific Deep Dives

Advanced Partition Sizing

The number of partitions determines parallelism for both producers and consumers. Getting it right at topic creation matters since partition count is immutable.

Partition count sizing

Factors that drive partition count:

FactorImpactConsideration
Desired consumer parallelismMore partitions means more concurrent consumersOne partition per consumer in a group
Producer throughput targetEach partition handles about 10 MB/sPartitions should be >= target_MB_s / 10
Maximum broker scaleKafka performance degrades past roughly 4000 partitions per brokerConsider broker count when sizing
Key cardinalityHigh-cardinality keys distribute evenlyAvoid partitions that become hot spots

Sizing formula:

def calculate_partition_count(
    target_throughput_mbps: float,
    max_consumer_parallelism: int,
    num_brokers: int,
    replication_factor: int = 3
) -> dict:
    """
    Calculate recommended partition count for a topic.
    """
    # Throughput-based: each partition handles ~10 MB/s reliably
    partitions_for_throughput = math.ceil(target_throughput_mbps / 10)

    # Consumer-based: need at least as many partitions as consumers
    partitions_for_consumers = max_consumer_parallelism

    # Broker-based: avoid too many partitions per broker
    # Guideline: < 4000 partitions per broker for good performance
    max_partitions_per_broker = 4000
    partitions_for_broker_capacity = num_brokers * max_partitions_per_broker

    recommended = max(partitions_for_throughput, partitions_for_consumers)
    recommended = min(recommended, partitions_for_broker_capacity)

    # Account for replication: total partitions including replicas
    total_partitions_in_cluster = recommended * replication_factor

    return {
        'recommended_partitions': recommended,
        'based_on': 'throughput' if partitions_for_throughput >= partitions_for_consumers else 'consumer_parallelism',
        'throughput_achievable_mbps': recommended * 10,
        'max_consumer_parallelism': recommended,
        'total_partition_slots_used': total_partitions_in_cluster,
        'partitions_per_broker': recommended / num_brokers
    }

# Example: 100 MB/s target, 30 consumers, 6 brokers
result = calculate_partition_count(100, 30, 6)
# Returns: recommended=30 (based on consumer count),
#          throughput_achievable=300 MB/s,
#          partitions_per_broker=5

Practical guidelines:

  • Start conservative: 6-12 partitions for most topics
  • Increase when consumer lag appears and more consumers cannot be added
  • Monitor per-partition throughput to detect hot spots
  • When repartitioning is needed, create a new topic and migrate data with a consumer that reads from both

Producer and consumer implications:

  • More partitions means more producer connections and higher client-side memory
  • More partitions means longer leader election time after broker failure
  • Consumers with many partitions need more heap for offset tracking

Kafka use cases

Kafka excels at specific workloads:

Event streaming

Capture events from many sources and distribute to multiple consumers:

Clickstream -> Kafka -> Analytics
                     -> Personalization
                     -> Fraud Detection
                     -> Audit Log

Message bus replacement

Replace traditional message queues with Kafka for better throughput and replay capability.

Data integration

Connect different systems without point-to-point coupling:

Database CDC -> Kafka -> Search Index
                     -> Cache
                     -> Data Warehouse
                     -> ML Pipeline

Change Data Capture from databases fits this pattern well. Any system can consume the stream without touching the source database.

Kafka vs traditional queues

AspectKafkaTraditional Queue
RetentionDays/weeks/foreverUntil consumed
ReplayYesNo (usually)
OrderingPer partitionPer queue
ThroughputVery highModerate
Consumer groupsIndependent per groupShared or exclusive

Kafka’s retention and replay make it unique. You can reprocess historical data if your processing logic changes. Traditional queues cannot do this.

For broader event-driven patterns, see our post on event-driven architecture. For pub/sub patterns that overlap with Kafka’s topic model, see pub/sub patterns.

Trade-off Analysis

When designing Kafka-based systems, understanding trade-offs helps you make informed decisions.

Throughput vs Durability

ApproachThroughputDurabilityLatencyComplexity
acks=1 (leader only)HighestLowLowestLowest
acks=all + min.isr=2MediumHighMediumMedium
acks=all + min.isr=3LowerHighestHigherMedium
Exactly-once modeLowestHighestHighestHighest

Partition Count vs Overhead

PartitionsMax ConsumersMemory OverheadLeader Election Time
6-126-12LowFast (< 100ms)
50-10050-100MediumMedium (1-2s)
500+500+HighSlow (10+s)
4000+/brokerDependsVery highVery slow

Retention vs Storage Cost

RetentionStorage MultiplierReplay WindowBest For
1 hour1x baselineLimitedReal-time only
7 days5-7x baselineOne weekMost applications
30 days20-30x baselineOne monthRegulatory compliance
IndefiniteVariableFull historyEvent sourcing, audit logs

Consumer Scaling Constraints

ScenarioLimiting FactorSolution
Lag growing, spare partitionsConsumer count < partitionsAdd consumers up to partition count
Lag growing, no spare partitionsPartition count limits parallelismIncrease partitions (recreate topic)
Lag growing, many consumers but still behindProcessing bottleneckOptimize logic, scale horizontally with more partitions
Rebalances causing throughput dropsFrequent consumer restartsUse sticky/cooperative sticky assignor

Exactly-once vs At-least-once Decision Matrix

FactorUse At-least-onceUse Exactly-once
Duplicate processing costLow (metrics, analytics)High (financial, inventory)
Output system supportAnyMust support idempotent writes
Throughput requirementHighModerate to high
Operational complexityLowerHigher
End-to-end latency toleranceLowHigher

Production Failure Scenarios

Understanding how Kafka fails in production helps you design more resilient systems.

Scenario 1: Broker network partition

When a broker loses network connectivity but doesn’t crash:

Broker-2 becomes unreachable
  → Partition 0 leader (on Broker-2) stops responding
  → Followers on Broker-1 and Broker-3 detect leader timeout
  → Kafka controller triggers leader election
  → ISR shrinks to exclude unreachable broker
  → min.insync.replicas check: if remaining < min.insync.replicas, writes fail
  → Producer receives NotEnoughReplicasException

Impact: Temporary unavailability of the partition. If unclean.leader.election=true and no ISR available, potential message loss.

Scenario 2: Zombie consumer problem

Consumer crashes but doesn’t leave group gracefully:

Consumer-1 processes messages and writes to database
Consumer-1 crashes AFTER database write but BEFORE offset commit
  → Consumer-1's partition sits unassigned
  → No consumer processes those messages (lag grows)
  → After session.timeout expires, Kafka triggers rebalance
  → New consumer picks up partition, replays same messages
  → Without idempotent processing: duplicate writes occur

The fix: idempotent consumers handle this gracefully. If you cannot make processing idempotent, use exactly-once semantics.

Scenario 3: Schema Registry mismatch

Producer and consumer evolve schemas independently:

Producer v1: { "userId": "string", "action": "string" }
Producer v2: { "userId": "string", "action": "string", "metadata": { "source": "string" } }
Consumer still expects v1 schema
  → Deserialization fails
  → Message goes to DLQ (if configured) or consumer crashes
  → Backlog of unprocessed messages accumulates

The fix: use Schema Registry with compatibility checking. Never evolve schemas in incompatible ways.

Scenario 4: Clock skew in clustered deployment

Brokers on different machines have clock skew:

Broker-1 clock: 1000
Broker-2 clock: 990 (10 seconds behind)
Broker-3 clock: 1010 (10 seconds ahead)

Leader write at timestamp 1005 (Broker-1 time)
  → Broker-2 sees this as future timestamp (1005 > 990)
  → Log segment index corrupted on Broker-2 replica
  → During leader election, Broker-2's replica deemed invalid
  → ISR shrinks, potential data loss if Broker-1 goes down

The fix: use NTP synchronization across all brokers. Monitor clock drift with automated alerts.

Scenario 5: Partition reassignment during peak load

Operations team rebalances partitions while producers are at full throughput:

Original: Broker-1 has partitions 0,1,2; Broker-2 has 3,4,5
Reassignment triggered:
  → New replicas start copying from leaders (network spike)
  → ISR temporarily expands (old + new replicas receiving)
  → Controller overwhelmed by metadata updates
  → Request latency P99 spikes to 10+ seconds
  → Producer buffers fill, send() blocks
  → Consumer lag grows as brokers struggle to keep up

The fix: schedule reassignments during low-traffic windows. Use Cruise Control for automated, throttled reassignment.

FailureImpactMitigation
Broker goes downPartition leader election; temporary unavailabilityConfigure replication factor of 3; use ISR configuration
Controller failureCluster-wide coordination pauseRun multiple brokers; use ZooKeeper/KRaft for controller election
Network partitionFollowers fall out of ISR; potential data lossMonitor ISR size; alert when replicas fall behind
Producer retry stormDuplicate messages after transient failuresEnable idempotent producer; design idempotent consumers
Consumer rebalance stormThroughput drops during rebalancingUse sticky partition assignment; avoid frequent consumer restarts
Offset commit failureMessages reprocessed or skippedUse transactional producers with exactly-once semantics when needed
Partition imbalanceSome brokers overloaded while others idleMonitor partition distribution; use Cruise Control for rebalancing
Data loss on leader changeUnder-replicated partitions lose messagesEnsure min.insync.replicas >= 2; acks=all on producers

Common Pitfalls / Anti-Patterns

Pitfall 1: too many partitions

Each partition increases Kafka’s overhead (file handles, memory, leader elections). Creating thousands of partitions when you only need dozens causes unnecessary complexity. Start with fewer partitions and increase only when needed.

Not planning for partition count

Partition count is fixed at topic creation. If you need more later, you must recreate the topic. Plan partition count based on expected throughput and consumer parallelism requirements.

Pitfall 2: ignoring consumer lag

If consumers cannot keep up with producers, lag grows indefinitely. Monitor lag continuously and scale consumers or optimize processing logic.

Pitfall 3: not using compression

Uncompressed messages waste network bandwidth and storage. Enable compression (LZ4, ZSTD, or Snappy) on producers.

Pitfall 4: auto.offset.reset = earliest without understanding consequences

This setting replays all messages from the beginning of the log. In production with large retention, this can overwhelm consumers. Use it deliberately, not as a default.

Pitfall 5: sending sensitive data unencrypted

Kafka does not encrypt messages by default. If you send PII, credentials, or sensitive data, enable encryption or do not send such data through Kafka.

Interview Questions

1. What is the fundamental difference between Kafka and a traditional message queue?

Expected answer points:

  • Kafka is a distributed streaming platform built around the concept of a durable log, not a queue
  • Messages in Kafka topics are retained for a configurable period (hours, days, indefinitely) rather than being deleted upon consumption
  • This retention enables replay capability: consumers can re-read historical data from any point in the log
  • Traditional queues typically delete messages after consumption and do not support replay
  • Kafka's topic model supports multiple independent consumer groups, each reading the same data independently
2. How does Kafka guarantee ordering within a partition?

Expected answer points:

  • Within a partition, messages have monotonically increasing offsets that define total order
  • Producers can specify a message key; Kafka hashes the key to determine partition assignment
  • All messages with the same key go to the same partition, guaranteeing order per key
  • Consumers read messages in offset order from their assigned partitions
  • Cross-partition ordering is not guaranteed; only per-partition ordering is maintained
3. Explain the difference between at-least-once, at-most-once, and exactly-once delivery semantics in Kafka.

Expected answer points:

  • At-most-once: consumer commits offsets before processing; messages may be lost but never duplicated (lowest latency)
  • At-least-once (default): consumer processes messages then commits offsets; messages may be duplicated but never lost
  • Exactly-once: uses Kafka transactions to atomically commit offsets and output writes; no duplicates, no loss (highest latency)
  • The choice depends on use case: at-least-once with idempotent consumers handles most scenarios well
  • Exactly-once should only be used when duplicate processing has serious consequences and the cost is acceptable
4. What is a Consumer Group and how does it affect message consumption?

Expected answer points:

  • A consumer group is a set of consumers cooperating to consume messages from a topic
  • Each partition is delivered to exactly one consumer within a group (ensuring parallelism)
  • Different consumer groups each receive all messages independently
  • When a consumer joins or leaves, Kafka rebalances partition assignments across the group
  • Rebalancing temporarily pauses consumption; frequent rebalances hurt throughput
5. What is ISR (In-Sync Replicas) and why does it matter for durability?

Expected answer points:

  • ISR are replicas that have fully caught up with the partition leader
  • Only ISR members can become leader after a failure
  • The min.insync.replicas setting determines the minimum ISR size for acknowledging writes
  • With replication.factor=3 and min.insync.replicas=2, writes acknowledge when at least 2 replicas persist the message
  • If all in-sync replicas fall behind or fail, the partition becomes unavailable for writes
6. How do you handle messages that fail processing repeatedly in Kafka?

Expected answer points:

  • Dead Letter Queues (DLQs) catch messages that exhaust retries
  • Configure a DefaultErrorHandler with FixedBackOff for retry behavior (e.g., 3 retries, 1 second apart)
  • DLQ preserves original topic, partition, and key for debugging
  • Monitor DLQ depth — messages piling up indicate upstream issues
  • DLQ messages can be reprocessed after fixing the underlying issue
7. What are the key configuration parameters for ensuring durability in Kafka?

Expected answer points:

  • replication.factor = 3 (or higher) for critical topics to ensure redundancy
  • min.insync.replicas = 2 (requires acks=all) so writes persist to multiple replicas
  • acks = all (wait for all ISR to acknowledge before confirming write)
  • retention.ms configured based on replay requirements (hours, days, or indefinite)
  • Enable producer idempotency to prevent duplicates during retries
8. How does the partition count affect Kafka performance and scalability?

Expected answer points:

  • Partition count determines maximum consumer parallelism (one consumer per partition per group)
  • Each partition handles roughly 10 MB/s throughput; size partitions accordingly
  • Kafka performance degrades past ~4000 partitions per broker
  • Partition count is immutable after topic creation; plan ahead
  • More partitions means higher overhead (file handles, memory, leader election time)
9. What is the difference between ZooKeeper and KRaft modes in Kafka?

Expected answer points:

  • ZooKeeper: traditional metadata management (partition leadership, ISR, consumer offsets, ACLs)
  • KRaft: Kafka's built-in consensus protocol (Kafka 3.3+) removes ZooKeeper dependency
  • KRaft scales better — ZooKeeper struggles with millions of znodes (common with many consumer groups)
  • KRaft enables simpler cluster setup and faster controller election
  • Kafka 3.3+ supports live migration from ZooKeeper to KRaft without downtime
10. How would you design a Kafka-based system to handle backpressure?

Expected answer points:

  • Consumer-side: limit max.poll.records and configure fetch.min.bytes/fetch.max.wait.ms
  • Producer-side: buffer.memory and batch.size create natural backpressure when broker is slow
  • Monitor consumer lag — growing lag signals producers outpacing consumers
  • Watch for under-replicated partitions, producer send() blocking, and request timeout exceptions
  • Scale consumers horizontally (if partitions allow) or optimize processing logic
  • Scale brokers if broker throughput is the bottleneck
11. What is log compaction in Kafka and when would you use it?

Expected answer points:

  • Log compaction retains the latest message for each key within a partition, discarding older messages with the same key
  • Unlike time-based retention which deletes messages after a period, compaction keeps the most recent value for each key indefinitely
  • Use cases: maintaining a lookup table or changelog where only the latest state matters (e.g., customer profile updates)
  • Enables Kafka as a key-value store or database for event sourcing where you need the current state, not full history
  • Compaction runs in the background and does not block normal writes
12. How does Kafka handle schema evolution with Schema Registry?

Expected answer points:

  • Schema Registry stores and validates message schemas (Avro or JSON Schema) separately from Kafka
  • Producers and consumers register and retrieve schemas by subject name, enabling schema validation at publish time
  • Schema compatibility modes control evolution: BACKWARD, FORWARD, FULL, NONE
  • BACKWARD compatibility allows consumers reading new data to work with old schemas (most common)
  • Without Schema Registry, incompatible schema changes cause deserialization errors or silent data corruption
  • Schema Registry also compresses message payloads by storing schema IDs instead of full schemas in each message
13. What is Kafka Connect and what are the key components?

Expected answer points:

  • Kafka Connect is a framework for scalably and reliably streaming data between Kafka and external systems
  • Connectors are plugins that define how to interact with source systems (databases, S3, JDBC) or sink systems
  • Workers are the processes that execute connectors; they can run in standalone or distributed mode
  • Converters handle serialization/deserialization of record keys and values (Avro, JSON, Parquet)
  • Transforms are optional lightweight modifications to records (e.g., filtering, adding fields)
  • Offset storage: Connect manages offset tracking internally, typically in a Kafka topic
14. Explain the role of the Transaction Coordinator in Kafka exactly-once semantics.

Expected answer points:

  • The Transaction Coordinator is a Kafka broker component that manages the two-phase commit protocol for transactions
  • Phase 1 (prepare): producer sends commit request; coordinator writes a prepare marker to all involved partitions
  • Phase 2 (commit): coordinator writes a commit marker atomically across all partitions
  • If the producer crashes before commit, the coordinator detects and aborts the transaction on restart
  • Consumer uses the transaction marker to filter out aborted transactions via isolation.level=read_committed
  • The transactional.id ensures exactly-once semantics even across producer restarts
15. How does the consumer partition assignment strategy affect performance?

Expected answer points:

  • Range assignor: assigns partitions contiguously per topic; can cause imbalance if topic partition counts differ
  • RoundRobin assignor: distributes partitions evenly across consumers regardless of topic; better balance
  • Sticky assignor: minimizes partition movement during rebalances, reducing reprocessing overhead
  • Cooperative sticky assignor: allows incremental rebalancing without full stop-the-world pauses
  • Choice affects rebalance duration, message ordering during rebalance, and consumer CPU utilization
  • For stateful consumers, sticky assignment prevents costly state migration
16. What are the trade-offs between increasing partitions versus adding consumers?

Expected answer points:

  • Adding consumers only helps if there are spare partitions; max parallelism equals partition count
  • Increasing partitions increases parallelism but is permanent and costly: more file handles, memory overhead, longer leader election
  • More partitions also increases end-to-end latency (more replication, more producer batching decisions)
  • Rebalancing with many partitions takes longer, temporarily impacting throughput
  • Rule of thumb: target 10-100 MB/s per partition throughput, start conservative (6-12 partitions)
  • If lag is growing and no spare partitions, you must add partitions (requires topic recreation or new topic)
17. How do you monitor Kafka consumer lag and what are acceptable thresholds?

Expected answer points:

  • Consumer lag is the difference between the latest offset and the consumer's committed offset
  • Kafka exposes lag per partition via KafkaConsumer.metrics() or tools like kafka-consumer-groups
  • JMX metrics: ConsumerLag and FetchManager metrics in the kafka.consumer group
  • Monitoring tools: Confluent Control Center, Kafka Manager, Prometheus with JMX exporter, Datadog
  • Acceptable threshold depends on SLA: for 5-minute SLA, lag should stay under 5 minutes
  • Growing lag signals producers outpacing consumers; investigate processing bottlenecks or scale consumers
18. What is the purpose of Kafka's retention policy and how do you choose the right setting?

Expected answer points:

  • Retention policy determines how long messages are kept before being eligible for deletion
  • Configured via retention.ms (time-based) or retention.bytes (size-based) per topic
  • Choose based on use case: event sourcing needs long retention for replay, real-time analytics may need shorter
  • Consider downstream consumers needing to reprocess historical data if processing logic changes
  • Longer retention increases storage costs; balance between replay window and cost
  • For compliance or audit requirements, retention may need to be days or weeks
19. How does Kafka achieve fault tolerance at the broker level?

Expected answer points:

  • Partition replication: each partition has a leader and ISR followers on different brokers
  • If a broker fails, Kafka automatically elects a new leader from ISR members
  • min.insync.replicas ensures writes persist to multiple replicas before acknowledgment
  • Controller broker manages partition leadership and cluster coordination (backed by ZooKeeper or KRaft)
  • unclean.leader.election setting controls behavior when no ISR is available: can lose messages if set to true
  • racks configuration allows placing replicas in different physical locations for rack-aware failure handling
20. What are the key differences between Kafka and other streaming platforms like Apache Pulsar or RabbitMQ?

Expected answer points:

  • Kafka uses partition-centric model with consumer groups; Pulsar uses subscription types (exclusive, failover, shared, key-shared)
  • Pulsar separates storage into tiered storage (BookKeeper for real-time, Apache Tiered Storage for historical)
  • RabbitMQ is a traditional broker with exchanges and queues, not a log-based system; no native replay
  • Pulsar supports geo-replication out-of-the-box; Kafka requires MirrorMaker
  • Kafka has a larger ecosystem and community; Pulsar offers better multi-tenancy and geo-replication
  • For pure message queuing with acknowledgments, RabbitMQ excels; for high-throughput event streaming with replay, Kafka excels

Further Reading

Conclusion

Key points

  • Kafka is a distributed log, not a queue; messages are retained and can be replayed
  • Topics divide into partitions for parallelism; same key goes to same partition
  • Consumer groups enable independent consumption by different services
  • At-least-once delivery is the default; exactly-once requires Kafka transactions
  • Rebalancing happens when consumers join or leave; frequent rebalances hurt throughput
  • ZooKeeper (or KRaft in newer versions) manages cluster metadata

Configuration Essentials

Core Setup

  • Partition count planned for target throughput and consumer parallelism
  • Replication factor set to 3 for critical topics
  • min.insync.replicas configured to 2 (requires acks=all)
  • Retention policy set based on replay requirements

Producer Configuration

  • Idempotent producer enabled
  • Compression enabled (LZ4 or ZSTD recommended)
  • acks=all for critical data
  • retries configured with appropriate backoff

Consumer Configuration

  • Consumer group offset reset policy defined
  • max.poll.records tuned for processing capacity
  • session.timeout appropriate for your use case
  • Partition assignment strategy selected (sticky or cooperative sticky for production)

Operations and Monitoring

Observability checklist

Metrics to monitor
  • Consumer lag: difference between latest offset and consumer position (critical for SLAs)
  • Under-replicated partitions: partitions without full replication
  • ISR size: In-Sync Replicas count per partition
  • Message throughput: messages and bytes per second per topic
  • Request latency: P99 producer and consumer request latencies
  • Disk usage: broker disk utilization and growth rate
  • Consumer group status: active members and partition assignments
  • Controller status: leader elections and controller changes
Logs to capture
  • Broker startup and shutdown events
  • Partition leader election events
  • Consumer group rebalancing events
  • Producer acknowledgment failures and retries
  • Controller changes and election events
  • Under-replicated partition events
  • Disk space warnings
Alerts to configure
  • Consumer lag exceeds SLA threshold (for example, more than 5 minutes behind)
  • Under-replicated partitions greater than 0
  • Broker disk usage above 80%
  • Producer error rate above 1%
  • Consumer group has no active members
  • Controller is unavailable
  • Messages per second exceeds capacity threshold
  • Request latency P99 exceeds threshold

ZooKeeper and KRaft storage requirements

Kafka needs somewhere to store cluster metadata — ZooKeeper traditionally, KRaft in Kafka 3.3+.

What is stored:

  • Partition leadership and ISR membership
  • Consumer group offsets and membership
  • Access control lists (ACLs)
  • Topic configurations
  • Delegation token information
ZooKeeper/KRaft storage ≈
    partitions × (leadership + ISR state)
  + consumer_groups × (member offsets + metadata)
  + topics × (configs + ACLs)
  + delegation_tokens

Typical storage needs:

Cluster SizeTopicsPartitionsConsumer GroupsZooKeeper/KRaft Storage
Small (3 brokers)502003050-200 MB
Medium (6 brokers)2001,000150200-500 MB
Large (12 brokers)5005,0005001-3 GB
Very large (24+ brokers)1,000+20,000+2,000+5-10 GB

Storage is not the real bottleneck. Znode count is. ZooKeeper falls over when there are millions of child znodes — which happens when you have many consumer group members or partition replicas. KRaft mode removes ZooKeeper dependency and scales better.

If you are still on ZooKeeper, migrate to KRaft. Kafka 3.3+ supports live migration — no downtime needed.

Deployment Readiness

Pre-deployment checklist

- [ ] Replication factor set to 3 for critical topics
- [ ] min.insync.replicas configured appropriately
- [ ] Consumer lag monitoring configured and alerts set
- [ ] Producer retries and idempotency configured
- [ ] Compression enabled on producers
- [ ] Schema Registry deployed for schema validation
- [ ] ACLs configured for topic access control
- [ ] TLS/SSL encryption enabled for all connections
- [ ] Partition count planned based on throughput requirements
- [ ] Retention policy configured (hours, days, weeks)
- [ ] Dead letter topic configured for failed messages
- [ ] Consumer group offset reset policy defined
- [ ] Backup and disaster recovery plan documented

Security

  • SASL/SCRAM or mTLS configured for authentication
  • ACLs configured for topic access control
  • TLS/SSL encryption enabled for all connections
  • Sensitive data encryption configured

Security checklist

  • Authentication: use SASL/PLAIN or SCRAM for client authentication; mTLS for certificate-based auth
  • Authorization: implement ACLs to restrict topic access; principle of least privilege
  • Encryption in transit: enable SSL/TLS for all broker and client connections
  • Encryption at rest: use disk encryption or Kafka’s Secret API for sensitive data
  • Schema validation: validate message schemas with Confluent Schema Registry
  • Data sanitization: sanitize message keys and values to prevent injection
  • Audit logging: enable Kafka’s audit logging for admin operations
  • Network segmentation: place brokers in private networks; restrict inter-broker communication

Category

Related Posts

Exactly-Once Delivery: The Elusive Guarantee

Explore exactly-once semantics in distributed messaging - why it's hard, how Kafka and SQS approach it, and practical patterns for deduplication.

#distributed-systems #messaging #kafka

Ordering Guarantees in Distributed Messaging

Understand how message brokers provide ordering guarantees, from FIFO queues to causal ordering across partitions, and the trade-offs in distributed systems.

#distributed-systems #messaging #kafka

Dead Letter Queues: Handling Message Failures Gracefully

Design and implement Dead Letter Queues for reliable message processing. Learn DLQ patterns, retry strategies, monitoring, and recovery workflows.

#data-engineering #dead-letter-queue #kafka