Event Sourcing

Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.

published: March 26, 2026 reading time: 25 min read author: GeekWorkBench updated: January 1, 1970

Quick Summary

Event sourcing stores the history of changes directly as immutable events rather than just current state, so the audit trail comes built in and you can replay state at any point in time. Instead of "order status is shipped," you store "order was placed, payment received, order shipped," and derive current state by replaying events in order. The event store is append-only, organized by aggregate with timestamp ordering, and projections build read models from the event stream. The tradeoff is handling schema evolution as event types change over time, projection rebuild performance on high-event volumes, and event immutability constraints. After reading, you will understand when event sourcing fits versus traditional CRUD, how to implement an event store, and how to handle schema evolution and projections.

Event Sourcing

Most systems store current state. You have an Order with status “shipped.” That is what the database knows. If you want to know the order’s history, you add created_at and updated_at columns, or a separate audit log, or a history table.

Event sourcing stores the history directly. Instead of “order status is shipped,” you store “order was placed,” “payment received,” “order shipped.” The current state comes from replaying these events.

This changes how you think about data. Every change is recorded.

Introduction

Most applications store current state and treat history as an afterthought. Event sourcing makes history the primary store — every change is recorded as an immutable event, and current state is derived by replaying those events. This gives you a complete audit trail for free, the ability to replay state at any point in time, and a natural fit for asynchronous event-driven architectures.

This guide covers the core mechanics of event sourcing (event stores, event replay, projections), schema evolution strategies for handling event type changes over time, performance considerations for high-event-volume domains, and how event sourcing compares to traditional CRUD. It also covers common pitfalls: event immutability, partitioning for event ordering, and projection rebuild performance.

The Core Idea

In event sourcing, you store events, not state. An event represents something that happened:

OrderPlaced
PaymentReceived
OrderShipped
OrderDelivered

To find the current state of an order, replay all its events in order. To find history, read the events directly.

# Events
class OrderPlaced:
    order_id: str
    customer_id: str
    items: list
    timestamp: datetime

class PaymentReceived:
    order_id: str
    amount: decimal
    timestamp: datetime

class OrderShipped:
    order_id: str
    tracking_number: str
    timestamp: datetime

# Event store appends events
event_store.append(OrderPlaced(order_id="123", ...))
event_store.append(PaymentReceived(order_id="123", ...))
event_store.append(OrderShipped(order_id="123", ...))

# To get current state, replay events
def get_order_state(order_id):
    events = event_store.get_events_for(order_id)
    state = Order()
    for event in events:
        state = apply_event(state, event)
    return state

Real implementations handle concurrency, snapshots, and versioning. The principle holds.

flowchart LR
    Command[("Command<br/>PlaceOrder")] --> Append[("Event Store<br/>Append-only")]

    Append --> Stream[("Event Stream<br/>Order-123")]

    Stream --> Proj1[("Projection<br/>Order State")]

    Stream --> Proj2[("Projection<br/>Customer History")]

    Stream --> Proj3[("Projection<br/>Analytics Events")]

    Proj1 -.->|periodic| Snapshot[("Snapshot<br/>Version N")]

    Snapshot -->|faster rebuild| Proj1

    ReadModel1[("Read Model<br/>Order Detail")] <--> Proj1

    ReadModel2[("Read Model<br/>Customer View")] <--> Proj2

    ReadModel3[("Read Model<br/>Dashboard")] <--> Proj3

Event Store Implementation

An event store is an append-only log of events. Properties:

Events are immutable once written
Events are organized by aggregate (entity)
Events are ordered by timestamp or sequence number
Events include metadata: timestamp, user, correlation ID

Simple implementations use a relational table:

CREATE TABLE events (
    event_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    aggregate_id VARCHAR(100) NOT NULL,
    aggregate_type VARCHAR(50) NOT NULL,
    event_type VARCHAR(100) NOT NULL,
    event_data JSON NOT NULL,
    metadata JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_aggregate (aggregate_id, created_at)
);

Production event stores like EventStoreDB, Axon, and Marten provide stream management, snapshots for performance, projections for building read models, and subscriptions for reacting to new events.

Event Replay for State Reconstruction

Given all events for an aggregate, you can reconstruct its state at any point in time. This works for debugging, auditing, and recovery.

“Show me the order state as it was on March 15 at 3pm”: replay events up to that timestamp.

“Why did this order get cancelled?”: read the events and see exactly what happened and in what order.

Bug in projection logic? Rebuild the read model from scratch by replaying all events.

Snapshots improve performance for long-lived aggregates. Store periodic snapshots and replay only events since the last snapshot instead of replaying everything from the beginning.

# With snapshots
def get_order_state(order_id, as_of_date):
    snapshot = event_store.get_snapshot_before(order_id, as_of_date)
    events = event_store.get_events_since(order_id, snapshot.version, until=as_of_date)
    state = snapshot.data
    for event in events:
        state = apply_event(state, event)
    return state

Eventual Consistency and Read Model Rebuilding

Event sourcing typically uses CQRS. Commands produce events. Events are stored. Separate read models consume events and build query-optimized views.

The read model is eventually consistent. Events are written, then published to consumers who update read models. There is a delay between a command succeeding and the read model reflecting that change.

This delay is typically milliseconds in normal operation but can stretch under load or during rebuilding.

Read model rebuilding is useful. You can add new projections to existing data, rebuild after fixing bugs, create new read models for new features from historical data, and test projections against the same events.

# Rebuild a read model from scratch
def rebuild_read_model(projection, from_version=0):
    events = event_store.get_all_events(from_version=from_version)
    for event in events:
        projection.handle(event)
    projection.persist()

Comparing with Traditional CRUD

Traditional CRUD stores current state directly. Update overwrites previous state.

Aspect	CRUD	Event Sourcing
Data stored	Current state	History of changes
Update behavior	Overwrite	Append
History	Optional, must be designed	Built-in
Debugging	Limited visibility	Full audit trail
Read models	Direct from primary store	Built from events
Storage	Typically less	Typically more
Complexity	Simpler	More complex

CRUD works for simple domains where current state is what matters and history is not important. Event sourcing pays off when history matters—audit requirements, complex debugging, temporal queries.

Event sourcing storage growth is manageable with snapshots, event pruning for closed aggregates, and columnar storage for event data.

Handling Event Schema Evolution

Events are immutable but schemas change. A ProductCreated event from three years ago might have different fields than one created today.

Versioning approaches:

Upcasting: write an upcaster that transforms old event versions to new versions during replay.

Event versioning: include version numbers in events and handle different versions in projection logic.

Schema migration: migrate old event data to new schema when reading.

# Upcaster example
class OrderPlacedUpcaster:
    def can_upcast(self, event_data):
        return event_data.get('version', 0) < 2

    def upcast(self, event_data):
        if event_data.get('version', 0) < 1:
            event_data['customer_email'] = None
        if event_data.get('version', 0) < 2:
            event_data['customer_id'] = str(event_data['customer_id'])
        event_data['version'] = 2
        return event_data

Most teams add versioning when needed. Over-engineering upfront costs more than dealing with old events later.

When to Use / When Not to Use Event Sourcing

Use event sourcing when:

Audit trails are a compliance requirement
You need to reconstruct state at any point in time
Debugging requires understanding how state evolved
Read models may need to be rebuilt from scratch
You are already using CQRS and want complete history
Your domain involves long-running processes with multiple steps

Do not use event sourcing when:

CRUD is simpler and history is not important
Your team lacks experience with distributed systems patterns
Storage costs are a significant constraint
You need strong consistency within milliseconds
Schema changes are frequent and upfront versioning overhead is too high

Event Sourcing vs CRUD vs State Storage Trade-offs

Dimension	CRUD (State Storage)	Event Sourcing	State + Events Hybrid
Storage model	Current state	Immutable events	Both
History preserved	None (overwrites)	Full history	Limited event log
Update behavior	Overwrite	Append-only	Append events
Debugging visibility	Current state only	Full replay	Recent events + snapshots
Read model rebuild	N/A	Full from scratch	From last snapshot
Storage growth	Controlled	Unbounded (needs pruning)	Managed with TTL
Schema evolution	Migrate existing rows	Upcasters on replay	Versioned events
Complexity	Low	High	Medium
Best for	Simple CRUD apps	Audit-critical domains	Most event-driven systems

Production Failure Scenarios

Failure	Impact	Mitigation
Event replay explosion	Long aggregates take minutes to rebuild	Use snapshots, limit aggregate size, archive old events
Schema evolution gaps	Old events fail to deserialize	Upcasters, version numbers in events, backward-compatible schemas
Eventual consistency lag	Read models stale after writes	Monitor projection lag, set SLA alerts, design UI for async updates
Eventual consistency in multi-aggregate queries	Cross-aggregate reads return inconsistent state	Design aggregates to be consistency boundaries, compensate with sagas
Event store write throughput	Single stream becomes bottleneck	Partition by aggregate type, use event store clustering
Snapshot corruption	Rebuilds produce wrong state	Verify snapshots, maintain event replay as source of truth

Capacity Estimation: Event Store Growth and Snapshot Frequency

Event stores grow indefinitely unless you manage them. Understanding growth rates and snapshot frequency is essential for operational planning.

Event store growth formula:

events_per_day = commands_per_day × events_per_command
storage_per_day = events_per_day × avg_event_size
storage_per_year = storage_per_day × 365

For an order processing system handling 100,000 orders per day with 5 events per order on average and 400 bytes per event:

Events per day: 500,000
Storage per day: 500,000 × 400 = 200MB
Storage per year: 200MB × 365 = 73GB
With snapshots (1 snapshot per aggregate per week): add ~20% overhead → ~88GB per year

Snapshot frequency planning:

events_between_snapshots = desired_rebuild_time_ms / avg_event_process_time_ms
snapshot_frequency_days = events_between_snapshots / events_per_day_per_aggregate

If rebuilding an aggregate from scratch takes 30 seconds (30,000ms) and each event takes 1ms to process, you need a snapshot every 30,000 events. With an aggregate receiving 100 events per day, that is one snapshot every 300 days — roughly yearly. For aggregates receiving 1,000 events per day (an active user account), you need snapshots every 30 days.

The practical approach: snapshot when the aggregate crosses a fixed event count threshold (every 100 or 1000 events) rather than on a time schedule. This keeps rebuild time bounded regardless of aggregate activity level.

Snapshot storage formula:

snapshot_size = aggregate_state_serialized_bytes
total_snapshot_storage = snapshot_size × number_of_aggregates × snapshots_per_aggregate_kept

For 1M aggregates, each snapshot averaging 10KB, keeping 2 snapshots per aggregate (current + previous for safety): 1M × 10KB × 2 = 20GB. Manageable. For 100M aggregates at the same rate: 2TB — requiring compression or archival strategy.

Observability Hooks: Monitoring Event Processing Health

Key metrics for event sourcing systems: projection lag, event processing latency, and event store append latency.

-- PostgreSQL: check event stream staleness (for tracked consumer groups)
SELECT
    consumer_id,
    last_processed_event_id,
    last_processed_timestamp,
    EXTRACT(EPOCH FROM (now() - last_processed_timestamp)) AS lag_seconds
FROM projection_consumers
WHERE is_active = true
ORDER BY lag_seconds DESC;

Key alerts:

# Projection lag alert
- alert: EventSourcingProjectionLag
  expr: event_projection_lag_seconds > 300
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Projection {{ $labels.projection }} lag exceeds 5 minutes"

# Event store append latency
- alert: EventStoreAppendLatencyHigh
  expr: histogram_quantile(0.95, event_append_duration_seconds) > 0.1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Event store append P95 latency above 100ms"

# Snapshot age alert (detect aggregates not being snapshotted)
- alert: SnapshotAgeExceedsThreshold
  expr: max(event_count_since_last_snapshot) BY (aggregate_type) > 50000
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "Aggregate type {{ $labels.aggregate_type }} has aggregates without recent snapshots"

Real-World Case Study: Axon Framework at a Major Bank

A major European bank rebuilt their trading platform on Axon Framework (Java-based CQRS/ES framework) to handle regulatory compliance requirements. Every trade must have a complete, immutable audit trail. Event sourcing was chosen because regulators required the ability to reconstruct state at any point in time — not just current state, but state as it existed on any historical date.

Their deployment: 50 aggregate types, 200,000 daily trades, 50-100 events per trade (pricing, risk checks, approvals). Event store grew from 100GB in year one to 8TB over five years. Projection lag was their primary operational headache — the risk calculation projection took 45 minutes to catch up after a batch of overnight trades loaded at market open.

The resolution: they implemented parallelized snapshot rebuilding. Instead of one projection worker processing events sequentially, they partitioned the event log by aggregate ID and processed 16 partitions in parallel, reducing projection rebuild time from 45 minutes to under 5 minutes. They also introduced a “warm standby” projection that lagged by 5 minutes intentionally — if the primary projection failed, the warm standby could take over with only 5 minutes of data loss.

Interview Questions

1. Your event store has 10 years of data and adding a new projection requires replaying all 10 years. How do you manage this?

Archive old events to cold storage (object storage like S3), keep recent events in the hot event store. New projections start from the most recent snapshot in the hot store and only replay the events since that snapshot. If you need full history replay (regulatory requirements), run it offline in a separate environment during a maintenance window. The key insight: plan archival strategy before you need it. Implement snapshots and tiered storage from the beginning, not as a retrofit.

2. Two aggregates in the same event stream need to stay consistent with each other. How do you handle this?

Aggregates should be designed as consistency boundaries — each aggregate's invariants are protected internally, and cross-aggregate consistency is handled through sagas or workflows. If you need two aggregates to change together atomically, they should be a single aggregate. If they must remain separate for scalability or ownership reasons, use a saga: emit a CrossAggregateReferenceCreated event from aggregate A, have a saga process that event and emit a compensating command to aggregate B, and handle the eventual consistency explicitly. Do not try to use distributed transactions (2PC) across aggregates — that defeats the purpose of having separate consistency boundaries.

3. How do you handle schema evolution for events?

Events are immutable once written, so old events must deserialize with new code. Use upcasters: functions that transform old event schemas to the current schema on read. Version your events explicitly in the payload: v1, v2. When the schema changes, add a v2 upcaster that reads v1 events and produces v2 representations. Keep upcasters for all historical versions — never delete old upcasters. Test by replaying events from the beginning of your event store and verifying the final aggregate state matches expected results.

4. What is the relationship between snapshots and event replay?

Snapshots are a performance optimization. Without snapshots, reconstructing aggregate state requires replaying all events from the beginning of the aggregate's history. With snapshots, you periodically save the aggregate's state at a given event sequence number. Reconstructing the aggregate loads the snapshot and replays only events after the snapshot sequence. Without snapshots, an aggregate with 10 years of history would be slow to load. With yearly snapshots, you replay at most 1 year of events. Snapshots do not affect correctness — event replay from event 0 always produces the same state. They only affect performance.

5. How do you rebuild an aggregate's state if the event store contains 10 years of events?

Use snapshots as the starting point, then replay only recent events. Take a snapshot at a fixed sequence number (e.g., every 1000 events or yearly). To rebuild: locate the most recent snapshot for the aggregate, load it, then replay only events after the snapshot's sequence number. If snapshots are taken yearly, you replay at most 1 year of events regardless of total history. For aggregates without snapshots or with very long histories, use event archival: archive events older than a threshold (e.g., 2 years) to cold storage (S3), keeping recent events in the hot store. New rebuilds start from the latest hot snapshot and replay only recent events. For regulatory replay that requires full history, run the replay as a separate offline job during maintenance windows, not in the production event store path.

6. What is the difference between event sourcing and append-only logging?

Append-only logging is the storage mechanism: new entries are added, existing entries are never modified or deleted. Event sourcing is the pattern that uses an append-only log to store state-changing events and derives current state by replaying those events. An append-only log can be used for many things (audit logs, transaction logs, message queues) without deriving state from replay. Event sourcing specifically means: the log IS the source of truth, current state is derived by replay, and read models are built by consuming events. The relationship: event sourcing uses append-only logging as its storage mechanism, but not all append-only logs are used for event sourcing.

7. How do you handle aggregate version conflicts in event sourcing?

Use optimistic concurrency with expected version numbers. When loading an aggregate, track its current version. When emitting events, include the expected version. The aggregate root checks: if current version != expected version, reject the command with ConcurrencyException. This prevents two commands from racing on the same aggregate: one succeeds (version matches), one fails (version changed). The failing client fetches the new aggregate state and retries. This is the standard concurrency model for event sourcing aggregates. Implementation: store version as a column on the aggregate record, increment on each event append, check version in the aggregate's command handler before emitting.

8. What is an anti-pattern in event sourcing that developers should avoid?

Common anti-patterns: using commands as event names (imperative "PlaceOrder" vs past tense "OrderPlaced"), storing too much data in event payloads (include what changed, not entire aggregate state), and not versioning events (breaking changes when schema evolves). Another anti-pattern: creating a separate event for every field change on every aggregate, leading to hundreds of event types that make projections complex. Instead, use coarse-grained events (OrderPlaced with full order snapshot) and let projections extract what they need. Also avoid: rebuilding read models from scratch frequently (use snapshots), storing encrypted binary event data without metadata that enables upcasting (event data should be readable and upcastable, not opaque blobs), and using event sourcing for aggregates that change frequently (event store grows rapidly, rebuild times increase).

9. How do you implement snapshots in an event sourcing system? What are the trade-offs?

Snapshot strategy: save the aggregate's state at fixed intervals (every N events or on a time schedule). Snapshot structure: aggregate ID, version at snapshot time, serialized state, timestamp. To load: find the latest snapshot with version <= target version, load it, replay events from snapshot version + 1 to target version. Trade-offs: snapshots reduce rebuild time but add storage overhead (snapshot per aggregate × snapshot frequency). If you snapshot too frequently (every 10 events), storage overhead is high. If you snapshot too rarely (every 100K events), rebuild time remains high. Find the balance: snapshot when rebuild time would exceed your target (e.g., snapshot every 1000 events if rebuilding from scratch takes 30 seconds but 1000 events takes 1 second). Also test snapshot restoration to ensure snapshot format is backward-compatible across code versions.

10. How does event sourcing handle reporting and analytics? Can you query the event store directly?

The event store is append-only and optimized for append and sequential read by aggregate ID, not for analytical queries (filtering by date range, aggregating across aggregates). For reporting, build read models (projections) that consume events and produce analytics-optimized views: daily revenue summary, user activity aggregation, etc. Query the read models for analytics, not the event store directly. If you need ad-hoc analytics without pre-built projections, either stream events to a separate analytics database (data warehouse, Elasticsearch) or use event store tools that support parallel event streaming for analytics queries (EventStoreDB supports this with category projections). The key insight: event sourcing is not a reporting solution — you build read models for every query pattern you need.

11. What is the relationship between event sourcing and eventual consistency?

Event sourcing produces events asynchronously: a command succeeds, an event is appended to the event store, an event bus publishes the event, read model consumers process it and update read models. There is a delay between command success and read model reflecting the change — this is eventual consistency. The delay is typically milliseconds in normal operation but can stretch under load. Eventual consistency matters most for user-facing reads immediately after writes (user places order, immediately queries order history). Mitigate by: showing "processing" state in UI, polling read model until caught up, or updating a write-side read model synchronously within the command handler. Eventual consistency is a direct architectural consequence of CQRS with async event propagation — you cannot eliminate it, only manage its UX impact.

12. How do you choose aggregate boundaries in an event sourced system?

Aggregate boundaries define consistency boundaries. A single aggregate's invariants are protected synchronously — changes are atomic within the aggregate. Cross-aggregate consistency is eventual, coordinated by sagas. Choose aggregates based on business invariants: if two entities must always be consistent (e.g., order and order line items), they are likely the same aggregate. If two entities can change independently (e.g., customer and product), they are different aggregates. The rule: if you would need a distributed transaction to update two entities together, they should be one aggregate. If they can be updated independently with eventual consistency between them, they should be separate aggregates. Overly large aggregates (everything in one) create bottlenecks; overly small aggregates (one per entity) create complex saga coordination.

13. What is the event store and how does it differ from a message queue?

An event store is an append-only log organized by aggregate ID (stream). You write events to a specific stream, and you read events from a stream by aggregate ID. It is persistent and durable — events survive system restarts. A message queue (Kafka, RabbitMQ) is for asynchronous message delivery between services — messages are consumed and typically deleted. Event stores and message queues can be combined: EventStoreDB can publish events to Kafka after writing; Kafka can be the transport for events to read models. Key difference: event store retains all events for replay and audit; message queues are transient. You can rebuild any read model from an event store; you cannot rebuild from a message queue unless you stored the messages elsewhere.

14. How do you handle schema migration for events in production without downtime?

Use upcasters that run during event replay. When a breaking schema change occurs (rename field, change type, split event into two), write an upcaster that transforms old event versions to new versions. Upcasters are registered by event version and run automatically during replay. To migrate in production: deploy new code with upcasters for the old event version; the event store serves old events (with upcasters running), new events are written in new format. This is backward-compatible deployment: both old and new event formats work simultaneously. No downtime migration: event store never rewrites stored events; upcasters handle old events on read. The key rule: never rewrite stored events — only transform them on read.

15. In event sourcing, what is the purpose of correlation IDs and causation IDs?

Correlation ID links related events across aggregates: if a PlaceOrder command triggers an OrderPlaced event, and that triggers a PaymentRequested event, all three share the same correlation ID so you can trace the flow. Causation ID links an event to its immediate cause: the PaymentRequested event has causation ID = the OrderPlaced event ID that caused it. Correlation IDs connect spans of related work across service boundaries; causation IDs describe the direct chain of events within a single workflow. Both matter for debugging: given a PaymentFailed event, you can use correlation ID to find all related events across services, and use causation ID to trace the exact sequence that led to failure. Store both in event metadata: metadata.correlationId = "uuid", metadata.causationId = "previous-event-id".

16. How do you handle event ordering guarantees in a distributed event sourcing implementation?

Event ordering within a single aggregate is guaranteed by sequence numbers — events within a stream are ordered and replay produces deterministic state. Across aggregates, ordering is not guaranteed and should not be assumed. If you need cross-aggregate ordering (e.g., event A must be processed before event B from a different aggregate), you need a saga or workflow coordinator that enforces ordering explicitly. In distributed systems, clocks are unreliable — use logical sequencing (monotonically increasing sequence numbers per stream) rather than wall-clock timestamps for ordering. For distributed event stores (EventStoreDB clustering), use the vector clock or sequence number for conflict detection, not timestamps. If you need total ordering across all events (rare, for strict audit requirements), a single aggregate stream is the only reliable approach — split across aggregates loses ordering guarantees.

17. How do you implement an event store from scratch? What are the core tables and guarantees?

Minimum implementation: an events table with columns for aggregate_id, stream (bucket for ordering), event_type, event_data (JSON payload), metadata (JSON for correlation/causation IDs), sequence_number (monotonically increasing per stream), and timestamp. Core guarantees: append-only (no updates or deletes allowed via application code), stream-level sequencing (events within a stream are ordered), and optimistic concurrency via expected_version checks. The append operation must be atomic: write the event and increment the aggregate version in a single transaction. Beyond the minimum, production event stores add: snapshot storage, consumer offset tracking for projections, and event archiving to cold storage.

18. What snapshot strategy would you use for an aggregate receiving 1000 events per day over a 5-year history?

At 1000 events/day over 5 years: 1.83M events per aggregate. Without snapshots, rebuilding takes minutes to hours. Strategy: snapshot every 10,000 events (roughly every 10 days of activity). This keeps rebuild time to processing 10,000 events = ~10 seconds at 1ms/event. Storage: if an aggregate state averages 10KB serialized, 2 snapshots per aggregate = 20KB per aggregate. For 100K aggregates: 2GB snapshot storage. Alternative: time-based snapshots weekly or monthly regardless of event count. The key metric: target rebuild time should be under 30 seconds — if it takes longer, snapshot more frequently. Always keep at least 2 snapshots (current + previous) so you can verify the latest before promoting it.

19. How do you rebuild an aggregate's state from an event store? Walk through the algorithm step by step.

Algorithm: load the most recent snapshot (if any) to get the starting state and version; load all events for the aggregate from snapshot_version + 1 to the target version; apply each event in sequence by calling the aggregate's event handler which transitions state; return the final state. If no snapshot exists, start from an empty aggregate and replay all events from version 0. If replaying for a point-in-time (as-of date), filter events by timestamp <= target timestamp and stop there. For full rebuild (no target), replay all events. Performance: use batched event loading (load 1000 events at a time rather than one at a time) and verify snapshots are backward-compatible before deploying new aggregate code.

20. How does event versioning interact with aggregate code evolution? How do you maintain backward compatibility?

Aggregate code evolves but old events must still deserialize. Strategy: never delete or rename fields from event schemas; only add optional fields. The aggregate event handler must handle old event versions via upcasters. Upcasters are functions registered by source version that transform old event data to current schema. Example: a OrderPlaced event in v1 has customerEmail; in v2 it is renamed to customerId. The upcaster for v1 reads customerEmail, maps it to customerId, and marks version as 2. The aggregate handler always receives the current version. Test by replaying all historical event versions through the aggregate and verifying final state is correct. Maintain a version test suite that runs after every schema change.

Conclusion

Event sourcing stores state changes as immutable events. Current state is derived by replaying events. History is preserved automatically. Read models are built by consuming events.

Event sourcing enables complete audit trails, easy debugging, and rebuildable read models. It adds complexity—eventual consistency, schema evolution, event store management. It works best with CQRS and fits domains where history matters.

Use event sourcing when you need the audit trail, when debugging requires understanding how state evolved, or when building read models that might need rebuilding. Avoid it when CRUD is simpler and history is not important.

Event Sourcing

Introduction

The Core Idea

Event Store Implementation

Event Replay for State Reconstruction

Eventual Consistency and Read Model Rebuilding

Comparing with Traditional CRUD

Handling Event Schema Evolution

When to Use / When Not to Use Event Sourcing

Event Sourcing vs CRUD vs State Storage Trade-offs

Production Failure Scenarios

Capacity Estimation: Event Store Growth and Snapshot Frequency

Observability Hooks: Monitoring Event Processing Health

Real-World Case Study: Axon Framework at a Major Bank

See Also

Related Posts

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

CQRS Pattern

CQRS and Event Sourcing: Distributed Data Management

Publish/Subscribe Patterns: Topics, Subscriptions, Filtering