Event-Driven Architecture: Events, Commands, and Patterns

Learn event-driven architecture fundamentals: event sourcing, CQRS, event correlation, choreography vs orchestration, and implementation patterns.

published: reading time: 12 min read

Event-Driven Architecture: Events, Commands, and Patterns

Event-driven architecture (EDA) is a way of designing systems where components communicate by producing and consuming events. Instead of asking for data (request-response), services emit events that other services react to. This decoupling in time and space gives you flexibility, scalability, and resilience that request-driven designs struggle with.

This post covers the core concepts of EDA: events vs commands, event sourcing, CQRS, correlation, and the choreography vs orchestration debate.

Events vs Commands

The distinction between events and commands is fundamental to EDA but often confused.

Commands are requests for a specific action. They are directed: “do this.” A command expects one handler to process it. If you send CreateOrder, one service processes it.

Events are facts that something happened. They are broadcast: “this happened.” An event can have zero or many listeners. When OrderPlaced occurs, the notification service, analytics service, and fulfillment service can all react.

graph LR
    subgraph Commands
        C[CreateOrder] --> S[Order Service]
    end

    subgraph Events
        E[OrderPlaced] --> N[Notification]
        E --> A[Analytics]
        E --> F[Fulfillment]
    end

    S --> E

Naming Conventions

Commands are verb-noun: CreateUser, CancelOrder, UpdateInventory.

Events are noun-verb past tense: UserCreated, OrderCancelled, InventoryUpdated.

Follow this convention and the distinction becomes self-documenting.

Event Sourcing

Event sourcing is a pattern where you store the full history of state changes as events, rather than just the current state. The current state is derived by replaying events.

graph LR
    E1[UserRegistered] --> Store[(Event Store)]
    E2[EmailChanged] --> Store
    E3[NameUpdated] --> Store

    Store -->|replay| State[Current State]

Instead of:

UPDATE users SET name = 'Alice' WHERE id = 1;

You append:

UserNameUpdated { user_id: 1, old_name: 'Bob', new_name: 'Alice', timestamp: ... }

Benefits of Event Sourcing

  • Complete audit trail: You have every state change ever, immutable and auditable
  • Temporal queries: Ask “what was the customer’s address on January 15th?”
  • Replay and replayability: If your read model is wrong, rebuild it from events
  • Decoupled write and read: Events are the source of truth, projections are derived

Challenges of Event Sourcing

  • Event schema evolution: Old events must still deserialize correctly as your schema changes
  • Projections can be slow: Replaying millions of events takes time
  • Eventual consistency: Read models lag behind writes as projections catch up
  • Complexity: More moving parts than simple CRUD

Handling Schema Evolution

Events are immutable, but your event schemas change. Use versioning strategies:

  • Upcasting: A transformer that converts old event versions to new versions before processing
  • Event versioning: Include version number in event type (UserCreatedV2)
  • Snapshots: Periodically capture state to limit replay depth

CQRS: Command Query Responsibility Segregation

CQRS separates read and write models. Writes go through one model (optimized for commands). Reads go through a different model (optimized for queries).

Writes: API -> Command Handler -> Event Store
Reads:  Query -> Read Model (materialized view)

The event store is the write side. Read models are projections built from events.

graph LR
    Command[Command] --> CommandHandler[Command Handler]
    CommandHandler --> EventStore[(Event Store)]

    EventStore -->|project| ReadModel1[Read Model: Orders by User]
    EventStore -->|project| ReadModel2[Read Model: Orders by Date]
    EventStore -->|project| ReadModel3[Read Model: Revenue Dashboard]

When to Use CQRS

CQRS shines when:

  • Read and write workloads differ significantly
  • You need multiple, optimized read representations of the same data
  • Teams can work on command and query sides independently
  • You are already using event sourcing

CQRS adds complexity. If your domain is simple (CRUD with basic queries), it is overkill.

Event Correlation

In distributed systems, a single business transaction spans multiple services. Tracking such transactions requires correlation IDs.

A correlation ID is a unique identifier attached to all events resulting from the same originating action:

UserSignsUp (correlation_id: abc-123)
  -> AccountCreated (correlation_id: abc-123)
     -> WelcomeEmailSent (correlation_id: abc-123)
     -> AnalyticsEvent (correlation_id: abc-123)

With correlation IDs, you can trace a complete transaction across services and logs.

Implementation

Pass correlation ID through the entire call chain:

def handle_signup(user_data):
    correlation_id = str(uuid.uuid4())
    event = {
        'type': 'UserSignedUp',
        'correlation_id': correlation_id,
        'user_id': user_data['id'],
        'email': user_data['email']
    }
    event_bus.publish(event)

Each service logs its correlation ID, enabling distributed tracing.

Choreography vs Orchestration

When a business transaction spans multiple services, who coordinates it? Two approaches:

Choreography

Services react to events and emit their own events. No central coordinator.

OrderService: OrderPlaced -> InventoryService: ReserveInventory -> PaymentService: ChargePayment
                                        -> ShippingService: ScheduleShipping

Each service knows only its own part. The overall flow emerges from the event chain.

Pros:

  • Services are truly decoupled
  • No single point of failure
  • Easy to add new consumers

Cons:

  • Behavior is scattered across services
  • Hard to see the overall transaction
  • Difficult to implement transactions that must succeed or fail together

Orchestration

A central orchestrator coordinates the transaction:

graph LR
    Orch[Order Orchestrator] -->|Reserve| Inventory[Inventory Service]
    Orch -->|Charge| Payment[Payment Service]
    Orch -->|Schedule| Shipping[Shipping Service]

The orchestrator knows the complete workflow. It decides what to do next based on responses.

Pros:

  • Behavior is centralized and visible
  • Easier to implement complex workflows with branches and compensation
  • Transaction boundaries are explicit

Cons:

  • The orchestrator becomes a central point of coupling
  • Risk of “smart middleware” anti-pattern (business logic in the orchestrator)
  • Single point of failure (mitigated by workflow engines with persistence)

Which to Use

Choreography works well for simple, linear workflows where services are truly independent. Orchestration works better for complex workflows with complex compensation logic.

In practice, many systems use a hybrid: orchestration for the core business workflow, choreography for peripheral side effects.

When to Use / When Not to Use

When to Use Event-Driven Architecture

  • Microservices with independent scaling: When services should be decoupled and scale independently
  • High write volume with audit requirements: When you need complete audit trails of all state changes
  • Multiple read models: When different consumers need different representations of the same data (CQRS)
  • Real-time event processing: When you need immediate reaction to events (notifications, analytics, monitoring)
  • System integration: When integrating multiple systems that evolve independently
  • Temporal queries: When you need to query historical state (“what was the balance on date X?”)

When Not to Use Event-Driven Architecture

  • Simple request-response logic: When your use case is straightforward CRUD operations
  • Strong consistency requirements: When you need immediate consistency across all services
  • Small teams with limited expertise: When operational complexity exceeds team capacity
  • Low-latency synchronous requirements: When sub-millisecond response times are critical
  • Simple workflows: When workflow steps are few and do not benefit from decoupling
  • Batch-oriented processing: When your use case is bulk data processing rather than real-time events

Trade-off Analysis

FactorEDARequest-ResponseNotes
CouplingLoose - producers don’t know consumersTight - client knows serviceEDA enables independent evolution
ScalabilityHorizontal - event broker handles loadVertical - service handles loadEDA scales better for high write volume
ConsistencyEventual (eventual consistency)Strong (immediate)EDA requires handling lag
LatencyLow for publish, variable for consumePredictableEDA has higher average latency
DebuggingHarder - distributed transactionsEasier - synchronous callsEDA needs correlation IDs and tracing
Event SchemaMust be versioned and stableAPI contracts onlyEDA requires schema governance
InfrastructureEvent broker needed (Kafka, RabbitMQ)Simple HTTP/RESTEDA adds infrastructure complexity
Data RecoveryNatural - replay eventsPoint-in-time snapshotsEDA provides better recovery
Learning CurveSteep - new patternsGentle - familiar RESTEDA requires paradigm shift

Production Failure Scenarios

FailureImpactMitigation
Event broker failureEvents not published or consumedUse broker clustering with replication; implement outbox pattern
Consumer crash mid-processingEvents may be lost or reprocessedUse acknowledgment after successful processing; idempotent consumers
Out-of-order event deliveryState becomes inconsistentUse sequence numbers or timestamps; implement conflict resolution
Event schema version mismatchConsumers cannot deserialize eventsImplement schema versioning; use upcasters or schema registry
Poison event blocking consumerConsumer stops processingConfigure dead letter queues; implement retry limits
Cascading failuresOne service failure triggers failures in othersImplement circuit breakers; design for partial availability
Event replay overloadNew consumer overwhelms system with replaysThrottle replay rate; process in batches with backpressure
Correlation ID lossCannot trace transactions across servicesPropagate correlation IDs through all events; log correlation

Observability Checklist

Metrics to Monitor

  • Event publish rate: Events published per second per event type
  • Event consumption lag: Time between event publication and processing completion
  • Consumer error rate: Failed event processing attempts per consumer
  • Dead letter queue depth: Failed events awaiting investigation
  • Event processing duration: Time to process each event type
  • Eventual consistency lag: Time between write and read model update (for CQRS)
  • Orchestration workflow completion time: End-to-end workflow duration

Logs to Capture

  • Event publication with event type, correlation ID, and timestamp
  • Event consumption start and completion
  • Event processing failures with full context
  • Dead letter queue arrivals with failure reason
  • Workflow state transitions (for orchestration)
  • Correlation ID propagation across service boundaries
  • Schema version mismatches

Alerts to Configure

  • Event consumption lag exceeds SLA threshold
  • Consumer error rate exceeds threshold
  • Dead letter queue accumulating messages
  • Event publish rate anomalies (spike or drop)
  • Workflow timeout violations
  • Broker connection failures
  • Consumer group rebalancing events

Security Checklist

  • Event content security: Do not include sensitive data in events without encryption
  • Event broker authentication: Use authentication (mTLS, SASL) for event broker connections
  • Event broker authorization: Implement access controls for producing and consuming events
  • Schema security: Validate event schemas; prevent injection through event content
  • Event encryption in transit: Enable TLS for all broker connections
  • Event encryption at rest: Enable disk encryption if broker persists events
  • Audit of event access: Log who consumed which events and when
  • Data classification: Classify events by sensitivity; apply appropriate controls

Common Pitfalls / Anti-Patterns

Pitfall 1: Using Events for Queries

Events are for notifications, not queries. If you find yourself requesting data through events and waiting for responses, use a direct API call instead.

Pitfall 2: Mixing Commands and Events

Commands expect one handler; events expect zero or more handlers. Sending a command as an event creates confusion about who should handle it and whether responses are expected.

Pitfall 3: Not Planning for Event Schema Evolution

Events are immutable once published. If you change the schema, old events must still deserialize correctly. Plan versioning from the start using upcasters or versioned event types.

Pitfall 4: Over-Engineering with Choreography

Choreography is simple until it is not. Complex workflows with compensation logic are easier to manage with orchestration. Do not default to choreography for everything.

Pitfall 5: Ignoring Eventual Consistency

Read models lag behind writes in EDA. If your business logic assumes immediate consistency, you will have bugs. Design UIs and expectations around eventual consistency.

Pitfall 6: Creating Too Many Event Types

If every database write creates a unique event type, you have an event catalog nightmare. Use a smaller set of event types with sufficient payload to handle multiple use cases.

Quick Recap

Key Points

  • Events are facts that something happened; commands are requests for specific actions
  • Event sourcing stores state changes as events for replay and audit
  • CQRS separates read and write models for independent optimization
  • Correlation IDs enable distributed tracing across service boundaries
  • Choreography lets services react to events without central coordination
  • Orchestration uses a central coordinator for complex workflows
  • Event schema evolution requires backward-compatible versioning strategies
  • Design for eventual consistency; read models lag behind writes

Pre-Deployment Checklist

- [ ] Event schema versioning strategy defined (upcasters or versioned types)
- [ ] Event schema registry deployed for validation
- [ ] Correlation ID propagation implemented across all services
- [ ] Dead letter queue configured for failed event processing
- [ ] Idempotent event consumers implemented
- [ ] Circuit breakers configured for downstream service calls
- [ ] Event broker clustering and replication configured
- [ ] Monitoring for event consumption lag configured
- [ ] Alert thresholds set for dead letter queue depth
- [ ] Event retention policy defined (for event sourcing replay)
- [ ] Read model rebuild procedure documented
- [ ] Schema evolution testing implemented (old events deserialize correctly)
- [ ] Security controls (authentication, authorization, encryption) configured
- [ ] Workflow compensation logic tested (for orchestration)

Event-Driven Architecture with Kafka

Apache Kafka is a common backbone for event-driven systems. Its durable log, replay capability, and consumer group model map well to EDA patterns.

For pub/sub patterns that complement EDA, see pub/sub patterns. For understanding messaging infrastructure that supports EDA, see message queue types.

Conclusion

Event-driven architecture is not a silver bullet. It trades request-response simplicity for flexibility, scalability, and resilience. Events decouple services in time and space. Event sourcing gives you a complete audit trail and rebuildable views. CQRS separates read and write optimization. Correlation IDs enable distributed tracing.

The complexity is real. You deal with eventual consistency, schema evolution, and distributed debugging. Before adopting EDA wholesale, start with a bounded context where the benefits are clear: high write volume, audit requirements, multiple read models, or genuinely decoupled services.

Start simple for most applications. Add event-driven elements incrementally as requirements demand.

Category

Related Posts

CQRS and Event Sourcing: Distributed Data Management Patterns

Learn about Command Query Responsibility Segregation and Event Sourcing patterns for managing distributed data in microservices architectures.

#microservices #cqrs #event-sourcing

CQRS Pattern

Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.

#database #cqrs #event-sourcing

Event Sourcing

Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.

#database #event-sourcing #cqrs