Event-Driven Architecture: Events, Commands, and Patterns
Learn event-driven architecture fundamentals: event sourcing, CQRS, event correlation, choreography vs orchestration, and implementation patterns.
Event-Driven Architecture: Events, Commands, and Patterns
Event-driven architecture (EDA) is a way of designing systems where components communicate by producing and consuming events. Instead of asking for data (request-response), services emit events that other services react to. This decoupling in time and space gives you flexibility, scalability, and resilience that request-driven designs struggle with.
This post covers the core concepts of EDA: events vs commands, event sourcing, CQRS, correlation, and the choreography vs orchestration debate.
Events vs Commands
The distinction between events and commands is fundamental to EDA but often confused.
Commands are requests for a specific action. They are directed: “do this.” A command expects one handler to process it. If you send CreateOrder, one service processes it.
Events are facts that something happened. They are broadcast: “this happened.” An event can have zero or many listeners. When OrderPlaced occurs, the notification service, analytics service, and fulfillment service can all react.
graph LR
subgraph Commands
C[CreateOrder] --> S[Order Service]
end
subgraph Events
E[OrderPlaced] --> N[Notification]
E --> A[Analytics]
E --> F[Fulfillment]
end
S --> E
Naming Conventions
Commands are verb-noun: CreateUser, CancelOrder, UpdateInventory.
Events are noun-verb past tense: UserCreated, OrderCancelled, InventoryUpdated.
Follow this convention and the distinction becomes self-documenting.
Event Sourcing
Event sourcing is a pattern where you store the full history of state changes as events, rather than just the current state. The current state is derived by replaying events.
graph LR
E1[UserRegistered] --> Store[(Event Store)]
E2[EmailChanged] --> Store
E3[NameUpdated] --> Store
Store -->|replay| State[Current State]
Instead of:
UPDATE users SET name = 'Alice' WHERE id = 1;
You append:
UserNameUpdated { user_id: 1, old_name: 'Bob', new_name: 'Alice', timestamp: ... }
Benefits of Event Sourcing
- Complete audit trail: You have every state change ever, immutable and auditable
- Temporal queries: Ask “what was the customer’s address on January 15th?”
- Replay and replayability: If your read model is wrong, rebuild it from events
- Decoupled write and read: Events are the source of truth, projections are derived
Challenges of Event Sourcing
- Event schema evolution: Old events must still deserialize correctly as your schema changes
- Projections can be slow: Replaying millions of events takes time
- Eventual consistency: Read models lag behind writes as projections catch up
- Complexity: More moving parts than simple CRUD
Handling Schema Evolution
Events are immutable, but your event schemas change. Use versioning strategies:
- Upcasting: A transformer that converts old event versions to new versions before processing
- Event versioning: Include version number in event type (
UserCreatedV2) - Snapshots: Periodically capture state to limit replay depth
CQRS: Command Query Responsibility Segregation
CQRS separates read and write models. Writes go through one model (optimized for commands). Reads go through a different model (optimized for queries).
Writes: API -> Command Handler -> Event Store
Reads: Query -> Read Model (materialized view)
The event store is the write side. Read models are projections built from events.
graph LR
Command[Command] --> CommandHandler[Command Handler]
CommandHandler --> EventStore[(Event Store)]
EventStore -->|project| ReadModel1[Read Model: Orders by User]
EventStore -->|project| ReadModel2[Read Model: Orders by Date]
EventStore -->|project| ReadModel3[Read Model: Revenue Dashboard]
When to Use CQRS
CQRS shines when:
- Read and write workloads differ significantly
- You need multiple, optimized read representations of the same data
- Teams can work on command and query sides independently
- You are already using event sourcing
CQRS adds complexity. If your domain is simple (CRUD with basic queries), it is overkill.
Event Correlation
In distributed systems, a single business transaction spans multiple services. Tracking such transactions requires correlation IDs.
A correlation ID is a unique identifier attached to all events resulting from the same originating action:
UserSignsUp (correlation_id: abc-123)
-> AccountCreated (correlation_id: abc-123)
-> WelcomeEmailSent (correlation_id: abc-123)
-> AnalyticsEvent (correlation_id: abc-123)
With correlation IDs, you can trace a complete transaction across services and logs.
Implementation
Pass correlation ID through the entire call chain:
def handle_signup(user_data):
correlation_id = str(uuid.uuid4())
event = {
'type': 'UserSignedUp',
'correlation_id': correlation_id,
'user_id': user_data['id'],
'email': user_data['email']
}
event_bus.publish(event)
Each service logs its correlation ID, enabling distributed tracing.
Choreography vs Orchestration
When a business transaction spans multiple services, who coordinates it? Two approaches:
Choreography
Services react to events and emit their own events. No central coordinator.
OrderService: OrderPlaced -> InventoryService: ReserveInventory -> PaymentService: ChargePayment
-> ShippingService: ScheduleShipping
Each service knows only its own part. The overall flow emerges from the event chain.
Pros:
- Services are truly decoupled
- No single point of failure
- Easy to add new consumers
Cons:
- Behavior is scattered across services
- Hard to see the overall transaction
- Difficult to implement transactions that must succeed or fail together
Orchestration
A central orchestrator coordinates the transaction:
graph LR
Orch[Order Orchestrator] -->|Reserve| Inventory[Inventory Service]
Orch -->|Charge| Payment[Payment Service]
Orch -->|Schedule| Shipping[Shipping Service]
The orchestrator knows the complete workflow. It decides what to do next based on responses.
Pros:
- Behavior is centralized and visible
- Easier to implement complex workflows with branches and compensation
- Transaction boundaries are explicit
Cons:
- The orchestrator becomes a central point of coupling
- Risk of “smart middleware” anti-pattern (business logic in the orchestrator)
- Single point of failure (mitigated by workflow engines with persistence)
Which to Use
Choreography works well for simple, linear workflows where services are truly independent. Orchestration works better for complex workflows with complex compensation logic.
In practice, many systems use a hybrid: orchestration for the core business workflow, choreography for peripheral side effects.
When to Use / When Not to Use
When to Use Event-Driven Architecture
- Microservices with independent scaling: When services should be decoupled and scale independently
- High write volume with audit requirements: When you need complete audit trails of all state changes
- Multiple read models: When different consumers need different representations of the same data (CQRS)
- Real-time event processing: When you need immediate reaction to events (notifications, analytics, monitoring)
- System integration: When integrating multiple systems that evolve independently
- Temporal queries: When you need to query historical state (“what was the balance on date X?”)
When Not to Use Event-Driven Architecture
- Simple request-response logic: When your use case is straightforward CRUD operations
- Strong consistency requirements: When you need immediate consistency across all services
- Small teams with limited expertise: When operational complexity exceeds team capacity
- Low-latency synchronous requirements: When sub-millisecond response times are critical
- Simple workflows: When workflow steps are few and do not benefit from decoupling
- Batch-oriented processing: When your use case is bulk data processing rather than real-time events
Trade-off Analysis
| Factor | EDA | Request-Response | Notes |
|---|---|---|---|
| Coupling | Loose - producers don’t know consumers | Tight - client knows service | EDA enables independent evolution |
| Scalability | Horizontal - event broker handles load | Vertical - service handles load | EDA scales better for high write volume |
| Consistency | Eventual (eventual consistency) | Strong (immediate) | EDA requires handling lag |
| Latency | Low for publish, variable for consume | Predictable | EDA has higher average latency |
| Debugging | Harder - distributed transactions | Easier - synchronous calls | EDA needs correlation IDs and tracing |
| Event Schema | Must be versioned and stable | API contracts only | EDA requires schema governance |
| Infrastructure | Event broker needed (Kafka, RabbitMQ) | Simple HTTP/REST | EDA adds infrastructure complexity |
| Data Recovery | Natural - replay events | Point-in-time snapshots | EDA provides better recovery |
| Learning Curve | Steep - new patterns | Gentle - familiar REST | EDA requires paradigm shift |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Event broker failure | Events not published or consumed | Use broker clustering with replication; implement outbox pattern |
| Consumer crash mid-processing | Events may be lost or reprocessed | Use acknowledgment after successful processing; idempotent consumers |
| Out-of-order event delivery | State becomes inconsistent | Use sequence numbers or timestamps; implement conflict resolution |
| Event schema version mismatch | Consumers cannot deserialize events | Implement schema versioning; use upcasters or schema registry |
| Poison event blocking consumer | Consumer stops processing | Configure dead letter queues; implement retry limits |
| Cascading failures | One service failure triggers failures in others | Implement circuit breakers; design for partial availability |
| Event replay overload | New consumer overwhelms system with replays | Throttle replay rate; process in batches with backpressure |
| Correlation ID loss | Cannot trace transactions across services | Propagate correlation IDs through all events; log correlation |
Observability Checklist
Metrics to Monitor
- Event publish rate: Events published per second per event type
- Event consumption lag: Time between event publication and processing completion
- Consumer error rate: Failed event processing attempts per consumer
- Dead letter queue depth: Failed events awaiting investigation
- Event processing duration: Time to process each event type
- Eventual consistency lag: Time between write and read model update (for CQRS)
- Orchestration workflow completion time: End-to-end workflow duration
Logs to Capture
- Event publication with event type, correlation ID, and timestamp
- Event consumption start and completion
- Event processing failures with full context
- Dead letter queue arrivals with failure reason
- Workflow state transitions (for orchestration)
- Correlation ID propagation across service boundaries
- Schema version mismatches
Alerts to Configure
- Event consumption lag exceeds SLA threshold
- Consumer error rate exceeds threshold
- Dead letter queue accumulating messages
- Event publish rate anomalies (spike or drop)
- Workflow timeout violations
- Broker connection failures
- Consumer group rebalancing events
Security Checklist
- Event content security: Do not include sensitive data in events without encryption
- Event broker authentication: Use authentication (mTLS, SASL) for event broker connections
- Event broker authorization: Implement access controls for producing and consuming events
- Schema security: Validate event schemas; prevent injection through event content
- Event encryption in transit: Enable TLS for all broker connections
- Event encryption at rest: Enable disk encryption if broker persists events
- Audit of event access: Log who consumed which events and when
- Data classification: Classify events by sensitivity; apply appropriate controls
Common Pitfalls / Anti-Patterns
Pitfall 1: Using Events for Queries
Events are for notifications, not queries. If you find yourself requesting data through events and waiting for responses, use a direct API call instead.
Pitfall 2: Mixing Commands and Events
Commands expect one handler; events expect zero or more handlers. Sending a command as an event creates confusion about who should handle it and whether responses are expected.
Pitfall 3: Not Planning for Event Schema Evolution
Events are immutable once published. If you change the schema, old events must still deserialize correctly. Plan versioning from the start using upcasters or versioned event types.
Pitfall 4: Over-Engineering with Choreography
Choreography is simple until it is not. Complex workflows with compensation logic are easier to manage with orchestration. Do not default to choreography for everything.
Pitfall 5: Ignoring Eventual Consistency
Read models lag behind writes in EDA. If your business logic assumes immediate consistency, you will have bugs. Design UIs and expectations around eventual consistency.
Pitfall 6: Creating Too Many Event Types
If every database write creates a unique event type, you have an event catalog nightmare. Use a smaller set of event types with sufficient payload to handle multiple use cases.
Quick Recap
Key Points
- Events are facts that something happened; commands are requests for specific actions
- Event sourcing stores state changes as events for replay and audit
- CQRS separates read and write models for independent optimization
- Correlation IDs enable distributed tracing across service boundaries
- Choreography lets services react to events without central coordination
- Orchestration uses a central coordinator for complex workflows
- Event schema evolution requires backward-compatible versioning strategies
- Design for eventual consistency; read models lag behind writes
Pre-Deployment Checklist
- [ ] Event schema versioning strategy defined (upcasters or versioned types)
- [ ] Event schema registry deployed for validation
- [ ] Correlation ID propagation implemented across all services
- [ ] Dead letter queue configured for failed event processing
- [ ] Idempotent event consumers implemented
- [ ] Circuit breakers configured for downstream service calls
- [ ] Event broker clustering and replication configured
- [ ] Monitoring for event consumption lag configured
- [ ] Alert thresholds set for dead letter queue depth
- [ ] Event retention policy defined (for event sourcing replay)
- [ ] Read model rebuild procedure documented
- [ ] Schema evolution testing implemented (old events deserialize correctly)
- [ ] Security controls (authentication, authorization, encryption) configured
- [ ] Workflow compensation logic tested (for orchestration)
Event-Driven Architecture with Kafka
Apache Kafka is a common backbone for event-driven systems. Its durable log, replay capability, and consumer group model map well to EDA patterns.
For pub/sub patterns that complement EDA, see pub/sub patterns. For understanding messaging infrastructure that supports EDA, see message queue types.
Conclusion
Event-driven architecture is not a silver bullet. It trades request-response simplicity for flexibility, scalability, and resilience. Events decouple services in time and space. Event sourcing gives you a complete audit trail and rebuildable views. CQRS separates read and write optimization. Correlation IDs enable distributed tracing.
The complexity is real. You deal with eventual consistency, schema evolution, and distributed debugging. Before adopting EDA wholesale, start with a bounded context where the benefits are clear: high write volume, audit requirements, multiple read models, or genuinely decoupled services.
Start simple for most applications. Add event-driven elements incrementally as requirements demand.
Category
Related Posts
CQRS and Event Sourcing: Distributed Data Management Patterns
Learn about Command Query Responsibility Segregation and Event Sourcing patterns for managing distributed data in microservices architectures.
CQRS Pattern
Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.
Event Sourcing
Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.