CQRS Pattern
Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.
CQRS Pattern
Most systems use the same model for reading and writing. You have a Customer table. You insert customers, update customers, delete customers, query customers. One schema, one set of operations.
CQRS—Command Query Responsibility Segregation—splits this. Separate models handle reads and writes. The write model handles business logic and state transitions. The read model handles queries and projections.
This enables different data structures optimized for each operation. Writes use a normalized structure enforcing integrity. Reads use a denormalized structure serving queries efficiently.
Introduction
CQRS separates command (write) models from query (read) models, letting each be optimized independently. In a typical CRUD system, you use the same schema for inserting orders and displaying order summaries. In CQRS, the write side uses a normalized model enforcing business rules; the read side uses denormalized projections optimized for query patterns. This separation enables different teams to work on reads and writes independently and lets each side scale according to its own load profile.
This guide covers the command versus query distinction, how CQRS integrates with event sourcing (commands create events, queries read projections), the eventual consistency implications of having separate models, and the scenarios where CQRS provides enough benefit to justify the added complexity versus when simpler approaches suffice.
Command Model vs Query Model
A command represents intent to change state. “Place order,” “update shipping address,” “cancel subscription.” Commands return success or failure, not data.
A query represents a request for data. “Show my orders,” “what is my account balance,” “list products in category.” Queries return data without changing state.
In a typical CRUD system, these look identical. A POST to /orders might create an order and return it. A GET to /orders lists orders. Both hit the same database and often the same table structure.
CQRS separates these. Commands go to one model, queries to another.
class PlaceOrderCommand:
def __init__(self, customer_id, line_items):
self.customer_id = customer_id
self.line_items = line_items
class GetCustomerOrdersQuery:
def __init__(self, customer_id):
self.customer_id = customer_id
Eventual Consistency Implications
CQRS typically involves separate data stores for reads and writes. The write store holds authoritative state. The read store updates asynchronously from the write store.
Reads might return stale data. A user places an order, gets success, immediately queries their order history, and might not see the new order yet because the read store has not caught up.
This eventual consistency is a direct consequence of the architectural choice. You gain performance and scalability through asynchronous updates. You pay with temporary inconsistency.
The impact depends on your domain. Financial systems often cannot tolerate stale reads. Social feeds can accept stale data—your new post does not need to appear instantly.
Handling Consistency in User Interfaces
CQRS often requires careful UI design:
- Show confirmation immediately after commands succeed
- Refresh or poll read views after writes
- Use optimistic UI updates that assume success
- Handle conflicts gracefully when they surface
If a user places an order and the confirmation page shows “processing,” the temporary inconsistency is visible but acceptable. If the confirmation page shows an empty order list, users lose confidence.
Event Sourcing Integration
CQRS pairs naturally with event sourcing. When commands succeed, they produce events. Events are stored in an event store. Read models consume events and build projections.
flowchart LR
Command[("Command<br/>PlaceOrder")] --> Validator[("Validate<br/>Business Rules")]
Validator -->|valid| EventStore[("Event Store<br/>Append-only log")]
Validator -->|invalid| Reject[("Reject<br/>Return Error")]
EventStore -->|publish| Bus[("Event Bus<br/>or Stream")]
Bus --> Projection1[("Projection 1<br/>Orders by Customer")]
Bus --> Projection2[("Projection 2<br/>Revenue Dashboard")]
Bus --> Projection3[("Projection 3<br/>Inventory Status")]
Projection1 --> ReadModel1[("Read Model 1<br/>Customer Order View")]
Projection2 --> ReadModel2[("Read Model 2<br/>Revenue Report")]
Projection3 --> ReadModel3[("Read Model 3<br/>Stock Levels")]
ReadModel1 --> Query1[("Query<br/>GET /customers/123/orders")]
ReadModel2 --> Query2[("Query<br/>GET /reports/revenue")]
ReadModel3 --> Query3[("Query<br/>GET /products/sku/inventory")]
Event sourcing provides a complete audit trail. Every state change is recorded as an event. Current state is derived by replaying events. This makes read models rebuildable from scratch at any time.
Event sourcing is not required for CQRS. You can use CQRS with a traditional write database and separate read stores updated via change data capture. But event sourcing makes the architecture cleaner because events are the natural bridge between command and query sides.
When CQRS Makes Sense
CQRS provides value when:
- Read and write workloads differ significantly in complexity or volume
- Different user roles need different views of the same data
- You need to scale reads and writes independently
- Eventual consistency is acceptable
- You benefit from separate optimization for each operation
A content management system might use CQRS. Authors write content; readers consume it. Write operations involve validation and publishing pipelines. Read operations involve search, filtering, and personalization. These workloads have nothing in common.
A real-time collaboration tool might use CQRS. User actions generate events that update a shared document. The read model projects current state from events. Multiple users see the same document with minimal latency.
When CQRS Is Overkill
CQRS adds significant complexity. Two models to maintain, synchronization between them, eventual consistency to explain. This complexity is only justified if separation provides concrete value.
CQRS is overkill when:
- Read and write operations are similar in complexity
- You do not need to scale reads and writes independently
- Strong consistency is required
- Your team lacks experience with distributed patterns
- Simplicity matters more than optimization
A simple CRUD application with balanced read/write workloads rarely benefits from CQRS. The added complexity outweighs the benefits.
Relationship with Read/Write Splitting
Read/write splitting also separates reads and writes, but at the infrastructure level rather than the architectural level.
Read/write splitting routes queries to replicas while writes go to the primary. Data is identical modulo replication lag. CQRS separates the actual models—reads and writes might use completely different schemas.
Read/write splitting is simpler. Use it when you need to scale reads and writes but your models remain fundamentally the same. CQRS is more powerful but more complex. Use it when you need fundamentally different read and write representations.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Projection lag causing stale reads | User sees outdated data after write | Show “processing” state in UI, poll for updates, set max acceptable lag SLAs |
| Event ordering violations | Read model ends up in inconsistent state | Use sequence numbers or causation IDs, idempotent projections |
| Command replay from event store | Duplicate commands applied, data corruption | Idempotent command handlers, deduplication via correlation IDs |
| Read model rebuild blocking writes | Event store frozen during snapshot | Use live rebuild with zero-downtime projection rebuild strategies |
| Multiple read models out of sync | Different queries return conflicting data | Accept eventual consistency windows, clearly document lag expectations |
| Event schema changes breaking projections | Old events cause projection crashes | Version events, use upcasters, maintain backward compatibility |
CQRS vs Traditional CRUD Trade-offs
| Dimension | CQRS | Traditional CRUD |
|---|---|---|
| Read/write model complexity | Separate models, each optimized | Single model serves both |
| Consistency model | Eventual consistency by default | Strong consistency typically |
| Write performance | Can be higher if read model is separate | Tied to read model structure |
| Read performance | Highly optimized per query shape | Limited by normalized schema |
| Operational complexity | High — dual models, sync logic | Low — single model |
| Horizontal scaling | Independent read/write scaling | Coupled scaling |
| Eventual consistency window | Variable — can be ms to seconds | N/A — always consistent |
| Best for | Complex domains, asymmetric workloads | Simple domains, balanced operations |
Capacity Estimation: Event Store Sizing and Projection Rebuild Time
CQRS with event sourcing requires sizing the event store and understanding projection rebuild times.
Event store storage formula:
avg_event_bytes = avg_event_payload_bytes + overhead_per_event
total_event_store_bytes = avg_event_bytes × events_per_stream × number_of_streams × duplication_factor
For an order management system with 10M orders, each generating ~20 events on average (placed, paid, shipped, etc.), with 500 bytes average event payload:
- Total events: 200M
- Avg event size: 500 + 100 bytes (metadata, timestamps, stream ID) = 600 bytes
- Raw event store: 200M × 600 = 120GB
- With projections creating read model snapshots (1 snapshot per stream ≈ 10 snapshots per stream on average): 120GB × 1.3 = ~156GB total
Projection rebuild time formula:
projection_rebuild_time = (events_per_stream × avg_event_process_time) × number_of_streams / parallelism
For a customer_order_history projection processing 20 events per order at 1ms per event, for 10M orders with 10 parallel workers:
- Sequential time: 200M events × 1ms = 200,000 seconds
- With 10 workers: 20,000 seconds ≈ 5.5 hours
This is why snapshots matter: rebuild from snapshot + replay of only recent events reduces rebuild time dramatically. If snapshots are taken every 100 events and the last snapshot was at event 950 of 1000, you replay only 50 events instead of 1000.
Read model storage: Read models are typically denormalized projections stored in a query-optimized format. A customer_order_history read model for 10M customers, averaging 20 orders each, storing order summary (100 bytes per order): 10M × 20 × 100 = 20GB. Read model storage grows with both customer count and order count, unlike event store which only grows with event count.
Real-World Case Study: GetEventStore at Scale
GetEventStore is an open-source event store used by many organizations implementing CQRS and event sourcing. One production deployment handled a financial trading platform with 50,000 daily active traders, each generating 200-500 events per trading session.
The event store grew to 2TB over 3 years, serving 50M events. The challenge was not storage — it was projection rebuild time when adding new read models. A new trader_performance_summary projection required scanning all 50M events, which took 18 hours on their infrastructure.
Their solution: event stream truncation. Events older than 2 years were archived to cold storage (S3), keeping the hot event store at 400GB with 18 months of recent events. New projections could start from a snapshot at the 2-year boundary and replay only the recent 18 months of events — reducing rebuild time from 18 hours to 90 minutes.
The lesson: plan for event store archival from the beginning. Every CQRS implementation will eventually need to add new projections, and rebuilding from scratch across years of events is expensive. Snapshotting + event archival is the only practical path.
Related Posts
- Microservices Roadmap - CQRS and event sourcing are companion patterns for building event-driven microservices where command and query responsibilities are separated and services communicate through an event bus
See Also
- CQRS and Event Sourcing — The full pattern combining CQRS with event sourcing
- Event-Driven Architecture — Asynchronous messaging patterns
- Consistency Models — Strong vs eventual consistency tradeoffs
Interview Questions
The projection lag indicates events are accumulating faster than the projection can process them. The fixes, in order of preference: add more projection workers (parallelism), optimize the projection query (add indexes on the read model), batch event processing in the projection (process 100 events per transaction instead of 1), or reduce the event volume going to that projection (use a separate stream). If lag is acceptable to the business, document the SLA explicitly and add a last_updated timestamp to the read model so applications can display staleness to users.
The key question: does the new read model need 5 years of history, or only recent data? If recent-only: take a snapshot of current state, create a new projection starting from the snapshot + replaying only events since the snapshot. If full history: plan for a long rebuild window (hours to days depending on event volume), run the rebuild offline during maintenance, and do not serve the new read model publicly until rebuild completes. Meanwhile, backfill the read model in batches and monitor error rates — old events may have schema issues that need upcaster logic.
This is a concurrency conflict. Without conflict resolution, both commands might pass aggregate version checks but produce invalid state. The standard solution: use optimistic concurrency with expected version numbers. Each aggregate has a version. Commands include expected_version. If the aggregate's current version does not match expected, the command is rejected and the client retries. For CQRS, this means the command handler must load the aggregate, check version, and emit events — if two commands race, one succeeds and one gets a ConcurrencyException.
CQRS is overkill when your read and write workloads are roughly symmetric — the same queries you use for reading are roughly the same complexity as your writes, and you do not have specific scalability concerns. It is also overkill for simple CRUD applications where a single model handles both adequately. The additional infrastructure (event bus, separate read models, synchronization logic) adds operational complexity that must be justified by actual performance or modeling benefits. If you cannot name a specific problem CQRS solves for your domain, start with a well-designed CRUD model.
A command represents intent to change state — "PlaceOrder," "UpdateAddress." It is directed at a specific aggregate and can be rejected if business rules are violated. An event represents something that happened — "OrderPlaced," "AddressUpdated." It is facts about the past and cannot be rejected. The distinction matters because commands flow synchronously to their aggregate and return success/failure; events flow asynchronously from the aggregate to interested consumers (read models, other services). Once an event is emitted, it cannot be undone — only compensated by a subsequent event. If you model user actions as events when they should be commands, you lose the ability to reject invalid state transitions. If you model commands as events, you lose the synchronous response model.
Use the outbox pattern: store the event in a transactional outbox table within the same database transaction as the aggregate state change. A separate process polls the outbox and publishes events. This guarantees at-least-once delivery: if the command succeeds and the outbox write succeeds, the event will eventually be published. Without the outbox pattern, you face a dilemma: commit the aggregate state without publishing (data is saved but downstream systems do not know), or publish before commit (event is published but aggregate state might not be saved). The outbox pattern solves both by making event publication a byproduct of the same transaction that writes the aggregate.
Read models converge through event ordering guarantees. Each event has a sequence number or timestamp. A projection processes events in order and applies them to the read model. If a projection processes events 1-1000, its read model reflects state at event 1000. If another projection processes the same events, both reach the same state (deterministic event application). The convergence depends on: events being published in order (event store enforces this per stream), projections being idempotent (processing event 5 twice does not produce wrong state), and no race conditions in projection updates. Set explicit eventual consistency SLAs: "read model X is guaranteed to be within 30 seconds of write model." Monitor projection lag and alert if any projection exceeds the SLA.
Sagas handle multi-step business processes that span multiple aggregates. In CQRS, each aggregate is a consistency boundary — changes to one aggregate are atomic, but changes across aggregates require coordination. A saga orchestrates this coordination by issuing commands to each aggregate and handling responses or timeouts. Example: a "PlaceOrder" saga issues a command to the Order aggregate, then issues a command to the Payment aggregate, then issues a command to the Inventory aggregate. If any step fails, the saga issues compensating commands to undo previous steps. Without sagas, you cannot implement business processes that span multiple aggregates. With sagas, you accept eventual consistency across aggregates but maintain consistency within each.
Use upcasters: functions that transform old event versions to the current version during replay. When an event schema evolves (add field, rename field, change type), you write an upcaster that transforms v1 events to v2 structure. Projections always receive the current version — they do not need to know about old schemas. Store the version number in each event payload. When replaying, the event store or projection framework runs the appropriate upcaster chain based on the event's version. Keep all upcasters forever — never delete them, as old events remain in the store. Test upcasters by replaying the full event history and verifying aggregate state matches expected values. Versioning strategy: backward-compatible additions (new optional field) do not need upcasters; breaking changes (rename, remove, change type) require upcasters.
Yes, you can implement CQRS with traditional databases. The write side uses a normalized schema optimized for write integrity (any RDBMS works). The read side uses denormalized projections stored as tables or as a separate read-optimized database (different schema, potentially different technology). PostgreSQL handles both well: write store as transactional tables, read models as separate tables updated via triggers or application-level event consumers. You can also use different databases for each side: PostgreSQL for writes, Elasticsearch for read-heavy search projections, Redis for real-time counters. CQRS is an architectural pattern, not a technology requirement. The key is that writes and reads use different models — the underlying storage technology can be the same or different depending on your scale and query needs.
The read model rebuild problem: adding a new read model projection requires replaying all historical events, which can take hours or days if the event store is large. Solutions: snapshot + incremental replay (snapshot at event N, replay only events N+1 to now), event archival (archive events older than 2 years to cold storage, new projections replay only recent events plus the snapshot), or parallel replay (partition the event log and replay in parallel across multiple workers). Plan for this from day one: implement snapshots at fixed event-count intervals, implement event archival to cold storage, and design your projection framework to support parallel replay. Without this, adding a new read model to a 5-year-old CQRS system is a multi-day operation.
In CQRS, reporting queries should use dedicated read models, not query across write-side aggregates. If you need a report that spans multiple aggregates (e.g., orders + customers + products), build a reporting projection that consumes events from all three aggregate types and produces a denormalized read model (e.g., `order_detail_view` with customer name, product name, order total). This is the fundamental CQRS design: do not query the write model for reads. The reporting projection runs asynchronously, consuming events and building a view optimized for reporting queries. If you need ad-hoc reporting without a pre-built projection, either use a separate reporting database (data warehouse or analytics DB) that receives the same events, or accept that CQRS is not the right pattern for highly ad-hoc query workloads.
Optimistic concurrency prevents two commands from racing to modify the same aggregate. Each aggregate has a version number. A command includes expected_version. When the command handler processes the command, it loads the aggregate, checks if current version matches expected, and if not, rejects the command with a ConcurrencyException. Both commands might arrive simultaneously; one succeeds (version matches), one fails (version changed). The failing client retries with the new version. Without optimistic concurrency, two simultaneous commands could both pass validation and produce an invalid aggregate state (e.g., two orders placed for the same inventory item). Implement optimistic concurrency at the aggregate level: version check happens inside the aggregate, not at the database level.
Test command handlers by sending commands and verifying events are produced: load aggregate from snapshot, apply command, assert expected events emitted. Test projections by replaying events and asserting read model state: start with empty read model, process event sequence, assert read model matches expected state. Use integration tests to verify the full flow: command → event → projection → read model. Key testing patterns: command handler tests should be deterministic (no external dependencies), projection tests should use real event sequences including edge cases (empty events, high-sequence events, interleaved streams). Mock the event store for unit tests; use a real event store for integration tests. Also test idempotency: if a projection processes the same event twice, the read model should be correct (not duplicated).
CQRS is a pattern about separating read and write models. Event-driven architecture (EDA) is a pattern about communicating through events rather than direct calls. CQRS can exist without EDA: you could have separate read and write databases with synchronous data propagation (CDC but no async messaging). EDA can exist without CQRS: microservices communicating via events but each maintaining its own state directly. They are complementary and often used together: CQRS provides the architectural split between commands and queries, EDA provides the async messaging backbone that connects the two. When people say "CQRS system," they usually mean both CQRS + event sourcing + EDA, but the pure patterns are distinct.
A 4-hour rebuild window for 500M events requires aggressive parallelism and strategic snapshotting. Strategy: implement parallel replay by partitioning the event stream across multiple workers (e.g., partition by aggregate ID hash, 10 parallel workers each processing 50M events). Use coarse-grained snapshots taken at yearly boundaries: during the 5-year history, take snapshots at year boundaries (4 snapshots for 5 years). A new read model starts from the nearest snapshot (e.g., 3 years ago) and replays only the remaining 2 years of events (~200M events). With 10 parallel workers at 100K events/second each: 200M / (10 × 100K) = 200 seconds for the replay portion. Add snapshot loading time and validation: well under 4 hours. Without snapshots, a full 500M event replay at the same rate would take ~500 seconds × 10 workers = 50 seconds for the sequential case, but 500M events at 100K/second = 5000 seconds ≈ 83 minutes — still feasible, but cutting it close if any issues arise.
Multiple projections consuming the same stream can drift apart if one falls behind or errors during processing. Synchronization strategies: assign each projection a cursor (last processed sequence number) persisted separately; if a projection errors, it resumes from its cursor without affecting other projections. For resync, pause all projections, take a consistent snapshot of the read model, reset cursors to a known sequence, and replay events. Alternatively, use a "seek-behind" consumer that always reads from the minimum cursor across all projections. If projections fall out of sync (return different data for the same query), the fix is always a full rebuild of the divergent projection from the snapshot plus replay.
Domain events are emitted by aggregates and represent business facts within a bounded context — "OrderPlaced," "PaymentReceived." They are part of the aggregate's internal model and carry rich payload for the domain. Integration events are cross-service messages that carry only the data needed for external consumers — typically a reduced payload with IDs and timestamps. Use domain events within the write side; use integration events when crossing service boundaries via the event bus. If your event bus serves multiple consumers with different needs, transform domain events into integration events at the bus level so each consumer gets the data contract it expects.
A command router maps incoming commands to their target aggregate instance. The routing logic is typically: extract the aggregate ID from the command payload, resolve which aggregate type handles this command type, load the aggregate from the event store by ID, and dispatch the command. Without routing, commands would have no target. A command router is the component that connects the external interface (API endpoint receiving a command) to the internal aggregate. Example: a CancelOrderCommand with orderId=123 routes to the Order aggregate with ID 123. If the aggregate does not exist, the router returns "not found" rather than creating a new aggregate.
CQRS uses message queues (Kafka, RabbitMQ, Azure Service Bus) to publish events from the write side to the read side. The flow: command handler emits domain event → event bus publishes event → queue delivers to projection consumers. Common pitfalls: at-least-once delivery causing duplicate projections (mitigate with idempotent projection handlers), ordering violations when events for the same aggregate arrive on different partitions (mitigate by keying on aggregate ID), and event bus becoming a bottleneck if projections are synchronous (mitigate with async consumers and lag monitoring). Always design for message redelivery — assume every message will be delivered more than once.
Further Reading
Architecture Resources:
- CQRS Pattern Microsoft Docs — Microsoft CQRS reference
- Event Sourcing and CQRS — Combining CQRS with event sourcing
- Udi Dahan: CQRS — Original CQRS clarification post
Implementation Guides:
- Event Store Documentation — Event store implementation guide
- Axon Framework — CQRS/ES framework for Java
- Marten — Event sourcing and CQRS for .NET/PostgreSQL
Design Patterns:
- Event Sourcing — Full event sourcing pattern discussion
- Event-Driven Architecture — Asynchronous messaging patterns
- Microservices — CQRS within microservice architecture
Conclusion
CQRS separates command and query responsibilities into different models. Commands handle state changes; queries handle data retrieval. The separation enables independent optimization but introduces eventual consistency.
CQRS works well for complex domains with asymmetric read/write workloads, or when paired with event sourcing. It adds significant complexity and is easy to overuse.
Start with a simple CRUD model. If profiling reveals asymmetric needs, consider CQRS as a targeted solution, not a default architecture.
Category
Related Posts
Event Sourcing
Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.
CQRS and Event Sourcing: Distributed Data Management
Learn about Command Query Responsibility Segregation and Event Sourcing patterns for managing distributed data in microservices architectures.
Publish/Subscribe Patterns: Topics, Subscriptions, Filtering
Learn publish-subscribe messaging patterns: topic hierarchies, subscription management, message filtering, fan-out, and dead letter queues.