CQRS Pattern
Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.
CQRS Pattern
Most systems use the same model for reading and writing. You have a Customer table. You insert customers, update customers, delete customers, query customers. One schema, one set of operations.
CQRS—Command Query Responsibility Segregation—splits this. Separate models handle reads and writes. The write model handles business logic and state transitions. The read model handles queries and projections.
This enables different data structures optimized for each operation. Writes use a normalized structure enforcing integrity. Reads use a denormalized structure serving queries efficiently.
Command Model vs Query Model
A command represents intent to change state. “Place order,” “update shipping address,” “cancel subscription.” Commands return success or failure, not data.
A query represents a request for data. “Show my orders,” “what is my account balance,” “list products in category.” Queries return data without changing state.
In a typical CRUD system, these look identical. A POST to /orders might create an order and return it. A GET to /orders lists orders. Both hit the same database and often the same table structure.
CQRS separates these. Commands go to one model, queries to another.
class PlaceOrderCommand:
def __init__(self, customer_id, line_items):
self.customer_id = customer_id
self.line_items = line_items
class GetCustomerOrdersQuery:
def __init__(self, customer_id):
self.customer_id = customer_id
Eventual Consistency Implications
CQRS typically involves separate data stores for reads and writes. The write store holds authoritative state. The read store updates asynchronously from the write store.
Reads might return stale data. A user places an order, gets success, immediately queries their order history, and might not see the new order yet because the read store has not caught up.
This eventual consistency is a direct consequence of the architectural choice. You gain performance and scalability through asynchronous updates. You pay with temporary inconsistency.
The impact depends on your domain. Financial systems often cannot tolerate stale reads. Social feeds can accept stale data—your new post does not need to appear instantly.
Handling Consistency in User Interfaces
CQRS often requires careful UI design:
- Show confirmation immediately after commands succeed
- Refresh or poll read views after writes
- Use optimistic UI updates that assume success
- Handle conflicts gracefully when they surface
If a user places an order and the confirmation page shows “processing,” the temporary inconsistency is visible but acceptable. If the confirmation page shows an empty order list, users lose confidence.
Event Sourcing Integration
CQRS pairs naturally with event sourcing. When commands succeed, they produce events. Events are stored in an event store. Read models consume events and build projections.
flowchart LR
Command[("Command<br/>PlaceOrder")] --> Validator[("Validate<br/>Business Rules")]
Validator -->|valid| EventStore[("Event Store<br/>Append-only log")]
Validator -->|invalid| Reject[("Reject<br/>Return Error")]
EventStore -->|publish| Bus[("Event Bus<br/>or Stream")]
Bus --> Projection1[("Projection 1<br/>Orders by Customer")]
Bus --> Projection2[("Projection 2<br/>Revenue Dashboard")]
Bus --> Projection3[("Projection 3<br/>Inventory Status")]
Projection1 --> ReadModel1[("Read Model 1<br/>Customer Order View")]
Projection2 --> ReadModel2[("Read Model 2<br/>Revenue Report")]
Projection3 --> ReadModel3[("Read Model 3<br/>Stock Levels")]
ReadModel1 --> Query1[("Query<br/>GET /customers/123/orders")]
ReadModel2 --> Query2[("Query<br/>GET /reports/revenue")]
ReadModel3 --> Query3[("Query<br/>GET /products/sku/inventory")]
Event sourcing provides a complete audit trail. Every state change is recorded as an event. Current state is derived by replaying events. This makes read models rebuildable from scratch at any time.
Event sourcing is not required for CQRS. You can use CQRS with a traditional write database and separate read stores updated via change data capture. But event sourcing makes the architecture cleaner because events are the natural bridge between command and query sides.
When CQRS Makes Sense
CQRS provides value when:
- Read and write workloads differ significantly in complexity or volume
- Different user roles need different views of the same data
- You need to scale reads and writes independently
- Eventual consistency is acceptable
- You benefit from separate optimization for each operation
A content management system might use CQRS. Authors write content; readers consume it. Write operations involve validation and publishing pipelines. Read operations involve search, filtering, and personalization. These workloads have nothing in common.
A real-time collaboration tool might use CQRS. User actions generate events that update a shared document. The read model projects current state from events. Multiple users see the same document with minimal latency.
When CQRS Is Overkill
CQRS adds significant complexity. Two models to maintain, synchronization between them, eventual consistency to explain. This complexity is only justified if separation provides concrete value.
CQRS is overkill when:
- Read and write operations are similar in complexity
- You do not need to scale reads and writes independently
- Strong consistency is required
- Your team lacks experience with distributed patterns
- Simplicity matters more than optimization
A simple CRUD application with balanced read/write workloads rarely benefits from CQRS. The added complexity outweighs the benefits.
Relationship with Read/Write Splitting
Read/write splitting also separates reads and writes, but at the infrastructure level rather than the architectural level.
Read/write splitting routes queries to replicas while writes go to the primary. Data is identical modulo replication lag. CQRS separates the actual models—reads and writes might use completely different schemas.
Read/write splitting is simpler. Use it when you need to scale reads and writes but your models remain fundamentally the same. CQRS is more powerful but more complex. Use it when you need fundamentally different read and write representations.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Projection lag causing stale reads | User sees outdated data after write | Show “processing” state in UI, poll for updates, set max acceptable lag SLAs |
| Event ordering violations | Read model ends up in inconsistent state | Use sequence numbers or causation IDs, idempotent projections |
| Command replay from event store | Duplicate commands applied, data corruption | Idempotent command handlers, deduplication via correlation IDs |
| Read model rebuild blocking writes | Event store frozen during snapshot | Use live rebuild with zero-downtime projection rebuild strategies |
| Multiple read models out of sync | Different queries return conflicting data | Accept eventual consistency windows, clearly document lag expectations |
| Event schema changes breaking projections | Old events cause projection crashes | Version events, use upcasters, maintain backward compatibility |
CQRS vs Traditional CRUD Trade-offs
| Dimension | CQRS | Traditional CRUD |
|---|---|---|
| Read/write model complexity | Separate models, each optimized | Single model serves both |
| Consistency model | Eventual consistency by default | Strong consistency typically |
| Write performance | Can be higher if read model is separate | Tied to read model structure |
| Read performance | Highly optimized per query shape | Limited by normalized schema |
| Operational complexity | High — dual models, sync logic | Low — single model |
| Horizontal scaling | Independent read/write scaling | Coupled scaling |
| Eventual consistency window | Variable — can be ms to seconds | N/A — always consistent |
| Best for | Complex domains, asymmetric workloads | Simple domains, balanced operations |
Capacity Estimation: Event Store Sizing and Projection Rebuild Time
CQRS with event sourcing requires sizing the event store and understanding projection rebuild times.
Event store storage formula:
avg_event_bytes = avg_event_payload_bytes + overhead_per_event
total_event_store_bytes = avg_event_bytes × events_per_stream × number_of_streams × duplication_factor
For an order management system with 10M orders, each generating ~20 events on average (placed, paid, shipped, etc.), with 500 bytes average event payload:
- Total events: 200M
- Avg event size: 500 + 100 bytes (metadata, timestamps, stream ID) = 600 bytes
- Raw event store: 200M × 600 = 120GB
- With projections creating read model snapshots (1 snapshot per stream ≈ 10 snapshots per stream on average): 120GB × 1.3 = ~156GB total
Projection rebuild time formula:
projection_rebuild_time = (events_per_stream × avg_event_process_time) × number_of_streams / parallelism
For a customer_order_history projection processing 20 events per order at 1ms per event, for 10M orders with 10 parallel workers:
- Sequential time: 200M events × 1ms = 200,000 seconds
- With 10 workers: 20,000 seconds ≈ 5.5 hours
This is why snapshots matter: rebuild from snapshot + replay of only recent events reduces rebuild time dramatically. If snapshots are taken every 100 events and the last snapshot was at event 950 of 1000, you replay only 50 events instead of 1000.
Read model storage: Read models are typically denormalized projections stored in a query-optimized format. A customer_order_history read model for 10M customers, averaging 20 orders each, storing order summary (100 bytes per order): 10M × 20 × 100 = 20GB. Read model storage grows with both customer count and order count, unlike event store which only grows with event count.
Real-World Case Study: GetEventStore at Scale
GetEventStore is an open-source event store used by many organizations implementing CQRS and event sourcing. One production deployment handled a financial trading platform with 50,000 daily active traders, each generating 200-500 events per trading session.
The event store grew to 2TB over 3 years, serving 50M events. The challenge was not storage — it was projection rebuild time when adding new read models. A new trader_performance_summary projection required scanning all 50M events, which took 18 hours on their infrastructure.
Their solution: event stream truncation. Events older than 2 years were archived to cold storage (S3), keeping the hot event store at 400GB with 18 months of recent events. New projections could start from a snapshot at the 2-year boundary and replay only the recent 18 months of events — reducing rebuild time from 18 hours to 90 minutes.
The lesson: plan for event store archival from the beginning. Every CQRS implementation will eventually need to add new projections, and rebuilding from scratch across years of events is expensive. Snapshotting + event archival is the only practical path.
Interview Questions
Q: Your read model is consistently 30 seconds behind the write model. Users complain they see data that is 30 seconds stale. How do you fix this?
The projection lag indicates events are accumulating faster than the projection can process them. The fixes, in order of preference: add more projection workers (parallelism), optimize the projection query (add indexes on the read model), batch event processing in the projection (process 100 events per transaction instead of 1), or reduce the event volume going to that projection (use a separate stream). If lag is acceptable to the business, document the SLA explicitly and add a last_updated timestamp to the read model so applications can display staleness to users.
Q: You need to add a new read model to an existing system with 5 years of events. How do you approach this?
The key question: does the new read model need 5 years of history, or only recent data? If recent-only: take a snapshot of current state, create a new projection starting from the snapshot + replaying only events since the snapshot. If full history: plan for a long rebuild window (hours to days depending on event volume), run the rebuild offline during maintenance, and do not serve the new read model publicly until rebuild completes. Meanwhile, backfill the read model in batches and monitor error rates — old events may have schema issues that need upcaster logic.
Q: What happens when two commands targeting the same aggregate arrive simultaneously?
This is a concurrency conflict. Without conflict resolution, both commands might pass aggregate version checks but produce invalid state. The standard solution: use optimistic concurrency with expected version numbers. Each aggregate has a version. Commands include expected_version. If the aggregate’s current version does not match expected, the command is rejected and the client retries. For CQRS, this means the command handler must load the aggregate, check version, and emit events — if two commands race, one succeeds and one gets a ConcurrencyException.
Q: When is CQRS overkill?
CQRS is overkill when your read and write workloads are roughly symmetric — the same queries you use for reading are roughly the same complexity as your writes, and you do not have specific scalability concerns. It is also overkill for simple CRUD applications where a single model handles both adequately. The additional infrastructure (event bus, separate read models, synchronization logic) adds operational complexity that must be justified by actual performance or modeling benefits. If you cannot name a specific problem CQRS solves for your domain, start with a well-designed CRUD model.
Related Posts
- Microservices Roadmap - CQRS and event sourcing are companion patterns for building event-driven microservices where command and query responsibilities are separated and services communicate through an event bus
Conclusion
CQRS separates command and query responsibilities into different models. Commands handle state changes; queries handle data retrieval. The separation enables independent optimization but introduces eventual consistency.
CQRS works well for complex domains with asymmetric read/write workloads, or when paired with event sourcing. It adds significant complexity and is easy to overuse.
Start with a simple CRUD model. If profiling reveals asymmetric needs, consider CQRS as a targeted solution, not a default architecture.
See Also
- CQRS and Event Sourcing — The full pattern combining CQRS with event sourcing
- Event-Driven Architecture — Asynchronous messaging patterns
- Consistency Models — Strong vs eventual consistency tradeoffs
Category
Related Posts
Event Sourcing
Storing state changes as immutable events. Event store implementation, event replay, schema evolution, and comparison with traditional CRUD approaches.
CQRS and Event Sourcing: Distributed Data Management Patterns
Learn about Command Query Responsibility Segregation and Event Sourcing patterns for managing distributed data in microservices architectures.
Publish/Subscribe Patterns: Topics, Subscriptions, and Filtering
Learn publish-subscribe messaging patterns: topic hierarchies, subscription management, message filtering, fan-out, and dead letter queues.