CQRS Pattern

Separate read and write models. Command vs query models, eventual consistency implications, event sourcing integration, and when CQRS makes sense.

published: March 26, 2026 reading time: 26 min read author: GeekWorkBench updated: January 1, 1970

Quick Summary

CQRS separates command (write) models from query (read) models, letting each be optimized independently. The write model handles business logic and state transitions; the read model serves queries and projections. This enables different data structures optimized for each operation—writes use normalized structures enforcing integrity while reads use denormalized projections for query efficiency. CQRS typically involves eventual consistency since reads and writes use separate data stores updated asynchronously. CQRS adds significant complexity and is only worth it when separation provides concrete value like significantly different read/write workloads or independent scaling needs.

CQRS Pattern

Most systems use the same model for reading and writing. You have a Customer table. You insert customers, update customers, delete customers, query customers. One schema, one set of operations.

CQRS—Command Query Responsibility Segregation—splits this. Separate models handle reads and writes. The write model handles business logic and state transitions. The read model handles queries and projections.

This enables different data structures optimized for each operation. Writes use a normalized structure enforcing integrity. Reads use a denormalized structure serving queries efficiently.

Introduction

CQRS separates command (write) models from query (read) models, letting each be optimized independently. In a typical CRUD system, you use the same schema for inserting orders and displaying order summaries. In CQRS, the write side uses a normalized model enforcing business rules; the read side uses denormalized projections optimized for query patterns. This separation enables different teams to work on reads and writes independently and lets each side scale according to its own load profile.

This guide covers the command versus query distinction, how CQRS integrates with event sourcing (commands create events, queries read projections), the eventual consistency implications of having separate models, and the scenarios where CQRS provides enough benefit to justify the added complexity versus when simpler approaches suffice.

Command Model vs Query Model

A command represents intent to change state. “Place order,” “update shipping address,” “cancel subscription.” Commands return success or failure, not data.

A query represents a request for data. “Show my orders,” “what is my account balance,” “list products in category.” Queries return data without changing state.

In a typical CRUD system, these look identical. A POST to /orders might create an order and return it. A GET to /orders lists orders. Both hit the same database and often the same table structure.

CQRS separates these. Commands go to one model, queries to another.

class PlaceOrderCommand:
    def __init__(self, customer_id, line_items):
        self.customer_id = customer_id
        self.line_items = line_items

class GetCustomerOrdersQuery:
    def __init__(self, customer_id):
        self.customer_id = customer_id

Eventual Consistency Implications

CQRS typically involves separate data stores for reads and writes. The write store holds authoritative state. The read store updates asynchronously from the write store.

Reads might return stale data. A user places an order, gets success, immediately queries their order history, and might not see the new order yet because the read store has not caught up.

This eventual consistency is a direct consequence of the architectural choice. You gain performance and scalability through asynchronous updates. You pay with temporary inconsistency.

The impact depends on your domain. Financial systems often cannot tolerate stale reads. Social feeds can accept stale data—your new post does not need to appear instantly.

Handling Consistency in User Interfaces

CQRS introduces a temporal gap between command acceptance and query visibility. Users submit a command, receive confirmation, and immediately query for the result. The read model may not yet reflect the new state. Designing UI around this gap is a practical challenge that most CQRS tutorials skip.

Immediate confirmation vs. poll-based refresh. Show a success state the moment the command handler returns, then start polling the read endpoint until the projection catches up. For most user-facing operations, 1-2 second intervals work fine. If your projections are faster than that, WebSockets beat polling: the server pushes an event when the read model updates, and the client switches from “processing” to “visible” without a round-trip.

Optimistic UI with rollback. Pretend the command worked and update the UI immediately. If it actually failed—rejected, timed out, hit a constraint—revert the change and surface an error. This approach shines when commands succeed most of the time, like creating a record or editing your own profile. It falls apart in contested scenarios, like two users racing to reserve the last seat on a flight. Seeing an optimistic update get pulled back feels worse than seeing a “busy, try again” state from the start.

Staleness indicators. Lag is not always a problem to solve. Sometimes it is just a fact. In those cases, show it: a “last updated 8 seconds ago” label, a subtle spinner during the catch-up window, anything that sets expectations without drama. Users accept stale data when they know it is stale. What breaks trust is data that looks current but is not.

Conflict resolution UI. Two commands racing on the same aggregate means one gets rejected. The user on the losing end needs to understand what happened, what the current state actually is, and what their options are. A modal that says “operation failed” with no context is a good way to lose a user. Give them enough to act: retry with fresh data, merge manually, or cancel and reload.

If a user places an order and the confirmation page shows “processing,” the temporary inconsistency is visible but acceptable. If the confirmation page shows an empty order list, users lose confidence.

Event Sourcing Integration

CQRS pairs naturally with event sourcing. When commands succeed, they produce events. Events are stored in an event store. Read models consume events and build projections.

flowchart LR
    Command[("Command<br/>PlaceOrder")] --> Validator[("Validate<br/>Business Rules")]

    Validator -->|valid| EventStore[("Event Store<br/>Append-only log")]

    Validator -->|invalid| Reject[("Reject<br/>Return Error")]

    EventStore -->|publish| Bus[("Event Bus<br/>or Stream")]

    Bus --> Projection1[("Projection 1<br/>Orders by Customer")]
    Bus --> Projection2[("Projection 2<br/>Revenue Dashboard")]
    Bus --> Projection3[("Projection 3<br/>Inventory Status")]

    Projection1 --> ReadModel1[("Read Model 1<br/>Customer Order View")]
    Projection2 --> ReadModel2[("Read Model 2<br/>Revenue Report")]
    Projection3 --> ReadModel3[("Read Model 3<br/>Stock Levels")]

    ReadModel1 --> Query1[("Query<br/>GET /customers/123/orders")]
    ReadModel2 --> Query2[("Query<br/>GET /reports/revenue")]
    ReadModel3 --> Query3[("Query<br/>GET /products/sku/inventory")]

Event sourcing provides a complete audit trail. Every state change is recorded as an event. Current state is derived by replaying events. This makes read models rebuildable from scratch at any time.

Event sourcing is not required for CQRS. You can use CQRS with a traditional write database and separate read stores updated via change data capture. But event sourcing makes the architecture cleaner because events are the natural bridge between command and query sides.

When CQRS Makes Sense

CQRS provides value when:

Read and write workloads differ significantly in complexity or volume
Different user roles need different views of the same data
You need to scale reads and writes independently
Eventual consistency is acceptable
You benefit from separate optimization for each operation

A content management system might use CQRS. Authors write content; readers consume it. Write operations involve validation and publishing pipelines. Read operations involve search, filtering, and personalization. These workloads have nothing in common.

A real-time collaboration tool might use CQRS. User actions generate events that update a shared document. The read model projects current state from events. Multiple users see the same document with minimal latency.

When CQRS Is Overkill

CQRS adds significant complexity. Two models to maintain, synchronization between them, eventual consistency to explain. This complexity is only justified if separation provides concrete value.

CQRS is overkill when:

Read and write operations are similar in complexity
You do not need to scale reads and writes independently
Strong consistency is required
Your team lacks experience with distributed patterns
Simplicity matters more than optimization

A simple CRUD application with balanced read/write workloads rarely benefits from CQRS. The added complexity outweighs the benefits.

Relationship with Read/Write Splitting

Read/write splitting also separates reads and writes, but at the infrastructure level rather than the architectural level.

Read/write splitting routes queries to replicas while writes go to the primary. Data is identical modulo replication lag. CQRS separates the actual models—reads and writes might use completely different schemas.

Read/write splitting is simpler. Use it when you need to scale reads and writes but your models remain fundamentally the same. CQRS is more powerful but more complex. Use it when you need fundamentally different read and write representations.

Production Failure Scenarios

Failure	Impact	Mitigation
Projection lag causing stale reads	User sees outdated data after write	Show “processing” state in UI, poll for updates, set max acceptable lag SLAs
Event ordering violations	Read model ends up in inconsistent state	Use sequence numbers or causation IDs, idempotent projections
Command replay from event store	Duplicate commands applied, data corruption	Idempotent command handlers, deduplication via correlation IDs
Read model rebuild blocking writes	Event store frozen during snapshot	Use live rebuild with zero-downtime projection rebuild strategies
Multiple read models out of sync	Different queries return conflicting data	Accept eventual consistency windows, clearly document lag expectations
Event schema changes breaking projections	Old events cause projection crashes	Version events, use upcasters, maintain backward compatibility

CQRS vs Traditional CRUD Trade-offs

Dimension	CQRS	Traditional CRUD
Read/write model complexity	Separate models, each optimized	Single model serves both
Consistency model	Eventual consistency by default	Strong consistency typically
Write performance	Can be higher if read model is separate	Tied to read model structure
Read performance	Highly optimized per query shape	Limited by normalized schema
Operational complexity	High — dual models, sync logic	Low — single model
Horizontal scaling	Independent read/write scaling	Coupled scaling
Eventual consistency window	Variable — can be ms to seconds	N/A — always consistent
Best for	Complex domains, asymmetric workloads	Simple domains, balanced operations

Capacity Estimation: Event Store Sizing and Projection Rebuild Time

CQRS with event sourcing requires sizing the event store and understanding projection rebuild times.

Event store storage formula:

avg_event_bytes = avg_event_payload_bytes + overhead_per_event
total_event_store_bytes = avg_event_bytes × events_per_stream × number_of_streams × duplication_factor

For an order management system with 10M orders, each generating ~20 events on average (placed, paid, shipped, etc.), with 500 bytes average event payload:

Total events: 200M
Avg event size: 500 + 100 bytes (metadata, timestamps, stream ID) = 600 bytes
Raw event store: 200M × 600 = 120GB
With projections creating read model snapshots (1 snapshot per stream ≈ 10 snapshots per stream on average): 120GB × 1.3 = ~156GB total

Projection rebuild time formula:

projection_rebuild_time = (events_per_stream × avg_event_process_time) × number_of_streams / parallelism

For a customer_order_history projection processing 20 events per order at 1ms per event, for 10M orders with 10 parallel workers:

Sequential time: 200M events × 1ms = 200,000 seconds
With 10 workers: 20,000 seconds ≈ 5.5 hours

This is why snapshots matter: rebuild from snapshot + replay of only recent events reduces rebuild time dramatically. If snapshots are taken every 100 events and the last snapshot was at event 950 of 1000, you replay only 50 events instead of 1000.

Read model storage: Read models are typically denormalized projections stored in a query-optimized format. A customer_order_history read model for 10M customers, averaging 20 orders each, storing order summary (100 bytes per order): 10M × 20 × 100 = 20GB. Read model storage grows with both customer count and order count, unlike event store which only grows with event count.

Real-World Case Study: GetEventStore at Scale

GetEventStore is an open-source event store used by many organizations implementing CQRS and event sourcing. One production deployment handled a financial trading platform with 50,000 daily active traders, each generating 200-500 events per trading session.

The event store grew to 2TB over 3 years, serving 50M events. The challenge was not storage — it was projection rebuild time when adding new read models. A new trader_performance_summary projection required scanning all 50M events, which took 18 hours on their infrastructure.

Their solution: event stream truncation. Events older than 2 years were archived to cold storage (S3), keeping the hot event store at 400GB with 18 months of recent events. New projections could start from a snapshot at the 2-year boundary and replay only the recent 18 months of events — reducing rebuild time from 18 hours to 90 minutes.

The lesson: plan for event store archival from the beginning. Every CQRS implementation will eventually need to add new projections, and rebuilding from scratch across years of events is expensive. Snapshotting + event archival is the only practical path.

Microservices Roadmap - CQRS and event sourcing are companion patterns for building event-driven microservices where command and query responsibilities are separated and services communicate through an event bus

Interview Questions

1. Your read model is consistently 30 seconds behind the write model. Users complain they see data that is 30 seconds stale. How do you fix this?

The projection lag indicates events are accumulating faster than the projection can process them. The fixes, in order of preference: add more projection workers (parallelism), optimize the projection query (add indexes on the read model), batch event processing in the projection (process 100 events per transaction instead of 1), or reduce the event volume going to that projection (use a separate stream). If lag is acceptable to the business, document the SLA explicitly and add a last_updated timestamp to the read model so applications can display staleness to users.

2. You need to add a new read model to an existing system with 5 years of events. How do you approach this?

The key question: does the new read model need 5 years of history, or only recent data? If recent-only: take a snapshot of current state, create a new projection starting from the snapshot + replaying only events since the snapshot. If full history: plan for a long rebuild window (hours to days depending on event volume), run the rebuild offline during maintenance, and do not serve the new read model publicly until rebuild completes. Meanwhile, backfill the read model in batches and monitor error rates — old events may have schema issues that need upcaster logic.

3. What happens when two commands targeting the same aggregate arrive simultaneously?

This is a concurrency conflict. Without conflict resolution, both commands might pass aggregate version checks but produce invalid state. The standard solution: use optimistic concurrency with expected version numbers. Each aggregate has a version. Commands include expected_version. If the aggregate's current version does not match expected, the command is rejected and the client retries. For CQRS, this means the command handler must load the aggregate, check version, and emit events — if two commands race, one succeeds and one gets a ConcurrencyException.

4. When is CQRS overkill?

CQRS is overkill when your read and write workloads are roughly symmetric — the same queries you use for reading are roughly the same complexity as your writes, and you do not have specific scalability concerns. It is also overkill for simple CRUD applications where a single model handles both adequately. The additional infrastructure (event bus, separate read models, synchronization logic) adds operational complexity that must be justified by actual performance or modeling benefits. If you cannot name a specific problem CQRS solves for your domain, start with a well-designed CRUD model.

5. What is the difference between commands and events in CQRS? Why does this distinction matter?

A command represents intent to change state — "PlaceOrder," "UpdateAddress." It is directed at a specific aggregate and can be rejected if business rules are violated. An event represents something that happened — "OrderPlaced," "AddressUpdated." It is facts about the past and cannot be rejected. The distinction matters because commands flow synchronously to their aggregate and return success/failure; events flow asynchronously from the aggregate to interested consumers (read models, other services). Once an event is emitted, it cannot be undone — only compensated by a subsequent event. If you model user actions as events when they should be commands, you lose the ability to reject invalid state transitions. If you model commands as events, you lose the synchronous response model.

6. How do you handle partial failure in a CQRS system when the command succeeds but the event publication fails?

Use the outbox pattern: store the event in a transactional outbox table within the same database transaction as the aggregate state change. A separate process polls the outbox and publishes events. This guarantees at-least-once delivery: if the command succeeds and the outbox write succeeds, the event will eventually be published. Without the outbox pattern, you face a dilemma: commit the aggregate state without publishing (data is saved but downstream systems do not know), or publish before commit (event is published but aggregate state might not be saved). The outbox pattern solves both by making event publication a byproduct of the same transaction that writes the aggregate.

7. In a CQRS system with multiple read models, how do you ensure all read models converge to the same state?

Read models converge through event ordering guarantees. Each event has a sequence number or timestamp. A projection processes events in order and applies them to the read model. If a projection processes events 1-1000, its read model reflects state at event 1000. If another projection processes the same events, both reach the same state (deterministic event application). The convergence depends on: events being published in order (event store enforces this per stream), projections being idempotent (processing event 5 twice does not produce wrong state), and no race conditions in projection updates. Set explicit eventual consistency SLAs: "read model X is guaranteed to be within 30 seconds of write model." Monitor projection lag and alert if any projection exceeds the SLA.

8. What is the relationship between sagas and CQRS? When do you need sagas in a CQRS system?

Sagas handle multi-step business processes that span multiple aggregates. In CQRS, each aggregate is a consistency boundary — changes to one aggregate are atomic, but changes across aggregates require coordination. A saga orchestrates this coordination by issuing commands to each aggregate and handling responses or timeouts. Example: a "PlaceOrder" saga issues a command to the Order aggregate, then issues a command to the Payment aggregate, then issues a command to the Inventory aggregate. If any step fails, the saga issues compensating commands to undo previous steps. Without sagas, you cannot implement business processes that span multiple aggregates. With sagas, you accept eventual consistency across aggregates but maintain consistency within each.

9. How do you version the event schema in a CQRS system without breaking existing projections?

Use upcasters: functions that transform old event versions to the current version during replay. When an event schema evolves (add field, rename field, change type), you write an upcaster that transforms v1 events to v2 structure. Projections always receive the current version — they do not need to know about old schemas. Store the version number in each event payload. When replaying, the event store or projection framework runs the appropriate upcaster chain based on the event's version. Keep all upcasters forever — never delete them, as old events remain in the store. Test upcasters by replaying the full event history and verifying aggregate state matches expected values. Versioning strategy: backward-compatible additions (new optional field) do not need upcasters; breaking changes (rename, remove, change type) require upcasters.

10. How does CQRS affect database choice? Can you use a traditional RDBMS for CQRS?

Yes, you can implement CQRS with traditional databases. The write side uses a normalized schema optimized for write integrity (any RDBMS works). The read side uses denormalized projections stored as tables or as a separate read-optimized database (different schema, potentially different technology). PostgreSQL handles both well: write store as transactional tables, read models as separate tables updated via triggers or application-level event consumers. You can also use different databases for each side: PostgreSQL for writes, Elasticsearch for read-heavy search projections, Redis for real-time counters. CQRS is an architectural pattern, not a technology requirement. The key is that writes and reads use different models — the underlying storage technology can be the same or different depending on your scale and query needs.

11. What is the read model rebuild problem and how do you solve it in a production CQRS system?

The read model rebuild problem: adding a new read model projection requires replaying all historical events, which can take hours or days if the event store is large. Solutions: snapshot + incremental replay (snapshot at event N, replay only events N+1 to now), event archival (archive events older than 2 years to cold storage, new projections replay only recent events plus the snapshot), or parallel replay (partition the event log and replay in parallel across multiple workers). Plan for this from day one: implement snapshots at fixed event-count intervals, implement event archival to cold storage, and design your projection framework to support parallel replay. Without this, adding a new read model to a 5-year-old CQRS system is a multi-day operation.

12. In CQRS, how do you handle reporting queries that require data from multiple aggregates?

In CQRS, reporting queries should use dedicated read models, not query across write-side aggregates. If you need a report that spans multiple aggregates (e.g., orders + customers + products), build a reporting projection that consumes events from all three aggregate types and produces a denormalized read model (e.g., `order_detail_view` with customer name, product name, order total). This is the fundamental CQRS design: do not query the write model for reads. The reporting projection runs asynchronously, consuming events and building a view optimized for reporting queries. If you need ad-hoc reporting without a pre-built projection, either use a separate reporting database (data warehouse or analytics DB) that receives the same events, or accept that CQRS is not the right pattern for highly ad-hoc query workloads.

13. What is optimistic concurrency in CQRS and why is it important?

Optimistic concurrency prevents two commands from racing to modify the same aggregate. Each aggregate has a version number. A command includes expected_version. When the command handler processes the command, it loads the aggregate, checks if current version matches expected, and if not, rejects the command with a ConcurrencyException. Both commands might arrive simultaneously; one succeeds (version matches), one fails (version changed). The failing client retries with the new version. Without optimistic concurrency, two simultaneous commands could both pass validation and produce an invalid aggregate state (e.g., two orders placed for the same inventory item). Implement optimistic concurrency at the aggregate level: version check happens inside the aggregate, not at the database level.

14. How do you test a CQRS system? What testing strategies work for command handlers and projections?

Test command handlers by sending commands and verifying events are produced: load aggregate from snapshot, apply command, assert expected events emitted. Test projections by replaying events and asserting read model state: start with empty read model, process event sequence, assert read model matches expected state. Use integration tests to verify the full flow: command → event → projection → read model. Key testing patterns: command handler tests should be deterministic (no external dependencies), projection tests should use real event sequences including edge cases (empty events, high-sequence events, interleaved streams). Mock the event store for unit tests; use a real event store for integration tests. Also test idempotency: if a projection processes the same event twice, the read model should be correct (not duplicated).

15. What is the difference between CQRS and event-driven architecture? Are they the same thing?

CQRS is a pattern about separating read and write models. Event-driven architecture (EDA) is a pattern about communicating through events rather than direct calls. CQRS can exist without EDA: you could have separate read and write databases with synchronous data propagation (CDC but no async messaging). EDA can exist without CQRS: microservices communicating via events but each maintaining its own state directly. They are complementary and often used together: CQRS provides the architectural split between commands and queries, EDA provides the async messaging backbone that connects the two. When people say "CQRS system," they usually mean both CQRS + event sourcing + EDA, but the pure patterns are distinct.

16. Your system needs to rebuild a read model from scratch. The event store contains 500 million events spanning 5 years. The business requires the read model to be online within 4 hours. How do you design the rebuild strategy?

A 4-hour rebuild window for 500M events requires aggressive parallelism and strategic snapshotting. Strategy: implement parallel replay by partitioning the event stream across multiple workers (e.g., partition by aggregate ID hash, 10 parallel workers each processing 50M events). Use coarse-grained snapshots taken at yearly boundaries: during the 5-year history, take snapshots at year boundaries (4 snapshots for 5 years). A new read model starts from the nearest snapshot (e.g., 3 years ago) and replays only the remaining 2 years of events (~200M events). With 10 parallel workers at 100K events/second each: 200M / (10 × 100K) = 200 seconds for the replay portion. Add snapshot loading time and validation: well under 4 hours. Without snapshots, a full 500M event replay at the same rate would take ~500 seconds × 10 workers = 50 seconds for the sequential case, but 500M events at 100K/second = 5000 seconds ≈ 83 minutes — still feasible, but cutting it close if any issues arise.

17. How do you synchronize multiple projections that consume from the same event stream? What happens when they fall out of sync?

Multiple projections consuming the same stream can drift apart if one falls behind or errors during processing. Synchronization strategies: assign each projection a cursor (last processed sequence number) persisted separately; if a projection errors, it resumes from its cursor without affecting other projections. For resync, pause all projections, take a consistent snapshot of the read model, reset cursors to a known sequence, and replay events. Alternatively, use a "seek-behind" consumer that always reads from the minimum cursor across all projections. If projections fall out of sync (return different data for the same query), the fix is always a full rebuild of the divergent projection from the snapshot plus replay.

18. What is the difference between domain events and integration events in CQRS? When would you use each?

Domain events are emitted by aggregates and represent business facts within a bounded context — "OrderPlaced," "PaymentReceived." They are part of the aggregate's internal model and carry rich payload for the domain. Integration events are cross-service messages that carry only the data needed for external consumers — typically a reduced payload with IDs and timestamps. Use domain events within the write side; use integration events when crossing service boundaries via the event bus. If your event bus serves multiple consumers with different needs, transform domain events into integration events at the bus level so each consumer gets the data contract it expects.

19. How do you route commands to the correct aggregate in a CQRS system? What is a command router?

A command router maps incoming commands to their target aggregate instance. The routing logic is typically: extract the aggregate ID from the command payload, resolve which aggregate type handles this command type, load the aggregate from the event store by ID, and dispatch the command. Without routing, commands would have no target. A command router is the component that connects the external interface (API endpoint receiving a command) to the internal aggregate. Example: a CancelOrderCommand with orderId=123 routes to the Order aggregate with ID 123. If the aggregate does not exist, the router returns "not found" rather than creating a new aggregate.

20. How does CQRS integrate with message queues and event buses? What are the common pitfalls?

CQRS uses message queues (Kafka, RabbitMQ, Azure Service Bus) to publish events from the write side to the read side. The flow: command handler emits domain event → event bus publishes event → queue delivers to projection consumers. Common pitfalls: at-least-once delivery causing duplicate projections (mitigate with idempotent projection handlers), ordering violations when events for the same aggregate arrive on different partitions (mitigate by keying on aggregate ID), and event bus becoming a bottleneck if projections are synchronous (mitigate with async consumers and lag monitoring). Always design for message redelivery — assume every message will be delivered more than once.

Conclusion

CQRS separates command and query responsibilities into different models. Commands handle state changes; queries handle data retrieval. The separation enables independent optimization but introduces eventual consistency.

CQRS works well for complex domains with asymmetric read/write workloads, or when paired with event sourcing. It adds significant complexity and is easy to overuse.

Start with a simple CRUD model. If profiling reveals asymmetric needs, consider CQRS as a targeted solution, not a default architecture.

CQRS Pattern

Introduction

Command Model vs Query Model

Eventual Consistency Implications

Handling Consistency in User Interfaces

Event Sourcing Integration

When CQRS Makes Sense

When CQRS Is Overkill

Relationship with Read/Write Splitting

Production Failure Scenarios

CQRS vs Traditional CRUD Trade-offs

Capacity Estimation: Event Store Sizing and Projection Rebuild Time

Real-World Case Study: GetEventStore at Scale

Related Posts

See Also

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

Event Sourcing

CQRS and Event Sourcing: Distributed Data Management

Publish/Subscribe Patterns: Topics, Subscriptions, Filtering