Read/Write Splitting
Master-slave replication for read scaling. Routing strategies, consistency lag considerations, and when this pattern helps vs hurts your architecture.
Read/Write Splitting
Databases hit bottlenecks. When reads and writes compete for the same resources, something gives. Read/write splitting attempts to solve this by funneling writes to one place and scattering reads across replicas.
Sounds simple. It is simple, until you need a read to reflect a write that just happened.
Introduction
Read/write splitting routes writes to a primary database and reads to replicas. The goal is to reduce primary load by offloading reads while writes go to a single authoritative node. The complexity comes from replication lag: a read immediately after a write might return stale data, and application code that assumes read-your-writes consistency breaks.
This guide covers how master-slave replication works (streaming replication in PostgreSQL, binlog-based in MySQL), the routing strategies (application-level and proxy-level), consistency lag considerations, and when read/write splitting actually helps versus when it adds unnecessary complexity. It also covers the sticky-session problem and how to architect for it.
How Master-Slave Replication Works
A single primary node handles all writes. Replicas pull changes from the primary’s transaction log and apply them locally. PostgreSQL uses streaming replication. MySQL uses binlog-based replication. Managed services like RDS, Aurora, and Cloud SQL wrap this in a managed endpoint.
The lag between primary commit and replica visibility depends on load, network, and whether you use synchronous or asynchronous replication. Under light load, lag stays under 100ms. Under heavy write load, it can stretch into seconds or more.
Synchronous replication holds the write until a replica confirms the change. This guarantees consistency but adds latency. Asynchronous replication is faster but leaves you with the lag problem.
Routing Strategies
You can route queries at the application level or through a proxy.
flowchart LR
App["Application"] --> Proxy["Proxy<br/>(ProxySQL / HAProxy)"]
Proxy -->|writes| Primary[("Primary DB")]
Proxy -->|reads| R1[("Replica 1")]
Proxy -->|reads| R2[("Replica 2")]
Proxy -->|reads| R3[("Replica 3")]
Primary -->|replicate| R1
Primary -->|replicate| R2
Primary -->|replicate| R3
Application-level routing embeds routing logic in your code. Proxy-level routing centralizes rules. Both approaches handle the basic write-to-primary, read-to-replica pattern.
Application-Level Routing
Your code decides where each query goes. You can route specific tables to the primary, pick replicas based on sharding keys, or choose based on current replica lag.
class DatabaseRouter:
def __init__(self, primary, replicas):
self.primary = primary
self.replicas = replicas
def get_read_node(self):
return min(self.replicas, key=lambda r: r.lag)
def route(self, query):
if query.is_write:
return self.primary
elif query.requires_consistency:
return self.primary
else:
return self.get_read_node()
The problem: every routing decision lives in application code. Developers have to know which queries tolerate stale data and which cannot. This becomes a convention that’s hard to enforce and easy to break.
Proxy-Level Routing
ProxySQL, HAProxy, and pgpool-II sit between your application and the database cluster. You define routing rules once and the proxy enforces them.
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES (1, 1, '^SELECT.*FOR UPDATE', 0, 1);
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES (2, 1, '^SELECT', 1, 1);
Proxies work when you cannot change application code or want centralized routing logic. The downside is added infrastructure and another component that can fail.
Consistency Lag: The Core Issue
Replica lag is the gap between a write committing on the primary and that write appearing on a replica.
Think about what happens when a user creates a record and immediately reads it:
- Application writes to primary:
INSERT INTO orders VALUES (...) - Primary commits
- Application reads from replica:
SELECT * FROM orders WHERE ... - Replica has not yet received or applied the change
- User sees nothing
This is the read-after-write consistency problem. Solutions exist but each costs something:
- Read from primary after writes: reliable but defeats read scaling
- Synchronous replication: eliminates lag but slows all writes
- Session-based sticky routing: same replica for a user’s session has lower lag
- Track LSN: remember the write position and route reads to replicas that caught up
Most applications can tolerate stale reads for dashboards and analytics. Few can tolerate it for content the user just created.
Statement-Based vs Row-Based Replication
MySQL offers two replication formats:
Statement-Based Replication (SBR)
Replicates the SQL statements themselves. The replica runs the same INSERT, UPDATE, or DELETE.
Advantages: compact logs, less storage, deterministic for pure data operations.
Disadvantages: nondeterministic functions like NOW() or UUID() produce different results. Triggers and stored procedures can behave differently. Certain query optimizations cause drift between primary and replica.
Row-Based Replication (RBR)
Replicates the actual row changes—what changed and how.
Advantages: deterministic, handles nondeterministic functions correctly, clear visibility into changes.
Disadvantages: much larger binary logs for bulk operations. You cannot see the SQL that caused the change, which makes debugging harder.
Most production systems use row-based replication. MySQL 8.0 defaults to it. PostgreSQL’s logical replication uses a similar approach.
Statement-Based vs Row-Based Replication Trade-offs
| Dimension | Statement-Based (SBR) | Row-Based (RBR) |
|---|---|---|
| Log size | Compact — stores SQL text | Large — stores row diffs |
| Nondeterministic functions | Risky (NOW, UUID, RAND) | Safe — actual values replicated |
| Triggers and stored procedures | Can diverge on replica | Correct by default |
| Bulk operation overhead | Low | High for large changes |
| Debugging | SQL visible in logs | Must reconstruct from row images |
| Default in MySQL | Before 8.0 | MySQL 8.0+ |
| Use when | Simple, deterministic SQL | Complex queries, triggers, UDFs |
When to Use / When Not to Use Read/Write Splitting
Use read/write splitting when:
- Read-heavy workload (80%+ reads is a common target)
- Application tolerates eventual consistency for most reads
- Replica lag stays bounded under production load
- Vertical scaling is no longer sufficient
Product catalogs, content management systems, analytics dashboards, and search results work well.
Do not use read/write splitting when:
- Read-after-write consistency is required — users see their own stale data
- Write volume outpaces what replicas can apply — lag becomes unbounded
- Foreign key constraints span both write and read paths — referential integrity checks need primary
- You expect it to be transparent — routing logic bleeds into application code
I worked on a system where read replicas were supposed to offload the primary. But every feature dealing with the user’s own data—shopping carts, order history, account settings—had to hit the primary anyway. After accounting for those queries, only 30% of reads went to replicas instead of the expected 80%. The replicas sat idle while the primary stayed busy.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Read-after-write returns stale data | Users see their own stale data after writing | Read from primary after writes, use sticky sessions, or track LSN |
| Replica lag grows unbounded under write load | Read replicas return very old data | Monitor lag, add replicas, switch to synchronous replication |
| Proxy routes write to replica | Write appears to succeed but is not replicated | Configure proxy rules carefully, test routing before deploying |
| Read replica promoted without verifying lag | Data loss when replica becomes primary | Check lag before promotion, use semi-synchronous replication |
| Bulk query overwhelms replica | Replica falls behind, affects all reads from it | Set replica to read-only, use separate pool for heavy analytics |
Capacity Estimation: Replica Lag Under Write Load
Replica lag grows when the replica cannot apply changes as fast as the primary generates them. The key formula: lag_seconds = (write_throughput_per_second * average_event_size) / (replica_apply_rate). Under normal conditions, replica apply rate keeps up with write throughput. Under heavy write bursts, lag accumulates faster than it can be resolved.
For PostgreSQL streaming replication, the replica lag metric is pg_stat_replication.write_lsn - replay_lsn. This is the difference between the WAL position the primary has written and the position the replica has replayed. Under a sustained write rate of 10,000 transactions per second, each generating 10KB of WAL, you produce 100MB of WAL per second. If the replica’s replay rate is 80MB per second (because of slower disk or higher CPU load), you accumulate 20MB of lag per second — roughly 12 seconds of lag after one minute.
The practical implication: size your replicas to handle your peak write throughput. If your primary can sustain 5,000 writes per second but your replica is running on half the CPU, the replica will fall behind during peak load even if it keeps up at average load. Monitor replica CPU and disk throughput alongside replication lag — lag often precedes a CPU or I/O saturation problem on the replica.
For MySQL with parallel replication (MariaDB or MySQL 8.0 with slave_parallel_workers), the formula improves because the replica can apply changes in parallel. With 4 worker threads instead of 1, replica throughput roughly quadruples. Enable parallel replication when replica lag starts growing despite adequate CPU and I/O headroom.
Observability Hooks: Lag Metrics and Replica Heartbeat Monitoring
For PostgreSQL, the primary monitoring query is pg_stat_replication:
SELECT
client_addr,
state,
sent_lsn - replay_lsn AS lag_bytes,
(sent_lsn - replay_lsn)::float / 1024 / 1024 AS lag_mb,
write_lsn - replay_lsn AS write_lag,
flush_lsn - replay_lsn AS flush_lag
FROM pg_stat_replication
WHERE application_name IN ('replica_1', 'replica_2');
The state column tells you whether replication is streaming (streaming) or lagging (catchup). Alert when lag_mb exceeds your RPO threshold — if your RPO is 30 seconds and writes generate 100MB per second, alert at 3GB lag.
For MySQL, SHOW REPLICA STATUS\G exposes Seconds_Behind_Master — but this metric is misleading. It measures the replica’s SQL thread delay, not the actual replication lag. If the replica’s IO thread is disconnected, Seconds_Behind_Master shows zero even though the replica is hours behind. Better metrics: Read_Master_Log_Pos compared to Exec_Master_Log_Pos shows how far the replica has applied changes, and Relay_Log_Space shows how much unapplied data is accumulating on the replica.
For both databases, set up heartbeat tables on the primary that the replica reads. The heartbeat records the current timestamp on the primary. The replica reads it and calculates actual lag as now() - heartbeat_timestamp. This works even during replication disconnects and is more reliable than relying on WAL positions alone.
The critical alert: replica lag exceeding your RPO. If your RPO is 5 minutes, alert at 4 minutes — giving a 1-minute window to investigate and potentially promote the replica before your RPO is breached.
Quick Recap Checklist
- All writes go to primary; reads distribute to replicas
- Read-after-write consistency requires primary reads after writes
- Replica lag grows under heavy write load — monitor continuously
- Session-based sticky routing reduces lag for user-scoped reads
- Row-based replication (RBR) safer than statement-based (SBR) for nondeterministic functions
- Proxy-level routing centralizes rules; application-level routing offers flexibility
- Bulk analytics queries should hit dedicated replica, not shared pool
- Connection pooling prevents connection storms during failover
- Track LSN to route reads to replicas that have caught up
- Test routing rules before deploying to production
Related Posts
- Database Replication — The foundations of how data copies between nodes
- Database Scaling — Vertical vs horizontal scaling strategies
- Relational Databases — Understanding ACID and transaction isolation
Interview Questions
The read sees eventual consistency with a lag of 2-3 seconds. The write that just happened on the primary is not visible on the replica for 2-3 seconds. If the application needs read-after-write consistency, it must either read from the primary after writes or use synchronous replication. For most web applications, 2-3 seconds of lag is acceptable. For financial or inventory systems, it is not.
The lagging replica is a bottleneck — it cannot replay WAL entries as fast as the primary generates them. This is usually caused by slower disk I/O (the replica is applying changes from disk rather than cache), less CPU than the other replicas, or a long-running query on that replica blocking the replication apply thread. Check the replica's CPU, disk throughput, and pg_stat_activity for blocking queries. The fix is either to upgrade the replica's hardware, isolate it from other workloads, or use a load balancer that routes reads away from the lagging replica until it catches up.
Asynchronous replication: the primary commits writes and responds to the client immediately, then replicates to replicas in the background. Replicas may lag by milliseconds to seconds. Read/write splitting with asynchronous replication means some reads return stale data. Synchronous replication: the primary waits for at least one replica to confirm the write before committing. This guarantees the replica has the write immediately, eliminating stale reads on replicas that have confirmed. The tradeoff is write latency — every write now waits for a network round-trip to the replica.
ProxySQL uses the mysql_query_rules table to define routing. Key rules: for transactions marked as read-only (via SET TRANSACTION READ ONLY), route to the replica hostgroup; for writes (INSERT, UPDATE, DELETE, or transactions not marked read-only), route to the primary hostgroup. You also need to configure mysql_servers to define primaries and replicas with their hostgroups, and mysql_users to authenticate. The scheduler in ProxySQL can also run periodic checks to verify replica lag and dynamically adjust weights. The rule order matters — ProxySQL evaluates rules top-down and applies the first matching rule. Use match_digest for regex matching on query patterns and apply=1 to ensure the rule is applied.
GTID (Global Transaction Identifier) assigns a unique UUID + sequence number to each transaction. File-position replication uses binlog filename and byte offset to track replication position. GTID simplifies failover because you specify the GTID position to resume from rather than a file+offset. It also makes it easier to detect skipped transactions and verify replica consistency. Choose GTID when you need automatic position tracking, easier failover, or when using MySQL 8.0+ (GTID is the default). Choose file-position when using older MySQL versions, when you need finer-grained control over replication position, or when your existing tooling depends on binlog coordinates.
Statement-based replication (SBR) logs SQL statements — lower storage but nondeterministic for functions like NOW() or UUID(). Row-based replication (RBR) logs actual row changes — deterministic but larger binlogs for bulk operations. Mixed mode switches between them: uses SBR for deterministic statements, RBR for nondeterministic ones. For read/write splitting, row-based is safer because it correctly replicates all changes regardless of function determinism. MySQL 8.0 defaults to row-based. SBR can cause data drift between primary and replica when nondeterministic functions produce different values.
Auto-commit means every statement is its own transaction. In read/write splitting, you want to route multi-statement transactions to the primary to maintain consistency. With auto-commit, each SELECT is a separate transaction — the router can send each to a different replica, which means two reads in the same logical operation hit different replicas with different lag states. Additionally, if the ORM sends BEGIN implicitly or uses connection pooling with transaction interleaving, a read can be routed to a replica while the previous write is still pending on the primary. Fix by disabling auto-commit at the application level, using explicit transactions for related query groups, or ensuring the routing layer tracks transaction context.
Configure max lag tolerance in your routing layer. ProxySQL has max_replication_lag per server — if lag exceeds this, the server is excluded from the active pool. Application-level: check pg_stat_replication (PostgreSQL) or SHOW REPLICA STATUS (MySQL) and only route reads to replicas whose lag is below your tolerance. Use weighted routing: replicas with lower lag get more weight. For PostgreSQL, synchronous_commit = remote_apply ensures replicas are fully current before the primary commits, eliminating lag at the cost of write latency. Alternatively, use a connection pool that monitors lag and temporarily excludes lagging replicas.
Sticky routing binds a user's session to a specific replica for a period after they write. The idea: the replica that just received the write (or is closest to the primary) stays associated with that user for subsequent reads. This reduces read-after-write inconsistency because the user's reads hit the same replica that has already received their writes. Implementation: track the LSN or GTID position after writes, and route the user's reads to a replica that has replayed past that position. Most load balancers and proxies support session affinity (sticky cookies or source IP). The tradeoff is reduced load distribution — some replicas may be overloaded if their associated users are more active.
pgpool-II is a PostgreSQL-specific pooling and routing solution. It supports read/write splitting via primary_router_mode and query routing rules. ProxySQL is database-agnostic (works with MySQL, PostgreSQL, MariaDB). pgpool-II runs inline — queries pass through pgpool — while ProxySQL runs as a sidecar proxy. pgpool-II can be simpler for pure PostgreSQL setups but adds latency as queries pass through it. ProxySQL offers more granular routing rules, better health checking, and is more flexible for multi-database environments. Both support connection pooling, health checks, and query routing. Choose pgpool-II for a tightly integrated PostgreSQL solution; choose ProxySQL for heterogeneous database environments or when you need more routing control.
Implement a tiered routing strategy. Route inventory checks, payment processing, and any query that must reflect recent writes to the primary. Route product browsing, search results, and cached content to replicas. At the application level, annotate queries: mark some as requires_consistency=true and those go to primary; others go to replicas. Some ORMs support this via hinting or explicit read/write mode. Alternatively, use synchronous replication for the primary + one replica (synchronous target) so that replica is always current — this replica handles the consistency-sensitive reads while other replicas handle purely eventual-consistent reads.
Key metrics: replica lag per replica (alert if > 30 seconds), primary CPU and connection count, replica CPU and connection count, query latency by routing target (primary vs replica), error rate on replicas, and replication slot age (PostgreSQL). Set alerts: replica lag > your RPO threshold (e.g., 30s), primary connections > 80% of max, replica CPU > 80%, query latency p99 exceeding your SLA. Track the ratio of reads going to primary vs replicas — if primary reads are high (above 20% for read-heavy workloads), your routing is misconfigured. Monitor replication I/O and SQL thread status in MySQL to catch replica failures early.
Connection pooling (PgBouncer, ProxySQL pooling) sits in front of your routing layer. If the pool is not shard-aware, connections from the pool may hit different replicas for the same session, defeating sticky routing and increasing read-after-write inconsistency. Pitfalls: if you use transaction-mode pooling and read/write splitting at the proxy level, a single connection may be used for both reads and writes within a transaction, causing routing confusion. The fix: ensure the pool is configured to route connections appropriately based on query type, or implement routing at the application level before the pool. Also ensure the pool does not hold connections to replicas that are lagging behind — pool health checks should include replication lag.
Cascading replication compounds lag. If primary commits at T0, replica1 receives at T0 + 100ms (network), and replica2 receives from replica1 at T0 + 200ms. The lag for replica2 is effectively double the single-hop lag. For read/write splitting, cascading makes sense only if the intermediate replica (replica1) also serves reads — otherwise you are adding latency for no benefit. Use cascading when you need many replicas and the primary cannot handle all replication streams simultaneously. Monitor lag at each tier — a lagging intermediate replica causes all downstream replicas to lag equally. If latency matters for your reads, avoid cascading or use synchronous replication to the first tier.
A query joins across multiple tables that were recently updated. The write committed on the primary. The replica has applied the change to one table but not yet to a joined table due to different replication apply times. The read hits the replica, sees the updated first table but stale joined table, and returns rows that never existed together on the primary — silently incorrect join results with no error. This is a read-after-write inconsistency that manifests as wrong query results rather than missing data. The application sees what appears to be correct data but is actually a cross-section of old and new states. The fix: use primary for queries that join recently-written tables, or use synchronous replication.
Prepared statements cache execution plans. In PostgreSQL, a generic plan is cached for a prepared statement regardless of parameter values. For partitioned tables, this can be problematic — the cached plan may be suboptimal for different parameter values. For read/write splitting specifically, if a prepared statement is first executed with a write (routed to primary) and then reused for reads (routed to replica), the plan may be optimized for the primary's statistics rather than the replica's. Use PREPARE ... FOR EXECUTE with explicit execution rather than generic prepared statement reuse across different query types. For read/write splitting with replicas, prefer client-side parameter binding with custom plans rather than generic cached plans.
Replication slots prevent WAL from being removed on the primary until the replica acknowledges receiving it. This ensures that if a replica falls behind or disconnects, the primary retains the WAL needed to resync. In a read/write splitting setup, if a replica falls behind and the replication slot is not properly configured, the primary's disk can fill with WAL that cannot be recycled — causing the primary to stop accepting writes. Conversely, if a slot is misconfigured with max_slot_wal_keep_size too small, the primary may remove WAL that a lagging replica still needs, and when that replica reconnects it cannot resume and must be re-initialized, causing data loss. Always monitor replication slot age and ensure wal_keep_size or max_slot_wal_keep_size is set appropriately for your replica lag tolerance.
Vitess is a database middleware that sits in front of MySQL and handles sharding and read/write splitting transparently. It routes reads to replicas and writes to primaries based on SQL parsing — it understands SQL and can determine query type without application hints. Unlike application-level routing (where routing logic is in your code) or proxy-level routing (like ProxySQL with rules), Vitess provides a single logical database endpoint and handles routing internally. It also handles failover, resharding, and connection pooling automatically. The tradeoff is operational complexity — Vitess requires its own cluster management. For read/write splitting without full sharding, Vitess is heavyweight; ProxySQL or application-level routing is simpler. Use Vitess when you need both read/write splitting and horizontal sharding together.
Mixed transactions are the hardest problem in read/write splitting. A transaction like BEGIN; SELECT (read); INSERT (write); SELECT (read); COMMIT must route the read to a replica but the write to the primary — but within a single transaction, you cannot have two different connections to two different servers. The router must detect this pattern and route the entire transaction to the primary, defeating read scaling for that transaction. Some proxies (ProxySQL) detect transaction type from the first statement and pin the connection — if the first statement is a write, the whole transaction goes to primary; if read-only, it can go to replica. Application-level routers handle this explicitly by tracking transaction state. The failure mode: an ORM that uses auto-commit or interleaves statements can route reads to replicas within a transaction that has pending writes on the primary, returning stale data for those reads. The safest approach: at the application layer, mark transactions as read-only or read-write explicitly, and route accordingly. If your framework auto-commits every statement, every statement is a new transaction and routing is per-statement (not per-batch), which is easier to handle.
Long-running transactions that started on the primary before failover may fail or produce inconsistent results on the new primary — the transaction's session state is lost, locks may be held differently, and the new primary has a different replication position. If the transaction was read-only, it may simply fail to commit. If it was a write transaction, the writes may be lost entirely. Strategies to minimize failures: keep transaction durations short so they are less likely to span a failover event. Use application-level transaction retry logic with idempotent operations — catch connection errors and re-execute failed transactions. Avoid multi-statement transactions that hold locks across the failover window. For read/write splitting specifically, use sticky sessions to route the same user to the same replica, reducing the chance that a transaction spans a failover. Monitor in-flight transaction count and alert before failover — a spike in long-running transactions indicates higher risk during promotion.
Further Reading
Official Documentation:
- PostgreSQL High Availability — Streaming replication, replica configuration
- MySQL Replication — Binary log formats, replica configuration
Tools:
- ProxySQL — SQL-aware proxy for MySQL and PostgreSQL routing
- HAProxy — TCP/HTTP load balancer with database support
- pgpool-II — PostgreSQL connection pooler and replication proxy
- PgBouncer — Lightweight PostgreSQL connection pooler
Deep Dives:
- ProxySQL Read/Write Splitting — Rule-based routing configuration
- MySQL Replication HA Solutions — Comparison of replication-based HA approaches
Conclusion
Read/write splitting is a real technique with real limits. The key questions: how much read traffic can tolerate stale data, and how much does write volume exceed what a single node handles? Answer those honestly before committing.
Start with your database’s replication documentation and measure replica lag under actual production load.
Category
Related Posts
Database Capacity Planning: A Practical Guide
Plan for growth before you hit walls. This guide covers growth forecasting, compute and storage sizing, IOPS requirements, and cloud vs on-prem decisions.
Connection Pooling: HikariCP, pgBouncer, and ProxySQL
Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.
Database Monitoring: Metrics, Tools, and Alerting
Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.