Read/Write Splitting

Master-slave replication for read scaling. Routing strategies, consistency lag considerations, and when this pattern helps vs hurts your architecture.

published: reading time: 11 min read updated: January 1, 1970

Read/Write Splitting

Databases hit bottlenecks. When reads and writes compete for the same resources, something gives. Read/write splitting attempts to solve this by funneling writes to one place and scattering reads across replicas.

Sounds simple. It is simple, until you need a read to reflect a write that just happened.

How Master-Slave Replication Works

A single primary node handles all writes. Replicas pull changes from the primary’s transaction log and apply them locally. PostgreSQL uses streaming replication. MySQL uses binlog-based replication. Managed services like RDS, Aurora, and Cloud SQL wrap this in a managed endpoint.

The lag between primary commit and replica visibility depends on load, network, and whether you use synchronous or asynchronous replication. Under light load, lag stays under 100ms. Under heavy write load, it can stretch into seconds or more.

Synchronous replication holds the write until a replica confirms the change. This guarantees consistency but adds latency. Asynchronous replication is faster but leaves you with the lag problem.

Routing Strategies

You can route queries at the application level or through a proxy.

flowchart LR
    App["Application"] --> Proxy["Proxy<br/>(ProxySQL / HAProxy)"]
    Proxy -->|writes| Primary[("Primary DB")]
    Proxy -->|reads| R1[("Replica 1")]
    Proxy -->|reads| R2[("Replica 2")]
    Proxy -->|reads| R3[("Replica 3")]
    Primary -->|replicate| R1
    Primary -->|replicate| R2
    Primary -->|replicate| R3

Application-level routing embeds routing logic in your code. Proxy-level routing centralizes rules. Both approaches handle the basic write-to-primary, read-to-replica pattern.

Application-Level Routing

Your code decides where each query goes. You can route specific tables to the primary, pick replicas based on sharding keys, or choose based on current replica lag.

class DatabaseRouter:
    def __init__(self, primary, replicas):
        self.primary = primary
        self.replicas = replicas

    def get_read_node(self):
        return min(self.replicas, key=lambda r: r.lag)

    def route(self, query):
        if query.is_write:
            return self.primary
        elif query.requires_consistency:
            return self.primary
        else:
            return self.get_read_node()

The problem: every routing decision lives in application code. Developers have to know which queries tolerate stale data and which cannot. This becomes a convention that’s hard to enforce and easy to break.

Proxy-Level Routing

ProxySQL, HAProxy, and pgpool-II sit between your application and the database cluster. You define routing rules once and the proxy enforces them.

INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES (1, 1, '^SELECT.*FOR UPDATE', 0, 1);

INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES (2, 1, '^SELECT', 1, 1);

Proxies work when you cannot change application code or want centralized routing logic. The downside is added infrastructure and another component that can fail.

Consistency Lag: The Core Issue

Replica lag is the gap between a write committing on the primary and that write appearing on a replica.

Think about what happens when a user creates a record and immediately reads it:

  1. Application writes to primary: INSERT INTO orders VALUES (...)
  2. Primary commits
  3. Application reads from replica: SELECT * FROM orders WHERE ...
  4. Replica has not yet received or applied the change
  5. User sees nothing

This is the read-after-write consistency problem. Solutions exist but each costs something:

  • Read from primary after writes: reliable but defeats read scaling
  • Synchronous replication: eliminates lag but slows all writes
  • Session-based sticky routing: same replica for a user’s session has lower lag
  • Track LSN: remember the write position and route reads to replicas that caught up

Most applications can tolerate stale reads for dashboards and analytics. Few can tolerate it for content the user just created.

Statement-Based vs Row-Based Replication

MySQL offers two replication formats:

Statement-Based Replication (SBR)

Replicates the SQL statements themselves. The replica runs the same INSERT, UPDATE, or DELETE.

Advantages: compact logs, less storage, deterministic for pure data operations.

Disadvantages: nondeterministic functions like NOW() or UUID() produce different results. Triggers and stored procedures can behave differently. Certain query optimizations cause drift between primary and replica.

Row-Based Replication (RBR)

Replicates the actual row changes—what changed and how.

Advantages: deterministic, handles nondeterministic functions correctly, clear visibility into changes.

Disadvantages: much larger binary logs for bulk operations. You cannot see the SQL that caused the change, which makes debugging harder.

Most production systems use row-based replication. MySQL 8.0 defaults to it. PostgreSQL’s logical replication uses a similar approach.

Statement-Based vs Row-Based Replication Trade-offs

DimensionStatement-Based (SBR)Row-Based (RBR)
Log sizeCompact — stores SQL textLarge — stores row diffs
Nondeterministic functionsRisky (NOW, UUID, RAND)Safe — actual values replicated
Triggers and stored proceduresCan diverge on replicaCorrect by default
Bulk operation overheadLowHigh for large changes
DebuggingSQL visible in logsMust reconstruct from row images
Default in MySQLBefore 8.0MySQL 8.0+
Use whenSimple, deterministic SQLComplex queries, triggers, UDFs

When to Use / When Not to Use Read/Write Splitting

Use read/write splitting when:

  • Read-heavy workload (80%+ reads is a common target)
  • Application tolerates eventual consistency for most reads
  • Replica lag stays bounded under production load
  • Vertical scaling is no longer sufficient

Product catalogs, content management systems, analytics dashboards, and search results work well.

Do not use read/write splitting when:

  • Read-after-write consistency is required — users see their own stale data
  • Write volume outpaces what replicas can apply — lag becomes unbounded
  • Foreign key constraints span both write and read paths — referential integrity checks need primary
  • You expect it to be transparent — routing logic bleeds into application code

I worked on a system where read replicas were supposed to offload the primary. But every feature dealing with the user’s own data—shopping carts, order history, account settings—had to hit the primary anyway. After accounting for those queries, only 30% of reads went to replicas instead of the expected 80%. The replicas sat idle while the primary stayed busy.

Production Failure Scenarios

FailureImpactMitigation
Read-after-write returns stale dataUsers see their own stale data after writingRead from primary after writes, use sticky sessions, or track LSN
Replica lag grows unbounded under write loadRead replicas return very old dataMonitor lag, add replicas, switch to synchronous replication
Proxy routes write to replicaWrite appears to succeed but is not replicatedConfigure proxy rules carefully, test routing before deploying
Read replica promoted without verifying lagData loss when replica becomes primaryCheck lag before promotion, use semi-synchronous replication
Bulk query overwhelms replicaReplica falls behind, affects all reads from itSet replica to read-only, use separate pool for heavy analytics

Capacity Estimation: Replica Lag Under Write Load

Replica lag grows when the replica cannot apply changes as fast as the primary generates them. The key formula: lag_seconds = (write_throughput_per_second * average_event_size) / (replica_apply_rate). Under normal conditions, replica apply rate keeps up with write throughput. Under heavy write bursts, lag accumulates faster than it can be resolved.

For PostgreSQL streaming replication, the replica lag metric is pg_stat_replication.write_lsn - replay_lsn. This is the difference between the WAL position the primary has written and the position the replica has replayed. Under a sustained write rate of 10,000 transactions per second, each generating 10KB of WAL, you produce 100MB of WAL per second. If the replica’s replay rate is 80MB per second (because of slower disk or higher CPU load), you accumulate 20MB of lag per second — roughly 12 seconds of lag after one minute.

The practical implication: size your replicas to handle your peak write throughput. If your primary can sustain 5,000 writes per second but your replica is running on half the CPU, the replica will fall behind during peak load even if it keeps up at average load. Monitor replica CPU and disk throughput alongside replication lag — lag often precedes a CPU or I/O saturation problem on the replica.

For MySQL with parallel replication (MariaDB or MySQL 8.0 with slave_parallel_workers), the formula improves because the replica can apply changes in parallel. With 4 worker threads instead of 1, replica throughput roughly quadruples. Enable parallel replication when replica lag starts growing despite adequate CPU and I/O headroom.

Observability Hooks: Lag Metrics and Replica Heartbeat Monitoring

For PostgreSQL, the primary monitoring query is pg_stat_replication:

SELECT
    client_addr,
    state,
    sent_lsn - replay_lsn AS lag_bytes,
    (sent_lsn - replay_lsn)::float / 1024 / 1024 AS lag_mb,
    write_lsn - replay_lsn AS write_lag,
    flush_lsn - replay_lsn AS flush_lag
FROM pg_stat_replication
WHERE application_name IN ('replica_1', 'replica_2');

The state column tells you whether replication is streaming (streaming) or lagging (catchup). Alert when lag_mb exceeds your RPO threshold — if your RPO is 30 seconds and writes generate 100MB per second, alert at 3GB lag.

For MySQL, SHOW REPLICA STATUS\G exposes Seconds_Behind_Master — but this metric is misleading. It measures the replica’s SQL thread delay, not the actual replication lag. If the replica’s IO thread is disconnected, Seconds_Behind_Master shows zero even though the replica is hours behind. Better metrics: Read_Master_Log_Pos compared to Exec_Master_Log_Pos shows how far the replica has applied changes, and Relay_Log_Space shows how much unapplied data is accumulating on the replica.

For both databases, set up heartbeat tables on the primary that the replica reads. The heartbeat records the current timestamp on the primary. The replica reads it and calculates actual lag as now() - heartbeat_timestamp. This works even during replication disconnects and is more reliable than relying on WAL positions alone.

The critical alert: replica lag exceeding your RPO. If your RPO is 5 minutes, alert at 4 minutes — giving a 1-minute window to investigate and potentially promote the replica before your RPO is breached.

Interview Questions

Q: Your application writes to the primary and immediately reads from a replica. Under asynchronous replication, the replica is consistently 2-3 seconds behind. What consistency guarantee does the read see?

The read sees eventual consistency with a lag of 2-3 seconds. The write that just happened on the primary is not visible on the replica for 2-3 seconds. If the application needs read-after-write consistency, it must either read from the primary after writes or use synchronous replication. For most web applications, 2-3 seconds of lag is acceptable. For financial or inventory systems, it is not.

Q: You have 3 read replicas but notice that under heavy write load, one replica consistently falls behind while the other two stay current. What is happening and how do you fix it?

The lagging replica is a bottleneck — it cannot replay WAL entries as fast as the primary generates them. This is usually caused by slower disk I/O (the replica is applying changes from disk rather than cache), less CPU than the other replicas, or a long-running query on that replica blocking the replication apply thread. Check the replica’s CPU, disk throughput, and pg_stat_activity for blocking queries. The fix is either to upgrade the replica’s hardware, isolate it from other workloads, or use a load balancer that routes reads away from the lagging replica until it catches up.

Q: What is the difference between synchronous and asynchronous replication in the context of read/write splitting?

Asynchronous replication: the primary commits writes and responds to the client immediately, then replicates to replicas in the background. Replicas may lag by milliseconds to seconds. Read/write splitting with asynchronous replication means some reads return stale data. Synchronous replication: the primary waits for at least one replica to confirm the write before committing. This guarantees the replica has the write immediately, eliminating stale reads on replicas that have confirmed. The tradeoff is write latency — every write now waits for a network round-trip to the replica.

Q: You are designing a read/write splitting strategy for an e-commerce application. Which queries should go to replicas and which should go to the primary?

All writes go to the primary. For reads: product catalog browsing, search results, and cached user profiles can go to replicas because small staleness is acceptable — a product price that is 10 seconds old is fine for browsing. For reads immediately after writes: shopping cart contents, order confirmation pages, and inventory counts should go to the primary because users just wrote that data and expect to see it immediately. Session data and user preference reads depend on the business requirement — if a user changes their display name and immediately sees the old name, that may be confusing but is not a correctness problem for most applications.

Conclusion

Read/write splitting is a real technique with real limits. The key questions: how much read traffic can tolerate stale data, and how much does write volume exceed what a single node handles? Answer those honestly before committing.

Start with your database’s replication documentation and measure replica lag under actual production load.


Category

Related Posts

Database Capacity Planning: A Practical Guide

Plan for growth before you hit walls. This guide covers growth forecasting, compute and storage sizing, IOPS requirements, and cloud vs on-prem decisions.

#database #capacity-planning #infrastructure

Connection Pooling: HikariCP, pgBouncer, and ProxySQL

Learn connection pool sizing, HikariCP, pgBouncer, and ProxySQL, timeout settings, idle management, and when pooling helps or hurts performance.

#database #connection-pooling #performance

Database Monitoring: Metrics, Tools, and Alerting

Keep your PostgreSQL database healthy with comprehensive monitoring. This guide covers query latency, connection usage, disk I/O, cache hit ratios, and alerting with pg_stat_statements and Prometheus.

#database #monitoring #observability