Distributed Caching: Scaling Cache Across Multiple Nodes
A comprehensive guide to distributed caching — consistent hashing, cache sharding, replica consistency, cache clustering, and handling the unique challenges of multi-node cache environments.
Introduction
Single-node caching works until it doesn’t.
| Limitation | What Happens |
|---|---|
| Memory | Your 16GB cache node fills up. You need 64GB. |
| Requests | Your cache serves 100k requests/second. You need 500k. |
| Availability | Your cache node crashes. Every request hits the database. |
Distributed caching solves all three. But it introduces coordination problems.
Core Concepts
Client-Side Sharding
The client decides which cache node to use based on the key. No proxy needed.
graph LR
A[Client] --> B{Which node?}
B -->|key1| C[Node 1]
B -->|key2| D[Node 2]
B -->|key3| E[Node 3]
import hashlib
from bisect import bisect
class ShardedCache:
def __init__(self, nodes):
self.nodes = nodes
self.ring = {}
self.sorted_keys = []
self._build_ring()
def _build_ring(self):
for node in self.nodes:
for i in range(150):
key = f"{node}:{i}"
hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
self.ring[hash_key] = node
self.sorted_keys.append(hash_key)
self.sorted_keys.sort()
def _get_node(self, key):
if not self.ring:
return None
hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
idx = bisect(self.sorted_keys, hash_key)
if idx >= len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]
def get(self, key):
node = self._get_node(key)
return node.cache.get(key)
def set(self, key, value):
node = self._get_node(key)
node.cache.set(key, value)
Pros: No proxy layer, low latency, simple for small clusters. Cons: Client manages sharding, adding nodes requires moving data.
Proxy-Based Routing
A proxy (like Twemproxy or Redis Cluster client support) routes requests. Clients don’t know about the cluster topology.
graph LR
A[Client] --> B[Proxy]
B --> C[Node 1]
B --> D[Node 2]
B --> E[Node 3]
Clients send all requests to the proxy. The proxy handles sharding logic. This is how managed services like Redis Enterprise and ElastiCache work.
Pros: Client simplicity, topology hidden from clients. Cons: Extra hop (proxy), proxy is a SPOF unless HA, more latency.
Redis Cluster
Redis Cluster is Redis’s native distributed mode. It handles sharding, replication, and failover automatically.
# Minimum Redis Cluster setup (6 nodes = 3 primaries + 3 replicas)
redis-cli --cluster create \
192.168.1.1:7000 \
192.168.1.2:7000 \
192.168.1.3:7000 \
192.168.1.4:7000 \
192.168.1.5:7000 \
192.168.1.6:7000 \
--cluster-replicas 1
import redis
# Redis Cluster client handles topology automatically
r = redis.RedisCluster(
host='192.168.1.1',
port=7000,
skip_full_coverage_check=True
)
# Works the same as single Redis
r.set('key', 'value')
r.get('key')
Redis Cluster splits keys across 16,384 slots. Each primary node owns a subset of slots. If a primary fails, its replica promotes automatically.
Topic-Specific Deep Dives
Cache Coherence
When you have multiple cache nodes, coherence becomes a problem. If you update data in Node 1, how do you invalidate that same data in Node 2?
The Problem
graph TD
A[Write to DB] --> B[Update Node 1]
B -.->|Async| C[Node 2 has stale data]
D[Read from Node 2] --> E[Stale data returned]
Solutions
1. Write-Through All Nodes
On write, update all cache nodes.
def write_to_all_nodes(key, value, nodes):
for node in nodes:
node.set(key, value)
# Problem: More write latency, potential inconsistency if one fails
This works but is slow and fragile. One slow node drags everything down.
2. Invalidation Broadcasting
On write, broadcast invalidation to all nodes.
def invalidate_across_cluster(key, nodes, pubsub):
# Publish invalidation event
pubsub.publish('cache:invalidate', key)
# Subscribe on each node
for node in nodes:
node.delete(key)
# Subscribe to invalidation
pubsub.subscribe('cache:invalidate')
Redis pub/sub is useful here. When data changes, publish an invalidation message. All cache instances subscribe and delete the key.
# On write
redis_cluster.publish('invalidation', json.dumps({'key': 'user:123'}))
# On each cache node
pubsub = redis_cluster.pubsub()
pubsub.subscribe('invalidation')
for message in pubsub.listen():
if message['type'] == 'message':
data = json.loads(message['data'])
cache.delete(data['key'])
3. Consistent Hashing with Virtual Nodes
Use consistent hashing so keys map to the same nodes reliably. Adding a node only moves some keys.
class VirtualNodeSharding:
def __init__(self, nodes, replicas=150):
self.replicas = replicas
self.ring = {}
self.sorted_keys = []
for node in nodes:
for i in range(replicas):
# Virtual node for better distribution
key = f"{node}:vnodes:{i}"
hash_key = self._hash(key)
self.ring[hash_key] = node
self.sorted_keys.append(hash_key)
self.sorted_keys.sort()
def _hash(self, key):
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def get_node(self, key):
hash_key = self._hash(key)
idx = bisect(self.sorted_keys, hash_key)
if idx >= len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]
Session Store Patterns
Sessions are a common distributed caching use case. User sessions need to be available across all application instances.
Redis Session Store
import redis
import json
import secrets
class RedisSessionStore:
def __init__(self, redis_cluster, ttl=86400):
self.redis = redis_cluster
self.ttl = ttl
def create_session(self, user_id, data=None):
session_id = secrets.token_urlsafe(32)
session_key = f"session:{session_id}"
session_data = {
'user_id': user_id,
'created_at': time.time(),
'data': data or {}
}
self.redis.setex(
session_key,
self.ttl,
json.dumps(session_data)
)
return session_id
def get_session(self, session_id):
session_key = f"session:{session_id}"
data = self.redis.get(session_key)
if not data:
return None
return json.loads(data)
def update_session(self, session_id, data):
session_key = f"session:{session_id}"
session = self.get_session(session_id)
if not session:
return False
session['data'].update(data)
session['updated_at'] = time.time()
self.redis.setex(
session_key,
self.ttl,
json.dumps(session)
)
return True
def destroy_session(self, session_id):
session_key = f"session:{session_id}"
self.redis.delete(session_key)
Session affinity vs distributed sessions
If your application runs on multiple instances, sessions need to be shared. Two options:
-
Sticky sessions: Route each user to the same instance. Simpler, but instance failure loses sessions.
-
Distributed sessions: Sessions stored in Redis, available to all instances. More resilient, slightly slower.
For anything critical, use distributed sessions. Sticky sessions will ruin your day when an instance goes down.
High Availability Patterns
Replication
Add replica nodes that copy data from the primary. Reads can go to replicas, spreading load.
# Redis replication config (on replica)
replicaof 192.168.1.1 7000
replica-read-only yes
# Read from replica, write to primary
def get_from_replica(key):
return replica_redis.get(key)
def set_to_primary(key, value):
return primary_redis.setex(key, ttl, value)
def get_cached(key):
# Try primary first (has freshest data)
val = primary_redis.get(key)
if val:
return val
# Fall back to replica
return replica_redis.get(key)
Automatic Failover
Redis Sentinel handles failover automatically. When a primary fails, Sentinel promotes a replica and updates clients.
# Sentinel configuration
sentinel monitor mymaster 192.168.1.1 7000 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
from redis.sentinel import Sentinel
sentinel = Sentinel([
('192.168.1.1', 26379),
('192.168.1.2', 26379),
('192.168.1.3', 26379)
], socket_timeout=0.1)
# Get master and replica connections
master = sentinel.master_for('mymaster')
replica = sentinel.slave_for('mymaster')
# Automatic failover handling
master.set('key', 'value') # Writes always go to master
replica.get('key') # Reads can go to replica
Trade-off Analysis
This section consolidates the key architectural decisions and their implications across distributed caching implementations.
Consistency vs Availability
The classic CAP theorem trade-off manifests directly in cache cluster design:
| Approach | Consistency | Availability | Typical Use Case |
|---|---|---|---|
| Redis Cluster (async) | Eventual | High | Web applications, content caching |
| Redis Sentinel + writes | Strong (sync write) | Moderate (failover) | Financial data, inventory |
| Write-through all nodes | Strong | Low (any node down) | Small clusters, critical consistency |
| Pub/Sub invalidation | Eventual | High | Multi-region, social media |
Latency vs Consistency Trade-offs
| Strategy | Read Latency | Write Latency | Staleness Window |
|---|---|---|---|
| Primary for all ops | ~0.5-1ms | ~0.5-1ms | None (always fresh) |
| Read from replica | ~0.3-0.5ms | ~0.5-1ms | Replication lag (typically <100ms) |
| Probabilistic early exp | ~0.5-1ms | Same as write | TTL × probabilistic window |
| Stale-while-revalidate | ~0.3ms (serve stale) | Background refresh | Configurable, can be seconds |
Memory vs Performance
| Configuration | Memory Efficiency | Performance | Failure Mode |
|---|---|---|---|
| No replication | 100% usable | Highest throughput | Single node failure = data loss |
| 1 replica per master | 50% usable | ~70% of no-replica | 1 node fail = promoted replica serves |
| 2 replicas per master | 33% usable | ~50% of no-replica | More resilience, higher cost |
| Hot key replication | Lower (replicated keys) | Even load distribution | Complexity in invalidation |
Client-Side Sharding vs Proxy-Based Routing
| Factor | Client-Side Sharding | Proxy-Based Routing |
|---|---|---|
| Latency | Lowest (direct) | +1 hop (proxy) |
| Operations complexity | Higher (each client) | Lower (centralized) |
| Scalability | Limited by client | Proxy must scale |
| Failure domain | Client failures affect cache access | Proxy is SPOF unless HA |
| Topology management | Client must track | Proxy handles all |
TTL Selection Trade-offs
| TTL Setting | Cache Efficiency | Data Freshness | Server Load |
|---|---|---|---|
| Very short (<60s) | Low hit rate | Near real-time | High DB load |
| Short (1-5 min) | Moderate hit rate | Fresh enough for most | Moderate DB load |
| Long (1-24 hours) | High hit rate | Stale data risk | Low DB load |
| Adaptive TTL | Variable | Better freshness | Complexity in implementation |
When to Use / When Not to Use
| Architecture | When to Use | When Not to Use |
|---|---|---|
| Client-Side Sharding | Small to medium clusters (<10 nodes); low latency critical; team comfortable with client logic | Large clusters; multi-language environments; teams want simplicity |
| Proxy-Based Routing | Multi-client support; topology hidden; managed service environments | Extra hop latency unacceptable; proxy is single point of failure |
| Redis Cluster | Large datasets (>10GB); need automatic failover; want native sharding | Small datasets; need strict ordering; transaction-heavy workloads |
| Write-Through All Nodes | Small clusters; strong consistency required; write volume manageable | Large clusters; high write volume; eventual consistency acceptable |
| Pub/Sub Invalidation | Multi-region; eventual consistency acceptable; need real-time invalidation | Strict consistency required; high invalidation frequency |
| Consistent Hashing | Frequent node additions/removals; want minimal key remapping | Fixed cluster size; need strict key distribution |
| Sticky Sessions | Simple deployments; session loss acceptable on failure; cost-sensitive | High availability required; sessions must survive instance failures |
| Distributed Sessions | HA critical; multi-instance deployments; sessions must persist | Simple single-instance apps; session loss tolerable |
Decision Guide
graph TD
A[Scale Need] --> B{Memory > Single Node?}
B -->|Yes| C{Need HA?}
B -->|No| D[Single Node Cache]
C -->|Yes| E[Redis Cluster or Replicated Setup]
C -->|No| F{Client Simplicity?}
F -->|Yes| G[Proxy-Based Routing]
F -->|No| H[Client-Side Sharding]
E --> I{Need Automatic Failover?}
I -->|Yes| J[Sentinel or Redis Cluster]
I -->|No| K[Manual Replica Promotion]
Capacity Estimation
Planning a distributed cache cluster requires understanding how data and load distribute across nodes.
Cluster memory sizing formula:
total_cache_memory = sum(memory_per_node × number_of_nodes)
effective_capacity = total_cache_memory × (1 - replication_overhead) × (1 - fragmentation_factor)
usable_capacity = effective_capacity × 0.7 # Keep 30% headroom for spikes
For a Redis Cluster with 3 masters × 16GB each and 1 replica per master (total 6 nodes):
- Total raw memory: 48GB
- Replication overhead: 33% (1 replica per master): 48GB × 0.67 = 32GB effective
- Fragmentation factor: typically 1.1-1.5× depending on allocator: 32GB / 1.2 = 26.7GB
- Usable at 70%: ~18.7GB for cached data
Slot distribution in Redis Cluster: 16,384 slots divided across N masters. With 3 masters: each owns ~5,461 slots. When adding nodes, Redis Cluster migrates slots in units — one slot at a time. The cluster slots command shows current ownership. Plan slot assignments in advance using redis-cli --cluster import or manual cluster setslot commands during maintenance windows.
Hot key detection: Even with consistent hashing, hot keys cause uneven load. The formula for identifying hot keys:
hot_key_probability = (requests_per_key / total_requests) × number_of_nodes
If 1% of keys receive 50% of traffic on a 10-node cluster, those keys are 5× hotter than average. Mitigation: replicate hot keys to all nodes and route randomly, use key spreading (split user:123 into user:123:shard0 … user:123:shard9), or move hot key handling to a dedicated single-node Redis with replica read scaling.
Memcached cluster write capacity: With N nodes in a consistent hash ring, each write goes to one node. Write throughput = single_node_throughput × N (assuming even distribution). If one Memcached node handles 200k ops/sec, a 10-node cluster handles ~2M ops/sec total, but any single key’s write rate is still bound by one node.
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Single cache node crashes | Keys on that node missed; potential database overload | Replica promotion; circuit breaker; replica reads during failover |
| Network partition | Cluster split-brain or unavailable | Redis Cluster handles automatically; Sentinel monitors |
| Proxy failure (Twemproxy) | All cache requests fail | Run multiple proxies with load balancer; keep-alive health checks |
| Shard rebalancing | Temporary key remapping; some misses during migration | Use stable cluster configuration; avoid frequent resharding |
| Invalidation broadcast lost | Stale data on some nodes | Implement periodic reconciliation; use version vectors |
| Replica lag | Stale reads from replicas | Monitor lag; route consistency-critical reads to primary |
| Sentinel election failure | No automatic failover possible | Run 3+ Sentinels; ensure proper quorum configuration |
Common Pitfalls / Anti-Patterns
1. Assuming Hashing Is Evenly Distributed
Poor hash functions or small keyspaces cause hot spots.
# BAD: Modulo-based sharding creates hot spots
def get_node(key, num_nodes):
hash_val = hash(key)
return hash_val % num_nodes # Uneven for small keyspaces
# GOOD: Use consistent hashing with virtual nodes
class ConsistentHashSharding:
def __init__(self, nodes, replicas=150):
self.ring = {}
for node in nodes:
for i in range(replicas):
key = f"{node}:{i}"
self.ring[md5(key)] = node
2. Not Handling Node Failure Gracefully
Assuming all nodes always available leads to cascading failures.
# BAD: No failure handling
def get(key):
return nodes[key % len(nodes)].get(key) # Crashes if node down
# GOOD: Retry on different node, circuit breaker
def get_with_fallback(key, max_retries=3):
for attempt in range(max_retries):
try:
node = get_node_for_key(key)
return node.get(key)
except ConnectionError:
logger.warning("node_connection_failed", node=node, attempt=attempt)
# Circuit breaker would prevent repeated attempts
return get_from_database_fallback(key)
3. Pub/Sub Invalidation Without Guarantees
Pub/Sub is fire-and-forget; messages can be lost.
# BAD: Assuming pub/sub invalidation always works
def on_data_update(key, value):
db.update(key, value)
redis_cluster.publish('invalidation', key) # Fire and forget
# GOOD: Verify invalidation, periodic reconciliation
def on_data_update(key, value):
db.update(key, value)
# Publish invalidation
redis_cluster.publish('invalidation', key)
# Also delete locally to handle the common case
redis_cluster.delete(key)
# For strict consistency: check all nodes periodically
schedule_reconciliation(key)
4. Ignoring Slot Migration Complexity
Adding nodes to Redis Cluster requires migrating slots.
graph LR
A[Before Migration] --> B[During Migration<br/>Slot 123 split]
B --> C[Node1 owns 0-5460]
B --> D[Node2 owns 5461-10922<br/>+ Slot 123 migrating]
C --> E[After Migration<br/>Node1 owns 0-5460]
D --> E
# During slot migration:
# 1. Cluster must be in stable state (no other migrations)
# 2. Target node must accept IMPORTING slot
# 3. Source node must accept MIGRATING slot
# 4. Keys must be moved one by one
# 5. Slot ownership transferred after all keys moved
# Don't add nodes during high traffic - migration causes temporary unavailability
5. Single Sentinel for HA
Sentinel needs quorum to elect a new master.
# BAD: Single Sentinel
# If Sentinel crashes, no failover possible
# GOOD: 3+ Sentinels with proper quorum
sentinel monitor mymaster 192.168.1.1 7000 2 # Quorum of 2
sentinel monitor mymaster 192.168.1.2 7000 2 # On different machines
sentinel monitor mymaster 192.168.1.3 7000 2 # Quorum = 2 of 3
Cache Stampede Prevention
Here’s a scenario I’ve seen play out in production: a popular cache key expires, and suddenly 50 requests hit the database at once. That’s the thundering herd — cache expiration creates traffic spikes that can flatten your database before you know what happened.
Probabilistic Early Expiration
Instead of waiting for TTL to expire, refresh the cache probabilistically before it expires:
import random
import time
class ProbabilisticEarlyExpiration:
def __init__(self, cache_client, beta=0.3):
self.cache = cache_client
self.beta = beta # Higher = more aggressive early refresh
def get(self, key, fetch_func):
cached = self.cache.get(key)
if cached is not None:
# Check if we should early-refresh
metadata = self.cache.get(f"{key}:meta")
if metadata:
age, original_ttl = float(metadata['age']), float(metadata['ttl'])
# Probabilistic early expiration formula from XFetch
probability = self.beta * (age / original_ttl) ** 2
if random.random() < probability:
# Async refresh in background (or sync for simplicity)
try:
new_value = fetch_func(key)
self.cache.set(key, new_value, ttl=original_ttl)
except Exception:
pass # Don't fail if refresh fails
return cached
# Cache miss - fetch and store
value = fetch_func(key)
ttl = self._estimate_ttl(value)
self.cache.set(key, value, ttl=ttl)
self.cache.set(f"{key}:meta", {'age': 0, 'ttl': ttl})
return value
Lock-Based Cache Fill
Use distributed locks so only one request fetches and fills the cache:
import redis
import threading
class LockBasedCacheFill:
def __init__(self, redis_client, lock_ttl=5):
self.redis = redis_client
self.lock_ttl = lock_ttl
def get_or_fill(self, key, fetch_func, ttl=300):
cached = self.redis.get(key)
if cached is not None:
return cached
lock_key = f"lock:{key}"
# Try to acquire lock
acquired = self.redis.set(lock_key, "1", nx=True, ex=self.lock_ttl)
if acquired:
try:
value = fetch_func(key)
self.redis.setex(key, ttl, value)
return value
finally:
self.redis.delete(lock_key)
else:
# Another request is filling - wait and retry
time.sleep(0.1)
cached = self.redis.get(key)
if cached:
return cached
# Fallback after timeout - fetch directly
return fetch_func(key)
Background Refresh with Lease
Give requests a lease on the cache entry so they know when to refresh:
def get_with_lease(self, key, fetch_func, ttl=300):
cached, lease_id = self.redis.get_with_lease(key)
if cached:
remaining_ttl = self.redis.ttl(key)
# If less than 20% of original TTL remains, trigger background refresh
if remaining_ttl < ttl * 0.2:
# Fire-and-forget background refresh
thread = threading.Thread(target=self._background_fill, args=(key, fetch_func))
thread.start()
return cached
return self._fill_cache(key, fetch_func, ttl)
TTL Selection Deep Dive
TTL is one of those settings that seems trivial but will bite you in production. Set it too short and you lose most of your cache benefit. Too long and you’re serving stale data without realizing it.
TTL Selection Framework
| Data Type | Typical TTL Range | Rationale |
|---|---|---|
| User session | 15 min - 24 hours | Session must outlast user activity, but stale sessions hurt |
| Product catalog | 1 hour - 24 hours | Updates are infrequent, stale data has business cost |
| Social media feed | 30 sec - 5 min | Freshness critical, staleness very visible |
| Analytics aggregates | 5 min - 1 hour | Source data changes slowly, aggregate is cheap to refresh |
| Configuration flags | 30 sec - 5 min | Feature flags need fast propagation |
| Search index | 1 hour - 24 hours | Index rebuild is expensive, updates are batched |
| Thumbnail / media URL | 24 hours - 7 days | URLs change rarely but CDN caching matters |
TTL Refresh Strategy
def adaptive_ttl(key, value, access_pattern):
"""
Dynamically set TTL based on access patterns and data type.
"""
base_ttl = {
'session': 86400, # 24 hours
'catalog': 3600, # 1 hour
'feed': 60, # 1 minute
'config': 300, # 5 minutes
'static': 86400 * 7, # 7 days
}
ttl = base_ttl.get(access_pattern, 3600)
# Reduce TTL for frequently changing data
if value.get('updated_at'):
age = time.time() - value['updated_at']
if age > ttl * 0.8:
ttl = int(ttl * 0.5) # Halve TTL for aging data
# Increase TTL for read-heavy data that rarely changes
read_count = value.get('read_count', 0)
if read_count > 10000:
ttl = int(ttl * 1.5)
return ttl
Handling TTL vs Freshness Trade-offs
The max-age header and Redis TTL serve different purposes:
# Redis TTL controls when cache expires internally
# HTTP max-age controls what downstream clients see
# For a product catalog cached at CDN edge:
response.set_header('Cache-Control', 'public, max-age=300') # CDN caches 5 min
redis.setex('product:123', 3600, data) # Redis expires in 1 hour
# Invalidation: when product updates, publish to pub/sub
redis.publish('invalidation', 'product:123')
# Application deletes the key: redis.delete('product:123')
# Next request fetches fresh data, repopulates cache with new TTL
Cache Warming Strategies
Cold cache is one of those things that feels fine in testing and absolutely destroys you at 3am on a Monday. When a cache restarts or scales, you need a strategy to repopulate it without hammering your database.
Proactive Warming
On cache startup or scale-out, pre-populate with estimated hot keys:
class CacheWarmer:
def __init__(self, cache_client, db_client):
self.cache = cache_client
self.db = db_client
def warm_from_access_log(self, access_log_path, top_n=1000):
"""
Warm cache from recent access patterns.
access_log_path: path to access log with key names
"""
from collections import Counter
key_counts = Counter()
with open(access_log_path) as f:
for line in f:
key = self._extract_key_from_log(line)
if key:
key_counts[key] += 1
# Get top N most accessed keys
hot_keys = [k for k, _ in key_counts.most_common(top_n)]
# Batch fetch from database and populate cache
self._batch_fill(hot_keys, batch_size=100)
def _batch_fill(self, keys, batch_size=100):
for i in range(0, len(keys), batch_size):
batch = keys[i:i + batch_size]
# Fetch all from DB in one query
rows = self.db.fetch_many("SELECT * FROM items WHERE id IN (%s)", batch)
for row in rows:
self.cache.setex(f"item:{row['id']}", 3600, json.dumps(row))
def warm_from_key_pattern(self, patterns):
"""
Warm from known key patterns (e.g., all products in category).
"""
for pattern in patterns:
# e.g., "product:*" - scan matching keys
cursor = 0
while True:
cursor, keys = self.cache.scan(cursor, match=pattern, count=100)
for key in keys:
# Re-populate with fresh data
value = self.db.fetch_one(key.replace('cache:', ''))
if value:
self.cache.setex(key, 3600, json.dumps(value))
if cursor == 0:
break
Warm Cache on Node Addition
When adding a new cache node (cluster scale-out), pre-warm it:
graph TD
A[New Node Added] --> B{Key ownership?}
B --> C[Scan hot keys from other nodes]
C --> D[Batch fetch from database]
D --> E[Populate new node]
E --> F[Monitor hit rate on new node]
F --> G{Hit rate > threshold?}
G -->|Yes| H[Node ready]
G -->|No| I[Increase warm priority]
I --> C
def warm_new_node(new_node, cluster, hot_keys):
"""
Pre-populate a new cluster node with hot keys.
"""
for key_batch in chunked(hot_keys, 50):
# Get current value from existing nodes
values = cluster.mget(key_batch)
# Set on new node with same TTL
for key, value in zip(key_batch, values):
if value:
ttl = cluster.ttl(key)
new_node.setex(key, ttl, value)
Lazy vs Eager Warming
| Strategy | Pros | Cons | Best For |
|---|---|---|---|
| Lazy | No upfront cost, only heat what gets used | Initial requests hit DB | Large caches, infrequent cold starts |
| Eager | Instant hot cache, no DB spikes | Wastes resources on unused keys | Predictable traffic, scheduled maintenance |
| Hybrid | Prioritizes hot keys, fills on-demand | Complexity | Production systems with known hot set |
Multi-Tier Caching
Most production systems don’t get far with a single cache layer. At scale, you end up stacking L1 (in-process memory), L2 (Redis or Memcached), and L3 (CDN) — each with different latency, capacity, and cost characteristics.
Common Cache Hierarchy
graph TB
A[Request] --> B[L1: In-Process<br/>Process memory<br/>~1MB, sub-ms]
B -->|miss| C[L2: Distributed Cache<br/>Redis/Memcached<br/>~10GB, <1ms]
C -->|miss| D[L3: CDN Edge<br/>Geographic PoPs<br/>~100GB, 5-20ms]
D -->|miss| E[Database<br/>ms latency]
L1 + L2 Implementation
import threading
import time
class TwoTierCache:
def __init__(self, l1_size=1000, l2_client=None):
# L1: Simple in-memory dict with LRU
self.l1 = {} # key -> (value, expiry)
self.l1_access = {} # key -> last access time for LRU
self.l1_lock = threading.Lock()
self.l1_size = l1_size
# L2: Redis
self.l2 = l2_client
def get(self, key):
# Try L1 first
with self.l1_lock:
if key in self.l1:
value, expiry = self.l1[key]
if time.time() < expiry:
self.l1_access[key] = time.time()
return value
del self.l1[key]
del self.l1_access[key]
# Try L2
if self.l2:
value = self.l2.get(key)
if value:
# Populate L1
self._l1_set(key, value, ttl=300)
return value
return None
def set(self, key, value, ttl=300):
# Write to both tiers
self._l1_set(key, value, ttl)
if self.l2:
self.l2.setex(key, ttl, value)
def _l1_set(self, key, value, ttl):
with self.l1_lock:
# Evict LRU if at capacity
if len(self.l1) >= self.l1_size and key not in self.l1:
lru_key = min(self.l1_access, key=self.l1_access.get)
del self.l1[lru_key]
del self.l1_access[lru_key]
self.l1[key] = (value, time.time() + ttl)
self.l1_access[key] = time.time()
def invalidate(self, key):
with self.l1_lock:
self.l1.pop(key, None)
self.l1_access.pop(key, None)
if self.l2:
self.l2.delete(key)
CDN as L3: Cache Aside with Tier-Aware Client
class TierAwareCacheClient:
def __init__(self, cdn_client, redis_client, db_client):
self.cdn = cdn_client # L3: CDN (e.g., Cloudflare API)
self.redis = redis_client # L2: Redis
self.db = db_client # Origin
def get(self, key):
# Try CDN (L3) - lowest latency for distant users
value = self.cdn.get(key)
if value:
return value
# Try Redis (L2)
value = self.redis.get(key)
if value:
# Populate CDN for next request
self.cdn.set(key, value, ttl=300)
return value
# Cache miss - fetch from DB (L1 effectively)
value = self.db.fetch(key)
if value:
# Write to both L2 and L3
self.redis.setex(key, 3600, value)
self.cdn.set(key, value, ttl=300)
return value
Quick Recap Checklist
- Distributed caching solves memory, throughput, and availability limits of single node
- Client-side sharding gives lowest latency but requires client logic
- Proxy-based routing simplifies clients but adds hop and potential SPOF
- Redis Cluster provides automatic sharding, replication, and failover
- Cache coherence across nodes requires explicit strategy (pub/sub, write-all, etc.)
- Session stores are a common distributed cache use case — distributed sessions beat sticky sessions
- Sentinel handles failover for Redis masters; Redis Cluster handles it natively
- Consistent hashing with virtual nodes (150+ replicas) minimizes remapping on node changes
- Probabilistic early expiration (XFetch) prevents cache stampede without locks
- Lock-based cache fill guarantees single DB hit but adds coordination overhead
- Multi-tier caching (L1/L2/L3) balances latency, capacity, and cost
- Cache invalidation must cascade across all tiers — L1 local, L2 pub/sub, L3 CDN purge
- Monitor replication lag and alert if >100ms to catch stale read issues
- Slot-aware clients must handle MOVED redirects correctly during failover
- Keep 30% memory headroom to avoid eviction under pressure causing latency spikes
Copy/Paste Checklist
Observability Checklist
Metrics to Track
Cluster-Level:
cluster_nodes- Number of nodes, their states, and rolescluster_slots_fail- Number of slots not covered by reachable nodescluster_known_nodes- Total known nodes in cluster
Node-Level:
connected_slaves- Number of replicas connected to each mastermaster_repl_offset- Replication positionslave_repl_offset- Replica lag calculation
# Cluster health check
def check_cluster_health(cluster):
health = {
'healthy': True,
'nodes': {},
'issues': []
}
for node in cluster.nodes:
info = node.info()
health['nodes'][node.name] = {
'role': info.get('role'),
'connected': info.get('connected') == 'true',
'memory': info.get('used_memory'),
}
if info.get('role') == 'master' and int(info.get('connected_slaves', 0)) == 0:
health['issues'].append(f"Master {node.name} has no replicas")
health['healthy'] = len(health['issues']) == 0
return health
# Monitor replication lag
def check_replication_lag(primary, replica):
primary_offset = primary.info()['master_repl_offset']
replica_info = replica.info()
replica_offset = replica_info.get('master_repl_offset', 0)
lag_bytes = primary_offset - replica_offset
# Rough estimate: 1MB = ~1 second at typical network speeds
lag_ms = (lag_bytes / 1024 / 1024) * 1000
if lag_ms > 100:
logger.warning("replication_lag_high",
lag_ms=lag_ms,
lag_bytes=lag_bytes)
Logs to Capture
logger = structlog.get_logger()
# Log cluster topology changes
def log_cluster_event(event_type, node_id, old_state, new_state):
logger.warning("cluster_topology_change",
event_type=event_type,
node_id=node_id,
old_state=old_state,
new_state=new_state,
timestamp=time.time())
# Log failover events
def log_failover(primary_id, new_primary_id):
logger.critical("cache_failover_completed",
old_primary=primary_id,
new_primary=new_primary_id,
timestamp=time.time())
Alert Rules
- alert: CacheClusterNodeDown
expr: cache_cluster_connected_nodes < expected_nodes
for: 1m
labels:
severity: critical
annotations:
summary: "Cache cluster node(s) unavailable"
- alert: CacheReplicationLag
expr: cache_replication_lag_bytes > 10485760 # 10MB
for: 5m
labels:
severity: warning
annotations:
summary: "Cache replication lag exceeds 10MB"
- alert: CacheClusterSlotFail
expr: cache_cluster_slots_fail > 0
for: 30s
labels:
severity: critical
annotations:
summary: "Cache cluster has unreachable slots"
Security Checklist
- Node authentication - Use Redis ACLs or cluster authentication
- Inter-node encryption - Encrypt traffic between cluster nodes
- Network isolation - Cluster nodes should be on private network
- Key prefix namespacing - Prevent collisions in shared cluster
- Monitor cluster events - Watch for unauthorized topology changes
- Secure Sentinel - Sentinel instances should require authentication
- Backup cluster state - Regularly backup cluster configuration
- Validate node identity - Verify node certificates if using TLS
# Redis Cluster secure configuration
# On each node:
requirepass your-cluster-auth
cluster-preferred-endpoint-type ipv4
tls-cluster yes
# In redis.conf:
# Bind to internal network only
bind 10.0.1.1 -::1
# ACL for application user
user appuser on >apppassword ~app:* on >appdb ~* +get +set +del +exists -flushall -flushdb
# Redis Cluster minimum setup (6 nodes)
redis-cli --cluster create \
192.168.1.1:7000 \
192.168.1.2:7000 \
192.168.1.3:7000 \
192.168.1.4:7000 \
192.168.1.5:7000 \
192.168.1.6:7000 \
--cluster-replicas 1
# Sentinel configuration (3 nodes)
sentinel monitor mymaster 192.168.1.1 7000 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 10000
# Pub/Sub invalidation subscription
redis-cli SUBSCRIBE cache:invalidate
# Message format: {"key": "user:123", "action": "delete"}
# Cluster health check
redis-cli cluster info
redis-cli cluster nodes | grep master
# Python Redis Cluster
r = redis.RedisCluster(
host='192.168.1.1',
port=7000,
skip_full_coverage_check=True,
read_from_replicas=True # Read from replicas for scale
)
# Deployment checklist:
# [ ] Minimum 3 masters + replicas for HA
# [ ] Sentinel quorum = (sentinels / 2) + 1
# [ ] Monitor replication lag on all replicas
# [ ] Test failover manually before going to production
# [ ] Implement application-level retry logic for cluster operations
# [ ] Use slot-aware client for efficient routing
# [ ] Set up alerts for cluster node failures
Real-World Case Studies
Redis Cluster Failure at Scale
A major e-commerce platform ran a 24-node Redis Cluster (12 masters, 12 replicas) for session storage and cart data. During a peak traffic event, a network switch failed causing a brief partition between 4 nodes and the rest of the cluster.
The cluster’s quorum-based failover correctly detected that the 4 isolated nodes could not form a majority and did not promote themselves — correct behavior. But the remaining 20 nodes experienced a cascade: the primary for one of the hot key slots was among the isolated nodes. The replica for that slot was in the majority partition. Failover promoted the replica to primary.
The problem: the application was using a slot-aware client that cached slot-to-node mapping. When failover changed slot ownership, the client’s cached mapping became stale. Some application instances still routed requests for the affected slot to the demoted node (now a replica). The replica rejected writes, and those application instances had no mechanism to refresh the slot map — they kept retrying the same failed node.
The recovery took 8 minutes. The fix: application code was updated to handle MOVED errors by refreshing the slot map and retrying, and the slot refresh timeout was reduced from 60 seconds to 5 seconds. Monitoring was added for MOVED error rates per application instance.
The lesson: cluster failover works, but your application’s cluster client must handle MOVED redirects correctly. A slot-aware client that does not honor MOVED creates a partial failure that bypasses cluster resilience. Test failover explicitly — trigger manual failover and observe whether your application recovers cleanly.
Monitoring Distributed Cache
You need visibility into your distributed cache. Key metrics:
# Check cluster health
def check_cluster_health(cluster):
health = {
'nodes': [],
'total_keys': 0,
'total_memory': 0,
'hit_rate': 0
}
for node in cluster.get_nodes():
info = node.info()
health['nodes'].append({
'host': node.host,
'role': node.role,
'connected': node.connected
})
health['total_keys'] += info.get('keys', 0)
health['total_memory'] += info.get('used_memory', 0)
return health
# Alert thresholds
ALERT_THRESHOLDS = {
'memory_usage_percent': 80, # Alert if >80% memory used
'hit_rate_percent': 80, # Alert if hit rate <80%
'connected_clients': 10000, # Alert if >10k clients
'replication_lag_ms': 100 # Alert if replica lag >100ms
}
Interview Questions
Expected answer points:
- Consistent hashing maps keys to nodes using a hash ring with virtual nodes (150+ per physical node)
- When a node is added or removed, only K/N keys are remapped (where K is virtual nodes per node, N is total nodes) — far less than traditional modulo hashing which remaps all keys
- Virtual nodes ensure even key distribution even when physical nodes have different capacities
- Trade-off: still requires some data movement during rebalancing, but impact is bounded and predictable
Expected answer points:
- Cache-aside: application checks cache first, falls back to database on miss, then populates cache — most common pattern, application controls everything
- Read-through: cache automatically fetches from database on miss — simpler for application but less control over serialization
- Write-through: application writes to cache, cache synchronously writes to database — simpler reads but slower writes
- Use cache-aside when: you need control over serialization, multiple data sources, or asymmetric read/write loads
- Use write-through when: reads far outnumber writes and you want simplified read path
Expected answer points:
- Probabilistic early expiration (XFetch): before TTL expires, randomly refresh based on key age and a beta parameter — reduces simultaneous expiration
- Lock-based cache fill: first request acquires a distributed lock and fills; others wait and retry — guarantees single fill but adds lock latency
- Background refresh with lease: serve stale data while refreshing asynchronously — best for latency-sensitive workloads
- Other approaches: random jitter on TTL, staggered TTLs for grouped keys, proactive warming of critical keys
Expected answer points:
- Redis Sentinel: monitoring, notification, and automatic failover for a single master — requires external proxy or client-side logic for routing
- Redis Cluster: native sharding, replication, and failover within the Redis server — no external components needed but shards are independently available
- Sentinel is simpler operationally but requires careful client configuration for failover
- Cluster provides better write scalability (multiple masters) but requires slot-aware clients and has limitations on transactions across slots
- Cluster cannot guarantee ordering across slots; Sentinel can if you use a single master-replica pair
Expected answer points:
- Pub/Sub broadcast: when data updates, publish invalidation event to all nodes — simple but unreliable (messages can be lost)
- Write-through all nodes: synchronously update all cache replicas on write — strong consistency but slow and fragile if nodes fail mid-write
- Version vectors or vector clocks: track which nodes have which version of data — enables conflict resolution but adds complexity
- Periodic reconciliation: periodically scan and compare cache state across nodes — catches missed invalidations but is not real-time
- Hybrids: publish invalidation + also delete locally (handles common case) + periodic reconciliation (catches missed cases)
Expected answer points:
- L1 (in-process): process-local memory, smallest capacity (~1MB), fastest latency — store most frequently accessed hot keys
- L2 (Redis): distributed cache, larger capacity (~10GB), sub-millisecond latency — primary cache for most data
- L3 (CDN): geographically distributed, largest capacity, higher latency (5-20ms) — cache static content, media, responses for distant users
- On read: check L1 → L2 → L3 → database; on write: write to all tiers with appropriate TTLs
- Invalidation must cascade: delete L1 locally + delete L2 in Redis + purge CDN
- Trade-offs: complexity of tier management, consistency across tiers, cost of CDN for dynamic content
Expected answer points:
- Formula: usable_capacity = (sum of node memory) × (1 - replication_overhead) × (1 - fragmentation_factor) × 0.7
- Example: 3 masters × 16GB = 48GB raw; with 1 replica per master, 33% overhead → 32GB effective; fragmentation 1.2× → 26.7GB; 30% headroom → ~18.7GB usable
- Account for replication overhead: each replica mirrors its master (2× raw memory for master+replica pair)
- Fragmentation factor accounts for memory allocator overhead (typically 1.1-1.5×) — depends on key size distribution and allocator settings
- Hot key analysis: identify keys receiving disproportionate traffic; replicate hot keys across nodes or split them
Expected answer points:
- Cluster-level: cluster_nodes (node count), cluster_slots_fail (unreachable slots), cluster_known_nodes (total known nodes)
- Node-level: connected_slaves (replica count per master), master_repl_offset vs slave_repl_offset (replication lag in bytes)
- Memory usage: used_memory_human vs maxmemory — alert if >80% utilized
- Hit rate: keyspace_hits / (keyspace_hits + keyspace_misses) — alert if <80%
- Replica lag: alert if lag_ms > 100ms — stale reads affect correctness
- Client connections: connected_clients — alert if >10k or suddenly spikes
- Slot migration in progress: if cluster is resharding, temporary unavailability and misses expected
Expected answer points:
- Redis Cluster has 16,384 slots; when adding nodes, slots are migrated one-by-one from existing nodes to new nodes
- Migration process: target node accepts IMPORTING, source node accepts MIGRATING, keys are moved one by one, then slot ownership transfers
- During migration, some keys may be temporarily unavailable — requests for migrating slot keys may get MOVED error
- Slot-aware clients handle MOVED by refreshing slot map and retrying on correct node
- Best practices: schedule during low-traffic windows, avoid concurrent migrations, test migration process in staging
- After migration, application clients must handle MOVED errors correctly — if client caches slot map without refreshing, partial failure occurs
Expected answer points:
- Sticky sessions: load balancer routes each user to the same application instance — sessions live in that instance's memory
- Distributed sessions: sessions stored in shared cache (Redis), accessible from any application instance
- Sticky sessions: simpler, faster (no network hop), but instance failure loses all sessions for that user
- Distributed sessions: more resilient, enables horizontal scaling (any instance can serve any user), slightly slower due to cache access
- Choose sticky sessions when: deploying to a single instance or small number of stable instances, session loss on failure is acceptable, cost-sensitive
- Choose distributed sessions when: multi-instance deployment, high availability required, sessions must survive instance failures, horizontal scaling needed
Expected answer points:
- Redis Cluster has 16,384 slots distributed across master nodes; migration moves slots one at a time from source to target node
- Migration process: target node enters IMPORTING state, source enters MIGRATING state, keys are scanned and moved one by one via MIGRATE command
- During migration, keys for the migrating slot may be on either node — cluster ensures atomic operations during transition
- Clients receive MOVED errors when requesting keys on wrong node; slot-aware clients refresh slot map and retry on correct node
- Operational best practices: schedule during low-traffic windows, ensure cluster is stable (no other migrations), monitor slot fail count
- Avoid adding nodes during peak traffic — resharding causes temporary unavailability for keys being migrated
Expected answer points:
- Redis Sentinel: provides monitoring, notification, and automatic failover for a single master + replicas — does NOT shard data
- Redis Cluster: provides sharding (data distributed across multiple masters), replication per shard, and automatic failover — handles both scaling and HA
- Choose Sentinel when: dataset fits on single node (~50GB), you need HA for reads/writes to one master, can tolerate external proxy or client-side redirect logic
- Choose Cluster when: dataset exceeds single node capacity, need write scaling (multiple masters), want native sharding with slot-aware clients
- Sentinel limitations: no data sharding, all writes go to single master, cluster-wide operations require external tooling
- Cluster limitations: cannot guarantee ordering across slots, multi-key operations limited to same slot, transactions across slots not supported
Expected answer points:
- Eager warming: pre-populate cache from database before serving traffic — no cold-start latency for users but delays cluster startup
- Lazy warming: serve traffic immediately, populate cache on demand — users experience cache misses initially but cluster starts faster
- Proactive warming approach: use access logs to identify hot keys (top N by request count), batch-fetch from DB, populate cache before going live
- Node addition warming: scan existing nodes for hot keys, fetch values, populate new node with same TTL — monitor hit rate to validate
- Trade-offs: eager wastes resources on keys that may never be accessed; lazy causes initial latency spike and DB overload risk
- Hybrid approach: warm top 1000-5000 hot keys eagerly (covers 80% of traffic), lazy-warm everything else
Expected answer points:
- XFetch formula: probability = beta × (age / original_ttl)^2 — as key ages toward expiration, probability of early refresh increases
- When probability triggers, one requestor synchronously refreshes cache; others continue using stale value until refresh completes
- Advantage: no coordination overhead (no locks), keeps cache warm naturally, only triggers refresh when approaching expiration
- Preference for XFetch when: latency tolerance exists (stale OK briefly), lock infrastructure unavailable, want simplicity over strict guarantees
- Lock-based fill advantage: guarantees exactly one DB request per expired key, no stale reads during regeneration
- Preference for locks when: staleness completely unacceptable, can tolerate lock acquisition latency, need strict per-key guarantees
- Hybrid: use probabilistic early refresh as first line of defense, fall back to lock-based fill if cache miss occurs
Expected answer points:
- Invalidation must cascade: delete from L1 locally + delete from L2 in Redis + purge from CDN
- For L1 (in-process): local deletion is immediate; no cross-node coordination needed
- For L2 (Redis): delete key + publish invalidation event via pub/sub; all instances subscribed to invalidation channel delete the key
- For L3 (CDN): use CDN purge API (e.g., Cloudflare API, Akamai purge) after Redis invalidation completes
- Ordering matters: invalidate L1/L2 first (fast), then CDN purge (slower) — ensures subsequent requests don't repopulate from stale CDN
- For strict consistency: implement version numbers or ETags; include version in cache key; changing data bumps version and naturally makes old keys stale
- Alternative: use short TTL on CDN responses (5-10 min) so stale content naturally expires while invalidation cascade runs
Expected answer points:
- Network latency: cross-region RTT can be 100-200ms+ — replication lag increases, client timeouts more frequent
- Cluster topology: Redis Cluster does not natively support multi-region — all nodes must be in low-latency network; cross-region requires application-level sharding
- Replication: async by default — writes in one region may take time to replicate to another; potential for data loss during partition
- Failover complexity: network partition between regions may cause split-brain or prolonged unavailability
- Alternative approaches: separate Redis clusters per region with application-level replication; use Redis Enterprise Global (multi-master); use CRDT-based solutions
- Monitoring: set up alerts for replication lag per region, track cross-region network latency, monitor cluster_slots_fail for network issues
- Cost trade-offs: multi-region provides disaster recovery but increases operational complexity and cost significantly
Expected answer points:
- Detection: use Redis MONITOR or slow log to identify keys with high access counts; analyze with redis-cli --bigkeys or MEMORY USAGE
- Hot key identification formula: hot_key_probability = (requests_per_key / total_requests) × number_of_nodes
- Mitigation strategy 1 - Key replication: split hot key into multiple keys (user:123:shard0, user:123:shard1...) and randomize replica selection
- Mitigation strategy 2 - Replica splitting: replicate hot key to all nodes; route requests to random replica to spread load
- Mitigation strategy 3 - Dedicated hot key cluster: move hot keys to separate single-node Redis with replica read scaling
- Application-level: use consistent hashing with hot key awareness — manually pin hot keys to nodes with lower utilization
- Monitoring: track hit rate per node, alert on node-level imbalance, use Redis INFO stats for keyspace_hits per database
Expected answer points:
- Authentication: use Redis ACLs (not just requirepass) — create users with minimal permissions (e.g., app user can only GET/SET/DEL on app:* keys)
- TLS: enable encrypted connections between clients and cluster nodes; also encrypt inter-node replication traffic
- Network isolation: bind cluster nodes to private network interfaces only (bind 10.0.1.1); never expose cluster on public IPs
- Key namespacing: use prefixes (app:user:, app:product:) to prevent collisions when multiple applications share cluster
- Sentinel security: Sentinel instances require authentication; they control master election so compromised Sentinel = cluster compromise
- Monitoring: watch for unauthorized topology changes (CLUSTER SETSLOT, CLUSTER meet), monitor failed authentication attempts
- Backup: regular cluster state backup (redis-cli --cluster save or rdbcopy); test restoration procedure
- Compliance: for PCI/DSS or HIPAA, enable AOF persistence, TLS, ACLs, and audit logging of all commands
Expected answer points:
- Async replication: write completes on primary after local write, replica updated later — fast writes but potential data loss if primary fails
- Sync replication: write completes only after replica acknowledges — no data loss but higher latency, any replica failure blocks writes
- Redis Cluster uses async replication by default — replicas may lag up to a few hundred milliseconds during normal operation
- Consistency implications: async = eventual consistency, reads from replicas may return stale data; writes may be lost if primary fails before replica update
- Mitigation: track replication lag (master_repl_offset vs slave_repl_offset) and alert when lag exceeds threshold (e.g., 100ms)
- For stronger guarantees: use WAIT command to wait for replica acknowledgment (synchronous behavior) — blocks until N replicas acknowledge
- Trade-off: WAIT improves durability but adds latency; unsuitable for latency-sensitive workloads; still eventual consistency if majority fails
Expected answer points:
- Data capacity: 10GB working set + 30% headroom = ~13GB raw capacity needed per replica; with 1 replica per master, total 20GB across master+replica
- Read throughput: Redis handles ~100k ops/sec per node comfortably; 3-node cluster (1 master + 1 replica each) handles 100k reads if read from replicas
- Write throughput: writes go to master; single master handles all writes — partition writes across multiple keys to scale write throughput
- Cluster configuration: 3 masters × 16GB each = 48GB raw, 33% replication overhead, 1.2x fragmentation = ~26GB effective, 70% usable = ~18GB — exceeds 10GB requirement
- Memory estimation formula: usable_capacity = (sum of node memory) × (1 - replication_overhead) × (1 - fragmentation_factor) × 0.7
- Hot keys consideration: if hot keys exist, replicate them across nodes or use key spreading; monitor per-node load distribution
- Growth buffer: plan for 6-12 months growth; if 10GB is current, plan for 30GB nodes to accommodate growth without immediate resharding
Further Reading
- Redis Cluster Tutorial — Official scaling documentation
- Memcached Consistent Hashing — How memcached distributes keys across nodes
- Cache Stampede Problem — Academic paper on probabilistic early expiration (XFetch)
- Redis Sentinel Documentation — High availability setup for Redis
- AWS ElastiCache Best Practices — Production cluster management guide
- Netflix Tech Blog: EVCache — Real-world distributed cache deployment at scale
Conclusion
Distributed caching solves scale and availability problems but introduces complexity. Start with a single node, add replication for read scaling, move to clustering when you need more memory or write throughput.
The coherence problem is real. Pick a strategy that matches your consistency requirements. Eventual consistency via pub/sub invalidation works for most use cases. Strong consistency is expensive.
Monitor everything. In distributed systems, you cannot eyeball whether the cache is healthy.
Best Practices Summary
These are the things I see teams trip over most often in production. Skim them now so you’re not debugging cache issues at midnight.
Architecture: Prefer Redis Cluster for clusters with more than 10 nodes or datasets larger than 50GB — it handles sharding and failover natively so you don’t have to. Use consistent hashing with virtual nodes (150+ replicas per node) so that adding or removing nodes only remaps a fraction of your keys. Never run a single Sentinel — deploy at least 3 on separate machines with quorum = (sentinels / 2) + 1. And test that your slot-aware client actually handles MOVED redirects correctly; I’ve seen clusters fail gracefully while applications crumbled because clients cached stale slot maps.
Coherence: Pub/Sub invalidation is eventually consistent — messages can be lost, full stop. Pair it with local deletion on write so the common case works without waiting for pub/sub delivery. Use periodic reconciliation to catch the cases pub/sub misses. Write-through all nodes is tempting if you need strong consistency but it falls apart fast as you scale — one slow node poisons everything.
Performance: Probabilistic early expiration (XFetch) keeps cache warm without locks or coordination overhead. Lock-based cache fill prevents concurrent database hits but adds latency for whoever wins the race. Read from replicas for read-heavy workloads; keep the primary for writes. And set alerts for replication lag — if replicas fall more than 100ms behind, you’re serving stale data without knowing it.
Operations: Keep 30% memory headroom at all times. Cache eviction under pressure causes unpredictable latency spikes that are hard to debug. Slot migration is disruptive — schedule during low-traffic windows, never during peak. Hot keys deserve attention: if 1% of keys receive 50% of your traffic, replicate those keys or split them across nodes.
Security: Use Redis ACLs, not just requirepass. TLS between cluster nodes keeps inter-node traffic private. Bind cluster nodes to private networks only. And namespace your keys with prefixes to avoid collisions when multiple applications share a cluster.
Category
Related Posts
Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming
A comprehensive guide to advanced cache patterns — thundering herd, cache stampede prevention with distributed locking and probabilistic early expiration, and cache warming strategies.
Caching Strategies: A Practical Guide
Learn the main caching patterns — cache-aside, write-through, write-behind, and refresh-ahead — plus how to pick TTLs, invalidate stale data, and distribute caches across nodes.
Cache Stampede Prevention: Protecting Your Cache
Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.