Caching Strategies: Cache-Aside, Write-Through, and More
Master five caching strategies for production systems. Learn cache-aside vs write-through, avoid cache stampede, and scale with these patterns.
Caching Strategies
Caching works. When done right, it takes latency from hundreds of milliseconds down to microseconds and keeps your database from melting under load. Get it wrong, though, and you’ll spend weeks chasing stale data bugs that only show up at 3am during peak traffic.
This guide covers five caching strategies, when each one makes sense, and the real trade-offs between them.
Why Caching Matters
Most applications have data that changes rarely but gets hit constantly. User profiles, product listings, config values, session data. Without caching, every request for this data pounds the database, even when nothing’s changed since last Tuesday.
The numbers tell the story:
| Approach | Typical Latency | Requests per Second (per node) |
|---|---|---|
| Database query | 5-50ms | 1,000-10,000 |
| Cache hit | 0.1-1ms | 100,000-1,000,000 |
| Cache miss (with cache) | 5-51ms | Same as database |
Here’s the thing though: a cache that serves stale data is worse than no cache. And a cache that needs constant babysitting to stay valid is just overhead you don’t need.
The Five Caching Strategies
Cache-Aside (Lazy Loading)
This is what most people mean when they say “caching.” Your application checks the cache first, loads from the database on a miss, then populates the cache for next time.
sequenceDiagram
participant Client
participant Cache
participant Database
Client->>Cache: GET user:123
Cache-->>Client: Cache miss
Client->>Database: SELECT * FROM users WHERE id = 123
Database-->>Client: User data
Client->>Cache: SET user:123 (ttl=3600)
Cache-->>Client: OK
Client->>Cache: GET user:123
Cache-->>Client: User data (cached)
Implementation:
def get_user(user_id):
# Try cache first
cache_key = f"user:{user_id}"
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss - load from database
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# Populate cache with TTL
redis.setex(cache_key, 3600, json.dumps(user))
return user
Write operation:
def update_user(user_id, data):
# Update database first
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
# Invalidate cache
redis.delete(f"user:{user_id}")
Pros:
- Simple to implement
- Cache only contains data that’s actually requested
- No cache stampede on startup (cold cache is expected)
- Easy to reason about
Cons:
- First request after cache miss is always slow
- Cache and database can temporarily diverge (eventual consistency)
- Three network round-trips on cache miss (check, read, write)
Use this when: reads dominate your workload and you can live with brief inconsistency.
Read-Through (Cache Enrichment)
Same idea as cache-aside, but the cache library handles the miss logic for you. You just ask the cache for data; it fetches from the database automatically if needed.
sequenceDiagram
participant Client
participant Cache
participant Database
Client->>Cache: GET user:123
Cache->>Cache: Check in-memory store
alt Cache miss
Cache->>Database: SELECT * FROM users WHERE id = 123
Database-->>Cache: User data
Cache->>Cache: Store in memory
end
Cache-->>Client: User data
Implementation with Redis:
def get_user_cached(user_id):
cache_key = f"user:{user_id}"
# Check if loader is registered
user = redis.get(cache_key)
if user:
return json.loads(user)
# Using Redis Functions or Lua script for atomic read-through
# This is handled by the cache layer itself
return None
Most caching libraries (like Spring Cache, Django’s cache framework, Go’s groupcache) implement read-through natively:
// Using groupcache (read-through implementation)
var db Database
sc := groupcache.NewGetter("http://cache-server/", &db)
func getUser(ctx context.Context, userID int64) (*User, error) {
var user User
key := fmt.Sprintf("user:%d", userID)
err := sc.Get(ctx, key, &user)
return &user, err
}
Pros:
- Cleaner application code
- Reduced latency on cache miss (cache fetches in parallel with other requests)
- Cache handles the fetch-and-store atomically
Cons:
- Less control over cache logic
- All caches must implement the same pattern
- Can mask cache behavior from developers
Use this when: you want caching to be infrastructure, not application logic.
Write-Through
Every write goes to cache and database together. The operation doesn’t return until both succeed.
sequenceDiagram
participant Client
participant Cache
participant Database
Client->>Cache: SET user:123
Cache->>Database: UPDATE users SET ...
Database-->>Cache: OK
Cache-->>Client: OK
Implementation:
def update_user(user_id, data):
# Write to cache AND database
cache_key = f"user:{user_id}"
# Start transaction
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
# Write-through to cache
redis.setex(cache_key, 3600, json.dumps(data))
return data
Pros:
- Strong consistency between cache and database
- Cache is always warm with latest data
- No cache invalidation logic needed
Cons:
- Write latency increases (two writes instead of one)
- Cache can be knocked out by write-heavy workloads
- Cache might be populated with data that’s never read
Use this when: consistency matters more than write speed and your writes are infrequent relative to reads.
Write-Behind (Write-Back)
You write to the cache and it batches the database writes to happen later, in the background.
sequenceDiagram
participant Client
participant Cache
participant Database
participant WriteBuffer
Client->>Cache: SET user:123
Cache->>WriteBuffer: Queue write
Cache-->>Client: OK (fast)
Note over WriteBuffer: Background worker
WriteBuffer->>Database: Batch UPDATE
Database-->>WriteBuffer: OK
Implementation:
import asyncio
from collections import deque
class WriteBehindCache:
def __init__(self, redis, db, batch_size=100, flush_interval=1.0):
self.redis = redis
self.db = db
self.write_queue = deque()
self.batch_size = batch_size
self.flush_interval = flush_interval
asyncio.create_task(self._flush_loop())
async def set(self, key, value):
self.redis.setex(key, 3600, json.dumps(value))
self.write_queue.append((key, value))
if len(self.write_queue) >= self.batch_size:
await self._flush()
async def _flush(self):
if not self.write_queue:
return
batch = []
while self.write_queue and len(batch) < self.batch_size:
batch.append(self.write_queue.popleft())
# Batch write to database
for key, value in batch:
self.db.execute(
"UPDATE users SET ... WHERE id = ?",
value['id'],
value
)
async def _flush_loop(self):
while True:
await asyncio.sleep(self.flush_interval)
await self._flush()
Pros:
- Very low write latency
- Batching reduces database load
- Cache handles burst writes gracefully
Cons:
- Risk of data loss if cache fails before flush
- Complexity in handling partial failures
- Cache and database can significantly diverge
- Harder to debug (writes happen asynchronously)
Use this when: you’re collecting metrics or events and losing a few writes won’t ruin your day.
Refresh-Ahead (Proactive Caching)
The cache automatically refreshes entries before they expire. Popular data stays perpetually warm, so users never hit a cache miss.
sequenceDiagram
participant Cache
participant Database
participant Refresher
Note over Cache: Entry TTL = 300s
Refresher->>Cache: Check TTL
Refresher->>Database: SELECT (background)
Refresher->>Cache: SET (reset TTL)
loop Every 60 seconds
Refresher->>Cache: Check popular entries
Refresher->>Database: Refresh if TTL < 60s
end
Implementation:
import time
from threading import Thread
class RefreshAheadCache:
def __init__(self, redis, db, ttl=300, refresh_threshold=0.8):
self.redis = redis
self.db = db
self.ttl = ttl
self.refresh_threshold = refresh_threshold
self.popular_keys = set()
# Background refresher thread
self.running = True
self.thread = Thread(target=self._refresh_loop)
self.thread.start()
def track_access(self, key):
"""Track frequently accessed keys"""
self.popular_keys.add(key)
def get(self, key):
value = self.redis.get(key)
if value:
self.track_access(key)
return json.loads(value)
return None
def _should_refresh(self, key):
"""Check if key needs proactive refresh"""
ttl = self.redis.ttl(key)
return ttl > 0 and ttl < (self.ttl * self.refresh_threshold)
def _refresh_loop(self):
while self.running:
for key in list(self.popular_keys):
if self._should_refresh(key):
# Refresh in background
data = self.db.query(
"SELECT * FROM users WHERE id = ?",
key.split(':')[1]
)
self.redis.setex(key, self.ttl, json.dumps(data))
time.sleep(10) # Check every 10 seconds
Pros:
- Eliminates cache miss latency for popular items
- Users never wait for cache to repopulate
- Smoother performance under varying loads
Cons:
- Wasted resources refreshing items not actually needed
- Complexity in tracking truly popular keys
- Risk of refreshing stale data
- Additional logic to determine refresh threshold
Use this when: you have a known set of hot data and read latency matters more than wasted cycles.
Choosing the Right Strategy
| Strategy | Read Performance | Write Performance | Consistency | Complexity |
|---|---|---|---|---|
| Cache-Aside | Good (after miss) | Best | Eventual | Low |
| Read-Through | Good | Same as DB | Eventual | Low |
| Write-Through | Good | Good | Strong | Medium |
| Write-Behind | Good | Best | Eventual | High |
| Refresh-Ahead | Best | Same | Near-strong | High |
How to decide
Which latency matters more, reads or writes?
- Reads: cache-aside, read-through, or refresh-ahead
- Writes: write-behind or write-through
How synced do cache and database need to be?
- Tight consistency: write-through
- Eventual is fine: cache-aside or write-behind
What happens if the cache goes down before flushing?
- Can’t lose writes: write-through
- A few lost writes are okay: write-behind
Is access predictable?
- Unpredictable: cache-aside
- Known hot set: refresh-ahead
Common Pitfalls
Cache Stampede
When a popular entry expires, multiple requests pile up trying to rebuild it simultaneously.
Fix with probabilistic early expiration:
def get_with_stampede_protection(key, fetch_func, beta=1.0):
value = redis.get(key)
if value:
return json.loads(value)
# Probabilistic early expiration
ttl = redis.ttl(key)
if ttl > 0:
# Add jitter to prevent thundering herd
expected = time.time() + ttl * beta
if random.random() < beta:
# Refresh in background
return fetch_func(key)
return fetch_func(key)
Or use a lock:
def get_with_lock(key, fetch_func, lock_timeout=5):
value = redis.get(key)
if value:
return json.loads(value)
# Try to acquire lock
lock = redis.lock(f"lock:{key}", timeout=lock_timeout)
if lock.acquire(blocking=True, blocking_timeout=1):
try:
# Double-check after acquiring lock
value = redis.get(key)
if value:
return json.loads(value)
return fetch_func(key)
finally:
lock.release()
# Didn't get lock, wait and retry
time.sleep(0.1)
return redis.get(key)
Cache Penetration
Requests for things that don’t exist skip the cache and hammer the database.
Fix by caching the null:
def get_user_cached(user_id):
cache_key = f"user:{user_id}"
# Check cache
value = redis.get(cache_key)
if value == "NULL": # Cached null marker
return None
if value:
return json.loads(value)
# Load from database
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
if user:
redis.setex(cache_key, 3600, json.dumps(user))
else:
# Cache null for shorter period
redis.setex(cache_key, 60, "NULL")
return user
Production Failure Scenarios
Understanding what fails and how to recover is critical for production caching systems.
| Failure | Impact | Mitigation |
|---|---|---|
| Cache node goes down | All requests hit database directly, potential cascade failure | Use connection pooling with automatic retry; implement circuit breaker pattern |
| Cache memory exhausted | Eviction kicks in aggressively, hit rate drops to 0% | Monitor memory usage; set appropriate maxmemory limits; alert at 70% threshold |
| Network partition between app and cache | Requests hang or timeout; database overload | Set reasonable socket timeouts (100-500ms); fail gracefully to database |
| Thundering herd on cache restart | All clients hit database simultaneously | Pre-warm cache on restart; use staggered TTLs; implement request coalescing |
| Stale data served after write | User sees outdated content | Use write-through for consistency-critical data; implement cache invalidation on writes |
| Cache credential rotation | Brief outage or authentication failures | Use connection pooling with lazy reconnection; rotate during low-traffic windows |
Observability Checklist
Monitor these metrics and set up alerts for production cache health.
Metrics to Track
- Hit Rate:
hits / (hits + misses)- should stay above 80-90% for well-tuned caches - Memory Usage:
used_memory / maxmemory- alert at 70%, critical at 80% - Eviction Count:
evicted_keys- indicates memory pressure - Connection Count:
connected_clients- sudden drops indicate connection issues - Command Latency: P50, P95, P99 for GET/SET operations
- Replication Lag: For replicated setups, lag should stay below 100ms
Logs to Capture
# Log cache operations for debugging
import structlog
logger = structlog.get_logger()
def get_user(user_id):
cache_key = f"user:{user_id}"
start = time.time()
cached = redis.get(cache_key)
if cached:
logger.info("cache_hit", key=cache_key, latency_ms=(time.time() - start) * 1000)
return json.loads(cached)
# Cache miss - this should be rare in production
logger.warning("cache_miss", key=cache_key)
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
redis.setex(cache_key, 3600, json.dumps(user))
return user
Alert Rules
# Prometheus alert rules for Redis
- alert: CacheHitRateLow
expr: redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) < 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Cache hit rate below 80%"
- alert: CacheMemoryExhausted
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
for: 2m
labels:
severity: critical
annotations:
summary: "Cache memory above 80% capacity"
Security Checklist
Cache security is often overlooked until a breach happens.
- Never expose Redis/Memcached directly to the internet - Bind to localhost or private network only
- Use authentication - Redis
requirepassor Memcached SASL authentication - Enable TLS - For connections crossing network boundaries
- Validate key namespaces - Use prefixes like
app:env:table:to prevent key collisions - Sanitize cache keys - User input should never become cache keys without validation
- Implement rate limiting - Prevent cache exhaustion attacks
- Audit cache access - Log who accessed what, especially for sensitive data
- Never cache sensitive data - PII, passwords, tokens, payment info should never enter the cache
# Redis secure configuration
bind 127.0.0.1 -::1
requirepass your-strong-password-here
tls-replication yes
tls-auth-clients no
Common Pitfalls / Anti-Patterns
These mistakes catch teams at scale. Avoid them.
1. Caching Without Measuring
Adding Redis doesn’t automatically make things faster. Measure before and after.
# BAD: Blind caching
def get_user(user_id):
return redis.get(f"user:{user_id}") or db.query(...)
# GOOD: Measure first, cache second
def get_user(user_id):
cache_key = f"user:{user_id}"
cached = redis.get(cache_key)
if cached:
metrics.increment("cache.hit")
return json.loads(cached)
metrics.increment("cache.miss")
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
redis.setex(cache_key, 3600, json.dumps(user))
return user
2. Using Cache as Primary Store
The cache is not the source of truth. The database is.
# BAD: Cache as primary - data loss on cache failure
def update_user(user_id, data):
redis.setex(f"user:{user_id}", 3600, json.dumps(data)) # Database not updated!
return data
# GOOD: Write to database first, invalidate cache
def update_user(user_id, data):
db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
redis.delete(f"user:{user_id}") # Invalidate, don't update
return data
3. No TTLs on Data
Unlimited TTLs mean stale data and unbounded memory growth.
# BAD: No expiration
redis.set(f"user:{user_id}", json.dumps(user))
# GOOD: Always set reasonable TTLs
redis.setex(f"user:{user_id}", 3600, json.dumps(user)) # 1 hour default
redis.setex(f"session:{session_id}", 86400, json.dumps(session)) # 24 hours for sessions
4. Ignoring Cache Stampede
The database can handle one request rebuilding a cache entry. It cannot handle 1000 simultaneous requests.
# BAD: No stampede protection
def get_user(user_id):
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# This runs for ALL concurrent requests on cache miss
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
# GOOD: Lock-based stampede protection
def get_user(user_id):
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
lock = redis.lock(f"lock:user:{user_id}", timeout=5)
if lock.acquire(blocking=True, blocking_timeout=1):
try:
# Double-check after acquiring lock
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
finally:
lock.release()
# Didn't get lock - wait and retry
time.sleep(0.1)
return redis.get(f"user:{user_id}")
5. Cache Key Collisions
Generic key names cause collisions in shared cache infrastructure.
# BAD: Generic keys
redis.set("user", user_data) # Collides with other "user" keys
redis.set("config", config) # Collides system-wide
# GOOD: Namespaced keys
redis.setex(f"myapp:prod:user:{user_id}", 3600, json.dumps(user_data))
redis.setex(f"myapp:prod:config:{env}", 86400, json.dumps(config))
Quick Recap
Key Bullets
- Cache-aside is the default strategy for most read-heavy workloads
- Write-through ensures strong consistency but increases write latency
- Write-behind batches writes for performance but risks data loss
- Refresh-ahead eliminates misses for popular items but adds complexity
- Always implement stampede protection when cache misses could cascade
- Monitor hit rate, memory usage, and eviction counts continuously
Copy/Paste Checklist
# Cache-Aside Implementation Checklist
- [ ] Check cache first (redis.get)
- [ ] On miss, query database
- [ ] Populate cache with TTL (redis.setex)
- [ ] On write, invalidate cache (redis.delete), don't update
- [ ] Implement stampede protection with locks
- [ ] Cache null values with short TTL to prevent penetration
- [ ] Monitor hit rate - should be >80%
- [ ] Set appropriate TTLs based on data freshness requirements
- [ ] Log cache hits and misses for observability
- [ ] Use circuit breaker for cache failures
# TTL Selection Guide
- [ ] User profiles: 15-60 minutes
- [ ] Session data: 24 hours
- [ ] API responses: 5-30 minutes
- [ ] Static config: 1-24 hours
- [ ] Product catalog: 1-24 hours
- [ ] Real-time data: No caching or very short TTL (30-60 seconds)
See Also
- Cache Eviction Policies — LRU, LFU, FIFO, and when each makes sense
- Redis & Memcached — The tools behind these strategies
- Cache Patterns — Stampede protection, tiered caching, and more
- Distributed Caching — Scaling out across multiple nodes
Capacity Estimation: Cache Size vs Hit Rate
The relationship between cache size and hit rate is not linear. Adding more cache memory gives diminishing returns beyond a certain point.
The working set model: your hit rate depends on how much of your frequently-accessed data fits in cache. If 80% of your requests hit 20% of your data, and that 20% fits in cache, you can achieve 95%+ hit rate with relatively small cache. If access is uniformly distributed, even a large cache provides modest hit rates.
The formula for estimating required cache size: working_set_bytes = unique_keys_per_second * avg_value_size * avg_ttl_seconds. If you have 10,000 requests per second, average value is 1KB, and you want a 5-minute TTL window, your working set is 10,000 1,000 300 = 3GB minimum for a fully-utilized cache before evictions. In practice, you need 1.5-2x that because LRU/LFU policies do not perfectly track the working set.
The hit rate curve: start at 0% hit rate with no cache, rapid climb as cache grows to cover the hot working set, then diminishing returns as cache size exceeds working set. Plot your hit rate against cache size to find the knee of the curve — the point where adding more cache stops helping significantly. This is your target cache size.
For cache-aside specifically, the miss penalty matters more than raw hit rate. A cache miss does a full database round-trip. If your database latency is 10ms and cache latency is 0.5ms, each miss costs 9.5ms extra. At 99% hit rate, only 1% of requests pay the miss penalty. At 95% hit rate, 5% pay it — a 5x increase in slow queries.
Real-World Case Study: YouTube’s Cache Hierarchy
YouTube’s caching infrastructure is one of the most studied in the industry. Their approach uses multiple cache layers: L1 (in-memory, per-machine), L2 (distributed cache), and CDN at the edge. Understanding their architecture explains why most companies settle on a two-tier approach.
YouTube’s L1 cache is a small in-memory cache on each application server. It handles the most frequently accessed items — popular videos, trending content. L1 hit rate alone is often 50-60% because many users on the same machine access the same popular content.
The L2 distributed cache (originally Memcached, later moved to custom infrastructure) handles cache misses from L1. L2 is sharded across many machines to provide petabyte-scale capacity. Cache misses from L2 go to storage (BigTable).
The CDN handles the edge, serving popular content from points of presence close to users. YouTube’s CDN cache hit rate is over 90% for video streaming — once a video becomes popular, it propagates to CDN PoPs and subsequent requests rarely hit origin.
The lesson: YouTube does not rely on a single cache tier. They use L1 to handle the ultra-hot set with extremely low latency, L2 for the warm cache, and CDN for the long tail of popular-but-not-ultra-popular content. Most companies should design for two tiers (local cache + distributed cache) before adding a CDN.
Real-World Case Study: Twitter’s Cache Warming Strategy
Twitter has a unique caching challenge: events (tweets, likes, follows) have a short window of high read traffic, then traffic drops off a cliff. A tweet from a celebrity gets millions of reads in the first hour, then readership drops to hundreds per day.
Twitter’s solution is aggressive cache warming: when a tweet is published, Twitter pushes it into the timelines of active followers’ caches rather than waiting for cache misses. This is the fanout-on-write pattern — write to caches at publish time rather than computing at read time.
The tradeoff is write amplification. Every tweet from a celebrity with 10 million followers requires 10 million cache writes. Twitter manages this by limiting fanout to active users only and using hybrid push/pull for lower-activity accounts. Inactive users’ timelines are computed on read from the tweet author’s tweet store.
The operational lesson: cache warming trades write amplification for read latency. For content with rapid decay in read traffic (news, social posts, live events), warming the cache at write time reduces read latency at the cost of higher write overhead. For evergreen content, cache-aside with long TTLs is simpler and more efficient.
Conclusion
There is no best strategy. Cache-aside is the default because it covers the most cases with the least complexity. But you’ll encounter situations where write-through or refresh-ahead fits better.
Start simple. Measure your hit rate. Add complexity only when data tells you to.
Category
Related Posts
Cache Eviction Policies: LRU, LFU, FIFO, and More Explained
Learn LRU, LFU, FIFO, and TTL eviction policies. Understand trade-offs with real-world performance implications for caching.
Cache Patterns: Stampede, Thundering Herd, Tiered Caching
Learn advanced cache patterns for production systems. Solve cache stampede, implement cache warming, and design tiered caching architectures.
Distributed Caching: Multi-Node Cache Clusters
Scale caching across multiple nodes. Learn about cache clusters, consistency models, session stores, and cache coherence patterns.