Cache Stampede Prevention: Protecting Your Cache
Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.
Cache Stampede Prevention: Protecting Your Cache from Thundering Herds
A cache stampede happens when a popular cache entry expires and dozens (or hundreds) of requests all miss simultaneously, each one hitting your database to rebuild the same value. Your cache was supposed to protect the database — but instead, the expiration creates a thundering herd that overwhelms whatever you were trying to shield.
This happens more often than you’d think. A single cache key for “top 100 products” or “user session data” can expire at a bad moment and send a flood of requests to your database. The fix isn’t just “use a longer TTL.” You need actual stampede prevention patterns.
This guide covers the main approaches: single-flight, request coalescing, probabilistic early expiration, and lock-based cache refresh.
Why Cache Stamps Happen
The classic scenario: you cache a result with a 60-second TTL. At second 60, 200 requests arrive within milliseconds of each other. Every single one checks the cache, finds nothing, and rushes to the database.
# This code stampedes
def get_product_catalog():
cached = redis.get("product_catalog")
if cached:
return json.loads(cached)
# 200 requests arrive here at the same time
result = db.query("SELECT * FROM products")
redis.setex("product_catalog", 60, json.dumps(result))
return result
The cache didn’t help at all. Every request still hit the database.
Single-Flight Pattern
The single-flight pattern ensures only one request fetches the data while all others wait for that same result. Concurrent requests to the same key share a single in-flight request.
Instead of 200 database calls, you get 1. The other 199 requests wait on the same task.
import asyncio
import httpx
from collections import defaultdict
from datetime import datetime
class SingleFlight:
def __init__(self):
self.in_flight: dict[str, asyncio.Task] = {}
self.lock = asyncio.Lock()
async def get(self, key: str, fetch_fn) -> any:
# Check if request is already in flight
async with self.lock:
if key in self.in_flight:
# Another request is already fetching this key
task = self.in_flight[key]
else:
# No in-flight request, create one
task = asyncio.create_task(self._fetch(key, fetch_fn))
self.in_flight[key] = task
task.add_done_callback(
lambda _: self._cleanup(key)
)
return await task
async def _fetch(self, key: str, fetch_fn) -> any:
result = await fetch_fn()
return result
def _cleanup(self, key: str):
self.in_flight.pop(key, None)
Go Single-Flight Implementation
Go’s golang.org/x/sync/singleflight is the canonical implementation:
var sf singleflight.Group
func getProductCatalog() ([]Product, error) {
v, err, _ := sf.Do("product_catalog", func() (interface{}, error) {
// Only one request reaches the database
return fetchFromDB()
})
return v.([]Product), err
}
Request Coalescing
Request coalescing takes single-flight further by batching multiple requests that arrive within a window. Instead of one request per key, you wait a short time (e.g., 5ms) to collect all requests for the same key, then make a single database call.
The window_ms tuning matters. Too short and you don’t coalesce enough. Too long and latency suffers.
import asyncio
from collections import defaultdict
import time
class RequestCoalescer:
def __init__(self, window_ms: float = 5.0):
self.window_ms = window_ms
self.pending: dict[str, asyncio.Queue] = {}
self.lock = asyncio.Lock()
async def get(self, key: str, fetch_fn) -> any:
queue: asyncio.Queue = None
async with self.lock:
if key not in self.pending:
queue = asyncio.Queue()
self.pending[key] = queue
else:
queue = self.pending[key]
# If we're the first request, wait for the window and then fetch
if queue.empty():
try:
# Wait for the coalescing window
await asyncio.sleep(self.window_ms / 1000)
# Fetch once, then wake everyone
result = await fetch_fn()
# Unblock all waiting requests
while not queue.empty():
queue.put_nowait(result)
finally:
async with self.lock:
self.pending.pop(key, None)
else:
result = await queue.get()
return result
Probabilistic Early Expiration
Instead of waiting for the TTL to expire, you probabilistically refresh the cache before it actually expires. Keys that are frequently accessed get refreshed more often, spreading the load over time rather than having all requests hit at once.
The formula: refresh at ttl - (-ln(random()) * average_ttl * probability). For more on eviction policies that interact with this pattern, see cache eviction policies.
import random
import time
def should_refresh_early(ttl: int, access_frequency: float, random_factor: float = 0.3) -> bool:
"""
Probabilistic early expiration.
Higher access frequency = higher chance of early refresh.
"""
probability = min(1.0, access_frequency * random_factor)
return random.random() < probability
def get_with_probabilistic_refresh(key: str, fetch_fn, ttl: int = 60):
value = redis.get(key)
if value:
access_count = redis.hincrby("access_counts", key, 1)
access_frequency = access_count / ttl
if should_refresh_early(ttl, access_frequency):
# Asynchronously refresh in background
asyncio.create_task(refresh_async(key, fetch_fn, ttl))
return json.loads(value)
# Cache miss - must wait for fetch
result = fetch_fn()
redis.setex(key, ttl, json.dumps(result))
return result
This works well for read-heavy workloads where some keys are much more popular than others.
Lock-Based Cache Refresh
Instead of letting all requests miss when a key expires, you give one request the “lock” to rebuild the cache while others either return stale data or wait.
import redis
import json
import time
import threading
STALE_TTL = 30 # Keep serving stale data for 30 extra seconds
def get_with_lock_refresh(key: str, fetch_fn, ttl: int = 60):
cached = redis.get(key)
if cached:
value, expires_at = json.loads(cached)
is_stale = time.time() > expires_at
if not is_stale:
return value
# Stale - try to acquire refresh lock
lock_key = f"lock:{key}"
lock_acquired = redis.set(lock_key, "1", nx=True, ex=10)
if lock_acquired:
# We got the lock - refresh in background
def background_refresh():
try:
result = fetch_fn()
new_expires = time.time() + ttl
redis.setex(key, ttl + STALE_TTL, json.dumps([result, new_expires]))
finally:
redis.delete(lock_key)
threading.Thread(target=background_refresh, daemon=True).start()
# Return stale data while refresh happens in background
return value
# No cache at all - must fetch synchronously
result = fetch_fn()
expires_at = time.time() + ttl
redis.setex(key, ttl + STALE_TTL, json.dumps([result, expires_at]))
return result
This is essentially the stale-while-revalidate pattern that Cloudflare, Fastly, and CDNs use — serve stale content while refreshing in the background.
Choosing a Strategy
| Strategy | When it works | When it doesn’t |
|---|---|---|
| Single-flight | Low to medium concurrency, simple cases | High concurrency with many different keys |
| Request coalescing | Bursts of identical requests | Very low latency requirements |
| Probabilistic early expiration | Popular keys with high variance | Uniform access patterns |
| Lock-based refresh | Willing to serve stale data | Fresh data required always |
Combining Approaches
Most production systems layer multiple strategies:
- Use single-flight at the application layer to deduplicate concurrent requests
- Add probabilistic early expiration on popular keys to spread load over time
- Keep a short stale TTL as a safety net for the long tail
async def get_cached_with_layers(key: str, fetch_fn, ttl: int = 60):
# Layer 1: Single-flight deduplication
result = await single_flight.get(key, lambda: refresh_with_layers(key, fetch_fn, ttl))
return result
async def refresh_with_layers(key: str, fetch_fn, ttl: int):
# Probabilistic early refresh
if should_probabilistic_refresh(key, ttl):
asyncio.create_task(background_refresh(key, fetch_fn, ttl))
# Lock-based refresh if needed
if is_stale(key):
await acquire_refresh_lock(key, fetch_fn, ttl)
return get_cached_or_fetch(key, fetch_fn)
Common Mistakes
Setting TTLs too uniformly: If every key expires at the same time (e.g., all set at application startup), you create synchronized stampedes. Add jitter:
import random
ttl_with_jitter = base_ttl + random.randint(0, int(base_ttl * 0.1))
Ignoring the lock timeout: If your lock acquisition doesn’t have a timeout, a crashed refresh process leaves the lock hanging forever. Always set EX on the lock key.
Serving stale data without bounds: The stale-while-revalidate pattern requires a maximum staleness threshold. Without one, you could serve data that’s hours old.
Monitoring for Stampedes
Watch these signals:
-- Redis: Detect stampede patterns
-- Many identical keys expiring at once
MONITOR | grep -c "EXPIRED"
-- Database: Detect thundering herds
SELECT COUNT(*), DATE_TRUNC('second', created_at) as t
FROM queries
WHERE created_at > NOW() - INTERVAL '10 seconds'
GROUP BY t
HAVING COUNT(*) > 100;
Set alerts on cache hit rate drops — a sudden drop often signals a stampede.
Next Steps
Cache stampede prevention is one piece of a larger caching strategy. For more on caching patterns, read about caching strategies and distributed caching.
For handling the database side of concurrent writes, the locking and concurrency guide covers pessimistic and optimistic approaches.
For understanding how cache stampedes relate to overall database scaling, it’s worth knowing how caches fit into the scale-out picture.
Category
Related Posts
Cache Eviction Policies: LRU, LFU, FIFO, and More Explained
Learn LRU, LFU, FIFO, and TTL eviction policies. Understand trade-offs with real-world performance implications for caching.
Caching Strategies: Cache-Aside, Write-Through, and More
Master five caching strategies for production systems. Learn cache-aside vs write-through, avoid cache stampede, and scale with these patterns.
Distributed Caching: Multi-Node Cache Clusters
Scale caching across multiple nodes. Learn about cache clusters, consistency models, session stores, and cache coherence patterns.