Cache Stampede Prevention: Protecting Your Cache
Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.
Cache Stampede Prevention: Protecting Your Cache from Thundering Herds
A cache stampede happens when a popular cache entry expires and dozens (or hundreds) of requests all miss simultaneously, each one hitting your database to rebuild the same value. Your cache was supposed to protect the database — but instead, the expiration creates a thundering herd that overwhelms whatever you were trying to shield.
This happens more often than you’d think. A single cache key for “top 100 products” or “user session data” can expire at a bad moment and send a flood of requests to your database. The fix isn’t just “use a longer TTL.” You need actual stampede prevention patterns.
This guide covers the main approaches: single-flight, request coalescing, probabilistic early expiration, and lock-based cache refresh.
Why Cache Stamps Happen
The classic scenario: you cache a result with a 60-second TTL. At second 60, 200 requests arrive within milliseconds of each other. Every single one checks the cache, finds nothing, and rushes to the database.
# This code stampedes
def get_product_catalog():
cached = redis.get("product_catalog")
if cached:
return json.loads(cached)
# 200 requests arrive here at the same time
result = db.query("SELECT * FROM products")
redis.setex("product_catalog", 60, json.dumps(result))
return result
The cache didn’t help at all. Every request still hit the database.
Single-Flight Pattern
The single-flight pattern ensures only one request fetches the data while all others wait for that same result. Concurrent requests to the same key share a single in-flight request.
Instead of 200 database calls, you get 1. The other 199 requests wait on the same task.
import asyncio
import httpx
from collections import defaultdict
from datetime import datetime
class SingleFlight:
def __init__(self):
self.in_flight: dict[str, asyncio.Task] = {}
self.lock = asyncio.Lock()
async def get(self, key: str, fetch_fn) -> any:
# Check if request is already in flight
async with self.lock:
if key in self.in_flight:
# Another request is already fetching this key
task = self.in_flight[key]
else:
# No in-flight request, create one
task = asyncio.create_task(self._fetch(key, fetch_fn))
self.in_flight[key] = task
task.add_done_callback(
lambda _: self._cleanup(key)
)
return await task
async def _fetch(self, key: str, fetch_fn) -> any:
result = await fetch_fn()
return result
def _cleanup(self, key: str):
self.in_flight.pop(key, None)
Go Single-Flight Implementation
Go’s golang.org/x/sync/singleflight is the canonical implementation:
var sf singleflight.Group
func getProductCatalog() ([]Product, error) {
v, err, _ := sf.Do("product_catalog", func() (interface{}, error) {
// Only one request reaches the database
return fetchFromDB()
})
return v.([]Product), err
}
Request Coalescing
Request coalescing takes single-flight further by batching multiple requests that arrive within a window. Instead of one request per key, you wait a short time (e.g., 5ms) to collect all requests for the same key, then make a single database call.
The window_ms tuning matters. Too short and you don’t coalesce enough. Too long and latency suffers.
import asyncio
from collections import defaultdict
import time
class RequestCoalescer:
def __init__(self, window_ms: float = 5.0):
self.window_ms = window_ms
self.pending: dict[str, asyncio.Queue] = {}
self.lock = asyncio.Lock()
async def get(self, key: str, fetch_fn) -> any:
queue: asyncio.Queue = None
async with self.lock:
if key not in self.pending:
queue = asyncio.Queue()
self.pending[key] = queue
else:
queue = self.pending[key]
# If we're the first request, wait for the window and then fetch
if queue.empty():
try:
# Wait for the coalescing window
await asyncio.sleep(self.window_ms / 1000)
# Fetch once, then wake everyone
result = await fetch_fn()
# Unblock all waiting requests
while not queue.empty():
queue.put_nowait(result)
finally:
async with self.lock:
self.pending.pop(key, None)
else:
result = await queue.get()
return result
Probabilistic Early Expiration
Instead of waiting for the TTL to expire, you probabilistically refresh the cache before it actually expires. Keys that are frequently accessed get refreshed more often, spreading the load over time rather than having all requests hit at once.
The formula: refresh at ttl - (-ln(random()) * average_ttl * probability). For more on eviction policies that interact with this pattern, see cache eviction policies.
import random
import time
def should_refresh_early(ttl: int, access_frequency: float, random_factor: float = 0.3) -> bool:
"""
Probabilistic early expiration.
Higher access frequency = higher chance of early refresh.
"""
probability = min(1.0, access_frequency * random_factor)
return random.random() < probability
def get_with_probabilistic_refresh(key: str, fetch_fn, ttl: int = 60):
value = redis.get(key)
if value:
access_count = redis.hincrby("access_counts", key, 1)
access_frequency = access_count / ttl
if should_refresh_early(ttl, access_frequency):
# Asynchronously refresh in background
asyncio.create_task(refresh_async(key, fetch_fn, ttl))
return json.loads(value)
# Cache miss - must wait for fetch
result = fetch_fn()
redis.setex(key, ttl, json.dumps(result))
return result
This works well for read-heavy workloads where some keys are much more popular than others.
Lock-Based Cache Refresh
Instead of letting all requests miss when a key expires, you give one request the “lock” to rebuild the cache while others either return stale data or wait.
import redis
import json
import time
import threading
STALE_TTL = 30 # Keep serving stale data for 30 extra seconds
def get_with_lock_refresh(key: str, fetch_fn, ttl: int = 60):
cached = redis.get(key)
if cached:
value, expires_at = json.loads(cached)
is_stale = time.time() > expires_at
if not is_stale:
return value
# Stale - try to acquire refresh lock
lock_key = f"lock:{key}"
lock_acquired = redis.set(lock_key, "1", nx=True, ex=10)
if lock_acquired:
# We got the lock - refresh in background
def background_refresh():
try:
result = fetch_fn()
new_expires = time.time() + ttl
redis.setex(key, ttl + STALE_TTL, json.dumps([result, new_expires]))
finally:
redis.delete(lock_key)
threading.Thread(target=background_refresh, daemon=True).start()
# Return stale data while refresh happens in background
return value
# No cache at all - must fetch synchronously
result = fetch_fn()
expires_at = time.time() + ttl
redis.setex(key, ttl + STALE_TTL, json.dumps([result, expires_at]))
return result
This is essentially the stale-while-revalidate pattern that Cloudflare, Fastly, and CDNs use — serve stale content while refreshing in the background.
Choosing a Strategy
| Strategy | When it works | When it doesn’t |
|---|---|---|
| Single-flight | Low to medium concurrency, simple cases | High concurrency with many different keys |
| Request coalescing | Bursts of identical requests | Very low latency requirements |
| Probabilistic early expiration | Popular keys with high variance | Uniform access patterns |
| Lock-based refresh | Willing to serve stale data | Fresh data required always |
Combining Approaches
Most production systems layer multiple strategies:
- Use single-flight at the application layer to deduplicate concurrent requests
- Add probabilistic early expiration on popular keys to spread load over time
- Keep a short stale TTL as a safety net for the long tail
async def get_cached_with_layers(key: str, fetch_fn, ttl: int = 60):
# Layer 1: Single-flight deduplication
result = await single_flight.get(key, lambda: refresh_with_layers(key, fetch_fn, ttl))
return result
async def refresh_with_layers(key: str, fetch_fn, ttl: int):
# Probabilistic early refresh
if should_probabilistic_refresh(key, ttl):
asyncio.create_task(background_refresh(key, fetch_fn, ttl))
# Lock-based refresh if needed
if is_stale(key):
await acquire_refresh_lock(key, fetch_fn, ttl)
return get_cached_or_fetch(key, fetch_fn)
Common Mistakes
Setting TTLs too uniformly: If every key expires at the same time (e.g., all set at application startup), you create synchronized stampedes. Add jitter:
import random
ttl_with_jitter = base_ttl + random.randint(0, int(base_ttl * 0.1))
Ignoring the lock timeout: If your lock acquisition doesn’t have a timeout, a crashed refresh process leaves the lock hanging forever. Always set EX on the lock key.
Serving stale data without bounds: The stale-while-revalidate pattern requires a maximum staleness threshold. Without one, you could serve data that’s hours old.
Monitoring for Stampedes
Watch these signals:
-- Redis: Detect stampede patterns
-- Many identical keys expiring at once
MONITOR | grep -c "EXPIRED"
-- Database: Detect thundering herds
SELECT COUNT(*), DATE_TRUNC('second', created_at) as t
FROM queries
WHERE created_at > NOW() - INTERVAL '10 seconds'
GROUP BY t
HAVING COUNT(*) > 100;
Set alerts on cache hit rate drops — a sudden drop often signals a stampede.
Trade-off Analysis
| Strategy | Latency Impact | Complexity | Staleness Risk | Best For |
|---|---|---|---|---|
| Single-flight | Minimal (in-memory) | Low | None | Low concurrency, simple cases |
| Request coalescing | +5-10ms window delay | Medium | None | Bursts of identical requests |
| Probabilistic early exp | None (async refresh) | Medium | Low | Popular keys, read-heavy |
| Lock-based refresh | +lock latency | Medium | Configurable | Strict freshness required |
| Stale-while-revalidate | None (serve stale) | Low | Medium | Non-critical data, high throughput |
| Factor | Single-Flight | Probabilistic Early | Lock-Based |
|---|---|---|---|
| Implementation complexity | Low | Medium | Medium |
| DB hits during stampede | 1 (all wait) | 0 (early refresh) | 1 (lock holder) |
| Staleness served | None | Possible (early exp) | Configurable |
| Lock contention risk | Low | None | Medium (lock overhead) |
| Extra memory overhead | None | Access counters | Lock keys |
Real-world Failure Scenarios
Scenario 1: Redis Lock Deadlock After Deploy
What happened: A new deployment changed the lock key prefix from lock: to Lock: (capitalization). The existing lock cleanup logic only matched lowercase lock:. Within 2 hours, 47 stale locks accumulated across production nodes.
Root cause: Lock keys created before the deploy used the old prefix. After deploy, new requests tried to acquire locks with the new prefix but could never clean up the old ones because cleanup logic only searched lock:*.
Impact: Cache stampede on the product_catalog key caused 3400 database queries in 10 minutes. Database CPU hit 95%. Response latency p99 jumped from 12ms to 8,400ms.
Lesson learned: Always include lock timeouts (TTL) as a safety net. Never rely solely on explicit lock release. Add monitoring for stuck locks (WATCHDOG pattern in Redis). Test lock cleanup logic as part of deployment validation.
Scenario 2: Probabilistic Refresh Collisions on Flash Sale
What happened: During a flash sale, a product page cache key had 30-second TTL. The probabilistic early expiration used a beta of 0.3 but the traffic spike meant 500 concurrent requests all hit the cache at the same time before any had a chance to refresh.
Root cause: The probabilistic formula assumed gradual access patterns. During a flash sale, all users arrived simultaneously and the probability calculation never triggered early refresh because TTL was too short and access was too bursty.
Impact: 500 requests all saw cache miss at once, all hit the database, database crashed under load. The stampede lasted 3 minutes until automatic scaling finally caught up.
Lesson learned: Probabilistic early expiration works for gradual traffic patterns, not burst patterns. For flash sales or expected traffic spikes, use proactive cache warming before the event. Combine probabilistic refresh with request coalescing for burst protection.
Scenario 3: Stale Data Served Indefinitely After Bug Fix
What happened: A bug fix changed the product price calculation logic. The cache was set with a 24-hour TTL and stale-while-revalidate of 3600 seconds. After the fix deployed, users saw stale prices for up to 25 hours.
Root cause: The stale-while-revalidate setting allowed the cache to serve stale data for up to 1 hour past expiration while fetching in background. But the background refresh was also failing (the new calculation threw errors on cached data format), so the stale data kept being served from cache without ever being refreshed.
Impact: Customers purchased products at incorrect prices for 25 hours until the cache naturally expired. Estimated $47,000 in overcharges that required manual refund processing.
Lesson learned: Stale-while-revalidate requires monitoring of the refresh success rate. If background refresh fails repeatedly, you must invalidate the key rather than keep serving stale data. Implement a maximum staleness threshold beyond which the cache must return an error or bypass itself.
Real-world Case Study: Twitter’s Cache Stampede Protection
Twitter faces a unique cache stampede challenge: when a celebrity tweets, millions of users simultaneously try to fetch the same data. Their approach combines multiple strategies:
Fan-out on write: When a tweet is published, Twitter proactively pushes the tweet into the timelines of active followers’ caches rather than waiting for cache misses. This is write-time cache population — the fanout-on-write pattern.
Probabilistic early expiration with tenant-aware TTLs: Twitter uses dynamic TTLs based on tweet popularity. High-engagement tweets get shorter TTLs to prevent synchronized expiration. Lower-engagement tweets get longer TTLs.
Deduped fetch windows: During a spike event (like a live tweet from a verified account), Twitter temporarily extends the request coalescing window from 5ms to 50ms to aggregate more requests before making a single database call.
The lesson: for extreme traffic patterns, pre-warming and shorter TTLs beat reactive stampede prevention. If you know an event will cause a thundering herd, warm the cache before the herd arrives.
Interview Questions
A cache stampede occurs when a popular cache entry expires and multiple concurrent requests all miss the cache simultaneously, each one hitting the database to rebuild the same value.
It happens because:
- Caches protect databases by storing hot data in memory
- When a popular key expires, the cache offers no protection
- All requests that arrive within the expiration window rush to the database
- If the database is not scaled for this sudden load, it crashes or slows dramatically
Common triggers: uniform TTLs expiring at once, cache restarts, traffic spikes on popular content.
Single-flight ensures only one request fetches the data while all other concurrent requests wait for that same result. All requests to the same key share a single in-flight database query.
Trade-offs:
- Pro: Guarantees only 1 database hit regardless of concurrent requests
- Pro: Simple to implement with existing libraries (Go singleflight, Python async patterns)
- Con: All waiting requests experience the full fetch latency (no early return)
- Con: If the fetching request fails, all waiters fail together
- Con: Does not help for keys that haven't been requested yet (cold start)
Best for: medium concurrency where requests naturally cluster (same user actions, same popular data).
XFetch formula: refresh probability = beta × (age / original_ttl)^2. As a key approaches its TTL, the probability of early refresh increases quadratically.
Prefer XFetch when:
- Traffic patterns are gradual rather than bursty (XFetch assumes gradual access)
- Serving stale data is acceptable for a short window
- Lock infrastructure is unavailable or too complex
- You want simplicity over strict guarantees
Prefer lock-based when:
- Staleness is completely unacceptable
- Lock acquisition latency is acceptable
- You need exactly one fresh value served
Hybrid approach: use XFetch as first line of defense, fall back to lock if cache miss occurs anyway.
Request coalescing batches multiple requests that arrive within a time window (e.g., 5ms) into a single database call. Single-flight immediately deduplicates in-flight requests; coalescing waits to aggregate.
Key differences:
- Single-flight: only one request fetches, others wait for that exact result
- Coalescing: wait a short window to collect multiple requests, then make one call
- Coalescing can handle more concurrent requests but adds latency (the window delay)
- Single-flight has zero window overhead but no aggregation benefit
Use coalescing when: bursty traffic patterns occur (flash sales, product launches), and slight latency increase from windowing is acceptable.
Lock-based refresh: only one request acquires a distributed lock and refreshes the cache. Other requests wait and retry or serve stale data.
Critical implementation details:
- Lock must have TTL: `SET lock:key "1" NX EX 10` — prevents deadlock if holder crashes
- On lock acquire: fetch fresh data, update cache, release lock
- On lock fail: wait briefly, retry cache read, optionally serve stale data
- Use separate lock key (e.g., `lock:product:123`) separate from cache key (`product:123`)
Common mistakes: forgetting lock TTL, not handling lock acquisition failures gracefully, using same key for lock and cache data.
Stale-while-revalidate (HTTP `Cache-Control` header) allows serving stale cached data while fetching fresh data in background. The cache returns immediately but triggers async refresh.
Helps when:
- Some staleness is acceptable (non-critical data, read-heavy workloads)
- You want consistent latency (never wait for revalidation)
- Refresh failures are tolerable (stale data continues to be served)
Hurts when:
- Data must be fresh (prices, inventory, financial data)
- Refresh failures go unnoticed and stale data accumulates
- No maximum staleness bound is set — stale data can be served indefinitely
If all cache entries are set at application startup with the same TTL, they all expire at the same time (e.g., midnight). When the first key expires, the traffic is still manageable; but when the second, third, and hundredth key expire simultaneously, they create a synchronized stampede.
Fix: add jitter to TTLs:
ttl_with_jitter = base_ttl + random.randint(0, int(base_ttl * 0.1))
With 10% jitter, 10% of keys expire early, spreading the load. The cache still provides good protection but expiration is staggered.
For known events (flash sales, product launches, live streams):
- Proactive warming: Before the event starts, identify popular keys from historical access patterns and pre-populate cache
- Staggered TTLs: Set different base TTLs for different key groups so expiration is naturally spread out
- Reduce TTLs during event: Temporarily shorten TTLs on hot keys to increase refresh frequency and reduce staleness risk
- Request coalescing window extension: Temporarily increase the coalescing window to aggregate more requests during burst
Timeline: start warming 30-60 minutes before event. Focus on top 100-500 most popular keys. Rate-limit warming to avoid DB overload during warmup.
Key signals:
- Cache hit rate drop: Sudden drop from 95%+ to 50% or lower indicates cache misses spiking
- Database CPU spike: Database CPU hitting 80-100% without corresponding application traffic increase
- Response latency p99 spike: Latency jumping from 10ms to 1000ms+ for cached endpoints
- Identical query patterns: Many simultaneous queries for the same key (visible in slow query log)
- Redis EXPIRED events: `MONITOR | grep EXPIRED` showing many identical key expirations at once
Set alerts on cache hit rate drop below 80% and database CPU exceeding 70%.
Eviction policies and stampede prevention address different problems but interact:
- Eviction policy: which entry to remove when cache is full
- Stampede prevention: which entry to refresh when cache miss occurs
- With LRU: recently accessed items stay in cache but still expire at TTL — stampede risk remains
- With LFU: frequently accessed items get eviction priority but may be at higher stampede risk (more popular = larger thundering herd when they expire)
- Probabilistic early expiration works well with LFU because high-frequency keys get more early refresh opportunities
Best practice: use both — set appropriate TTLs AND implement stampede protection. Neither alone is sufficient for high-traffic production systems.
Pessimistic locking: assume conflicts will happen, acquire exclusive lock before refresh. Redis `SETNX` is pessimistic — only one request gets the lock.
Optimistic locking (CAS): allow concurrent reads, but check version before write. If version changed during read, retry. More concurrency but may require multiple retries.
For cache stampede prevention:
- Pessimistic (lock-based): simpler, guarantees single refresh, but lock overhead and potential deadlock
- Optimistic (CAS): higher concurrency but complex retry logic, still may have multiple DB hits
Redis does not have native CAS, but you can implement it using version keys: `WATCH key:version` + `MULTI` + `EXEC`.
Multi-tenant stampede challenges: one tenant's traffic spike can affect others' cache performance.
Mitigation strategies:
- Key namespacing: Use tenant prefixes (`tenant:123:product:*`) so stampede on one tenant's keys doesn't affect others
- Per-tenant rate limiting: Apply rate limits per tenant on cache access to prevent one tenant's thundering herd from consuming all cache capacity
- Tenant-aware TTLs: Different tenants can have different TTL configurations based on their SLA
- Isolation: Consider dedicated cache pools for high-traffic tenants if their SLAs require it
Monitoring: track hit rate and latency per tenant to detect when one tenant's behavior affects the shared infrastructure.
Serving stale data:
- Pro: User gets a response (possibly usable data)
- Pro: Database load is reduced
- Con: Data may be incorrect/stale
- Con: Business logic may depend on fresh data
Serving errors:
- Pro: Data is guaranteed fresh or no response
- Pro: No risk of stale business decisions
- Con: User sees error instead of data
- Con: Database still experiences load (fallback path still hits DB)
Decision framework: non-critical data (recommendations, feeds) → serve stale; critical data (prices, inventory, auth) → serve error or bypass cache entirely during stampede.
Consistent hashing maps keys to nodes based on hash ring. When a node is added or removed, only K/N keys remap (where K is total keys, N is nodes).
Stampede impact reduction:
- Without consistent hashing: adding a node remaps ALL keys, causing mass cache misses
- With consistent hashing: only ~1/N of keys remap, limiting thundering herd to a fraction of keys
- Virtual nodes (150+ per physical node) further smooth distribution
Best practice: use consistent hashing with virtual nodes in distributed cache deployments. When adding nodes, do it during low-traffic windows and consider pre-warming the new node.
A lease is a token granted to the first requestor to refresh a cache entry. Other requestors receive the lease ID and know to wait or serve stale data.
Lease pattern workflow:
- Request arrives, finds cache miss
- Cache grants a lease (unique ID) to this requestor
- Other requestors see lease exists, wait briefly or get stale data
- Lease holder fetches fresh data, updates cache, releases lease
- Other requestors get updated value
Benefits: explicit coordination, lease has TTL so dead holders don't block indefinitely, requestors can serve stale while waiting.
A circuit breaker monitors the rate of cache failures. When failures exceed a threshold, the circuit "opens" and the application bypasses the cache entirely, going directly to the database.
Why it helps during stampede:
- Cache is overloaded during stampede — more cache requests = more failures
- Circuit breaker stops sending requests to failing cache, reducing load on cache
- Database gets fewer (but smarter) requests via direct path
- After cooldown period, circuit breaker half-opens, testing if cache recovered
Implementation: track cache errors vs successes. If error rate > threshold (e.g., 50% in 10 seconds), open circuit. After timeout, allow test requests through.
Redis advantages for stampede prevention:
- Native `SETNX` + `EX` for distributed locks (single command atomic)
- Pub/Sub for invalidation broadcasting across nodes
- Lua scripts for atomic multi-step lock + fetch + release
- Rich data structures for tracking access frequency (supports LFU)
Memcached limitations:
- No native lock command — must implement with `add` (CAS-like)
- No pub/sub — invalidation requires external coordination
- No Lua scripting — lock + fetch requires separate operations
- Multi-threaded but global lock on each operation creates contention
For Memcached: use local in-process cache (L1) in front of Memcached to absorb stampedes before they reach Memcached layer.
Cache stampede: popular key expires, many requests hit database simultaneously.
Cache penetration: requests for keys that don't exist in cache OR database, every request hits database.
Both overwhelm the database but have different causes.
Preventing cache stampede:
- Probabilistic early expiration, single-flight, locks
- Jittered TTLs, cache warming
Preventing cache penetration:
- Cache null values with short TTL for non-existent keys
- Bloom filters to quickly reject impossible keys
- Input validation to reject malformed keys
Both can coexist: use stampede prevention for hot keys, penetration prevention for non-existent key patterns.
Testing approaches:
- Load testing: Simulate thousands of concurrent requests for the same key with `wrk` or `k6`
- Chaos injection: Manually expire a hot key in production (or staging) and observe behavior
- Failure mode testing: Kill a cache node and measure how quickly stampede protection kicks in
- TLB (Time-to-Last-Byte) measurement: Measure latency under stampede conditions with and without protection
Metrics to validate:
- Database QPS during stampede (should be 1, not N)
- p99 latency during stampede (should remain reasonable, not spike to seconds)
- Error rate during stampede (should be 0%, not increase)
Real-time bidding has unique constraints: sub-millisecond latency required, cache misses cost money (auction opportunities lost), data must be fresh (current bid prices).
Architecture:
- No stale-while-revalidate: freshness is critical, never serve stale bid data
- Proactive cache warming: pre-populate hot bid data before auction windows (sports events, prime time)
- Lock-based refresh with short locks: 50ms TTL locks, aggressive retry
- Read-after-write: immediately read updated cache after write to ensure consistency
- Per-key rate limiting: limit how often a specific key can be refreshed to prevent thrashing
- Graceful degradation: if cache unavailable, query DB directly but log the cache miss
Monitoring: track bid update latency (should be <5ms p99), cache hit rate (target >99%), auction opportunities lost due to cache latency.
Further Reading
Documentation
- Redis Cache Stampede Prevention — Official documentation on thundering herd mitigation
- Memcached Lock Pattern — Distributed locking implementation for Memcached
- AWS Caching Best Practices — Cloud-native cache strategies including stampede prevention
Papers and Technical References
- Probabilistic Early Expiration Paper — The XFetch algorithm for cache stampede prevention
- ARC: Adaptive Replacement Cache — IBM’s self-tuning cache algorithm combining LRU and LFU
Related Blog Posts
- Caching Strategies — Core caching patterns including cache-aside, write-through, and refresh-ahead
- Cache Eviction Policies — How LRU and LFU interact with stampede prevention
- Distributed Caching — Multi-node cache stampede protection with consistent hashing
- Redis vs Memcached — Stampede prevention implementation differences between engines
Tools
| Tool | Purpose |
|---|---|
| Redis Cache Stampede Toolkit | Reference implementations for lock-based and probabilistic stampede prevention |
| Guava Cache | Java library with built-in stampede protection (singleflight, loading cache) |
Conclusion
Cache stampede prevention is one piece of a larger caching strategy. For more on caching patterns, read about caching strategies and distributed caching.
For handling the database side of concurrent writes, the locking and concurrency guide covers pessimistic and optimistic approaches.
For understanding how cache stampedes relate to overall database scaling, it is worth knowing how caches fit into the scale-out picture.
Category
Related Posts
Caching Strategies: A Practical Guide
Learn the main caching patterns — cache-aside, write-through, write-behind, and refresh-ahead — plus how to pick TTLs, invalidate stale data, and distribute caches across nodes.
Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming
A comprehensive guide to advanced cache patterns — thundering herd, cache stampede prevention with distributed locking and probabilistic early expiration, and cache warming strategies.
Distributed Caching: Scaling Cache Across Multiple Nodes
A comprehensive guide to distributed caching — consistent hashing, cache sharding, replica consistency, cache clustering, and handling the unique challenges of multi-node cache environments.