Cache Stampede Prevention: Protecting Your Cache

Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.

published: March 26, 2026 reading time: 24 min read author: GeekWorkBench updated: May 15, 2026

Quick Summary

Cache stampedes happen when a popular cache entry expires and dozens of requests all miss simultaneously, each hitting your database to rebuild the same value. The single-flight pattern ensures only one request fetches the data while others wait for that result. Request coalescing batches multiple requests arriving within a window to make a single database call. Probabilistic early expiration refreshes cache before TTL expires based on access frequency, spreading load over time rather than having all requests hit at once.

Cache Stampede Prevention: Protecting Your Cache from Thundering Herds

A cache stampede happens when a popular cache entry expires and dozens (or hundreds) of requests all miss simultaneously, each one hitting your database to rebuild the same value. Your cache was supposed to protect the database — but instead, the expiration creates a thundering herd that overwhelms whatever you were trying to shield.

This happens more often than you’d think. A single cache key for “top 100 products” or “user session data” can expire at a bad moment and send a flood of requests to your database. The fix isn’t just “use a longer TTL.” You need actual stampede prevention patterns.

This guide covers the main approaches: single-flight, request coalescing, probabilistic early expiration, and lock-based cache refresh.

Why Cache Stamps Happen

The classic scenario: you cache a result with a 60-second TTL. At second 60, 200 requests arrive within milliseconds of each other. Every single one checks the cache, finds nothing, and rushes to the database.

# This code stampedes
def get_product_catalog():
    cached = redis.get("product_catalog")
    if cached:
        return json.loads(cached)

    # 200 requests arrive here at the same time
    result = db.query("SELECT * FROM products")
    redis.setex("product_catalog", 60, json.dumps(result))
    return result

The cache didn’t help at all. Every request still hit the database.

Single-Flight Pattern

The single-flight pattern ensures only one request fetches the data while all others wait for that same result. Concurrent requests to the same key share a single in-flight request.

Instead of 200 database calls, you get 1. The other 199 requests wait on the same task.

import asyncio
import httpx
from collections import defaultdict
from datetime import datetime

class SingleFlight:
    def __init__(self):
        self.in_flight: dict[str, asyncio.Task] = {}
        self.lock = asyncio.Lock()

    async def get(self, key: str, fetch_fn) -> any:
        # Check if request is already in flight
        async with self.lock:
            if key in self.in_flight:
                # Another request is already fetching this key
                task = self.in_flight[key]
            else:
                # No in-flight request, create one
                task = asyncio.create_task(self._fetch(key, fetch_fn))
                self.in_flight[key] = task
                task.add_done_callback(
                    lambda _: self._cleanup(key)
                )

        return await task

    async def _fetch(self, key: str, fetch_fn) -> any:
        result = await fetch_fn()
        return result

    def _cleanup(self, key: str):
        self.in_flight.pop(key, None)

Go Single-Flight Implementation

Go’s golang.org/x/sync/singleflight is the canonical implementation:

var sf singleflight.Group

func getProductCatalog() ([]Product, error) {
    v, err, _ := sf.Do("product_catalog", func() (interface{}, error) {
        // Only one request reaches the database
        return fetchFromDB()
    })
    return v.([]Product), err
}

Request Coalescing

Request coalescing takes single-flight further by batching multiple requests that arrive within a window. Instead of one request per key, you wait a short time (e.g., 5ms) to collect all requests for the same key, then make a single database call.

The window_ms tuning matters. Too short and you don’t coalesce enough. Too long and latency suffers.

import asyncio
from collections import defaultdict
import time

class RequestCoalescer:
    def __init__(self, window_ms: float = 5.0):
        self.window_ms = window_ms
        self.pending: dict[str, asyncio.Queue] = {}
        self.lock = asyncio.Lock()

    async def get(self, key: str, fetch_fn) -> any:
        queue: asyncio.Queue = None

        async with self.lock:
            if key not in self.pending:
                queue = asyncio.Queue()
                self.pending[key] = queue
            else:
                queue = self.pending[key]

        # If we're the first request, wait for the window and then fetch
        if queue.empty():
            try:
                # Wait for the coalescing window
                await asyncio.sleep(self.window_ms / 1000)

                # Fetch once, then wake everyone
                result = await fetch_fn()

                # Unblock all waiting requests
                while not queue.empty():
                    queue.put_nowait(result)
            finally:
                async with self.lock:
                    self.pending.pop(key, None)
        else:
            result = await queue.get()

        return result

Probabilistic Early Expiration

Instead of waiting for the TTL to expire, you probabilistically refresh the cache before it actually expires. Keys that are frequently accessed get refreshed more often, spreading the load over time rather than having all requests hit at once.

The formula: refresh at ttl - (-ln(random()) * average_ttl * probability). For more on eviction policies that interact with this pattern, see cache eviction policies.

import random
import time

def should_refresh_early(ttl: int, access_frequency: float, random_factor: float = 0.3) -> bool:
    """
    Probabilistic early expiration.
    Higher access frequency = higher chance of early refresh.
    """
    probability = min(1.0, access_frequency * random_factor)
    return random.random() < probability

def get_with_probabilistic_refresh(key: str, fetch_fn, ttl: int = 60):
    value = redis.get(key)
    if value:
        access_count = redis.hincrby("access_counts", key, 1)
        access_frequency = access_count / ttl

        if should_refresh_early(ttl, access_frequency):
            # Asynchronously refresh in background
            asyncio.create_task(refresh_async(key, fetch_fn, ttl))

        return json.loads(value)

    # Cache miss - must wait for fetch
    result = fetch_fn()
    redis.setex(key, ttl, json.dumps(result))
    return result

This works well for read-heavy workloads where some keys are much more popular than others.

Lock-Based Cache Refresh

Instead of letting all requests miss when a key expires, you give one request the “lock” to rebuild the cache while others either return stale data or wait.

import redis
import json
import time
import threading

STALE_TTL = 30  # Keep serving stale data for 30 extra seconds

def get_with_lock_refresh(key: str, fetch_fn, ttl: int = 60):
    cached = redis.get(key)
    if cached:
        value, expires_at = json.loads(cached)
        is_stale = time.time() > expires_at

        if not is_stale:
            return value

        # Stale - try to acquire refresh lock
        lock_key = f"lock:{key}"
        lock_acquired = redis.set(lock_key, "1", nx=True, ex=10)

        if lock_acquired:
            # We got the lock - refresh in background
            def background_refresh():
                try:
                    result = fetch_fn()
                    new_expires = time.time() + ttl
                    redis.setex(key, ttl + STALE_TTL, json.dumps([result, new_expires]))
                finally:
                    redis.delete(lock_key)

            threading.Thread(target=background_refresh, daemon=True).start()

        # Return stale data while refresh happens in background
        return value

    # No cache at all - must fetch synchronously
    result = fetch_fn()
    expires_at = time.time() + ttl
    redis.setex(key, ttl + STALE_TTL, json.dumps([result, expires_at]))
    return result

This is essentially the stale-while-revalidate pattern that Cloudflare, Fastly, and CDNs use — serve stale content while refreshing in the background.

Choosing a Strategy

Strategy	When it works	When it doesn’t
Single-flight	Low to medium concurrency, simple cases	High concurrency with many different keys
Request coalescing	Bursts of identical requests	Very low latency requirements
Probabilistic early expiration	Popular keys with high variance	Uniform access patterns
Lock-based refresh	Willing to serve stale data	Fresh data required always

Combining Approaches

Most production systems layer multiple strategies:

Use single-flight at the application layer to deduplicate concurrent requests
Add probabilistic early expiration on popular keys to spread load over time
Keep a short stale TTL as a safety net for the long tail

async def get_cached_with_layers(key: str, fetch_fn, ttl: int = 60):
    # Layer 1: Single-flight deduplication
    result = await single_flight.get(key, lambda: refresh_with_layers(key, fetch_fn, ttl))
    return result

async def refresh_with_layers(key: str, fetch_fn, ttl: int):
    # Probabilistic early refresh
    if should_probabilistic_refresh(key, ttl):
        asyncio.create_task(background_refresh(key, fetch_fn, ttl))

    # Lock-based refresh if needed
    if is_stale(key):
        await acquire_refresh_lock(key, fetch_fn, ttl)

    return get_cached_or_fetch(key, fetch_fn)

Common Mistakes

Setting TTLs too uniformly: If every key expires at the same time (e.g., all set at application startup), you create synchronized stampedes. Add jitter:

import random
ttl_with_jitter = base_ttl + random.randint(0, int(base_ttl * 0.1))

Ignoring the lock timeout: If your lock acquisition doesn’t have a timeout, a crashed refresh process leaves the lock hanging forever. Always set EX on the lock key.

Serving stale data without bounds: The stale-while-revalidate pattern requires a maximum staleness threshold. Without one, you could serve data that’s hours old.

Monitoring for Stampedes

Watch these signals:

-- Redis: Detect stampede patterns
-- Many identical keys expiring at once
MONITOR | grep -c "EXPIRED"

-- Database: Detect thundering herds
SELECT COUNT(*), DATE_TRUNC('second', created_at) as t
FROM queries
WHERE created_at > NOW() - INTERVAL '10 seconds'
GROUP BY t
HAVING COUNT(*) > 100;

Set alerts on cache hit rate drops — a sudden drop often signals a stampede.

Trade-off Analysis

Strategy	Latency Impact	Complexity	Staleness Risk	Best For
Single-flight	Minimal (in-memory)	Low	None	Low concurrency, simple cases
Request coalescing	+5-10ms window delay	Medium	None	Bursts of identical requests
Probabilistic early exp	None (async refresh)	Medium	Low	Popular keys, read-heavy
Lock-based refresh	+lock latency	Medium	Configurable	Strict freshness required
Stale-while-revalidate	None (serve stale)	Low	Medium	Non-critical data, high throughput

Factor	Single-Flight	Probabilistic Early	Lock-Based
Implementation complexity	Low	Medium	Medium
DB hits during stampede	1 (all wait)	0 (early refresh)	1 (lock holder)
Staleness served	None	Possible (early exp)	Configurable
Lock contention risk	Low	None	Medium (lock overhead)
Extra memory overhead	None	Access counters	Lock keys

Real-world Failure Scenarios

Scenario 1: Redis Lock Deadlock After Deploy

What happened: A new deployment changed the lock key prefix from lock: to Lock: (capitalization). The existing lock cleanup logic only matched lowercase lock:. Within 2 hours, 47 stale locks accumulated across production nodes.

Root cause: Lock keys created before the deploy used the old prefix. After deploy, new requests tried to acquire locks with the new prefix but could never clean up the old ones because cleanup logic only searched lock:*.

Impact: Cache stampede on the product_catalog key caused 3400 database queries in 10 minutes. Database CPU hit 95%. Response latency p99 jumped from 12ms to 8,400ms.

Lesson learned: Always include lock timeouts (TTL) as a safety net. Never rely solely on explicit lock release. Add monitoring for stuck locks (WATCHDOG pattern in Redis). Test lock cleanup logic as part of deployment validation.

Scenario 2: Probabilistic Refresh Collisions on Flash Sale

What happened: During a flash sale, a product page cache key had 30-second TTL. The probabilistic early expiration used a beta of 0.3 but the traffic spike meant 500 concurrent requests all hit the cache at the same time before any had a chance to refresh.

Root cause: The probabilistic formula assumed gradual access patterns. During a flash sale, all users arrived simultaneously and the probability calculation never triggered early refresh because TTL was too short and access was too bursty.

Impact: 500 requests all saw cache miss at once, all hit the database, database crashed under load. The stampede lasted 3 minutes until automatic scaling finally caught up.

Lesson learned: Probabilistic early expiration works for gradual traffic patterns, not burst patterns. For flash sales or expected traffic spikes, use proactive cache warming before the event. Combine probabilistic refresh with request coalescing for burst protection.

Scenario 3: Stale Data Served Indefinitely After Bug Fix

What happened: A bug fix changed the product price calculation logic. The cache was set with a 24-hour TTL and stale-while-revalidate of 3600 seconds. After the fix deployed, users saw stale prices for up to 25 hours.

Root cause: The stale-while-revalidate setting allowed the cache to serve stale data for up to 1 hour past expiration while fetching in background. But the background refresh was also failing (the new calculation threw errors on cached data format), so the stale data kept being served from cache without ever being refreshed.

Impact: Customers purchased products at incorrect prices for 25 hours until the cache naturally expired. Estimated $47,000 in overcharges that required manual refund processing.

Lesson learned: Stale-while-revalidate requires monitoring of the refresh success rate. If background refresh fails repeatedly, you must invalidate the key rather than keep serving stale data. Implement a maximum staleness threshold beyond which the cache must return an error or bypass itself.

Real-world Case Study: Twitter’s Cache Stampede Protection

Twitter faces a unique cache stampede challenge: when a celebrity tweets, millions of users simultaneously try to fetch the same data. Their approach combines multiple strategies:

Fan-out on write: When a tweet is published, Twitter proactively pushes the tweet into the timelines of active followers’ caches rather than waiting for cache misses. This is write-time cache population — the fanout-on-write pattern.

Probabilistic early expiration with tenant-aware TTLs: Twitter uses dynamic TTLs based on tweet popularity. High-engagement tweets get shorter TTLs to prevent synchronized expiration. Lower-engagement tweets get longer TTLs.

Deduped fetch windows: During a spike event (like a live tweet from a verified account), Twitter temporarily extends the request coalescing window from 5ms to 50ms to aggregate more requests before making a single database call.

The lesson: for extreme traffic patterns, pre-warming and shorter TTLs beat reactive stampede prevention. If you know an event will cause a thundering herd, warm the cache before the herd arrives.

Interview Questions

1. What is a cache stampede (thundering herd) and why does it happen?

A cache stampede occurs when a popular cache entry expires and multiple concurrent requests all miss the cache simultaneously, each one hitting the database to rebuild the same value.

It happens because:

Caches protect databases by storing hot data in memory
When a popular key expires, the cache offers no protection
All requests that arrive within the expiration window rush to the database
If the database is not scaled for this sudden load, it crashes or slows dramatically

Common triggers: uniform TTLs expiring at once, cache restarts, traffic spikes on popular content.

2. Describe the single-flight pattern and its trade-offs for stampede prevention.

Single-flight ensures only one request fetches the data while all other concurrent requests wait for that same result. All requests to the same key share a single in-flight database query.

Trade-offs:

Pro: Guarantees only 1 database hit regardless of concurrent requests
Pro: Simple to implement with existing libraries (Go singleflight, Python async patterns)
Con: All waiting requests experience the full fetch latency (no early return)
Con: If the fetching request fails, all waiters fail together
Con: Does not help for keys that haven't been requested yet (cold start)

Best for: medium concurrency where requests naturally cluster (same user actions, same popular data).

3. How does probabilistic early expiration (XFetch) work, and when should you prefer it over lock-based approaches?

XFetch formula: refresh probability = beta × (age / original_ttl)^2. As a key approaches its TTL, the probability of early refresh increases quadratically.

Prefer XFetch when:

Traffic patterns are gradual rather than bursty (XFetch assumes gradual access)
Serving stale data is acceptable for a short window
Lock infrastructure is unavailable or too complex
You want simplicity over strict guarantees

Prefer lock-based when:

Staleness is completely unacceptable
Lock acquisition latency is acceptable
You need exactly one fresh value served

Hybrid approach: use XFetch as first line of defense, fall back to lock if cache miss occurs anyway.

4. What is request coalescing and how does it differ from single-flight?

Request coalescing batches multiple requests that arrive within a time window (e.g., 5ms) into a single database call. Single-flight immediately deduplicates in-flight requests; coalescing waits to aggregate.

Key differences:

Single-flight: only one request fetches, others wait for that exact result
Coalescing: wait a short window to collect multiple requests, then make one call
Coalescing can handle more concurrent requests but adds latency (the window delay)
Single-flight has zero window overhead but no aggregation benefit

Use coalescing when: bursty traffic patterns occur (flash sales, product launches), and slight latency increase from windowing is acceptable.

5. How does lock-based cache refresh work, and what are the critical implementation details?

Lock-based refresh: only one request acquires a distributed lock and refreshes the cache. Other requests wait and retry or serve stale data.

Critical implementation details:

Lock must have TTL: `SET lock:key "1" NX EX 10` — prevents deadlock if holder crashes
On lock acquire: fetch fresh data, update cache, release lock
On lock fail: wait briefly, retry cache read, optionally serve stale data
Use separate lock key (e.g., `lock:product:123`) separate from cache key (`product:123`)

Common mistakes: forgetting lock TTL, not handling lock acquisition failures gracefully, using same key for lock and cache data.

6. What is stale-while-revalidate and when does it help or hurt?

Stale-while-revalidate (HTTP `Cache-Control` header) allows serving stale cached data while fetching fresh data in background. The cache returns immediately but triggers async refresh.

Helps when:

Some staleness is acceptable (non-critical data, read-heavy workloads)
You want consistent latency (never wait for revalidation)
Refresh failures are tolerable (stale data continues to be served)

Hurts when:

Data must be fresh (prices, inventory, financial data)
Refresh failures go unnoticed and stale data accumulates
No maximum staleness bound is set — stale data can be served indefinitely

7. Why does setting uniform TTLs cause cache stampedes and how do you fix it?

If all cache entries are set at application startup with the same TTL, they all expire at the same time (e.g., midnight). When the first key expires, the traffic is still manageable; but when the second, third, and hundredth key expire simultaneously, they create a synchronized stampede.

Fix: add jitter to TTLs:

ttl_with_jitter = base_ttl + random.randint(0, int(base_ttl * 0.1))

With 10% jitter, 10% of keys expire early, spreading the load. The cache still provides good protection but expiration is staggered.

8. How would you design a cache warming strategy before a known high-traffic event?

For known events (flash sales, product launches, live streams):

Proactive warming: Before the event starts, identify popular keys from historical access patterns and pre-populate cache
Staggered TTLs: Set different base TTLs for different key groups so expiration is naturally spread out
Reduce TTLs during event: Temporarily shorten TTLs on hot keys to increase refresh frequency and reduce staleness risk
Request coalescing window extension: Temporarily increase the coalescing window to aggregate more requests during burst

Timeline: start warming 30-60 minutes before event. Focus on top 100-500 most popular keys. Rate-limit warming to avoid DB overload during warmup.

9. What monitoring signals indicate a cache stampede is occurring?

Key signals:

Cache hit rate drop: Sudden drop from 95%+ to 50% or lower indicates cache misses spiking
Database CPU spike: Database CPU hitting 80-100% without corresponding application traffic increase
Response latency p99 spike: Latency jumping from 10ms to 1000ms+ for cached endpoints
Identical query patterns: Many simultaneous queries for the same key (visible in slow query log)
Redis EXPIRED events: `MONITOR | grep EXPIRED` showing many identical key expirations at once

Set alerts on cache hit rate drop below 80% and database CPU exceeding 70%.

10. How does cache stampede prevention interact with cache eviction policies (LRU, LFU)?

Eviction policies and stampede prevention address different problems but interact:

Eviction policy: which entry to remove when cache is full
Stampede prevention: which entry to refresh when cache miss occurs
With LRU: recently accessed items stay in cache but still expire at TTL — stampede risk remains
With LFU: frequently accessed items get eviction priority but may be at higher stampede risk (more popular = larger thundering herd when they expire)
Probabilistic early expiration works well with LFU because high-frequency keys get more early refresh opportunities

Best practice: use both — set appropriate TTLs AND implement stampede protection. Neither alone is sufficient for high-traffic production systems.

11. What is the difference between optimistic locking (CAS) and pessimistic locking for cache stampede prevention?

Pessimistic locking: assume conflicts will happen, acquire exclusive lock before refresh. Redis `SETNX` is pessimistic — only one request gets the lock.

Optimistic locking (CAS): allow concurrent reads, but check version before write. If version changed during read, retry. More concurrency but may require multiple retries.

For cache stampede prevention:

Pessimistic (lock-based): simpler, guarantees single refresh, but lock overhead and potential deadlock
Optimistic (CAS): higher concurrency but complex retry logic, still may have multiple DB hits

Redis does not have native CAS, but you can implement it using version keys: `WATCH key:version` + `MULTI` + `EXEC`.

12. How would you implement stampede protection in a multi-tenant environment where different tenants share cache infrastructure?

Multi-tenant stampede challenges: one tenant's traffic spike can affect others' cache performance.

Mitigation strategies:

Key namespacing: Use tenant prefixes (`tenant:123:product:*`) so stampede on one tenant's keys doesn't affect others
Per-tenant rate limiting: Apply rate limits per tenant on cache access to prevent one tenant's thundering herd from consuming all cache capacity
Tenant-aware TTLs: Different tenants can have different TTL configurations based on their SLA
Isolation: Consider dedicated cache pools for high-traffic tenants if their SLAs require it

Monitoring: track hit rate and latency per tenant to detect when one tenant's behavior affects the shared infrastructure.

13. What are the trade-offs between serving stale data versus serving errors during a cache stampede?

Serving stale data:

Pro: User gets a response (possibly usable data)
Pro: Database load is reduced
Con: Data may be incorrect/stale
Con: Business logic may depend on fresh data

Serving errors:

Pro: Data is guaranteed fresh or no response
Pro: No risk of stale business decisions
Con: User sees error instead of data
Con: Database still experiences load (fallback path still hits DB)

Decision framework: non-critical data (recommendations, feeds) → serve stale; critical data (prices, inventory, auth) → serve error or bypass cache entirely during stampede.

14. How does consistent hashing help reduce stampede impact during cache node additions or removals?

Consistent hashing maps keys to nodes based on hash ring. When a node is added or removed, only K/N keys remap (where K is total keys, N is nodes).

Stampede impact reduction:

Without consistent hashing: adding a node remaps ALL keys, causing mass cache misses
With consistent hashing: only ~1/N of keys remap, limiting thundering herd to a fraction of keys
Virtual nodes (150+ per physical node) further smooth distribution

Best practice: use consistent hashing with virtual nodes in distributed cache deployments. When adding nodes, do it during low-traffic windows and consider pre-warming the new node.

15. How does the concept of "lease" in cache implementations help with stampede prevention?

A lease is a token granted to the first requestor to refresh a cache entry. Other requestors receive the lease ID and know to wait or serve stale data.

Lease pattern workflow:

Request arrives, finds cache miss
Cache grants a lease (unique ID) to this requestor
Other requestors see lease exists, wait briefly or get stale data
Lease holder fetches fresh data, updates cache, releases lease
Other requestors get updated value

Benefits: explicit coordination, lease has TTL so dead holders don't block indefinitely, requestors can serve stale while waiting.

16. What is the role of circuit breakers in cache stampede protection?

A circuit breaker monitors the rate of cache failures. When failures exceed a threshold, the circuit "opens" and the application bypasses the cache entirely, going directly to the database.

Why it helps during stampede:

Cache is overloaded during stampede — more cache requests = more failures
Circuit breaker stops sending requests to failing cache, reducing load on cache
Database gets fewer (but smarter) requests via direct path
After cooldown period, circuit breaker half-opens, testing if cache recovered

Implementation: track cache errors vs successes. If error rate > threshold (e.g., 50% in 10 seconds), open circuit. After timeout, allow test requests through.

17. How does cache stampede prevention differ between Redis and Memcached?

Redis advantages for stampede prevention:

Native `SETNX` + `EX` for distributed locks (single command atomic)
Pub/Sub for invalidation broadcasting across nodes
Lua scripts for atomic multi-step lock + fetch + release
Rich data structures for tracking access frequency (supports LFU)

Memcached limitations:

No native lock command — must implement with `add` (CAS-like)
No pub/sub — invalidation requires external coordination
No Lua scripting — lock + fetch requires separate operations
Multi-threaded but global lock on each operation creates contention

For Memcached: use local in-process cache (L1) in front of Memcached to absorb stampedes before they reach Memcached layer.

18. What is the relationship between cache stampede and cache penetration, and how do you prevent both?

Cache stampede: popular key expires, many requests hit database simultaneously.

Cache penetration: requests for keys that don't exist in cache OR database, every request hits database.

Both overwhelm the database but have different causes.

Preventing cache stampede:

Probabilistic early expiration, single-flight, locks
Jittered TTLs, cache warming

Preventing cache penetration:

Cache null values with short TTL for non-existent keys
Bloom filters to quickly reject impossible keys
Input validation to reject malformed keys

Both can coexist: use stampede prevention for hot keys, penetration prevention for non-existent key patterns.

19. How would you test cache stampede prevention before going to production?

Testing approaches:

Load testing: Simulate thousands of concurrent requests for the same key with `wrk` or `k6`
Chaos injection: Manually expire a hot key in production (or staging) and observe behavior
Failure mode testing: Kill a cache node and measure how quickly stampede protection kicks in
TLB (Time-to-Last-Byte) measurement: Measure latency under stampede conditions with and without protection

Metrics to validate:

Database QPS during stampede (should be 1, not N)
p99 latency during stampede (should remain reasonable, not spike to seconds)
Error rate during stampede (should be 0%, not increase)

20. Design a cache stampede protection system for a real-time bidding system where cache misses are extremely expensive.

Real-time bidding has unique constraints: sub-millisecond latency required, cache misses cost money (auction opportunities lost), data must be fresh (current bid prices).

Architecture:

No stale-while-revalidate: freshness is critical, never serve stale bid data
Proactive cache warming: pre-populate hot bid data before auction windows (sports events, prime time)
Lock-based refresh with short locks: 50ms TTL locks, aggressive retry
Read-after-write: immediately read updated cache after write to ensure consistency
Per-key rate limiting: limit how often a specific key can be refreshed to prevent thrashing
Graceful degradation: if cache unavailable, query DB directly but log the cache miss

Monitoring: track bid update latency (should be <5ms p99), cache hit rate (target >99%), auction opportunities lost due to cache latency.

Tool	Purpose
Redis Cache Stampede Toolkit	Reference implementations for lock-based and probabilistic stampede prevention
Guava Cache	Java library with built-in stampede protection (singleflight, loading cache)

Conclusion

Cache stampede prevention is one piece of a larger caching strategy. For more on caching patterns, read about caching strategies and distributed caching.

For handling the database side of concurrent writes, the locking and concurrency guide covers pessimistic and optimistic approaches.

For understanding how cache stampedes relate to overall database scaling, it is worth knowing how caches fit into the scale-out picture.

Cache Stampede Prevention: Protecting Your Cache from Thundering Herds

Why Cache Stamps Happen

Single-Flight Pattern

Go Single-Flight Implementation

Request Coalescing

Probabilistic Early Expiration

Lock-Based Cache Refresh

Choosing a Strategy

Combining Approaches

Common Mistakes

Monitoring for Stampedes

Trade-off Analysis

Real-world Failure Scenarios

Scenario 1: Redis Lock Deadlock After Deploy

Scenario 2: Probabilistic Refresh Collisions on Flash Sale

Scenario 3: Stale Data Served Indefinitely After Bug Fix

Real-world Case Study: Twitter’s Cache Stampede Protection

Interview Questions

Further Reading

Documentation

Papers and Technical References

Related Blog Posts

Tools

Conclusion

Category

Tags

Related Posts

Caching Strategies: A Practical Guide

Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming

Distributed Caching: Scaling Cache Across Multiple Nodes