Cache Eviction Policies: LRU, LFU, FIFO, and Beyond
A comprehensive guide to cache eviction policies — LRU, LFU, FIFO, Random — with implementation trade-offs, decision frameworks, and real-world case studies.
Introduction
This guide covers the most widely-used cache eviction policies, from simple strategies like FIFO and Random to sophisticated ones like LRU and LFU, with implementation trade-offs, decision frameworks, and real-world case studies.
Core Concepts
LRU (Least Recently Used)
LRU evicts the item that hasn’t been accessed for the longest time. The idea is that recently used items are more likely to be used again soon.
graph LR
A[Cache Full] --> B[New item arrives]
B --> C{Is item in cache?}
C -->|No| D[Evict LRU item]
D --> E[Add new item]
C -->|Yes| F[Update position]
Implementation with LinkedHashMap (Java):
import java.util.LinkedHashMap;
import java.util.Map;
public class LRUCache<K, V> extends LinkedHashMap<K, V> {
private final int capacity;
public LRUCache(int capacity) {
super(capacity, 0.75f, true); // accessOrder = true for LRU
this.capacity = capacity;
}
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > capacity;
}
}
Redis LRU Implementation:
Redis doesn’t have a true LRU implementation because that would require tracking access order for every item, which is expensive. Instead, it samples a few random keys and evicts the least recently used among them.
# Configure Redis to use LRU eviction
maxmemory 100mb
maxmemory-policy allkeys-lru
# Or for volatile-lru (only evict keys with TTL)
maxmemory-policy volatile-lru
Good for:
- Access patterns where recent items get hit again soon
- Stable popular items
- Working sets that fit comfortably in cache
Struggles with:
- One-time scans (every item touched once, never again)
- Large working sets that overflow memory
- Write-heavy loads with big objects
LFU (Least Frequently Used)
LFU evicts the item with the lowest access count. If an item was accessed 100 times and another only 5 times, LFU keeps the 100-time item.
graph LR
A[Cache Full] --> B[New item arrives]
B --> C{Item exists?}
C -->|No| D[Evict LFU item]
D --> E[Add new item, count=1]
C -->|Yes| F[Increment count]
F --> E
Implementation:
from collections import Counter
import heapq
class LFUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {} # key -> (value, freq, timestamp)
self.freq_heap = [] # (freq, timestamp, key)
self.counter = Counter()
self.timestamp = 0
def get(self, key):
if key not in self.cache:
return None
value, freq, ts = self.cache[key]
self.counter[key] += 1
self.cache[key] = (value, freq + 1, self.timestamp)
heapq.heappush(self.freq_heap,
(freq + 1, self.timestamp, key))
self.timestamp += 1
return value
def put(self, key, value):
if self.capacity == 0:
return
if key in self.cache:
self.cache[key] = (value, self.counter[key] + 1, self.timestamp)
else:
if len(self.cache) >= self.capacity:
# Evict least frequently used
while self.freq_heap:
freq, ts, k = heapq.heappop(self.freq_heap)
if k in self.cache and self.cache[k][1] == freq:
del self.cache[k]
del self.counter[k]
break
self.cache[key] = (value, 1, self.timestamp)
self.counter[key] = 1
heapq.heappush(self.freq_heap,
(self.cache[key][1], self.timestamp, key))
self.timestamp += 1
Redis LFU Implementation:
# Enable LFU
maxmemory-policy allkeys-lfu
# Or volatile-lfu (only keys with TTL)
maxmemory-policy volatile-lfu
# Configure LFU decay time (how often freq counters decrease)
lfu-decay-time 1
Good for:
- Items with genuinely different popularity levels
- Stable access counts over time
- Rewarding consistently popular items
Struggles with:
- New items that need time to build up access count
- Old items that stay even after they stop being popular
- Frequency counter overflow if not tuned
FIFO (First In, First Out)
FIFO is the simplest policy: whatever entered the cache first leaves first. No access tracking, no frequency counting. Just a queue.
graph LR
A[Cache Full] --> B[New item arrives]
B --> C[Evict oldest item]
C --> D[Add new item]
Implementation:
from collections import deque
class FIFOCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {}
self.queue = deque()
def put(self, key, value):
if key in self.cache:
return
if len(self.cache) >= self.capacity:
oldest = self.queue.popleft()
if oldest in self.cache:
del self.cache[oldest]
self.cache[key] = value
self.queue.append(key)
def get(self, key):
return self.cache.get(key)
Good for:
- Uniform access patterns
- Situations where simplicity beats optimal hit rate
- Debugging (predictable behavior)
Struggles with:
- Most real-world access patterns
- Frequently accessed items getting evicted if they happened to arrive first
- Performance compared to LRU
TTL (Time To Live)
TTL isn’t really an eviction policy in the traditional sense. Instead, items automatically expire after a set time. The cache doesn’t need to decide what to evict; expired items are either ignored on access or cleaned up by a background process.
graph LR
A[Item added] --> B[Timer starts]
B --> C{TTL expired?}
C -->|Yes| D[Item invalid]
C -->|No| E[Item valid]
Redis TTL:
# Set a 5-minute TTL on a key
SET user:123 "data"
EXPIRE user:123 300
# Or use SETEX for atomic set + expire
SETEX user:123 300 "data"
# Check remaining TTL
TTL user:123
Lazy expiration vs active cleanup:
class TTLCache:
def __init__(self, default_ttl=3600):
self.default_ttl = default_ttl
self.cache = {} # key -> (value, expiry_time)
def get(self, key):
if key not in self.cache:
return None
value, expiry = self.cache[key]
if time.time() > expiry:
del self.cache[key]
return None
return value
def put(self, key, value, ttl=None):
ttl = ttl or self.default_ttl
expiry = time.time() + ttl
self.cache[key] = (value, expiry)
Good for:
- Data that naturally expires (sessions, temporary data)
- Cache consistency without manual invalidation
- Data that might change on the backend
Struggles with:
- Finding the right TTL value (too short = cache churn, too long = stale data)
- Access patterns (ignores them entirely)
- Memory waste from expired items still sitting around
Random Replacement
Random replacement picks a random item to evict when the cache is full. No tracking, no history, no complexity.
graph LR
A[Cache Full] --> B[New item arrives]
B --> C[Pick random item]
C --> D[Evict it]
D --> E[Add new item]
Implementation:
import random
class RandomCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {}
def put(self, key, value):
if key in self.cache:
self.cache[key] = value
return
if len(self.cache) >= self.capacity:
# Pick random key to evict
evict_key = random.choice(list(self.cache.keys()))
del self.cache[evict_key]
self.cache[key] = value
def get(self, key):
return self.cache.get(key)
Redis random eviction:
# Evict random keys
maxmemory-policy allkeys-random
# Or only random among keys with TTL
maxmemory-policy volatile-random
Good for:
- Uniform or unpredictable access patterns
- Extreme simplicity requirements
- Baseline comparison for other policies
- Certain distributed caching scenarios
Struggles with:
- Non-uniform access patterns
- Need for predictable performance
- Most real-world use cases
Topic-Specific Deep Dives
Comparing the Policies
| Policy | Complexity | Hit Rate | Memory Overhead | Best For |
|---|---|---|---|---|
| LRU | Medium | Good | Medium | Temporal locality |
| LFU | High | Best | High | Stable popularity |
| FIFO | Low | Poor | Low | Uniform access |
| TTL | Low | Variable | Low | Freshness required |
| Random | Very Low | Variable | None | Uniform access, simplicity |
flowchart TD
A[Access Pattern Analysis] --> B{Is access uniform?}
B -->|Yes| C{Random item acceptable?}
B -->|No| D{Do popular items stay popular?}
D -->|Yes| E[LFU]
D -->|No| F{Is recent data more popular?}
F -->|Yes| G[LRU]
F -->|No| H[Consider TTL or Random]
C -->|Yes| I[Random or FIFO]
C -->|No| J[LRU]
Trade-off Analysis (H1)
| Policy | Hit Rate | Memory Overhead | CPU Cost | Scan Resistance | Tunability | Best For |
|---|---|---|---|---|---|---|
| LRU | Good (temporal locality) | Very Low | Very Low | Low | None | General-purpose, web serving, APIs |
| LFU | Better (frequency reward) | Low (counter per entry) | Medium | Medium | None | Read-heavy, ranking systems, trending content |
| FIFO | Poor | Very Low | Very Low | None | None | Baselines, memory budget constraints |
| Random | Poor | Very Low | Very Low | High | None | Baselines, memory budget constraints |
| TTL | Depends on TTL | None | Low | N/A | TTL value | Time-sensitive data, session expiry |
Key Takeaways:
- Default: LRU for most use cases — simple, effective, low overhead
- Read-heavy/frequency-based: LFU — rewards consistent access patterns
- Memory-constrained: LRU over FIFO/Random — both have poor hit rates
- Variable workloads: Consider ARC or 2Q for adaptive behavior
- Redis: Use
volatile-lruorvolatile-lfuto only evict keys with TTL set
Real-World Considerations
Redis-Specific Notes
Redis 4.0+ introduced LFU and improved LRU. The maxmemory-policy setting controls what happens when memory limit is reached:
# Volatile policies (only evict keys with an expiry)
maxmemory-policy volatile-lru # Remove LRU keys with TTL
maxmemory-policy volatile-lfu # Remove LFU keys with TTL
maxmemory-policy volatile-ttl # Remove keys with shortest TTL
maxmemory-policy volatile-random # Random removal of keys with TTL
# Allkeys policies (evict any key)
maxmemory-policy allkeys-lru # LRU on all keys
maxmemory-policy allkeys-lfu # LFU on all keys
maxmemory-policy allkeys-random # Random on all keys
Approximate LRU in Redis
True LRU requires maintaining access order for all items, which is expensive. Redis samples N keys and picks the best among them:
# Number of samples to check (default 5)
maxmemory-samples 10
Higher values approach true LRU but use more CPU.
Memory vs Speed
There’s a fundamental trade-off in eviction policy design:
- More tracking = better decisions but more CPU/memory overhead
- Less tracking = faster operations but potentially worse hit rates
For most applications, LRU with a small samples count (3-5) strikes a good balance. The hit rate difference between true LRU and 5-sample LRU is usually negligible.
Cache Stampede Prevention
Here’s a fun way to take down your origin: let a popular cache entry expire, then watch as every concurrent request for that entry piles up at your database at the same time. That’s the thundering herd, and it can saturate your database before your cache even gets a chance to help.
graph TD
A[Hot entry expires] --> B[Request 1 misses cache]
A --> C[Request 2 misses cache]
A --> D[Request N misses cache]
B --> E[All hit origin simultaneously]
C --> E
D --> E
E --> F[Origin overload / failure]
Probabilistic Early Expiration (XFetch)
Instead of waiting for TTL to expire exactly, probabilistically extend some entries early to spread the load:
import hashlib
import random
import time
def get_with_probabilistic_early_expiry(cache, key, ttl, delta=0.1):
"""
Implements XFetch: probabilistically refreshes entries
before they actually expire to prevent thundering herds.
"""
value, expiry = cache.get(key)
if value is None:
return None
# If within delta window, probabilistically refresh
time_left = expiry - time.time()
if time_left < ttl * delta:
# Higher time_left means lower probability of refresh
# This spreads the refresh load over the delta window
prob_refresh = 1 - (time_left / (ttl * delta))
if random.random() < prob_refresh:
# This thread will refresh - others may get stale value
return value, "stale", expiry # Return stale, background refresh
return value, "fresh", expiry
def background_refresh(cache, key, compute_fn, ttl):
"""Background refresh worker - non-blocking"""
try:
fresh_value = compute_fn()
cache.set(key, fresh_value, ttl)
except Exception:
pass # Log and continue, don't block requests
Lease / Callback Pattern
Instead of letting all requests hit the origin, assign a “lease” to one requester while others wait:
import threading
class LeaseCache:
def __init__(self):
self.cache = {}
self.leases = {} # key -> holder thread ID
self.lock = threading.Lock()
def get(self, key, compute_fn, ttl=3600):
value = self.cache.get(key)
if value is not None:
return value
with self.lock:
if key in self.leases:
# Another thread is computing - wait or return stale
holder = self.leases[key]
# Wait briefly then return stale or recompute
return self.cache.get(key) # May be None
# This thread gets the lease
self.leases[key] = threading.get_ident()
try:
value = compute_fn()
self.cache[key] = value
return value
finally:
with self.lock:
del self.leases[key]
def get_with_stale(self, key, compute_fn, ttl=3600, max_staleness=60):
"""
Return stale value if available, refresh in background.
Best for non-critical data where slight staleness is acceptable.
"""
entry = self.cache.get(key)
if entry is not None:
value, timestamp = entry
if time.time() - timestamp < max_staleness:
# Serve stale immediately, refresh async
thread = threading.Thread(target=self._refresh, args=(key, compute_fn, ttl))
thread.start()
return value, "stale"
# No stale available - must block for fresh
return self.get(key, compute_fn, ttl), "fresh"
def _refresh(self, key, compute_fn, ttl):
try:
value = compute_fn()
self.cache[key] = (value, time.time())
except Exception:
pass
Redis-based Stampede Prevention
# Use SETNX-based lock for single-flight (only one request recomputes)
SETNX lock:user:123 "{timestamp}"
EXPIRE lock:user:123 10
# If lock acquired: recompute, set cache
# If lock failed: wait, then retry cache read
# Better: Redlock for distributed stampede prevention
# Use probabilistic early expiry with jitter
TTL = base_ttl + random.uniform(0, base_ttl * 0.1)
When to Use Each Approach
| Approach | Best For | Trade-off |
|---|---|---|
| Probabilistic early expiry | Read-heavy, tolerate slight staleness | Some requests get stale data |
| Lease/callback | Strict consistency requirements | First request pays full latency |
| Distributed lock (Redlock) | Multi-node deployments | Added complexity, lock overhead |
| Stale-while-revalidate | Non-critical data, high read throughput | Stale data exposure window |
Cache Warming Strategies
After a restart, a deploy, or an accidental flush, your cache starts empty. Every request becomes a cache miss and hits your origin at once. Cache warming is how you bridge that gap — getting popular data back in before production traffic becomes a problem.
Proactive Warming on Startup
import redis
import json
def warm_cache_from_persistence(redis_client, warm_keys_path):
"""
Load popular keys from a warm_keys.json file on startup.
This file should be generated from production access logs.
"""
with open(warm_keys_path, 'r') as f:
warm_entries = json.load(f)
warmed = 0
for entry in warm_entries:
key = entry['key']
value = entry['value']
ttl = entry.get('ttl', 3600)
# Only warm if key doesn't already exist
if not redis_client.exists(key):
redis_client.setex(key, ttl, value)
warmed += 1
return warmed
# Example warm_keys.json
# [
# {"key": "product:featured:2026", "value": "...", "ttl": 300},
# {"key": "user:session:12345", "value": "...", "ttl": 1800}
# ]
Background Warming with Priority Queue
import heapq
import threading
from collections import deque
class BackgroundWarmingCache:
def __init__(self, redis_client, max_warm_per_second=100):
self.redis = redis_client
self.max_warm_per_second = max_warm_per_second
self.warm_queue = [] # (priority, key, compute_fn)
self.warmed_count = 0
self.lock = threading.Lock()
threading.Thread(target=self._warming_worker, daemon=True).start()
def enqueue_warm(self, key, compute_fn, priority=1):
"""Priority 1 = highest (warm first), higher = lower priority"""
with self.lock:
heapq.heappush(self.warm_queue, (priority, key, compute_fn))
def _warming_worker(self):
while True:
with self.lock:
if not self.warm_queue:
time.sleep(0.1)
continue
priority, key, compute_fn = heapq.heappop(self.warm_queue)
# Only warm if not already cached
if not self.redis.exists(key):
try:
value = compute_fn()
self.redis.set(key, value)
with self.lock:
self.warmed_count += 1
except Exception:
pass # Skip failed warm, log in production
# Rate limiting
time.sleep(1.0 / self.max_warm_per_second)
On-Demand Warming (First-Request Triggered)
def get_with_on_demand_warm(cache, key, compute_fn, ttl=3600):
"""
On cache miss, return a synthetic stale value immediately
if one is available, then trigger background warm.
"""
stale = cache.get_stale(key)
if stale is not None:
# Return stale immediately, warm in background
threading.Thread(target=cache.set, args=(key, compute_fn(), ttl)).start()
return stale, "warm_stale"
# No stale available - must compute
value = compute_fn()
cache.set(key, value, ttl)
return value, "fresh"
# Usage in endpoint
@app.route('/api/product/<id>')
def get_product(id):
value, source = get_with_on_demand_warm(
cache=redis,
key=f"product:{id}",
compute_fn=lambda: db.fetch_product(id)
)
return {"data": value, "cache_source": source}
Warming from Database Query Logs
import psycopg2
def generate_warm_keys_from_query_logs(db_conn, top_n=1000, hours=24):
"""
Analyze slow query log to identify frequently accessed keys.
Useful for generating warm_keys.json automatically.
"""
cursor = db_conn.cursor()
cursor.execute("""
SELECT cache_key, COUNT(*) as hits
FROM query_log
WHERE timestamp > NOW() - INTERVAL '%s hours'
GROUP BY cache_key
ORDER BY hits DESC
LIMIT %s
""", (hours, top_n))
return [{"key": row[0], "priority": i} for i, row in enumerate(cursor.fetchall())]
Cache Warming Decision Matrix
| Strategy | Latency Impact | Complexity | Best Scenario |
|---|---|---|---|
| Proactive (file load) | None (background) | Low | Known popular keys, restart scenarios |
| Priority queue | Minimal (rate-limited) | Medium | Large warm sets, production traffic |
| On-demand (stale-first) | None (stale served) | Medium | Real-time, any key access pattern |
| Query log analysis | None (offline analysis) | High | Continuous optimization, A/B testing |
Multi-Level Cache Hierarchy
Real systems don’t use a single cache. They stack caches at multiple levels, each with different capacity, latency, and eviction characteristics. Understanding these hierarchies is critical for designing systems that scale.
graph TD
subgraph "L1 - CPU Core (ns)"
A[L1 Cache<br/>32-64KB<br/>~1ns]
end
subgraph "L2 - CPU Core (ns)"
B[L2 Cache<br/>256KB-1MB<br/>~4ns]
end
subgraph "L3 - Shared (ns)"
C[L3 Cache<br/>8-64MB<br/>~10-20ns]
end
subgraph "Memory (μs)"
D[RAM<br/>GB scale<br/>~100ns]
end
subgraph "SSD (μs)"
E[NVMe SSD<br/>~100μs]
end
subgraph "Disk (ms)"
F[HDD/Cloud Storage<br/>~10ms]
end
A --> B --> C --> D --> E --> F
Inclusive vs Exclusive Caches
Inclusive cache: Lower level contains all entries from upper level. Simple, wastes some space.
Exclusive cache: Levels don’t duplicate. More capacity per footprint, more complex eviction coordination.
graph LR
subgraph "Inclusive Cache Hierarchy"
I1[L1] --> I2[L2]
I2 --> I3[L3]
end
subgraph "Exclusive Cache Hierarchy"
E1[L1] --> E2[L2]
E2 --> E3[L3]
end
| Property | Inclusive | Exclusive |
|---|---|---|
| Capacity efficiency | Lower (duplication) | Higher (no duplication) |
| Eviction complexity | Lower | Higher (must check all levels) |
| Hit validation | Simple (check L1, done) | Must check all levels |
| Used by | Most modern CPUs | Some HPC architectures |
| Cache coherency | Simpler | More complex |
Write Policy in Multi-Level Caches
graph TD
A[CPU Write] --> B{L1 Hit?}
B -->|Yes| C[Write to L1]
B -->|No| D{L2 Hit?}
D -->|Yes| E[Write to L2, fetch from L1]
D -->|No| F{L3 Hit?}
F -->|Yes| G[Write to L3, propagate up]
F -->|No| H[Write to Memory]
H --> I[Invalidate caches]
Write-through: Every write updates all levels immediately. Simple coherency, higher write traffic.
Write-back: Write only to highest level, propagate to lower levels on eviction. Lower write traffic, complex coherency.
Cache Line Eviction Coordination
When L1 evicts an entry that exists in L2/L3, the entry moves down rather than being discarded:
class MultiLevelCache:
def __init__(self, l1, l2, l3):
self.l1 = l1 # Small, fast
self.l2 = l2 # Medium
self.l3 = l3 # Large, slow
def evict_from_l1(self, key, value):
"""L1 eviction -> promoted to L2"""
if self.l2.has_capacity():
self.l2.set(key, value)
else:
evicted_l2_key, evicted_l2_val = self.l2.evict_lru()
self.l3.set(evicted_l2_key, evicted_l2_val) # L2 evicted -> L3
self.l2.set(key, value)
def get(self, key):
if value := self.l1.get(key):
return value, "L1"
if value := self.l2.get(key):
self.l1.set(key, value) # Promote to L1
return value, "L2"
if value := self.l3.get(key):
self.l2.set(key, value) # Promote to L2
self.l1.set(key, value) # Also promote to L1
return value, "L3"
return None, "miss"
Software-Level Cache Hierarchies (Redis tiering)
# Redis Tiered Cache (Redis 7.0+ with DRAM + NVMe)
# Configure Redis with memory + storage engine
activerehashing yes
# Use as tiered: hot data in memory, warm data on disk
# For application-level tiering
tiered-cache:
l1: # In-memory Redis
maxmemory: 64gb
policy: allkeys-lru
l2: # SSD-backed Redis
maxmemory: 512gb
storage-engine: devours
eviction-policy: allkeys-lru
Hit Rate in Multi-Level Cache
def calculate_multilevel_hit_rate(l1_hits, l2_hits, l3_hits, total):
"""
Calculate effective hit rate across cache hierarchy.
Total effective hit = L1 + L2*(1-L1_rate) + L3*(1-L1_rate)*(1-L2_rate)
"""
l1_rate = l1_hits / total
l2_rate = l2_hits / total
l3_rate = l3_hits / total
effective_hit = l1_rate + l2_rate * (1 - l1_rate) + l3_rate * (1 - l1_rate) * (1 - l2_rate)
return effective_hit
# Example:
# L1 hit rate: 80%, L2 hit rate: 15%, L3 hit rate: 4%
# effective_hit = 0.80 + 0.15*(0.20) + 0.04*(0.20)*(0.85)
# effective_hit = 0.80 + 0.03 + 0.0068 = 0.8368 = 83.7%
Write Policy Deep Dive
Caching isn’t only about reads. Write patterns determine cache freshness and storage efficiency. The three primary write policies serve different consistency and performance needs.
graph LR
subgraph Write-Through
A[Write] --> B[Update Cache]
B --> C[Update Storage]
C --> D[Return]
end
subgraph Write-Back
A2[Write] --> B2[Update Cache]
B2 --> D2[Return fast]
D2 -.-> C2[Update Storage later]
end
subgraph Write-Around
A3[Write] --> C3[Update Storage]
C3 --> D3[Return]
D3 -.-> B3[Cache may get entry]
end
Write-Through
Every write updates both cache and storage. Simple consistency — cache always reflects stored data.
def write_through(cache, db, key, value):
"""
Write-through: synchronous write to both cache and DB.
Strong consistency, higher write latency.
"""
cache.set(key, value) # Update cache first (or after, configurable)
db.update(key, value) # Synchronous DB write
return value # Only return after both succeed
# Use when:
# - Data must not be lost (financial transactions)
# - Cache and DB must always be consistent
# - Write amplification is acceptable
Trade-off table:
| Factor | Write-Through | Write-Back | Write-Around |
|---|---|---|---|
| Write latency | High (sync storage) | Low (cache only) | Medium (storage only) |
| Consistency | Strong | Weak (power loss risk) | Storage is truth |
| Storage hits | Every write | On eviction | Only on read |
| Data durability | Immediate | On eviction | Immediate |
| Complexity | Low | High (eviction logic) | Medium |
Write-Back (Write-Behind)
Write to cache only, defer storage writes until eviction or checkpoint. Very fast for writes, risky on crash.
class WriteBackCache:
def __init__(self, cache, db, flush_interval=30):
self.cache = cache
self.db = db
self.dirty = {} # key -> value (written but not persisted)
self.flush_interval = flush_interval
# Background flush thread
threading.Thread(target=self._periodic_flush, daemon=True).start()
def write(self, key, value):
self.cache.set(key, value)
self.dirty[key] = value # Mark as dirty
def _periodic_flush(self):
while True:
time.sleep(self.flush_interval)
self.flush_all()
def flush_all(self):
for key, value in self.dirty.items():
self.db.update(key, value)
self.dirty.clear()
def __del__(self):
self.flush_all() # Ensure dirty writes on shutdown
Write-Around
Write goes directly to storage, bypassing cache. Cache stays uncluttered with one-time-write data.
def write_around(cache, db, key, value):
"""
Write-around: write to storage, cache on next read.
Keeps cache clean of one-time-write data.
"""
db.update(key, value) # Always write to storage
cache.delete(key) # Invalidate any existing cache entry
# Cache will be populated on next read
Use cases:
| Policy | Best For |
|---|---|
| Write-through | Critical data, financial, inventory where loss = loss |
| Write-back | High-write-throughput (logs, metrics), acceptable loss |
| Write-around | One-time writes (logs, events), read-heavy workloads |
Redis Write Patterns
# Write-through in Redis (synchronous SET + DB write)
SET user:123 "data"
HSET users:123 name "Alice" email "alice@example.com"
# For true write-through, wrap in MULTI/EXEC transaction
MULTI
HSET user:123 name "Alice"
LPUSH user:123:actions "login"
EXEC # Atomic, but not durable
# For durability: use Redis RDB/AOF + application-level write-through
# Application writes to Redis, Redis persists asynchronously
# On crash, restore from AOF + reconcile with DB
# Write-behind pattern with Redis
# 1. Write to Redis (fast)
SET session:abc "{data}"
# 2. Background job flushes to DB periodically
# Use Redis Stream or just a cron job
Read-Through vs Write-Through
| Pattern | Description | When to Use |
|---|---|---|
| Read-through | Cache automatically loads from storage on miss | Always (caching library handles) |
| Write-through | Cache and storage updated simultaneously | Strong consistency requirements |
| Write-behind | Cache updated, storage async | High write throughput |
Capacity Estimation
Different eviction policies have different memory overheads for tracking access. This matters when you’re trying to size your cache.
Memory overhead per million keys (approximate Redis):
| Policy | Metadata Overhead per Key | Per 1M Keys |
|---|---|---|
| FIFO | ~8 bytes (queue pointer) | ~8MB |
| Random | 0 bytes (no tracking) | 0MB |
| LRU (approximate, 5 samples) | ~16 bytes (access timestamp) | ~16MB |
| LFU | ~24 bytes (counter + decay timestamp) | ~24MB |
| TTL-only | 0 bytes (Redis native) | 0MB |
Working set size estimation:
working_set_bytes = unique_keys_per_second × avg_key_size × avg_ttl_seconds
cache_hit_probability = min(1, cache_size / working_set_bytes)
For a session cache with 100,000 unique sessions per second, average session size 2KB, average TTL 30 minutes (1800 seconds): working set = 100,000 × 2KB × 1800 = 360GB. A 64GB cache gives hit probability of 64/360 = 17.8% — too low, session cache is not fitting in cache. A 512GB cache gives 512/360 = 142% — working set fits with headroom.
Redis LRU vs LFU memory: LFU needs two values per key: the access counter and a decay timestamp. Counter overflow is a real concern — Redis LFU counters are 8-bit (0-255). After 255 increments, the counter stops growing unless decay kicks in. The lfu-decay-time controls how often counters are halved: lfu-decay-time 10 means every 10 minutes, all counters are multiplied by 0.5. This prevents counter freeze but means occasional access keeps items alive longer than expected.
Memcached slab class sizing: Memcached organizes memory into slab classes of 1MB each, divided into fixed-size chunks. For LRU with eviction, choose chunk size matching your average value size. If your values average 200 bytes but your slab class uses 128-byte chunks, you waste memory. If you use 1MB chunks for 200-byte values, you waste even more. The slabs reassign command can rebalance slab classes in production.
When to Use / When Not to Use
| Policy | When to Use | When Not to Use |
|---|---|---|
| LRU | Access has temporal locality (recent items accessed again); stable working set that fits in memory | One-time scans that touch every item; large working sets that exceed memory |
| LFU | Items have stable popularity over time; you want to reward consistently hot items | Rapidly changing access patterns; new items need time to build frequency |
| FIFO | Debugging or testing; uniform access where simplicity is paramount | Production systems with real access patterns; any case where hit rate matters |
| TTL | Data that naturally expires (sessions, temporary tokens); freshness-critical data | Static reference data; data without natural expiration |
| Random | Uniform access patterns; extreme simplicity requirements; baseline comparison | Non-uniform access patterns; any case where specific items must be retained |
Decision Guide
graph TD
A[Access Pattern] --> B{Is access temporal?}
B -->|Yes| C{Working set fits memory?}
C -->|Yes| K[LRU]
C -->|No| F[LRU with smaller cache or LFU]
K --> D{Is frequency stable?}
F --> D
D -->|Yes| G[LFU]
D -->|No| H{Need freshness?}
H -->|Yes| I[TTL]
H -->|No| J[Random or FIFO]
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| LRU with full memory | Old items hog cache, new popular items evicted immediately | Monitor eviction rates; use volatile-lru to only evict items with TTL |
| LFU frequency counter overflow | Items stop being properly ranked, hit rate drops | Tune lfu-decay-time; use maxmemory-samples for better approximation |
| TTL items not expiring | Memory fills with dead entries; cache becomes ineffective | Enable active expiration (lazyfree_lazy_expire in Redis); monitor expires stat |
| Random evicts hot items | Users experience random slow requests; inconsistent performance | Switch to LRU or LFU; track what’s actually being evicted |
| Cache restart with no persistence | All items lost, cold start causes database stampede | Implement cache warming; use Redis RDB persistence or replica for warm restart |
Common Pitfalls / Anti-Patterns
1. Assuming LRU Means “Oldest Recently Used”
LRU evicts the least recently used - the item whose last access was longest ago. Not the item with the oldest timestamp in absolute terms.
# WRONG understanding: LRU keeps oldest items
# The item added 1 year ago but accessed 1 minute ago is NOT
# the LRU item. The item added 1 second ago but not accessed
# since is the LRU item.
# RIGHT understanding:
# LRU = Least Recently Used = longest time since last access
2. Mixing Volatile and Non-Volatile Keys
Using both volatile-lru and non-volatile (no TTL) keys can cause memory issues.
# PROBLEM: Cache fills with non-volatile keys
# volatile-lru can only evict keys WITH TTL
# If keys without TTL fill memory, volatile-lru has nothing to evict
# SOLUTION: Either use only volatile keys with TTL, or use allkeys-lru
# to evict from entire cache
maxmemory-policy allkeys-lru # Evicts anything
# OR
maxmemory-policy volatile-lru # Only evicts keys with TTL
# Never mix strategies - pick one approach
3. Setting TTL Too Long for Volatile Policies
If using volatile-lru or volatile-lfu, keys without TTL never get evicted and can fill memory.
# PROBLEM: Some keys have no TTL, some have very long TTL
redis.setex("user:123", 86400, data) # 24 hour TTL
redis.set("config", json.dumps(config)) # NO TTL - never expires
# After cache fills with NO-TTL keys, volatile-lru can't free space
# because it only touches keys with TTL
# SOLUTION: Always use SETEX/SET with EX option; never use SET without TTL
redis.setex("config", 86400, json.dumps(config)) # Always with TTL
4. Ignoring Eviction Rate in Production
Low eviction count in dev, high eviction count in production means memory miscalculation.
# Development: Small dataset, everything fits
# Production: Large dataset, cache too small
# Monitor in production:
# redis-cli INFO stats | grep evicted_keys
# If evicted_keys > 0 frequently, cache is too small or policy wrong
5. Not Accounting for LRU Approximation in Redis
Redis doesn’t do true LRU. It samples keys. Small samples = inaccurate LRU.
# Default 5 samples - may miss true LRU candidate
maxmemory-samples 5
# For more accurate LRU (more CPU, better decisions)
maxmemory-samples 10
# For very large caches with non-uniform access
maxmemory-samples 16
Quick Recap
Key Bullets
- LRU is the default choice for most use cases because temporal locality is common
- LFU rewards consistency - items accessed frequently stay cached longer
- TTL is not an eviction policy per se - it’s a freshness mechanism
- FIFO and Random are baselines, not production choices for hit-rate-critical workloads
- Redis samples ~5 keys for LRU approximation (configurable with
maxmemory-samples) volatile-*policies only evict keys that have a TTL set
Copy/Paste Checklist
# Redis eviction policy configuration
# For general caching (evict any key):
maxmemory 100mb
maxmemory-policy allkeys-lru
# For cache with some persistent keys:
maxmemory 100mb
maxmemory-policy allkeys-lru # OR allkeys-lfu
# For cache where items MUST have TTL:
maxmemory 100mb
maxmemory-policy volatile-lru # Only evict keys WITH TTL
maxmemory-policy volatile-lfu
# For pure TTL-based expiration (no eviction):
maxmemory-policy volatile-ttl # Evicts keys closest to expiration
maxmemory 100mb
# For debugging or very specific cases:
maxmemory-policy allkeys-random
maxmemory-policy volatile-random
# Monitoring commands
INFO stats | grep -E "keyspace|evicted|expired"
OBJECT freq user:123 # Check access frequency of a key
DEBUG OBJECT LRU user:123 # Check LRU position of a key
Security Checklist
- Configure memory limits - Prevent cache from consuming all system memory
- Disable dangerous commands -
FLUSHDB,FLUSHALLcan wipe entire cache - Use ACLs - Restrict which commands each user/role can execute
- Bind to internal interfaces - Never expose Redis/Memcached to public internet
- Monitor for abnormal eviction patterns - Could indicate attack or misconfiguration
- Validate key inputs - Prevent injection of extremely long or special character keys
- Set up replication authentication - Replicas should require auth to sync from primary
# Redis security configuration
rename-command FLUSHDB ""
rename-command FLUSHALL ""
rename-command DEBUG ""
maxmemory 100mb
maxmemory-policy allkeys-lru
Observability Checklist
Metrics to Track
evicted_keys- Number of items evicted per second; indicates memory pressureexpired_keys- Number of items that expired via TTLstat_keyspace_hits/misses- Cache hit ratio calculationused_memory- Current memory usagemaxmemory- Configured memory limit
# Redis INFO stats to monitor
INFO stats | grep -E "keyspace|evicted|expired"
# Keyspace hits:123456789
# Keyspace misses:12345678
# Expiredkeys:1234567
# Evictedkeys:12345
Logs to Capture
import structlog
logger = structlog.get_logger()
# Monitor eviction events
def check_eviction_risk(redis_client, threshold=0.7):
info = redis_client.info('memory')
used = info['used_memory']
maxmem = info['maxmemory']
if maxmem > 0:
usage_ratio = used / maxmem
if usage_ratio > threshold:
logger.warning("cache_memory_pressure",
usage_percent=f"{usage_ratio * 100:.1f}%",
evicted_recently=info.get('evicted_keys', 0))
return usage_ratio
Alert Rules
- alert: HighEvictionRate
expr: rate(redis_evicted_keys_total[5m]) > 100
for: 5m
labels:
severity: warning
annotations:
summary: "High cache eviction rate - memory pressure"
- alert: LowHitRate
expr: redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) < 0.7
for: 10m
labels:
severity: critical
annotations:
summary: "Cache hit rate below 70%"
Real-World Case Studies
Real-World Case Study: Netflix
Netflix streams video to 250+ million subscribers. Their caching strategy is among the most sophisticated in the industry.
Netflix uses a tiered caching approach. The bottom tier is origin storage (Amazon S3). Above that, regional caches handle requests for entire geographic regions. At the top, edge caches serve individual streams. The eviction policy at each tier differs based on content lifecycle.
For the “continue watching” feature — which shows a user’s in-progress videos — Netflix uses LFU eviction. The logic: videos that 10,000 people are actively watching should stay cached over videos that 10 people are watching. LFU keeps the most-requested content. The counter decay is tuned aggressively (lfu-decay-time 1) because a video’s popularity can drop dramatically after it leaves the “trending” window.
For the “top 10” recommendations list, they use LRU. The list changes daily, and yesterday’s list becomes irrelevant quickly. LRU naturally evicts yesterday’s recommendations as today’s take over.
The lesson: one eviction policy does not fit all use cases. Netflix runs multiple cache clusters for different features, each tuned to its specific access pattern. Auditing which content lives in which cache tier and whether the policy matches the content’s access pattern is ongoing operational work.
Interview Questions
The working set does not fit in the available cache. With LRU, if your most frequently accessed data exceeds available memory, those hot items get evicted by newer items that are accessed once and never again (the "one-time access" problem). Diagnose with redis-cli --latency-histogram to see response time distribution, and OBJECT freq to check access frequency of keys. The fix: either increase cache memory, reduce working set size (trim keys or reduce TTL), or move to LFU which rewards consistently popular items over one-time accesses.
With 1M unique endpoints and 100K capacity, 90% of cache entries will never be revisited (assuming uniform distribution). LRU would constantly evict to make room for new entries that arrive once and are never requested again. Better approaches: use API response fingerprinting to identify which endpoints are actually popular (cache only top N), or use LFU to identify truly popular endpoints. TTL alone is not sufficient — without LRU/LFU, the popular endpoints still get evicted when one-time-access items fill the cache. The practical answer is to investigate actual access distribution first: if 20% of endpoints account for 80% of requests, filter your cache to those first.
LFU counters decay via lfu-decay-time. Every N minutes (default 1), Redis multiplies all LFU counters by 0.5, effectively halving them. A key accessed once (counter = 1) becomes 0.5 → 0 → never considered "least frequently used" for eviction purposes. An item accessed 100 times (counter = 100) becomes 50, then 25, then 12.5. This means LFU with decay correctly ages out old popularity, but if decay is too aggressive, genuinely popular items get evicted because their counters dropped too far. The counter is 8-bit (0-255 max), so items with very high counts can saturate. If decay time is too long, counters never shrink and old popularity data poisons the cache. Tuning lfu-decay-time requires understanding your content lifecycle — video streaming with week-long popularity windows needs different decay than news with hour-long windows.
volatile-lru but some keys have no TTL. The cache fills up and evictions stop working. Why?volatile-lru only considers keys with a TTL for eviction. If keys without TTL (no expiration) fill 80% of your cache, volatile-lru can only evict from the 20% that have TTL. When the volatile subset fills, volatile-lru runs out of candidates and maxmemory cannot be respected. The solution: either use allkeys-lru to evict from the entire cache regardless of TTL, or ensure ALL keys in the cache have a TTL set. Mixing persistent keys (no TTL) with volatile keys in the same Redis instance when using volatile-lru is an anti-pattern — those persistent keys will eventually fill memory and cause eviction failure.
The pattern of recovery suggests a cold cache problem triggered by the deploy. Common causes: (1) Application restarted and cache was flushed or inaccessible, (2) New deployment changed cache key format making old keys incompatible, (3) Traffic shift exposed a larger working set that wasn't warmed. Diagnosis steps: check `redis-cli INFO stats` for `keyspace_hits/misses` ratio, compare `evicted_keys` count before/after deploy, review application logs for cache connection errors, and verify key naming scheme didn't change. The fix: implement cache warming from a warm key file on startup, use rolling deploys that keep some instances serving with warm caches while others restart, or use blue-green cache warmup where new instances warm up before receiving traffic.
Use a sliding window rate limiter with Redis MULT/EXEC across cluster nodes. Each request increments a counter key scoped to the user+window (e.g., `ratelimit:user123:window_1700000000`) using `INCR` with `EXPIRE` to auto-clean windows. Since Redis Cluster partitions by key hash, the rate limit key must be single-key for atomicity — use a hash slot calculator or proxy to route all rate limit operations through a single node, or use Redisson's RRateLimiter which handles distributed coordination. For 10K RPS, a single Redis node can handle ~50K ops/sec with pipelining, so co-locating rate limiting on one node is acceptable if you partition by tenant (e.g., `ratelimit:tenant_a:window`). Alternative: use a dedicated rate-limit Redis node outside the cluster to avoid impacting primary workloads.
Cache-aside (lazy loading): Application checks cache first, on miss reads from DB and populates cache. Write goes to DB first, then invalidates cache. Pros: only popular data fills cache, no wasted writes. Cons: first request is always slow (cache miss), potential stale data on write failures. Best for: read-heavy workloads where most data is never accessed.
Read-through: Cache automatically loads data from storage on miss — application only talks to cache. Pros: simpler application code, cache handles loading transparently. Cons: cache must understand storage format, more complexity in cache layer. Best for: when you control the cache implementation and want transparent caching.
Write-through: Every write goes to cache AND storage simultaneously. Pros: cache always consistent with storage, no data loss on crash. Cons: higher write latency (synchronous), cache can be flooded with one-time writes. Best for: write-heavy workloads requiring strong consistency (financial data, inventory).
Three diagnostic approaches: First, use `redis-cli --latency-history` to capture latency percentiles over time — if p99 exceeds 10ms for non-BLPOP commands, cache is the culprit. Second, check `redis-cli INFO latency_stats` which records command-level latencies — look for commands with high `latency_us` values. Third, run `redis-cli --intrinsic-latency 100` to measure the baseline hardware latency (should be under 1ms). If intrinsic latency is fine but commands are slow, it's contention or big keys. Also monitor `instantaneous_ops_per_sec` — if ops drop during your spike windows while CPU is high, you have command queueing. Check `MEMORY USAGE` for key size bloat that could cause slow serialization. For production, instrument your application with cache-level tracing: log cache hit/miss/latency for every request and correlate slow requests with cache misses.
Raw calculation: 10M sessions × 500 bytes = 5GB. Your 512MB cache can only hold ~1M sessions. At 10% cache coverage, hit rate would be ~10% assuming uniform access — which is likely the realistic range for LRU. But there are critical factors: (1) Key overhead — each Redis key name adds ~40-60 bytes, so 10M keys × 60 bytes = 600MB just for key names, leaving essentially 0 bytes for values. (2) Redis memory allocation is ~1.5x the raw data size due to overhead, dict entries, and sdsalloc. (3) If access is Zipfian (20% of users = 80% of traffic), caching the top 2M popular sessions would give you 80% hit rate on 20% of traffic. Solution: use session TTL to bound working set, or add more memory (16GB+ for 10M sessions at 500 bytes each with overhead). Always use `redis-cli --bigkeys` to identify oversized keys before production.
When `maxmemory` is reached, Redis triggers the eviction sequence: (1) freeMemoryIfNeeded() is called before any new write operation. (2) If used_memory > maxmemory, eviction begins. (3) Redis samples maxmemory-samples keys (default 5) using OBJECT idleTime(key) to get LRU/LFU idle. (4) It selects the best candidate (highest idle time for LRU, lowest frequency for LFU). (5) If volatile-* policy, only keys with TTL are sampled. (6) The eviction candidate is deleted with dbDelete(). (7) If memory is still over limit, repeat up to maxmemory-eviction-tenacity attempts (default 10). (8) If all attempts fail, Redis returns an out-of-memory error on the write command. The evicted_keys metric in INFO stats tracks total evictions. Important: LRU in Redis is approximate — it doesn't scan all keys, just samples. With small maxmemory-samples, you may evict a recently-used key if it happened to be in the sampled set.
Cache stampede occurs when a popular cache entry expires simultaneously, causing all concurrent requests to miss and hit the origin at the same time. This can saturate or crash the backing database. Primary prevention strategies include: (1) Probabilistic early expiration (XFetch) — extend some entries before actual TTL to spread the refresh load over a time window. (2) Lease/callback pattern — assign a lease to the first requester; others wait or receive stale data. (3) Distributed locking (Redlock) — only one requester recomputes, others wait. (4) Stale-while-revalidate — serve stale data immediately while refreshing in background. The best choice depends on consistency requirements: strict consistency needs locks, high-throughput systems with tolerable staleness benefit from probabilistic approaches. In Redis, use SETNX-based locks or implement probabilistic TTL jitter.
ARC is a self-tuning cache algorithm introduced by IBM that combines LRU and LFU by maintaining four lists: T1 (recent LRU), T2 (frequent LFU), B1 (evicted from T1), and B2 (evicted from T2). The key innovation is that ARC dynamically adjusts the size of T1 vs T2 based on workload. If a miss occurs on a recently seen item (in B1), ARC concludes LRU is too aggressive and grows T1. If a miss occurs on a frequently seen item (in B2), ARC grows T2. This feedback loop adapts to workload changes without manual tuning. ARC achieves better hit rates than pure LRU or LFU in workloads with varying locality patterns. Redis does not natively support ARC, but it can be implemented at the application level or used in specialized caches like `cachegrand`.
Cold start after deploy is a classic cache warming problem. Strategies: (1) Proactive warming from persistence — on startup, load a pre-generated list of warm keys (top N from production access logs) from JSON file. Generate this list offline from access logs. (2) Priority queue background warming — use a heap to warm keys in priority order (highest priority = most popular first), rate-limited to avoid overwhelming the origin during warmup. (3) On-demand stale-first warming — serve stale data immediately on cache miss while triggering background recomputation; handles any key access pattern. (4) Rolling deploys — restart instances in batches so some instances maintain warm caches while others restart. (5) Blue-green cache warmup — new instances warm up fully before receiving production traffic. For Redis, save top keys with DUMP and restore on startup. Monitor `evicted_keys` and `keyspace_hits/misses` to validate warmup effectiveness.
Write-through: synchronously writes to both cache and storage. Strong consistency, higher write latency (waits for storage), but cache always reflects stored data. Best for: financial transactions, inventory systems where data loss is unacceptable.
Write-back (write-behind): writes to cache only, defers storage writes until eviction or checkpoint. Very fast writes, risky on crash (data loss if power loss), complex coherency. Best for: high-write-throughput workloads like logs, metrics, where acceptable data loss is defined in the SLA.
Write-around: writes directly to storage, bypassing cache. Keeps cache clean of one-time-write data. Best for: write-heavy workloads with read-heavy access patterns (e.g., logging events that are rarely reread). Cache gets populated only on subsequent reads.
Choice depends on: (1) Write-to-read ratio — write-heavy = write-behind, read-heavy = write-through. (2) Consistency requirements — financial = write-through. (3) Data durability needs — write-back needs battery-backed cache or persistency to avoid loss.
Modern CPUs use a tiered cache hierarchy: L1 (32-64KB, ~1ns), L2 (256KB-1MB, ~4ns), L3 (8-64MB, ~10-20ns), all much faster than RAM (~100ns) and storage (~100μs to ms). Multi-level caching exploits spatial and temporal locality.
Inclusive cache: Lower level contains all entries from upper level. Simple coherency, wastes some capacity due to duplication. Used by most modern CPUs.
Exclusive cache: Levels don't duplicate entries. Higher capacity per footprint, more complex eviction coordination (must check all levels on miss). Used in some HPC architectures.
For software caching: inclusive = simpler validation (check L1, if miss check L2, you're done). Exclusive = more capacity but must check all levels on miss. In multi-level software caches (Redis tiering), evicted entries from L1 should be promoted to L2 before discarding — this coordination prevents thrashing when hot entries accidentally get evicted from all levels.
Memcached divides memory into slab classes (default 1MB each), each containing fixed-size chunks. Slab class 1 might have 64-byte chunks, class 2 with 128-byte chunks, up to class 42 with 1MB chunks. When storing an item, Memcached finds the smallest slab class that fits the value size.
Memory waste occurs in two ways: (1) Internal fragmentation: if your average value is 200 bytes but slab class uses 128-byte chunks, items don't fit and Memcached uses the next class up (256 bytes), wasting 56 bytes per item. At scale, this adds up. (2) External fragmentation: total free memory exists but in wrong slab classes — slab 10 (512 bytes) is full while slab 15 (10KB) is mostly empty.
The `slabs reassign` command can rebalance slab classes in production but requires careful planning. For capacity planning, profile your actual value size distribution and configure `max_chunk_size` and growth factor accordingly. Redis uses a similar approach with its memory allocator (jemalloc) but abstracts this less visibly.
Cache-aside (lazy loading): Application checks cache first, on miss reads from DB and populates cache. Write goes to DB first, then invalidates cache. Pros: only popular data fills cache, no wasted writes on one-time data. Cons: first request always slow (cold miss), potential stale data if write fails before cache invalidation. Best for: read-heavy workloads where most data is never accessed.
Read-through: Cache automatically loads data from storage on miss — application only talks to cache, cache handles loading transparently. Pros: simpler application code. Cons: cache must understand storage format, adds complexity in cache layer. Best for: when you control both cache and storage and want transparent caching.
Write-through: Every write updates cache AND storage simultaneously. Pros: cache always consistent with storage, no data loss on crash. Cons: higher write latency (synchronous), cache fills with one-time writes. Best for: write-heavy workloads requiring strong consistency (financial, inventory).
Cache-aside is the most common pattern in web applications (e.g., Django cache framework, Spring Cache). Read-through is common in content delivery networks and OS page caches.
Redis LFU counters are 8-bit, ranging 0-255. When an item is accessed very frequently (e.g., a top-100 hot key with millions of hits), its counter saturates at 255 and stops growing. Meanwhile, a key with counter 254 will be evicted over one with 255, even if the 254 key deserves to stay more. This breaks LFU's ranking accuracy.
`lfu-decay-time` controls how often counters are halved. Default is 1 minute. Setting it too short (e.g., 1) means occasional accesses keep items alive too long (a key accessed once stays at 0.5 → 0.25 → ... and gets re-accessed before hitting 0). Setting it too long (e.g., 30) means genuinely popular items lose their ranking too quickly after a lull.
For content with week-long popularity windows (e.g., video streaming, news), use decay times in the 5-15 minute range. For rapidly changing trending content (social media, live events), use aggressive decay (1-3 minutes). Monitor `OBJECT freq` on known hot keys to verify counters are decaying as expected. Consider using LRU for workloads where item popularity changes rapidly rather than relying on frequency counting.
Working set calculation: 5,000 unique tokens/sec × 100 bytes × 900 seconds (15 min TTL) = 450MB working set. A 1GB cache should theoretically hold this comfortably.
But with LRU, hit rate depends on access distribution. If access is uniform: probability of a request being for a token in cache = cache_size / working_set = 1024MB / 450MB > 1 = 100% (everything fits). However, in practice: (1) LRU with maxmemory-samples 5 may evict slightly-recently-used tokens. (2) Key name overhead (~50 bytes per key in Redis) means 5M tokens × 50 bytes = 250MB just for key names. (3) With 100-byte values, effective capacity is ~750MB for values, leaving some headroom.
Expected hit rate with LRU: if access is uniform and working set fits, ~95%+ hit rate. If access is Zipfian (some tokens hit much more frequently), LRU will naturally keep popular tokens. If using LFU, tokens accessed frequently will be rewarded and maintain high hit rates. With TTL of 15 minutes and 5,000 TPS, token churn is manageable.
For distributed rate limiting on Redis Cluster with 50K RPS: (1) Sliding window counter: each request increments a key `ratelimit:{tenant}:{window_timestamp}` using INCR with EXPIRE on the window. Cluster partitions by key hash, so all operations for a tenant must hit the same node — use hashtag `{tenant}` in the key.
(2) Token bucket with Redis MULTI/EXEC: atomically decrement tokens and check remaining. Requires single-key access for atomicity.
(3) Architecture: since 50K RPS exceeds single Redis node capacity (~30-50K ops/sec depending on payload), partition by tenant across multiple nodes. Use a proxy layer (e.g., Redis Cluster proxy, or application-level routing) to route rate limit keys for each tenant to the same node.
(4) Alternative: dedicated rate-limit Redis: co-locate rate limiting on a single powerful node (e.g., Redis on a high-CPU instance) separate from your cluster. 50K RPS is well within single-node capacity with pipelining.
Key design: ensure atomicity by keeping all rate-limit operations for a tenant on one node. For very high throughput, consider local in-memory rate limiting with periodic Redis sync for distributed enforcement.
Further Reading
Official Documentation
- Redis Memory Optimization — Deep dive on Redis memory management and eviction policies
- Memcached Wiki — Official memcached documentation including slab allocator internals
- AWS ElastiCache Best Practices — Cloud-native cache deployment guidance
Papers and Technical References
- Mythical PYR Scale: LRU vs LFU — Academic comparison of LRU and LFU in real-world workloads
- ARC: A Self-Tuning, Low Overhead Replacement Cache — IBM paper on adaptive replacement cache combining LRU and LFU
- Redis LRU Approximation — How Redis implements approximate LRU with
maxmemory-samples
Tools and Utilities
- redis-cli —big-Keys — Identify large keys that consume memory
- MEMORY USAGE — Check memory footprint of specific keys
- memkeys — Real-time Memcached key monitoring tool
- cachegrand — High-performance modern cache with native LRU/LFU support
Related Blog Posts
- Caching Strategies — How to use caches effectively with these eviction policies
- Redis & Memcached — Implementing these policies in real caching engines
- Distributed Caching — Scaling eviction across multiple nodes
- System Design Fundamentals — Capacity planning, consistency, and the foundations behind cache design
Books
- Designing Data-Intensive Applications by Martin Kleppmann — Chapter on caching, databases, and trade-offs
- Redis in Action by Josiah Carlson — Practical Redis including advanced eviction patterns
- Building Microservices by Sam Newman — Caching patterns in distributed systems
Conclusion
Pick your eviction policy based on your access patterns. LRU is the default for most use cases because temporal locality is common. LFU rewards consistency. FIFO and random are mostly for specialized cases or as baselines.
The biggest mistake people make is treating eviction policy as a one-time decision. Monitor your hit rate. If it drops, your access patterns might have changed and a different policy might serve you better.
Category
Related Posts
Bloom Filters: Space-Efficient Probabilistic Data Structure
How Bloom filters provide memory-efficient membership testing with configurable false positive rates for caches, databases, and distributed systems.
Cache Patterns: Thundering Herd, Stampede Prevention, and Cache Warming
A comprehensive guide to advanced cache patterns — thundering herd, cache stampede prevention with distributed locking and probabilistic early expiration, and cache warming strategies.
Distributed Caching: Scaling Cache Across Multiple Nodes
A comprehensive guide to distributed caching — consistent hashing, cache sharding, replica consistency, cache clustering, and handling the unique challenges of multi-node cache environments.