Redis vs Memcached: Choosing an In-Memory Data Store
A comprehensive comparison of Redis and Memcached — data structures, persistence, clustering, Lua scripting, pub/sub, and guidance on when to choose each.
Redis vs Memcached: Choosing an In-Memory Data Store
Introduction
Both sit in front of your database and cache frequently accessed data in memory. Developers often use them interchangeably without understanding the differences. The differences matter: Redis is a data structure server that also happens to support caching. Memcached is a caching engine with a simpler data model. That distinction shapes what you can build with them and how you debug them at 2am.
This is not a “both are good” comparison. I will tell you when each one makes sense.
Core Concepts
Memcached stores strings and nothing but strings. You give it a key, you get back a value. That is the whole API.
Redis supports strings, lists, hashes, sets, sorted sets, bitmaps, hyperloglogs, geospatial indexes, and streams. It can act as a cache, a session store, a message broker, a rate limiter, and a real-time analytics engine. Note that volatile-lru and allkeys-lru use an approximated LRU algorithm (sampled LRU), not true LRU — Redis picks a random set of keys and evicts the least recently used among them. This is a memory-efficient approximation. volatile-lru applies the same sampled eviction only to keys with a TTL set, while allkeys-lru applies it across all keys.
# Memcached: everything is a string
memcached.set("user:123", json.dumps(user_data))
user_data = json.loads(memcached.get("user:123"))
# Redis: native data structures
redis.hset("user:123", mapping=user_data)
user_data = redis.hgetall("user:123")
Performance depends on what you are doing with them.
Redis Data Structures
Strings
Both handle simple string values. Redis just has more ways to manipulate them.
# Memcached
set key "value"
get key
# Redis
set key "value"
get key
# Redis extras
append key " more" # Append to existing
incr count # Atomic increment
decr count # Atomic decrement
setrange key 0 "re" # Overwrite bytes
getrange key 0 3 # Substring retrieval
Lists
Memcached does not have lists. Redis does.
# Redis lists: ordered, push/pop from either end
redis.lpush("queue:jobs", "job1", "job2", "job3")
redis.rpop("queue:jobs") # Returns "job1" (oldest)
redis.lrange("queue:jobs", 0, -1) # Get all
# Common use: recently viewed items, job queues, activity logs
redis.lpush("user:123:views", product_id)
redis.ltrim("user:123:views", 0, 19) # Keep last 20
Sets and Sorted Sets
Memcached has no sets. Redis has both.
# Redis sets: unique, unordered
redis.sadd("user:123:likes", "product1", "product2", "product3")
redis.smembers("user:123:likes")
redis.sismember("user:123:likes", "product1") # O(1) check
# Redis sorted sets: scored sets for leaderboards, priorities
redis.zadd("leaderboard", {"player1": 100, "player2": 200, "player3": 150})
redis.zrevrange("leaderboard", 0, 9, withscores=True) # Top 10
redis.zrank("leaderboard", "player2") # Get rank
Hashes
Memcached has no hashes. Redis does.
# Redis hashes: objects without serialization overhead
redis.hset("user:123", "name", "Alice", "email", "alice@example.com")
redis.hget("user:123", "name") # "Alice"
redis.hgetall("user:123") # All fields
# vs Memcached requiring JSON serialization
memcached.set("user:123", json.dumps({"name": "Alice", "email": "..."}))
Eviction Policies
Both support similar eviction policies when memory is full.
# Memcached eviction
# -no-eviction: return error on out-of-memory
# -allkeys-lru: evict least recently used of all keys
# -allkeys-random: evict random
# -volatile-lru: evict LRU of keys with TTL
# -volatile-ttl: evict shortest TTL
# -volatile-random: evict random of keys with TTL
memcached -o expire_counter,merge_threshold,ev=volatile-lru
# Redis maxmemory policies
# allkeys-lru, allkeys-random, allkeys-lfu, allkeys-ttl
# volatile-lru, volatile-lfu, volatile-random, volatile-ttl
# noeviction
maxmemory 100mb
maxmemory-policy allkeys-lru
The policies are nearly identical. Redis adds LFU (Least Frequently Used) which Memcached does not have.
Cache Invalidation Strategies
“Cache aside” (lazy loading) is the most common pattern, but there are several strategies with different trade-offs.
Write-Through Cache
Data is written to both cache and database synchronously. Reads always hit cache.
def set_user(user_id, data):
# Write to cache AND database together
redis.set(f"user:{user_id}", json.dumps(data))
db.users.update(user_id, data)
return data
def get_user(user_id):
# Cache is always fresh
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Fallback only if cache miss
data = db.users.get(user_id)
redis.set(f"user:{user_id}", json.dumps(data))
return data
Pros: Strong consistency — cache always matches database. Simple read path (always read from cache).
Cons: Write latency is higher (two writes). Cache can be stale if database write succeeds but cache write fails (use transactions).
Best for: Write-heavy workloads where data must always be current, configuration data, reference data.
Write-Behind Cache (Write-Back)
Data is written to cache only. Database is updated asynchronously.
def set_user(user_id, data):
# Write to cache only — fast
redis.set(f"user:{user_id}", json.dumps(data))
# Async write to database via queue
queue.enqueue("db:users:upsert", {"user_id": user_id, "data": data})
return data
Pros: Very fast writes. Reduces database load during write spikes.
Cons: Risk of data loss if cache fails before database is updated. Requires additional infrastructure (write queue, retry logic). Cache can be inconsistent across nodes during propagation.
Best for: Write-heavy workloads where occasional data loss is acceptable (metrics, analytics, leaderboards), session data.
Cache-Aside (Lazy Loading)
The application manages cache explicitly — reads populate cache on miss, writes update database and invalidate cache.
def get_user(user_id):
# Read from cache first
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Cache miss — load from database
data = db.users.get(user_id)
# Populate cache for next time
redis.setex(f"user:{user_id}", 3600, json.dumps(data))
return data
def set_user(user_id, data):
# Write to database
db.users.update(user_id, data)
# Invalidate cache — do not update it
redis.delete(f"user:{user_id}")
return data
Pros: Simple. Cache never has stale writes (invalidated on update). Read-heavy workloads naturally populate cache. No write amplification.
Cons: Cache miss penalty — first read after invalidation or startup is slow. “Thundering herd” problem on popular keys.
Best for: Read-heavy workloads, most general-purpose caching, when database is source of truth.
TTL and Invalidation Details
TTL Selection Criteria
Choosing TTL is a trade-off between staleness and cache efficiency.
| Data Type | Recommended TTL | Rationale |
|---|---|---|
| User sessions | 24-48 hours | Sessions have natural expiry; 48h covers timezone gaps |
| Configuration data | 5-15 minutes | Changes need to propagate; short enough to recover |
| API responses (public) | 1-5 minutes | Fresh data important; short TTL limits staleness |
| Analytics / aggregates | 15-60 minutes | Tolerates some staleness; longer = better hit rate |
| Product catalog | 1-24 hours | Updates are infrequent; long TTL = better hit rate |
| Leaderboards | 30-300 seconds | Needs near-real-time accuracy; short TTL required |
Quick heuristic: Pick TTL based on how stale your data is allowed to be. If you cannot answer “how stale can this be?”, set it to 60 seconds or less.
Invalidation Patterns
Delete vs Expire:
# Delete: immediate removal
redis.delete("user:123")
# Expire: time-based removal
redis.setex("temp:data", 300, value) # Auto-removes in 5 min
# Use expire for: temporary data, cached computations
# Use delete for: data that changed, explicit updates
Invalidation on update vs refresh on read:
# Option A: Invalidate on write (cache-aside)
def update_user(user_id, data):
db.users.update(user_id, data)
redis.delete(f"user:{user_id}") # Next read fetches fresh
# Option B: Refresh on write (write-through variant)
def update_user(user_id, data):
db.users.update(user_id, data)
redis.setex(f"user:{user_id}", 3600, json.dumps(data)) # Write new value
# Option A (invalidate) is preferred because:
# - Avoids write amplification (only invalidates, does not rewrite)
# - Simpler: no need to serialize and store on every write
# - Cache stays consistent if write fails (delete does not happen)
Avoiding the Thundering Herd
When a popular cache key expires, many requests simultaneously hit the database.
# BAD: Many requests see cache miss, all hit database
def get_product(product_id):
cached = redis.get(f"product:{product_id}")
if not cached:
data = db.products.get(product_id) # ALL requests hit DB
redis.setex(f"product:{product_id}", 300, json.dumps(data))
return json.loads(cached) if cached else data
# GOOD: Single request refreshes, others wait
import threading
import time
def get_product_safe(product_id, lock_ttl=10):
cached = redis.get(f"product:{product_id}")
if cached:
return json.loads(cached)
lock_key = f"lock:product:{product_id}"
# Try to acquire lock
if redis.set(lock_key, "1", nx=True, ex=lock_ttl):
# We got the lock — refresh from database
data = db.products.get(product_id)
redis.setex(f"product:{product_id}", 300, json.dumps(data))
redis.delete(lock_key)
return data
else:
# Another request is refreshing — wait and retry
time.sleep(0.1)
cached = redis.get(f"product:{product_id}")
if cached:
return json.loads(cached)
return get_product_safe(product_id, lock_ttl) # Retry
Alternative: Probabilistic early expiration (XFetch):
import hashlib
import random
def get_with_xfetch(key, beta=1.0):
"""XFetch: probabilistic early expiration to prevent thundering herd"""
value = redis.get(key)
if value:
# Check if we should refresh early (probabilistic)
ttl = redis.ttl(key)
if ttl > 0:
# Regenerate earlier if: random() < exp(-ttl/beta)
if random.random() < math.exp(-ttl / beta):
# Background refresh (in production, use separate thread/queue)
return value, True # "stale" flag to caller
return value, False
# Usage: serve stale data while refreshing in background
Cache Warming and Cold Starts
When the Cache Starts Cold
When a cache starts empty (restart, deployment, failure), every request hits the database.
# BAD: Cold cache causes database overload at startup
# All 10,000 users hit DB simultaneously after Redis restart
# GOOD: Pre-warm cache before taking traffic
def warm_cache():
"""Run at startup before accepting traffic"""
popular_keys = db.products.get_top_100() # Identify hot data
for product in popular_keys:
redis.setex(f"product:{product.id}", 3600, json.dumps(product))
# Now safe to take traffic
# Better: Progressive warming with rate limiting
def warm_cache_progressive():
keys_to_warm = get_keys_by_priority() # Sort by access frequency
for i, key in enumerate(keys_to_warm):
data = db.fetch(key)
redis.setex(f"cache:{key}", get_ttl_for(key), data)
# Rate limit: 1000 keys per second to avoid overwhelming DB
if i % 1000 == 0:
time.sleep(1)
Keeping Frequently-Used Data Hot
Monitor cache temperature — how often each key is accessed.
# Track key access frequency
def access_key(key):
# Increment access counter (atomic)
redis.hincrby("key:access", key, 1)
return redis.get(key)
# Analyze access patterns weekly
def analyze_access():
# Find keys not accessed in 7 days — low priority for warming
# Find top 1000 accessed keys — priority for staying cached
hot_keys = redis.zrevrange("key:access", 0, 999, withscores=True)
# Set aggressive TTL on hot keys
for key, score in hot_keys:
current_ttl = redis.ttl(f"cache:{key}")
if current_ttl < 3600: # Less than 1 hour
redis.expire(f"cache:{key}", 86400) # Extend to 24 hours
Warming Patterns in Practice
# Pattern 1: Scheduled pre-warming before high-traffic events
def warm_for_black_friday():
"""Run 30 minutes before expected traffic spike"""
# Pre-compute and cache popular product pages
top_products = db.get_products_by_category("popular", limit=500)
for product in top_products:
redis.setex(f"product:{product.id}", 7200, compute_product_page(product))
# Pre-warm user sessions that will be active
active_user_ids = db.get_users_logged_in_recently(limit=10000)
for user_id in active_user_ids:
redis.setex(f"session:{user_id}", 172800, load_user_session(user_id))
# Pattern 2: Proactive caching on database write
def write_with_proactive_cache(user_id, data):
# Write to database
db.users.update(user_id, data)
# Proactively cache the result
redis.setex(f"user:{user_id}", 86400, json.dumps(data))
# Also cache related data
redis.setex(f"user:{user_id}:profile", 86400, json.dumps(data["profile"]))
# Pattern 3: Background refresh for critical keys
def start_background_refresh(key, compute_fn, ttl=300):
"""Refresh key in background before expiry"""
def refresh():
while True:
value = compute_fn()
redis.setex(key, ttl, value)
time.sleep(ttl * 0.8) # Refresh at 80% of TTL
thread = threading.Thread(target=refresh, daemon=True)
thread.start()
Persistence
Memcached: Pure Memory
Memcached is pure memory. It never touches disk. When it restarts, everything is gone.
# Memcached has no persistence options
# Restart = empty cache
This sounds like a drawback, but for pure caching it is fine. Your source of truth is the database anyway.
Redis: Optional Persistence
Redis persists to disk. You can survive restarts without losing data.
# RDB snapshots: point-in-time dumps
save 900 1 # Save if 1 key changed in 900 seconds
save 300 10 # Save if 10 keys changed in 300 seconds
save 60 10000 # Save if 10000 keys changed in 60 seconds
# AOF (Append Only File): every write logged
appendonly yes
appendfsync everysec # fsync every second (balance of speed/safety)
# Or no persistence at all (pure cache mode)
save ""
Redis persistence is configurable. You can turn it off for pure caching or enable it for durability.
Performance and Clustering
Performance
Raw performance depends on your workload. Here is a general comparison:
| Operation | Memcached | Redis |
|---|---|---|
| GET/SET (simple) | Very fast | Fast |
| MGET/MSET (batch) | Faster | Slower (per-key overhead) |
| INCR (atomic counter) | Fast | Very fast |
| Sets/Lists/Hashes | Not supported | Depends on operation |
| Memory efficiency | Better (simple values) | Depends on data structures |
For simple string caching, Memcached often uses less memory per key. For complex data structures, Redis’s overhead is usually worth it.
Redis uses single-threaded execution (one command at a time per connection, but multiple connections). Memcached is multi-threaded. On a single instance, Redis can saturate network bandwidth. Memcached scales better on multi-core for raw throughput.
# Redis pipelining: batch commands to reduce round trips
pipe = redis.pipeline()
for key in keys:
pipe.get(key)
results = pipe.execute() # One round trip for all
Clustering and Distribution
Memcached
Memcached has no native clustering. You shard across instances manually using consistent hashing.
import hashlib
class ConsistentHash:
def __init__(self, servers):
self.servers = servers
self.ring = {}
self.sorted_keys = []
for server in servers:
for i in range(150):
key = f"{server}:{i}"
hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
self.ring[hash_key] = server
self.sorted_keys.append(hash_key)
self.sorted_keys.sort()
def get_server(self, key):
hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
for sorted_key in self.sorted_keys:
if hash_key <= sorted_key:
return self.ring[sorted_key]
return self.ring[self.sorted_keys[0]]
It works. You are just managing the sharding yourself.
Redis
Redis has built-in clustering with automatic sharding.
# Redis Cluster configuration
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 15000
# Automatic sharding, replication, and failover
# Your application sees a single logical database
Redis Cluster partitions keys across nodes automatically. It also supports replication for read scaling and failover.
Capacity Estimation: Memory-per-Key and Cluster Slot Planning
Memory per key differs significantly between Redis and Memcached.
Redis memory breakdown per key (string value):
| Component | Size |
|---|---|
| Key pointer | ~56 bytes (SDS allocator) |
| Value storage | Actual value size |
| Redis object overhead | ~16 bytes |
| Dictionary entry (if in hash) | ~32 bytes |
| Total minimum per key | ~72 bytes + value |
Memcached memory breakdown per key (string value):
| Component | Size |
|---|---|
| Key | Key length |
| Value | Actual value size |
| flags byte | 1 byte |
| CAS token (optional) | 8 bytes |
| Expiry time | 4 bytes |
| Overhead per item | ~25 bytes |
| Total minimum per item | ~25 bytes + key + value |
For a cache with 1 million keys, each storing a 200-byte value:
- Redis string: ~72M overhead + 200M data = ~272M total
- Memcached: ~25M overhead + key_space + 200M data = ~240M + key_space
Memcached wins on simple string workloads by 10-20% memory efficiency. Redis pays the overhead for richer data structures.
Redis Cluster slot planning: 16,384 slots divided across N master nodes. For a 6-node cluster (3 masters + 3 replicas), each master owns ~5,461 slots. Slot ownership determines which node stores which keys. The formula: slot = CRC16(key) % 16384. When planning capacity, ensure each master has headroom — if one master owns 5,461 slots and your average key is 1KB with 100K keys per slot, that node needs roughly 5GB. Plan for 2x headroom.
Memcached cluster sizing: No slots — consistent hashing distributes keys. Target 150-200 virtual nodes per physical node for even distribution. With N nodes and V virtual nodes each, the coefficient of variation (CV) of key distribution should stay below 0.3. Formula: CV ≈ 1/√(N × V). For CV < 0.3 with V = 150, you need N ≥ 12 nodes for even distribution. Fewer nodes means higher variance in distribution.
Comparative Analysis Tables
Cache Invalidation Strategy Comparison
| Strategy | Write Latency | Read Consistency | Data Loss Risk | Complexity | Best For |
|---|---|---|---|---|---|
| Write-Through | High (sync) | Always fresh | None | Low | Write-heavy, consistency-critical |
| Write-Behind | Low (async) | Eventually consistent | High | High | Write spikes, high-throughput |
| Cache-Aside | Low (1 write) | Strong (on invalidation) | Low | Medium | Read-heavy, general purpose |
Redis vs Memcached: When to Use Which
| Factor | Redis | Memcached | Winner |
|---|---|---|---|
| Data structures | Strings, Lists, Sets, Hashes, Sorted Sets | Strings only | Redis |
| Memory efficiency | Higher per-key overhead (~72 bytes + value) | Lower per-key overhead (~25 bytes + value) | Memcached |
| Persistence | RDB snapshots, AOF logs | None (pure memory) | Redis |
| Clustering | Built-in cluster with slots | Client-side consistent hashing | Redis |
| Threading model | Single-threaded (no locks) | Multi-threaded (global lock per operation) | Memcached (throughput), Redis (consistency) |
| Atomic operations | INCR, SETNX, Lua scripts | CAS tokens only | Redis |
| Pub/Sub | Native support | Not supported | Redis |
| Latency | Sub-millisecond, predictable | Sub-millisecond, predictable | Tie |
| Operational complexity | Higher (config, persistence) | Lower (stateless) | Memcached |
| Production maturity | Very mature at scale | Mature | Tie |
Eviction Policy Comparison
| Policy | Redis Support | Memcached Support | Behavior |
|---|---|---|---|
| LRU (Least Recently Used) | allkeys-lru, volatile-lru | allkeys-lru, volatile-lru | Evict least recently accessed |
| LFU (Least Frequently Used) | allkeys-lfu, volatile-lfu | Not supported | Evict least frequently accessed |
| TTL | volatile-ttl | volatile-ttl | Evict shortest TTL first |
| Random | allkeys-random, volatile-random | allkeys-random, volatile-random | Random eviction |
| No eviction | noeviction | noeviction | Return error on OOM |
Redis LFU advantage: LFU tracks frequency, not just recency. For data that is accessed frequently for a period then becomes cold, LFU prevents it from being evicted by a single recent access spike. Memcached does not have this capability.
Connection Management Trade-offs
| Aspect | Single Connection | Connection Pool | Presized Pool |
|---|---|---|---|
| Setup cost | High (connect latency) | Medium (pool creation) | Low |
| Concurrent requests | Poor (blocks) | Good | Best |
| Resource usage | Low (1 socket) | Medium (N sockets) | Medium |
| Complexity | Simple | Moderate | Simple |
| Best for | Scripts, short-lived | Web applications | High-throughput |
When to Use / When Not to Use
Memcached makes sense for simple string caching — HTML fragments, API responses, session data — where maximum memory efficiency matters and you do not need complex data structures. It scales horizontally via consistent hashing, and the operational surface is small. Use it for database query results that fit naturally in key-value form, and for things that do not change often and benefit from sub-millisecond access.
Redis makes sense when you need lists, sets, sorted sets, or hashes; when you want optional persistence; when you need atomic counters for rate limiting or distributed locks; when you need pub/sub for real-time features or chat; when you want built-in clustering; or when you are building leaderboards, job queues, or caching with complex data access patterns. Lua scripting adds atomic multi-step operations without race conditions.
A Practical Decision Framework
Do you need anything beyond simple string key-value?
YES -> Redis
NO -> Does memory efficiency matter more than features?
YES -> Memcached
NO -> Redis (for easier operations and clustering)
If you are not sure, start with Redis. The extra memory usage is negligible for most workloads. If you later find memory is tight and profiling shows Memcached is meaningfully better, switch.
Production Failure Scenarios and Trade-off Analysis
| Failure | Impact | Mitigation |
|---|---|---|
| Redis/Memcached OOM | Cache returns errors; application falls back to database | Monitor used_memory/maxmemory ratio; set alerts at 70% threshold |
| Redis fork for RDB save | Brief blocking during fork; memory doubles during copy-on-write | Schedule RDB saves during low-traffic; use AOF instead for persistence |
| Memcached restart | All data lost immediately (no persistence) | Design for cold cache; implement application-level cache warming |
| Redis replica lag | Reads from replica may return stale data | Monitor replication_backlog_histlen; read from primary for consistency-critical data |
| Connection pool exhaustion | Requests timeout waiting for connection | Size connection pool appropriately; implement request queuing with timeout |
| Single-threaded Redis blocking | Long commands block all other commands | Avoid KEYS, SMEMBERS on large sets; use pipeline/batch operations |
| Memcached multi-thread contention | High CPU under heavy load | Scale horizontally with consistent hashing; consider Redis for complex workloads |
Detailed Failure Scenarios
Case 1: Redis OOM During Peak Traffic
What happened: A Redis instance reached maxmemory during a flash sale. Eviction policy was noeviction. Redis started returning errors instead of serving requests.
Root cause: The maxmemory-policy was set to noeviction (return error on OOM) instead of allkeys-lru. Additionally, the application was not handling cache errors gracefully — it failed fast instead of falling back to the database.
Impact: 12% of requests failed during a 45-minute window. The database was under-provisioned for the fallback load and also started timing out.
Lesson learned: Always use an eviction policy that allows Redis to keep serving. Implement circuit breakers so the application falls back to the database gracefully when cache errors spike. Monitor evicted_keys and used_memory metrics.
Case 2: Memcached Restart Storm
What happened: A Memcached node was restarted for a configuration update. Within 90 seconds, the database was overwhelmed by cold-cache requests from all application servers simultaneously.
Root cause: No cache warming strategy. All 50 application instances started with empty local caches and hit the database for the same popular keys simultaneously. The database had no protection against this concurrent access pattern.
Impact: Average response time jumped from 15ms to 8,400ms. Database CPU hit 100%. The restart took 3 minutes longer than expected because the database was too overloaded to respond to health checks.
Lesson learned: Implement cache warming before taking a cache node out of rotation. Use consistent hashing with virtual nodes so individual key popularity does not spike on single nodes after restart. Consider using a local L1 cache (in-memory LRU) in front of Memcached to absorb cold-start load.
Case 3: Redis Pipeline Blocking on Large Set
What happened: A developer ran redis-cli --bigkeys on a production Redis instance during peak hours to find memory-heavy keys.
Root cause: The --bigkeys flag performs a full SCAN and evaluates every key’s memory footprint. On a 50GB Redis instance with millions of keys, this command consumed 15 seconds of CPU and blocked all other commands during that window.
Impact: P99 latency spiked from 5ms to 12,000ms. The application saw 200+ connection timeouts. The on-call engineer spent 45 minutes diagnosing why Redis was suddenly unresponsive.
Lesson learned: Never run memory introspection commands (--bigkeys, MEMORY USAGE on unknown keys, KEYS *) on production Redis instances. Use SCAN with COUNT limits for any introspection, and always run it during maintenance windows. For memory analysis, use Redis MEMORY STATS and INFO memory instead.
Common Pitfalls and Anti-Patterns
1. Using KEYS Command in Production
The KEYS command scans all keys and blocks Redis. Never use it in production.
# BAD: KEYS blocks Redis for seconds
all_keys = redis.keys("user:*")
# GOOD: Use SCAN for production
cursor = 0
while True:
cursor, keys = redis.scan(cursor, match="user:*", count=100)
process(keys)
if cursor == 0:
break
2. Storing Large Values Without Compression
Large values consume memory disproportionately and slow down operations.
# BAD: Storing large uncompressed data
redis.set("page:123", large_html_content) # 500KB+ per page
# GOOD: Compress large values
import zlib
compressed = zlib.compress(large_html_content.encode())
redis.setex("page:123", 3600, compressed)
3. Not Using Connection Pooling
Each operation creating a new connection adds overhead.
# BAD: New connection each time
def get_user(user_id):
r = redis.Redis(host='localhost', port=6379) # Connection every call
return r.get(f"user:{user_id}")
# GOOD: Reuse connection
pool = redis.ConnectionPool(host='localhost', port=6379, max_connections=50)
def get_user(user_id):
r = redis.Redis(connection_pool=pool)
return r.get(f"user:{user_id}")
4. Ignoring Memcached Persistence Limitations
Memcached has no persistence. Data is lost on restart.
# BAD: Assuming Memcached persists data
memcached.set("session:123", session_data)
# ... server restarts ...
session = memcached.get("session:123") # None - data gone
# GOOD: Design for cold start
session = memcached.get("session:123")
if not session:
session = load_from_database() # Always have fallback
memcached.set("session:123", session, time=3600)
5. Using Redis Single Instance for Everything
Redis single-threaded nature means CPU-bound operations block everything.
# BAD: CPU-heavy operation in Redis
# This blocks all other commands
redis.sort("large-set") # O(N log N) - blocks Redis
# GOOD: Move CPU work to application
data = redis.lrange("large-list", 0, -1)
sorted_data = sorted(data) # Application handles sorting
Quick Recap
Redis offers data structures — lists, sets, hashes — that Memcached cannot match. Memcached is more memory-efficient for simple strings. Redis persistence (RDB/AOF) survives restarts; Memcached does not. Redis Cluster provides automatic sharding; Memcached requires client-side sharding. Both support LRU/LFU eviction but Redis LFU is more sophisticated. Redis single-threaded is a feature — no race conditions — but CPU-heavy operations block everything.
Best Practices Summary
Redis: use connection pooling (never a new connection per request); set maxmemory and an eviction policy like allkeys-lru for most caching workloads; never run KEYS, SMEMBERS, or SORT on large sets; use pipelining for batch operations; enable slow log monitoring at 10ms threshold; use hashes for objects instead of JSON strings; set reasonable TTLs; rename dangerous commands in production (rename-command FLUSHDB ""); monitor memory fragmentation (mem_fragmentation_ratio > 1.5 indicates problems); use Lua scripts for atomic multi-step operations.
Memcached: use consistent hashing for sharding with 150-200 virtual nodes per physical node; use consistent key naming with service prefixes (users:123, sessions:abc); store serialized data efficiently with MessagePack or Protocol Buffers instead of JSON for 20-30% smaller payloads; set appropriate chunk size (default 1MB may waste memory for small values); monitor evictions — high rates indicate cache is too small or TTLs are misconfigured; prefer UDP for get operations in read-heavy workloads; use multi-get for batch operations.
Observability Checklist
Security Checklist
- Enable authentication — Redis 6+ supports ACLs. Memcached supports SASL authentication. Never run without auth in production.
- Bind to internal IPs only —
bind 127.0.0.1orbind 10.0.0.0/8to prevent unauthorized access. No public IP exposure. - Encrypt in transit — Use TLS for Redis and Memcached if crossing network boundaries. Redis 6+ has native TLS support.
- Limit commands — Rename dangerous commands:
rename-command FLUSHDB ""rename-command CONFIG ""rename-command KEYS "". - Set
maxmemory— Prevent cache from consuming all available RAM and causing system instability. - Use firewall rules — Restrict access to cache ports (6379 for Redis, 11211 for Memcached) to application servers only.
Metrics to Track
Redis:
# Core metrics via Redis INFO
INFO memory # used_memory, maxmemory, mem_fragmentation_ratio
INFO stats # total_commands_processed, keyspace_hits, keyspace_misses
INFO replication # master_link_status, slave_read_only, replication_lag
INFO clients # connected_clients, blocked_clients
# Calculate hit rate
# hit_rate = keyspace_hits / (keyspace_hits + keyspace_misses)
Memcached:
# Stats command
stats
# Items: curr_items, total_items, evictions
# Memory: bytes, limit_maxbytes
# Hit rate: get_hits, get_misses
# Calculate hit rate
# hit_rate = get_hits / (get_hits + get_misses)
Logs to Capture
import structlog
import time
logger = structlog.get_logger()
class CacheMetrics:
def __init__(self, redis_client):
self.redis = redis_client
self.start_time = time.time()
def track_operation(self, operation, key, hit=True):
logger.info("cache_operation",
operation=operation,
key=key,
cache_hit=hit,
latency_ms=self._measure_latency()
)
def log_memory_pressure(self):
info = self.redis.info('memory')
used = info['used_memory']
maxmem = info['maxmemory']
if maxmem > 0 and used / maxmem > 0.8:
logger.warning("cache_memory_critical",
used_mb=used / 1024 / 1024,
max_mb=maxmem / 1024 / 1024,
fragmentation=info.get('mem_fragmentation_ratio', 1))
Interview Questions
Memcached wins on raw memory efficiency for simple strings because it has minimal per-key overhead. For Redis in memory-constrained environments, use hashes instead of string serialization — HSET user:123 name Alice email alice@example.com stores all fields in one Redis key with shared overhead, versus one key per field or JSON serialization in a string key. Enable maxmemory-policy allkeys-lru and set maxmemory conservatively. Use MEMORY USAGE command to identify large keys. Consider using ziplist encoding for small hashes and lists to compress memory. For pure string caching where memory is critical, Memcached remains the pragmatic choice.
Redis is single-threaded, so a single long-running command blocks everything. The slowlog get 10 command reveals which commands are taking >10ms. Common culprits: SORT on large sets, KEYS pattern scans (never use in production), SMEMBERS on large sets, ZRANGEBYSCORE on large sorted sets without LIMIT, or FLUSHDB during peak traffic. For complex operations on large datasets, move the work to the application side — fetch the raw data and process it there. Also check for fork fatigue if using RDB persistence — the fork itself is cheap but if the parent process is CPU-bound, latency spikes occur during fork.
RDB snapshots are point-in-time dumps — compact, fast to restore, but you lose data since the last snapshot if the instance crashes. AOF logs every write operation — better durability, configurable fsync intervals, but larger files and slower writes. For a pure cache where the database is the source of truth, RDB is usually sufficient — if Redis restarts with an empty cache, the application repopulates from the database. Enable AOF only when you need durability guarantees for cached data, or when restart time matters more than storage overhead. The appendfsync everysec setting is a good balance — worst case 1 second of data loss but much faster than always.
Memcached's multi-threaded architecture uses a global lock on the cache for each operation. If your workload performs very small gets and sets, the lock contention overhead exceeds the parallelism benefit. High CPU with low throughput is the signature of lock contention in Memcached. Workarounds: use connection pooling to multiplex connections (more clients means better parallelism), partition your keys across multiple Memcached instances to reduce per-instance lock contention, or switch to Redis where single-threaded execution eliminates lock contention entirely for most workloads. Profile with stats command — look at lock_ratio or wait_ratio if available in your Memcached version.
A token bucket rate limiter in Redis uses INCR and EXPIRE: INCR increments a counter on each request, EXPIRE sets a TTL equal to the time window. If the count exceeds the limit, reject the request. Fixed window uses SETEX with key as rate_limit:{window} where window is timestamp rounded to the interval. Sliding window uses a sorted set with timestamps as scores — more accurate but requires ZREMRANGEBYSCORE and ZCARD. Token bucket allows burstiness within limits; fixed window is simpler but has boundary spikes; sliding window is most accurate but most expensive. For distributed rate limiting, Redis atomic operations are essential — Lua scripts ensure check-and-increment is atomic.
A 30-second replica lag means any read from the replica returns data that is up to 30 seconds stale. For most use cases this is acceptable; for leaderboards, like counts, or session data it can cause inconsistency. Risks: users might see outdated counts, missing likes, or stale leaderboard positions. Mitigation: monitor replication_backlog_histlen and master_link_down_since_seconds. If lag is caused by network or master load, fix the root cause first. For read-heavy workloads that can tolerate some staleness, use replica lag thresholds in your application — read from primary if lag exceeds your SLA. For consistency-critical reads (like financial data), always read from the primary. Consider read-timeouts on replicas — if the replica cannot keep up, it is better to fail the read than serve stale data.
The thundering herd problem occurs when a popular cache key expires or a cache becomes empty, and thousands of requests simultaneously try to refresh the same key from the database. Both Redis and Memcached suffer from this because they are typically used as shared caches. Prevention strategies: (1) probabilistic early expiration (XFetch) — randomly refresh keys before they expire based on expected access frequency; (2) distributed locks — only one request refreshes the cache, others wait and retry; (3) cache warming — proactively populate cache before expected traffic spikes; (4) merge responses — if multiple requests arrive for the same key, batch them into one database query and return the result to all. For Memcached specifically, using local in-process L1 cache in front of it absorbs most thundering herd patterns because hot keys stay in process memory.
Redis Cluster uses hash slot distribution: 16,384 slots calculated as CRC16(key) % 16384. Each master node owns a subset of slots. When you add or remove nodes, Redis migrates slots (typically 1/16th of keys per node). This is automatic and well-designed. Memcached uses consistent hashing with virtual nodes (typically 150-200 per physical node). When you add or remove nodes, only K/N keys are remapped where K is total keys and N is nodes — similar migration cost to Redis Cluster. Key difference: Redis Cluster requires at least 3 master nodes and resharding triggers brief unavailability during slot migration. Memcached consistent hashing is simpler — no special nodes required, just hash ring membership. For Redis: use for complex workloads needing replication, multiple data types, and built-in HA. For Memcached: use when you need simple, stateless sharding and can manage failover at the application layer.
Strings store serialized objects (JSON, msgpack). A single string key holds the entire object. Hashes store field-value pairs directly — each field is a separate key in Redis's internal dict. Choose Strings when: the entire object is always read or written as a whole, you need to store pre-serialized data from external systems, or you want to use string operations like APPEND or INCR on numeric fields. Choose Hashes when: you frequently read or write individual fields (partial access patterns), you want to avoid serialization/deserialization overhead, or you want Redis to manage field expiration independently. Memory: for objects with few fields (< 10), hashes have less overhead because field names and values share dict entry overhead. For large objects with many fields, hashes can be more memory-intensive than JSON in a string because each field is a separate Redis key-value. Benchmark your specific access patterns. Rule of thumb: if you access < 50% of fields at a time, hashes usually win.
A common pattern is L1 (local in-memory) + L2 (Memcached) + L3 (Redis) + Database. Memcached handles simple string caching for page fragments, rendered HTML, and API responses that benefit from its memory efficiency. Redis handles complex data (sorted sets for leaderboards, lists for queues, hashes for user objects), session storage with TTL, pub/sub for real-time features, and rate limiting with atomic operations. Concretely: use Memcached for cached database query results that are simple key-value at the page level (e.g., product:123 → HTML fragment). Use Redis for anything requiring data structures (like sets for "users who liked this post"), session data with TTL, distributed locks (SETNX), rate limiting counters, and pub/sub channels. The architectural principle: Memcached is a dumb, fast cache for immutable or rarely-changed data. Redis is a data store that happens to cache well. Start with Redis for everything; add Memcached only when memory is demonstrably constrained and profiling shows Memcached is meaningfully more efficient for specific workloads.
Choose Memcached when you need pure string key-value caching and memory efficiency is critical. If your data fits naturally as key-value pairs, you do not need atomic counters, data structures, or persistence, and your team prefers operational simplicity, Memcached wins. It is also the right choice when horizontal scaling via consistent hashing is acceptable and you do not need built-in clustering. For everything beyond simple string caching — sorted sets, hashes, pub/sub, Lua scripting, or persistence — use Redis.
Redis addresses cache stampede (thundering herd) through several mechanisms. Distributed locking via SETNX ensures only one request refreshes a hot key while others wait. Probabilistic early expiration (XFetch) randomly refreshes keys before they expire based on expected access frequency, preventing mass expiration events. WAIT command can be used for read-your-writes consistency. Application-level patterns like merging concurrent requests for the same key into a single database query also help. Memcached lacks built-in stampede prevention — use local in-process L1 cache in front of Memcached to absorb hot key access spikes.
Write-through writes synchronously to both cache and database — strong consistency, simple reads, but higher write latency and potential write amplification. Write-behind writes to cache only and async flushes to database — fast writes, handles spikes, but risks data loss if cache fails before database write and requires additional infrastructure (write queue, retry logic). Cache-aside (lazy loading) is the most common pattern: writes go directly to database, cache is invalidated on write; reads populate cache on miss. Best for read-heavy workloads where the database is the source of truth.
Estimate by measuring your working set size: total unique keys accessed within a typical traffic window multiplied by average key size. For Redis, account for ~72 bytes per-key overhead plus value size. For Memcached, ~25 bytes per-item overhead plus key and value. Target cache size so your working set fits with 20-30% headroom for traffic spikes. Monitor eviction rates — evictions > 1% of requests indicate cache is too small or TTLs are misconfigured. Use MEMORY USAGE in Redis to identify large keys. For Memcached, stats shows curr_items and evictions. Size for peak + 20%, not average load.
For Redis: used_memory/maxmemory ratio (alert at 70%), keyspace_hits/keyspace_misses for hit rate, evicted_keys and expired_keys for eviction pressure, replication_backlog_histlen for replica lag, mem_fragmentation_ratio for memory fragmentation, slowlog for commands > 10ms, and connected_clients for connection pool pressure. For Memcached: get_hits/get_misses for hit rate, curr_items and evictions for cache pressure, bytes/limit_maxbytes for memory usage, and wait_ratio if available for lock contention.
Consistent hashing distributes keys across cache nodes so that adding or removing a node remaps only K/N keys (where K is total keys and N is nodes), minimizing cache misses during topology changes. Memcached uses consistent hashing with virtual nodes (150-200 per physical node) for this purpose. Redis Cluster instead uses hash slots (16,384 total, calculated as CRC16(key) % 16384) and migrates slots when nodes are added or removed. Both approaches limit the blast radius of node additions or failures, but Redis Cluster automates the process while Memcached requires client-side implementation of the consistent hashing ring.
Key failure modes: OOM (cache returns errors, application falls back to DB) — mitigate with maxmemory-policy allkeys-lru and 70% memory alerts. Cold start (cache restarts empty, DB overwhelmed) — mitigate with cache warming before taking nodes out of rotation. Connection pool exhaustion (requests timeout) — size pool appropriately, implement request queuing with timeout. Replica lag (stale reads) — monitor replication_backlog_histlen, read from primary for consistency-critical data. Redis fork blocking (RDB saves pause commands) — schedule RDB during low-traffic windows, use AOF everysec. Single-threaded blocking (long commands block all others) — avoid KEYS, SMEMBERS on large sets in production. Memcached lock contention (high CPU, low throughput under load) — partition across more instances, use connection multiplexing.
LRU (Least Recently Used) evicts based on access recency — the most recently accessed key survives longest. LFU (Least Frequently Used) evicts based on access frequency — the least frequently accessed key is removed first. The difference matters for workloads where data is accessed frequently for a burst period, then becomes cold. With LRU, a single recent access can keep a key alive even if it has not been touched in days. With LFU, a key accessed 1,000 times last week but not this week will be evicted before a key accessed 10 times this week. Use LFU when: your working set changes gradually (popular items stay popular), you want to preserve frequently-accessed data during traffic spikes, or you need to prevent cold data from being retained by one-time access events. Redis implements LFU with LFU_DECAY_TIME (how often to decrement counters) and LFU_INIT_VAL (initial frequency value). Memcached does not support LFU.
Redis Pipeline batches multiple commands into a single network round trip — the client sends N commands, Redis processes them sequentially, client receives N responses. No atomicity guarantee (other commands from other clients can interleave). Best for: improving throughput on bulk reads/writes when each command is independent. MGET/MSET are native batch commands — MGET key1 key2 key3 retrieves multiple keys in one command, which is more efficient than pipelining individual GETs because Redis processes it internally as a single operation. Lua scripts are atomic — Redis executes the entire script without interleaving other commands, making them safe for read-check-write patterns. Lua scripts have startup overhead (script compilation) and cannot use blocking commands inside them. Trade-off summary: pipeline for throughput on independent ops, MGET/MSET for native batch efficiency, Lua for atomic multi-step operations.
Decision framework: start with Redis for new projects. If the team has operational experience with Memcached and the workload is purely simple string key-value, stay with Memcached. If you need data structures (sets, sorted sets, hashes, lists), atomic counters, pub/sub, persistence, or built-in clustering, move to Redis. Migration risks: data loss during transition if both caches are running simultaneously (cache keys diverge); increased operational complexity — Redis requires monitoring for fork times, AOF/RDB trade-offs, and memory fragmentation; connection pool sizing is different — Memcached multi-threaded model handles concurrency differently than Redis single-threaded model; application code changes — replacing memcached.get/set with redis.hgetall or redis.lrange is not a drop-in replacement. Mitigation: run both in parallel during transition, use feature flags to route traffic, implement thorough testing before cutting over, and plan for 2x operational monitoring during the transition period.
Further Reading
Official Documentation
- Redis Documentation — Official Redis docs, including data types, persistence, and clustering guides
- Memcached Wiki — Memcached official wiki with usage examples and configuration options
- Redis Cluster Tutorial — Scaling Redis with cluster mode
Related Articles
- Cache Eviction Policies — How LRU, LFU, and other policies work in Redis and Memcached
- Caching Strategies — How to use these tools effectively
- Distributed Caching — Scaling Redis and Memcached across multiple nodes
External Resources
- Redis Persistence Demystified — Understanding RDB and AOF trade-offs
- Memcached vs Redis: Choosing an In-Memory Store — Instagram’s migration case study
- Redis Lua Scripts Best Practices — Atomic operations with Lua
- Cache Warming Patterns — AWS guidance on cache strategies
Performance Tuning
- Redis Latency Problems Troubleshooting — Diagnosing and fixing Redis latency issues
- Memcached Tuning Guide — Configuration tuning for different workloads
Conclusion
Memcached is simpler and more memory-efficient for pure string caching. Redis is more capable. For basic caching, they are comparable. But Redis’s data structures unlock patterns that would be painful or impossible with Memcached.
I default to Redis for new projects. The operational simplicity of having one system for caching, sessions, pub/sub, and rate limiting usually beats the memory efficiency gains of Memcached.
That said, if you are caching primarily string data and memory is tight, Memcached still earns its place.
Category
Related Posts
Key-Value Stores: Redis and DynamoDB Patterns
Learn Redis and DynamoDB key-value patterns for caching, sessions, leaderboards, TTL eviction policies, and storage tradeoffs.
Caching Strategies: A Practical Guide
Learn the main caching patterns — cache-aside, write-through, write-behind, and refresh-ahead — plus how to pick TTLs, invalidate stale data, and distribute caches across nodes.
Cache Stampede Prevention: Protecting Your Cache
Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.