Caching Strategies: Cache-Aside, Write-Through, and More

Master five caching strategies for production systems. Learn cache-aside vs write-through, avoid cache stampede, and scale with these patterns.

published: reading time: 18 min read

Caching Strategies

Caching works. When done right, it takes latency from hundreds of milliseconds down to microseconds and keeps your database from melting under load. Get it wrong, though, and you’ll spend weeks chasing stale data bugs that only show up at 3am during peak traffic.

This guide covers five caching strategies, when each one makes sense, and the real trade-offs between them.


Why Caching Matters

Most applications have data that changes rarely but gets hit constantly. User profiles, product listings, config values, session data. Without caching, every request for this data pounds the database, even when nothing’s changed since last Tuesday.

The numbers tell the story:

ApproachTypical LatencyRequests per Second (per node)
Database query5-50ms1,000-10,000
Cache hit0.1-1ms100,000-1,000,000
Cache miss (with cache)5-51msSame as database

Here’s the thing though: a cache that serves stale data is worse than no cache. And a cache that needs constant babysitting to stay valid is just overhead you don’t need.


The Five Caching Strategies

Cache-Aside (Lazy Loading)

This is what most people mean when they say “caching.” Your application checks the cache first, loads from the database on a miss, then populates the cache for next time.

sequenceDiagram
    participant Client
    participant Cache
    participant Database

    Client->>Cache: GET user:123
    Cache-->>Client: Cache miss

    Client->>Database: SELECT * FROM users WHERE id = 123
    Database-->>Client: User data

    Client->>Cache: SET user:123 (ttl=3600)
    Cache-->>Client: OK

    Client->>Cache: GET user:123
    Cache-->>Client: User data (cached)

Implementation:

def get_user(user_id):
    # Try cache first
    cache_key = f"user:{user_id}"
    cached = redis.get(cache_key)

    if cached:
        return json.loads(cached)

    # Cache miss - load from database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)

    # Populate cache with TTL
    redis.setex(cache_key, 3600, json.dumps(user))

    return user

Write operation:

def update_user(user_id, data):
    # Update database first
    db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)

    # Invalidate cache
    redis.delete(f"user:{user_id}")

Pros:

  • Simple to implement
  • Cache only contains data that’s actually requested
  • No cache stampede on startup (cold cache is expected)
  • Easy to reason about

Cons:

  • First request after cache miss is always slow
  • Cache and database can temporarily diverge (eventual consistency)
  • Three network round-trips on cache miss (check, read, write)

Use this when: reads dominate your workload and you can live with brief inconsistency.


Read-Through (Cache Enrichment)

Same idea as cache-aside, but the cache library handles the miss logic for you. You just ask the cache for data; it fetches from the database automatically if needed.

sequenceDiagram
    participant Client
    participant Cache
    participant Database

    Client->>Cache: GET user:123
    Cache->>Cache: Check in-memory store

    alt Cache miss
        Cache->>Database: SELECT * FROM users WHERE id = 123
        Database-->>Cache: User data
        Cache->>Cache: Store in memory
    end

    Cache-->>Client: User data

Implementation with Redis:

def get_user_cached(user_id):
    cache_key = f"user:{user_id}"

    # Check if loader is registered
    user = redis.get(cache_key)
    if user:
        return json.loads(user)

    # Using Redis Functions or Lua script for atomic read-through
    # This is handled by the cache layer itself
    return None

Most caching libraries (like Spring Cache, Django’s cache framework, Go’s groupcache) implement read-through natively:

// Using groupcache (read-through implementation)
var db Database
sc := groupcache.NewGetter("http://cache-server/", &db)

func getUser(ctx context.Context, userID int64) (*User, error) {
    var user User
    key := fmt.Sprintf("user:%d", userID)
    err := sc.Get(ctx, key, &user)
    return &user, err
}

Pros:

  • Cleaner application code
  • Reduced latency on cache miss (cache fetches in parallel with other requests)
  • Cache handles the fetch-and-store atomically

Cons:

  • Less control over cache logic
  • All caches must implement the same pattern
  • Can mask cache behavior from developers

Use this when: you want caching to be infrastructure, not application logic.


Write-Through

Every write goes to cache and database together. The operation doesn’t return until both succeed.

sequenceDiagram
    participant Client
    participant Cache
    participant Database

    Client->>Cache: SET user:123
    Cache->>Database: UPDATE users SET ...
    Database-->>Cache: OK

    Cache-->>Client: OK

Implementation:

def update_user(user_id, data):
    # Write to cache AND database
    cache_key = f"user:{user_id}"

    # Start transaction
    db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)

    # Write-through to cache
    redis.setex(cache_key, 3600, json.dumps(data))

    return data

Pros:

  • Strong consistency between cache and database
  • Cache is always warm with latest data
  • No cache invalidation logic needed

Cons:

  • Write latency increases (two writes instead of one)
  • Cache can be knocked out by write-heavy workloads
  • Cache might be populated with data that’s never read

Use this when: consistency matters more than write speed and your writes are infrequent relative to reads.


Write-Behind (Write-Back)

You write to the cache and it batches the database writes to happen later, in the background.

sequenceDiagram
    participant Client
    participant Cache
    participant Database
    participant WriteBuffer

    Client->>Cache: SET user:123
    Cache->>WriteBuffer: Queue write
    Cache-->>Client: OK (fast)

    Note over WriteBuffer: Background worker

    WriteBuffer->>Database: Batch UPDATE
    Database-->>WriteBuffer: OK

Implementation:

import asyncio
from collections import deque

class WriteBehindCache:
    def __init__(self, redis, db, batch_size=100, flush_interval=1.0):
        self.redis = redis
        self.db = db
        self.write_queue = deque()
        self.batch_size = batch_size
        self.flush_interval = flush_interval
        asyncio.create_task(self._flush_loop())

    async def set(self, key, value):
        self.redis.setex(key, 3600, json.dumps(value))
        self.write_queue.append((key, value))

        if len(self.write_queue) >= self.batch_size:
            await self._flush()

    async def _flush(self):
        if not self.write_queue:
            return

        batch = []
        while self.write_queue and len(batch) < self.batch_size:
            batch.append(self.write_queue.popleft())

        # Batch write to database
        for key, value in batch:
            self.db.execute(
                "UPDATE users SET ... WHERE id = ?",
                value['id'],
                value
            )

    async def _flush_loop(self):
        while True:
            await asyncio.sleep(self.flush_interval)
            await self._flush()

Pros:

  • Very low write latency
  • Batching reduces database load
  • Cache handles burst writes gracefully

Cons:

  • Risk of data loss if cache fails before flush
  • Complexity in handling partial failures
  • Cache and database can significantly diverge
  • Harder to debug (writes happen asynchronously)

Use this when: you’re collecting metrics or events and losing a few writes won’t ruin your day.


Refresh-Ahead (Proactive Caching)

The cache automatically refreshes entries before they expire. Popular data stays perpetually warm, so users never hit a cache miss.

sequenceDiagram
    participant Cache
    participant Database
    participant Refresher

    Note over Cache: Entry TTL = 300s

    Refresher->>Cache: Check TTL
    Refresher->>Database: SELECT (background)
    Refresher->>Cache: SET (reset TTL)

    loop Every 60 seconds
        Refresher->>Cache: Check popular entries
        Refresher->>Database: Refresh if TTL < 60s
    end

Implementation:

import time
from threading import Thread

class RefreshAheadCache:
    def __init__(self, redis, db, ttl=300, refresh_threshold=0.8):
        self.redis = redis
        self.db = db
        self.ttl = ttl
        self.refresh_threshold = refresh_threshold
        self.popular_keys = set()

        # Background refresher thread
        self.running = True
        self.thread = Thread(target=self._refresh_loop)
        self.thread.start()

    def track_access(self, key):
        """Track frequently accessed keys"""
        self.popular_keys.add(key)

    def get(self, key):
        value = self.redis.get(key)
        if value:
            self.track_access(key)
            return json.loads(value)
        return None

    def _should_refresh(self, key):
        """Check if key needs proactive refresh"""
        ttl = self.redis.ttl(key)
        return ttl > 0 and ttl < (self.ttl * self.refresh_threshold)

    def _refresh_loop(self):
        while self.running:
            for key in list(self.popular_keys):
                if self._should_refresh(key):
                    # Refresh in background
                    data = self.db.query(
                        "SELECT * FROM users WHERE id = ?",
                        key.split(':')[1]
                    )
                    self.redis.setex(key, self.ttl, json.dumps(data))

            time.sleep(10)  # Check every 10 seconds

Pros:

  • Eliminates cache miss latency for popular items
  • Users never wait for cache to repopulate
  • Smoother performance under varying loads

Cons:

  • Wasted resources refreshing items not actually needed
  • Complexity in tracking truly popular keys
  • Risk of refreshing stale data
  • Additional logic to determine refresh threshold

Use this when: you have a known set of hot data and read latency matters more than wasted cycles.


Choosing the Right Strategy

StrategyRead PerformanceWrite PerformanceConsistencyComplexity
Cache-AsideGood (after miss)BestEventualLow
Read-ThroughGoodSame as DBEventualLow
Write-ThroughGoodGoodStrongMedium
Write-BehindGoodBestEventualHigh
Refresh-AheadBestSameNear-strongHigh

How to decide

Which latency matters more, reads or writes?

  • Reads: cache-aside, read-through, or refresh-ahead
  • Writes: write-behind or write-through

How synced do cache and database need to be?

  • Tight consistency: write-through
  • Eventual is fine: cache-aside or write-behind

What happens if the cache goes down before flushing?

  • Can’t lose writes: write-through
  • A few lost writes are okay: write-behind

Is access predictable?

  • Unpredictable: cache-aside
  • Known hot set: refresh-ahead

Common Pitfalls

Cache Stampede

When a popular entry expires, multiple requests pile up trying to rebuild it simultaneously.

Fix with probabilistic early expiration:

def get_with_stampede_protection(key, fetch_func, beta=1.0):
    value = redis.get(key)
    if value:
        return json.loads(value)

    # Probabilistic early expiration
    ttl = redis.ttl(key)
    if ttl > 0:
        # Add jitter to prevent thundering herd
        expected = time.time() + ttl * beta
        if random.random() < beta:
            # Refresh in background
            return fetch_func(key)

    return fetch_func(key)

Or use a lock:

def get_with_lock(key, fetch_func, lock_timeout=5):
    value = redis.get(key)
    if value:
        return json.loads(value)

    # Try to acquire lock
    lock = redis.lock(f"lock:{key}", timeout=lock_timeout)
    if lock.acquire(blocking=True, blocking_timeout=1):
        try:
            # Double-check after acquiring lock
            value = redis.get(key)
            if value:
                return json.loads(value)
            return fetch_func(key)
        finally:
            lock.release()

    # Didn't get lock, wait and retry
    time.sleep(0.1)
    return redis.get(key)

Cache Penetration

Requests for things that don’t exist skip the cache and hammer the database.

Fix by caching the null:

def get_user_cached(user_id):
    cache_key = f"user:{user_id}"

    # Check cache
    value = redis.get(cache_key)
    if value == "NULL":  # Cached null marker
        return None
    if value:
        return json.loads(value)

    # Load from database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)

    if user:
        redis.setex(cache_key, 3600, json.dumps(user))
    else:
        # Cache null for shorter period
        redis.setex(cache_key, 60, "NULL")

    return user

Production Failure Scenarios

Understanding what fails and how to recover is critical for production caching systems.

FailureImpactMitigation
Cache node goes downAll requests hit database directly, potential cascade failureUse connection pooling with automatic retry; implement circuit breaker pattern
Cache memory exhaustedEviction kicks in aggressively, hit rate drops to 0%Monitor memory usage; set appropriate maxmemory limits; alert at 70% threshold
Network partition between app and cacheRequests hang or timeout; database overloadSet reasonable socket timeouts (100-500ms); fail gracefully to database
Thundering herd on cache restartAll clients hit database simultaneouslyPre-warm cache on restart; use staggered TTLs; implement request coalescing
Stale data served after writeUser sees outdated contentUse write-through for consistency-critical data; implement cache invalidation on writes
Cache credential rotationBrief outage or authentication failuresUse connection pooling with lazy reconnection; rotate during low-traffic windows

Observability Checklist

Monitor these metrics and set up alerts for production cache health.

Metrics to Track

  • Hit Rate: hits / (hits + misses) - should stay above 80-90% for well-tuned caches
  • Memory Usage: used_memory / maxmemory - alert at 70%, critical at 80%
  • Eviction Count: evicted_keys - indicates memory pressure
  • Connection Count: connected_clients - sudden drops indicate connection issues
  • Command Latency: P50, P95, P99 for GET/SET operations
  • Replication Lag: For replicated setups, lag should stay below 100ms

Logs to Capture

# Log cache operations for debugging
import structlog
logger = structlog.get_logger()

def get_user(user_id):
    cache_key = f"user:{user_id}"
    start = time.time()

    cached = redis.get(cache_key)
    if cached:
        logger.info("cache_hit", key=cache_key, latency_ms=(time.time() - start) * 1000)
        return json.loads(cached)

    # Cache miss - this should be rare in production
    logger.warning("cache_miss", key=cache_key)
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.setex(cache_key, 3600, json.dumps(user))

    return user

Alert Rules

# Prometheus alert rules for Redis
- alert: CacheHitRateLow
  expr: redis_keyspace_hits_total / (redis_keyspace_hits_total + redis_keyspace_misses_total) < 0.8
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Cache hit rate below 80%"

- alert: CacheMemoryExhausted
  expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Cache memory above 80% capacity"

Security Checklist

Cache security is often overlooked until a breach happens.

  • Never expose Redis/Memcached directly to the internet - Bind to localhost or private network only
  • Use authentication - Redis requirepass or Memcached SASL authentication
  • Enable TLS - For connections crossing network boundaries
  • Validate key namespaces - Use prefixes like app:env:table: to prevent key collisions
  • Sanitize cache keys - User input should never become cache keys without validation
  • Implement rate limiting - Prevent cache exhaustion attacks
  • Audit cache access - Log who accessed what, especially for sensitive data
  • Never cache sensitive data - PII, passwords, tokens, payment info should never enter the cache
# Redis secure configuration
bind 127.0.0.1 -::1
requirepass your-strong-password-here
tls-replication yes
tls-auth-clients no

Common Pitfalls / Anti-Patterns

These mistakes catch teams at scale. Avoid them.

1. Caching Without Measuring

Adding Redis doesn’t automatically make things faster. Measure before and after.

# BAD: Blind caching
def get_user(user_id):
    return redis.get(f"user:{user_id}") or db.query(...)

# GOOD: Measure first, cache second
def get_user(user_id):
    cache_key = f"user:{user_id}"
    cached = redis.get(cache_key)
    if cached:
        metrics.increment("cache.hit")
        return json.loads(cached)

    metrics.increment("cache.miss")
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.setex(cache_key, 3600, json.dumps(user))
    return user

2. Using Cache as Primary Store

The cache is not the source of truth. The database is.

# BAD: Cache as primary - data loss on cache failure
def update_user(user_id, data):
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))  # Database not updated!
    return data

# GOOD: Write to database first, invalidate cache
def update_user(user_id, data):
    db.execute("UPDATE users SET ... WHERE id = ?", user_id, data)
    redis.delete(f"user:{user_id}")  # Invalidate, don't update
    return data

3. No TTLs on Data

Unlimited TTLs mean stale data and unbounded memory growth.

# BAD: No expiration
redis.set(f"user:{user_id}", json.dumps(user))

# GOOD: Always set reasonable TTLs
redis.setex(f"user:{user_id}", 3600, json.dumps(user))  # 1 hour default
redis.setex(f"session:{session_id}", 86400, json.dumps(session))  # 24 hours for sessions

4. Ignoring Cache Stampede

The database can handle one request rebuilding a cache entry. It cannot handle 1000 simultaneous requests.

# BAD: No stampede protection
def get_user(user_id):
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # This runs for ALL concurrent requests on cache miss
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    return user

# GOOD: Lock-based stampede protection
def get_user(user_id):
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    lock = redis.lock(f"lock:user:{user_id}", timeout=5)
    if lock.acquire(blocking=True, blocking_timeout=1):
        try:
            # Double-check after acquiring lock
            cached = redis.get(f"user:{user_id}")
            if cached:
                return json.loads(cached)
            user = db.query("SELECT * FROM users WHERE id = ?", user_id)
            redis.setex(f"user:{user_id}", 3600, json.dumps(user))
            return user
        finally:
            lock.release()

    # Didn't get lock - wait and retry
    time.sleep(0.1)
    return redis.get(f"user:{user_id}")

5. Cache Key Collisions

Generic key names cause collisions in shared cache infrastructure.

# BAD: Generic keys
redis.set("user", user_data)  # Collides with other "user" keys
redis.set("config", config)   # Collides system-wide

# GOOD: Namespaced keys
redis.setex(f"myapp:prod:user:{user_id}", 3600, json.dumps(user_data))
redis.setex(f"myapp:prod:config:{env}", 86400, json.dumps(config))

Quick Recap

Key Bullets

  • Cache-aside is the default strategy for most read-heavy workloads
  • Write-through ensures strong consistency but increases write latency
  • Write-behind batches writes for performance but risks data loss
  • Refresh-ahead eliminates misses for popular items but adds complexity
  • Always implement stampede protection when cache misses could cascade
  • Monitor hit rate, memory usage, and eviction counts continuously

Copy/Paste Checklist

# Cache-Aside Implementation Checklist
- [ ] Check cache first (redis.get)
- [ ] On miss, query database
- [ ] Populate cache with TTL (redis.setex)
- [ ] On write, invalidate cache (redis.delete), don't update
- [ ] Implement stampede protection with locks
- [ ] Cache null values with short TTL to prevent penetration
- [ ] Monitor hit rate - should be >80%
- [ ] Set appropriate TTLs based on data freshness requirements
- [ ] Log cache hits and misses for observability
- [ ] Use circuit breaker for cache failures

# TTL Selection Guide
- [ ] User profiles: 15-60 minutes
- [ ] Session data: 24 hours
- [ ] API responses: 5-30 minutes
- [ ] Static config: 1-24 hours
- [ ] Product catalog: 1-24 hours
- [ ] Real-time data: No caching or very short TTL (30-60 seconds)

See Also


Capacity Estimation: Cache Size vs Hit Rate

The relationship between cache size and hit rate is not linear. Adding more cache memory gives diminishing returns beyond a certain point.

The working set model: your hit rate depends on how much of your frequently-accessed data fits in cache. If 80% of your requests hit 20% of your data, and that 20% fits in cache, you can achieve 95%+ hit rate with relatively small cache. If access is uniformly distributed, even a large cache provides modest hit rates.

The formula for estimating required cache size: working_set_bytes = unique_keys_per_second * avg_value_size * avg_ttl_seconds. If you have 10,000 requests per second, average value is 1KB, and you want a 5-minute TTL window, your working set is 10,000 1,000 300 = 3GB minimum for a fully-utilized cache before evictions. In practice, you need 1.5-2x that because LRU/LFU policies do not perfectly track the working set.

The hit rate curve: start at 0% hit rate with no cache, rapid climb as cache grows to cover the hot working set, then diminishing returns as cache size exceeds working set. Plot your hit rate against cache size to find the knee of the curve — the point where adding more cache stops helping significantly. This is your target cache size.

For cache-aside specifically, the miss penalty matters more than raw hit rate. A cache miss does a full database round-trip. If your database latency is 10ms and cache latency is 0.5ms, each miss costs 9.5ms extra. At 99% hit rate, only 1% of requests pay the miss penalty. At 95% hit rate, 5% pay it — a 5x increase in slow queries.

Real-World Case Study: YouTube’s Cache Hierarchy

YouTube’s caching infrastructure is one of the most studied in the industry. Their approach uses multiple cache layers: L1 (in-memory, per-machine), L2 (distributed cache), and CDN at the edge. Understanding their architecture explains why most companies settle on a two-tier approach.

YouTube’s L1 cache is a small in-memory cache on each application server. It handles the most frequently accessed items — popular videos, trending content. L1 hit rate alone is often 50-60% because many users on the same machine access the same popular content.

The L2 distributed cache (originally Memcached, later moved to custom infrastructure) handles cache misses from L1. L2 is sharded across many machines to provide petabyte-scale capacity. Cache misses from L2 go to storage (BigTable).

The CDN handles the edge, serving popular content from points of presence close to users. YouTube’s CDN cache hit rate is over 90% for video streaming — once a video becomes popular, it propagates to CDN PoPs and subsequent requests rarely hit origin.

The lesson: YouTube does not rely on a single cache tier. They use L1 to handle the ultra-hot set with extremely low latency, L2 for the warm cache, and CDN for the long tail of popular-but-not-ultra-popular content. Most companies should design for two tiers (local cache + distributed cache) before adding a CDN.

Real-World Case Study: Twitter’s Cache Warming Strategy

Twitter has a unique caching challenge: events (tweets, likes, follows) have a short window of high read traffic, then traffic drops off a cliff. A tweet from a celebrity gets millions of reads in the first hour, then readership drops to hundreds per day.

Twitter’s solution is aggressive cache warming: when a tweet is published, Twitter pushes it into the timelines of active followers’ caches rather than waiting for cache misses. This is the fanout-on-write pattern — write to caches at publish time rather than computing at read time.

The tradeoff is write amplification. Every tweet from a celebrity with 10 million followers requires 10 million cache writes. Twitter manages this by limiting fanout to active users only and using hybrid push/pull for lower-activity accounts. Inactive users’ timelines are computed on read from the tweet author’s tweet store.

The operational lesson: cache warming trades write amplification for read latency. For content with rapid decay in read traffic (news, social posts, live events), warming the cache at write time reduces read latency at the cost of higher write overhead. For evergreen content, cache-aside with long TTLs is simpler and more efficient.

Conclusion

There is no best strategy. Cache-aside is the default because it covers the most cases with the least complexity. But you’ll encounter situations where write-through or refresh-ahead fits better.

Start simple. Measure your hit rate. Add complexity only when data tells you to.

Category

Related Posts

Cache Eviction Policies: LRU, LFU, FIFO, and More Explained

Learn LRU, LFU, FIFO, and TTL eviction policies. Understand trade-offs with real-world performance implications for caching.

#caching #algorithms #system-design

Cache Patterns: Stampede, Thundering Herd, Tiered Caching

Learn advanced cache patterns for production systems. Solve cache stampede, implement cache warming, and design tiered caching architectures.

#caching #patterns #system-design

Distributed Caching: Multi-Node Cache Clusters

Scale caching across multiple nodes. Learn about cache clusters, consistency models, session stores, and cache coherence patterns.

#distributed-systems #caching #scalability