CDN Deep Dive: Content Delivery Networks Explained

A comprehensive guide to CDNs — how they work, PoP architecture, anycast routing, cache invalidation strategies, SSL/TLS termination, and real-world performance trade-offs.

published: reading time: 41 min read author: Geek Workbench

Introduction

A CDN sits between your server and your users. It caches your content at edge locations worldwide so users fetch from a server down the street instead of halfway across the world. The result: faster page loads, less bandwidth cost, and protection against traffic spikes.

But CDNs are more than just static file caches. Modern CDNs handle request routing, DDoS protection, image optimization, and edge computing.


Core Concepts

The Basic Idea

graph LR
    A[User in Tokyo] --> B[CDN Edge<br/>Tokyo]
    A -->|Without CDN| C[Origin Server<br/>Virginia, USA]

    B --> D[Cache HIT<br/>Fast response]
    C --> E[Cache MISS<br/>Slow response]

Without CDN: User in Tokyo fetches from Virginia. Round-trip time: 200ms+.

With CDN: User in Tokyo fetches from Tokyo edge. Round-trip time: 5ms.

Points of Presence (PoPs)

CDNs maintain a network of servers called Points of Presence (PoPs). Each PoP has edge servers that serve cached content and handle requests.

A large CDN like CloudFlare or Fastly has hundreds of PoPs globally. When a user makes a request, the CDN routes it to the nearest PoP using anycast routing.

graph TD
    A[User Request] --> B[DNS Resolution]
    B --> C[Anycast Routing]
    C --> D[Nearest PoP]
    D --> E{Content Cached?}
    E -->|Yes| F[Return from cache]
    E -->|No| G[Fetch from origin]
    G --> H[Cache at PoP]
    H --> F

Anycast Routing

CDNs use Anycast for routing. Multiple PoPs announce the same IP address. Traffic routes to the geographically closest PoP automatically.

# How anycast works (simplified)
# All PoPs announce: 104.16.100.1 is here
# User's router sees multiple paths, picks shortest

# BGP routing
# Tokyo PoP: 104.16.100.1 via AS12345
# Virginia PoP: 104.16.100.1 via AS12345
# User in Tokyo gets routed to Tokyo PoP

Topic-Specific Deep Dives

Cache Headers

CDNs respect HTTP cache headers. Getting these right is essential for CDN performance.

Cache-Control

# Don't cache at all (private content)
Cache-Control: private, no-store

# Cache everywhere for 1 hour
Cache-Control: public, max-age=3600

# Cache at edge only (not in browsers), for 1 day
Cache-Control: public, s-maxage=86400, max-age=0

# Stale-while-revalidate: serve stale while fetching update
Cache-Control: public, max-age=3600, stale-while-revalidate=60

# Immutable: content never changes (perfect for versioned assets)
Cache-Control: public, max-age=31536000, immutable

When to Use What

HeaderUse Case
private, no-storeUser-specific data, credentials, payment info
public, max-age=3600API responses that change hourly
public, s-maxage=86400Static content, cached at edge only
immutableVersioned assets (JS bundles, images with hashes)

Vary Header

Tells CDN that responses vary based on certain request headers.

# Cache different versions based on Accept-Encoding
Vary: Accept-Encoding

# Cache different versions based on Authorization
Vary: Authorization

# Common combination
Vary: Accept-Encoding, Accept-Language

Warning: Every Vary header value creates a separate cache entry. Too many variations floods your CDN cache.


Cache Invalidation

Sometimes you need to force CDN to discard cached content.

Purge Methods

# CloudFlare API purge
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE/purge_cache" \
  -H "Authorization: Bearer *** \
  -H "Content-Type: application/json" \
  --data '{"files":["https://example.com/style.css"]}'

# Purge everything
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE/purge_cache" \
  -H "Authorization: Bearer *** \
  -H "Content-Type: application/json" \
  --data '{"purge_everything": true}'

Cache Tags (Surrogate Keys)

Some CDNs support cache tagging for granular invalidation.

# Set cache tag on response
Cache-Tag: product, product-123, category-electronics

# Purge by tag
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE/purge_cache" \
  -H "Authorization: Bearer *** \
  --data '{"tags":["product-123"]}'

This lets you invalidate all pages containing product 123 without purging the entire cache.


CDN Configuration for Static Sites

Typical Configuration

# HTML: short cache, revalidate
Cache-Control: public, max-age=60, stale-while-revalidate=300

# Static assets with hashes (immutable)
Cache-Control: public, max-age=31536000, immutable

# Images: medium cache
Cache-Control: public, max-age=86400, stale-while-revalidate=3600

Pretty URLs vs File Paths

Static site generators produce file paths like /blog/post/index.html. This makes cache invalidation tricky.

Better: use origin pull CDN where the CDN fetches from your origin on cache miss and caches the result. Your HTML points to the CDN URLs.

<!-- Instead of -->
<img src="/assets/image.png" />

<!-- Use CDN URL -->
<img src="https://cdn.example.com/assets/image.png" />

CloudFlare Configuration

CloudFlare is one of the most popular CDNs. Here’s how to configure it.

Page Rules

Page Rules control caching behavior per URL pattern.

# Rule 1: Cache everything static
# Pattern: *example.com/static/*
# Settings:
#   - Cache Level: Cache Everything
#   - Edge Cache TTL: 1 month
#   - Browser Cache TTL: 1 year

# Rule 2: Don't cache API
# Pattern: *example.com/api/*
# Settings:
#   - Cache Level: Bypass

# Rule 3: HTML cache with auto-revalidate
# Pattern: *example.com/blog/*
# Settings:
#   - Cache Level: Standard
#   - Edge Cache TTL: 1 hour
#   - Browser Cache TTL: 30 minutes

Caching Level

# Basic (standard caching, respects headers)
# Cache everything (caches all, ignores query strings mostly)
# Bypass (never cache)

Edge Caching

CloudFlare’s Argo tier caches at the edge beyond just PoP caching.

# Enable Argo
# Settings -> Network -> Always Use NOW

# This improves cache hit rate for non-HTML content
# by routing through optimized edge network

Performance Impact

Here’s what a CDN typically does for page load times:

MetricWithout CDNWith CDN
TTFB200-500ms5-50ms
Page Load3-5s1-2s
Bandwidth Cost$1.00/GB$0.08/GB

TTFB (Time To First Byte) drops dramatically because CDN answers from memory at the edge.


Edge Computing

Modern CDNs support edge computing: running code at edge locations instead of just caching.

Cloudflare Workers

// Cloudflare Worker: runs at edge
addEventListener("fetch", (event) => {
  event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
  const url = new URL(request.url);

  // Rewrite requests to API
  if (url.pathname.startsWith("/api/")) {
    // Add CORS headers
    const response = await fetch(request);
    const headers = new Headers(response.headers);
    headers.set("Access-Control-Allow-Origin", "*");
    return new Response(response.body, {
      status: response.status,
      headers: headers,
    });
  }

  // Pass through to origin
  return fetch(request);
}

Edge Use Cases

Use CaseDescription
A/B testingRoute users to variants at edge
GeolocationServe different content based on user location
AuthenticationValidate tokens before hitting origin
Rate limitingLimit requests at edge, before they reach origin
PersonalizationModify content per user at edge

Capacity Estimation: Edge Cache Sizing, PoP Count, and Bandwidth Planning

Sizing a CDN deployment requires understanding how much cache lives at the edge and how much origin bandwidth you actually need.

Edge cache sizing formula:

edge_cache_per_PoP = disk_size_per_edge_node × number_of_edge_nodes_per_PoP
total_edge_capacity = edge_cache_per_PoP × number_of_PoPs
effective_cache = total_edge_capacity × cache_hit_ratio

For a CDN with 100 PoPs, each having 10 edge nodes with 1TB SSD each:

  • Per PoP: 10TB
  • Total: 1PB raw capacity
  • At 90% hit rate: effective = 100TB served from cache, 10TB from origin

Origin bandwidth planning:

origin_bandwidth = total_traffic × (1 - cache_hit_ratio) × avg_response_size
peak_origin_bandwidth = origin_bandwidth × peak_factor

If your site serves 10Gbps total traffic with 80% CDN hit rate, 2Gbps hits origin. With a 3× peak factor, origin must handle 6Gbps. For a typical origin server with 10GbE NIC, this is manageable. For 50Gbps total with 70% hit rate, origin needs 15Gbps at peak — requiring load balancers and multiple origin servers.

PoP count planning: The formula for PoP coverage:

latency_to_user ≈ distance_to_nearest_PoP
number_of_PoPs_needed = geographic_coverage_requirement / avg_PoP_radius

Cloudflare has 300+ PoPs globally, Fastly has 80+. For most companies, using an existing CDN means you inherit their PoP count. If you are evaluating multi-CDN, measure your user geographic distribution and match to CDN PoP coverage in those regions.

Cache hit rate estimation: Theoretical maximum hit rate based on content characteristics:

hit_rate ≈ 1 - (unique_objects_per_day / total_requests_per_day × avg_object_size / cache_size_per_PoP)

Static sites with 1000 unique objects and 1M daily requests have high hit rate potential. Dynamic APIs with millions of unique query strings per day will always have low hit rate regardless of cache size.


Deep Dive: QUIC and HTTP/3 for CDN

HTTP/3 (RFC 9114) uses QUIC (RFC 9000) as its transport protocol, bringing significant CDN improvements:

Why QUIC Benefits CDN Traffic

graph LR
    A[User] --> B[TCP + TLS 1.2]
    A --> C[QUIC + TLS 1.3]
    B --> D[1-RTT<br/>3 handshake]
    C --> E[0-RTT<br/>1 handshake]
FeatureHTTP/1.1HTTP/2HTTP/3
TransportTCPTCPQUIC (UDP)
Handshake2-RTT + TLS2-RTT + TLS1-RTT + TLS 1.3
Head-of-line blockingYes (serial)Yes (TCP)No (streams independent)
Connection migrationBreaksBreaksSurvives network switches
Packet lossAll streams blockedAll streams blockedOnly lost stream blocked

CDN Edge QUIC Support

# Cloudflare: HTTP/3 enabled by default for proxied zones
# Check headers for HTTP/3 support
curl -I --http3 https://example.com

# Response headers indicate:
# alt-svc: h3=":443"; ma=86400

# Fastly: Enable via VCL or SDK
# VCL: set bereq.http.Upgrade = "h3";

QUIC Benefits for CDN

BenefitImpact
0-RTT resumptionReduced latency for returning users (cache warm)
No head-of-line blockingBetter performance on lossy networks (mobile)
Connection migrationSeamless on WiFi → cellular transition
Better congestion controlQUIC has better loss recovery than TCP

QUIC CDN Considerations

ConcernMitigation
UDP blockingSome corporate firewalls block UDP 443; fallback to HTTP/2
CDN vendor supportCheck CDN QUIC support (Cloudflare ✅, Fastly ✅, Akamai ✅)
Debugging complexityUse curl -v or Wireshark QUIC dissector
MiddleboxesSome middleboxes interfere with UDP; CDN handles transparently

Recommendation: HTTP/3 is now well-supported by major CDNs. Enable it globally — the performance improvements (especially on mobile) are significant and there’s minimal downside for static content delivery.


Multi-CDN Strategy Deep Dive

Using multiple CDN providers eliminates single CDN dependency and can improve performance by routing users to the best-performing network for their region.

Why Multi-CDN?

FactorSingle CDNMulti-CDN
AvailabilitySingle point of failureRedundancy across providers
PerformanceLimited by one networkBest-of-breed routing per region
CostSimpler pricingMore complex, but volume discounts possible
Operational ComplexityLowHigher - requires coordination

Implementation Approaches

Geographic Split:

# Route by ASN/preferred exit point
# Asia-Pacific: Cloudflare
# Americas + Europe: Fastly
# Secondary failover for all regions

# DNS-based routing example (Cloudflare Rules)
# if (ip.geoip.continent == "AS") then
#   use DNS fallback to Cloudflare Asia
# else
#   use primary CDN

Performance-Based Routing:

// Edge worker: measure latency to multiple CDNs
// Route to fastest responding CDN for user's region
addEventListener("fetch", event => {
  const latencyToCDN_A = await measureLatency("https://cdn-a.example.com/ping");
  const latencyToCDN_B = await measureLatency("https://cdn-b.example.com/ping");

  const fastest = latencyToCDN_A < latencyToCDN_B ? "cdn-a" : "cdn-b";
  // Rewrite to fastest CDN origin
});

Provider Comparison

ProviderPoP CountStrengthsBest For
Cloudflare300+Security, DDoS, WorkersGeneral purpose, security-first
Fastly80+Real-time purging, VCLDynamic content, API acceleration
Akamai400+Enterprise scale, mediaLarge enterprises, streaming
AWS CloudFront600+AWS ecosystem integrationAWS-native workloads

Multi-CDN Key Takeaways

  • Single CDN creates a single point of failure
  • Multi-CDN requires automated failover logic
  • Geographic splitting is simpler than performance-based routing
  • Consider multi-CDN if your SLA requires >99.9% uptime

WAF & DDoS Protection

CDNs provide the first line of defense at the network edge. Modern WAFs (Web Application Firewalls) operate at layer 7 to block application-layer attacks.

WAF Rule Categories

CategoryAttack TypesExample Rules
SQL InjectionData exfiltrationBlock OR 1=1, UNION SELECT
XSSSession hijacking, defacementBlock <script>, onerror=
Path TraversalFile accessBlock ../, ..\\
Rate LimitingBrute force, DoSBlock >100 req/min per IP
Bot DetectionScraping, credential stuffingBlock known bot signatures

Cloudflare WAF Configuration

# Cloudflare WAF rule example (Phase 2 rules)
# Block SQL injection attempts
# URI contains: UNION.*SELECT, OR 1=1, -- (comment)
# Action: block with 403

# Cloudflare Managed Rules
# Cloudflare-AWS Managed Ruleset
# Cloudflare MiTMray Ruleset
# Cloudflare OWASP Core Ruleset

# Custom rule example
# If (cf.threat_score > 30) then block
# If (not verified) and (uri contains /wp-login) then block

DDoS Mitigation Layers

graph TD
    A[User Traffic] --> B[CDN Edge<br/>Anycast]
    B --> C{DDoS detected?}
    C -->|Volume-based| D[Rate Limiting<br/>Traffic Shaping]
    C -->|Protocol-based| E[SYN Proxy<br/>Challenge]
    C -->|Application-based| F[WAF Rules<br/>Bot Management]
    D --> G[Legitimate Traffic<br/>Passes Through]
    E --> G
    F --> G
    C -->|Malicious| H[Blocked<br/>Rate Limited]

Rate Limiting at Edge

// Cloudflare Worker: distributed rate limiting
// Uses KV store for sliding window counter
const RATE_LIMIT = 100; // requests per minute
const WINDOW_MS = 60 * 1000;

async function rateLimit(request) {
  const ip = request.headers.get("CF-Connecting-IP");
  const key = `rl:${ip}`;

  const current = await CDN_KV.get(key);
  const count = current ? parseInt(current) : 0;

  if (count >= RATE_LIMIT) {
    return new Response("Rate Limit Exceeded", { status: 429 });
  }

  await CDN_KV.put(key, (count + 1).toString(), {
    expirationTtl: WINDOW_MS / 1000,
  });
}

WAF Key Takeaways

  • WAF operates at layer 7, blocking application-layer attacks
  • Rate limiting prevents brute force and DoS at the edge
  • DDoS protection should be multi-layered (volume, protocol, app)
  • WAF rules need tuning to avoid false positives

CDN Cost Optimization

CDN costs come from bandwidth, requests, and optional features. Understanding the billing model helps optimize spend.

Cost Components

ComponentTypical CostOptimization
Bandwidth (egress)$0.02-0.10/GBIncrease cache hit ratio
Requests (GET/POST)$0.10-1.00/millionReduce request count, coalesce
Origin transfer$0.01-0.05/GBUse origin shield
Optional features$5-100+/monthWorkers, Image Resizing, Argo

Tiered Caching for Cost

graph LR
    A[User] --> B[Regional Edge<br/>Tier 3]
    B --> C[Regional Shield<br/>Tier 2]
    C --> D[Origin Shield<br/>Tier 1]
    D --> E[Origin Server]

    B -->|miss| C
    C -->|miss| D
    D -->|miss| E

    C -.->|cache warm| F[Shared cache<br/>reduces origin hits]
    D -.->|cache warm| G[Origin shield<br/>reduces origin load]

Origin Shield Benefits

Origin shield caches at an intermediate tier between edge and origin:

  • Reduces origin bandwidth by 50-90% vs direct edge-fetching
  • Lowers origin server load by consolidating misses
  • Geographic shielding - one origin shield serves many edge nodes
# Cloudflare origin shield configuration
# Tiered cache topology:
# Tier 1: Origin Shield (cheapest region - Virginia)
# Tier 2: Regional caches (every PoP)
# Tier 3: Edge cache (closest to user)

# Enable via Cloudflare dashboard:
# Caching -> Tiered Cache -> Add Tier
# Set origin shield region: Virginia, USA

Cache Hit Ratio Impact on Cost

cost_per_request ≈ bandwidth_cost_per_GB × avg_response_size × (1 - hit_ratio)

Example:
- 100M requests/month
- 50KB average response
- Bandwidth: $0.05/GB

At 70% hit ratio:
  - 30M requests hit origin: 30M × 50KB = 1,500 GB
  - Cost: $75/month

At 95% hit ratio:
  - 5M requests hit origin: 5M × 50KB = 250 GB
  - Cost: $12.50/month

Savings: $62.50/month (83% reduction)

CDN Cost Optimization Key Takeaways

  • Bandwidth is the largest cost driver — maximize cache hit ratio
  • Origin shield reduces origin bandwidth by 50-90%
  • Request coalescing reduces total request count
  • Enable features only when needed — Argo, Workers have per-request costs

Mobile and IoT CDN Considerations

Mobile users and IoT devices have unique requirements: intermittent connectivity, varying network quality, and battery constraints.

AMP (Accelerated Mobile Pages)

AMP caches content globally at Google’s AMP Cache, served over HTTP/2:

<!-- AMP HTML: stripped-down HTML for fast mobile loading -->
<!-- CDN serves AMP version for Google search results -->
<!DOCTYPE html>
<html amp>
  <head>
    <link rel="amphtml" href="https://example.com/article.amp.html" />
  </head>
  <body>
    <amp-img src="hero.jpg" width="600" height="400" layout="responsive">
    </amp-img>
  </body>
</html>

Brotli Compression for Mobile

Brotli compresses 15-25% better than gzip, saving bandwidth on mobile:

# Enable Brotli at CDN
# Cloudflare: Speed -> Optimization -> Brotli
# Fastly: VCL: set req.http.Accept-Encoding = "gzip, br"

# Cache different Brotli levels based on device
# Mobile: level 4 (faster, less compression)
# Desktop: level 11 (slower, maximum compression)

Edge Computing for Mobile

// Mobile-specific edge logic
async function handleMobileRequest(request) {
  const headers = request.headers;
  const userAgent = headers.get("User-Agent");
  const saveData = headers.get("Save-Data");

  // Detect mobile
  const isMobile = /mobile|android|iphone/i.test(userAgent);

  // Serve lightweight version to data-saver users
  if (saveData === "on" || isMobile) {
    const url = new URL(request.url);
    // Rewrite to optimized version
    url.pathname = "/mobile" + url.pathname;
    return fetch(new Request(url, request));
  }

  return fetch(request);
}

IoT Considerations

ChallengeCDN Solution
Limited storageSmallest response via compression
Intermittent connectivityLong TTL + stale-while-revalidate
Low powerReduce round trips via keepalive
Certificate managementManaged TLS at edge
# IoT-specific cache headers
# Long TTL for sensor data (changes infrequently)
Cache-Control: public, max-age=3600

# Shorter TTL for commands (need freshness)
Cache-Control: public, max-age=60

# Compress for bandwidth savings
Accept-Encoding: gzip, deflate, br

Mobile & IoT Key Takeaways

  • AMP provides cached mobile pages via Google’s CDN
  • Brotli compresses 15-25% better than gzip for mobile
  • Edge computing can serve different content to mobile vs desktop
  • IoT benefits from compression and long TTLs with revalidation

Trade-off Analysis

When selecting CDN configurations, you often trade one benefit for another. Here are the key trade-offs:

Cache TTL Trade-offs

Content TypeLong TTLShort TTL
Static assets (hashed)✅ Cache forever, maximum performance❌ Unnecessary origin fetches
HTML pages⚠️ Risk stale content✅ Always fresh, more origin traffic
API responses❌ May serve outdated data✅ Always current
Images✅ Good for unchanging assets⚠️ Miss rate increases

Recommendation: Use content-hashed filenames for immutable assets (long TTL), short TTL with stale-while-revalidate for HTML, and no-store for dynamic content.

CDN Provider Trade-offs

FactorBuild Your OwnUse Major Provider (Cloudflare/Fastly/Akamai)
CostHigh CapEx, low marginal costOpEx based on usage
ControlFull controlLimited to provider’s features
ScaleLimited to your PoP investmentInstant global scale
ComplexityYou manage everythingManaged by provider

Recommendation: Use a major provider unless you have specific needs that require custom hardware (e.g., ultra-low latency trading, specialized hardware).

QUIC CDN Trade-offs

ConcernMitigation
UDP blockingSome corporate firewalls block UDP 443; fallback to HTTP/2
CDN vendor supportCheck CDN QUIC support (Cloudflare ✅, Fastly ✅, Akamai ✅)
Debugging complexityUse curl -v or Wireshark QUIC dissector
MiddleboxesSome middleboxes interfere with UDP; CDN handles transparently

Recommendation: HTTP/3 is now well-supported by major CDNs. Enable it globally — the performance improvements (especially on mobile) are significant and there’s minimal downside for static content delivery.

Origin Shield Trade-offs

ScenarioWithout ShieldWith Shield
Origin bandwidth50-90% reductionFull bandwidth from all PoPs
Origin loadReduced (shared cache)High (every PoP hits origin)
Cache consistencySlight lag (stale at shield)Immediate consistency
CostAdditional tier costSimpler, cheaper
ComplexityTiered cache managementSimple two-tier

Recommendation: Enable origin shield for any production traffic. The cost savings on origin bandwidth typically far exceed the additional CDN cost.


When to Use / When Not to Use

Use CaseWhen to Use CDNWhen Not to Use CDN
Static AssetsVersioned JS/CSS bundles, images, fontsRarely updated content needing instant purging
HTML PagesMostly static blog/documentation sitesFrequently updated real-time dashboards
API ResponsesPublic APIs with identical responses for all usersUser-specific, authenticated, or dynamic content
Video/StreamingVOD, large file distributionLive streaming (use specialized streaming CDN)
Global User BaseUsers distributed across geographiesLocalized single-region user base
Traffic SpikesExpecting sudden popularity surgesPredictable steady traffic patterns

When CDN is Essential

  • Public website serving users globally
  • Static-first architecture with versioned assets
  • Cost optimization for bandwidth-heavy content
  • DDoS protection and security layer needed
  • SEO improvement via fast global load times

When CDN is Overkill

  • Fully dynamic, personalized content on every request
  • Internal/private applications behind VPN
  • Real-time data (stock prices, live sports)
  • Content that changes constantly (search results)
  • Applications with very small, localized user bases

Production Failure Scenarios

FailureImpactMitigation
Cache doesn’t purgeUsers see stale content after updateUse cache tags; implement versioning; purge on deploy
CDN origin pull stormCache miss wave hits originImplement origin shield; gradual cache warming; rate limiting
SSL certificate expiryCDN serves no content over HTTPSAutomated cert renewal (Let’s Encrypt); monitoring
CDN goes down globallyAll traffic fails or falls back to slow originConfigure origin fallback; multi-CDN strategy
Regional PoP failureUsers in region experience latency/outageCDN redundancy; anycast routing
Cache poisoningMalicious content cached and servedValidate origin responses; signed URLs; integrity checks
Header misconfigurationPrivate content cached publiclyAudit cache headers; use private for user data
Query string abuseDifferent cache entry per query paramIgnore or normalize query strings; use cache busting

Common Pitfalls / Anti-Patterns

Caching Personalized Content

Don’t cache content that varies per user.

# BAD: Personalized page cached publicly
Cache-Control: public, max-age=3600
# User sees another user's data!

# GOOD: User-specific pages should not be cached
Cache-Control: private, no-store
# Or don't cache at CDN at all for auth-required pages

Query String Cache Key Duplication

CDNs treat /page?session=123 and /page?session=456 as different URLs.

# BAD: Unique query string on every build
# /app.js?v=1.0.0.12345  (changes every build)
# /app.js?v=1.0.0.12346  (new cache entry each time!)

# GOOD: Content-hashed filenames (cache forever)
# /app.a3f5d8.js  (hash in filename)
# Cache-Control: public, max-age=31536000, immutable

Setting Long TTLs Without Invalidation Plan

Long cache + no purge = stale content.

# BAD: Long TTL, no invalidation strategy
Cache-Control: public, max-age=31536000
# Content updated but CDN serves old version for a year

# GOOD: Reasonable TTL with purging
Cache-Control: public, max-age=86400  # 1 day
# On deploy: Purge old version via API

Over-Caching HTML

HTML changes often. Don’t cache it long unless you’re certain.

# BAD: Long TTL on HTML
Cache-Control: public, max-age=31536000
# Homepage updated, users see old version for a year

# GOOD: Short TTL with revalidation
Cache-Control: public, max-age=60, stale-while-revalidate=300
# Serve stale briefly while fetching update

Not Using Edge Computing Wisely

Edge functions add complexity; don’t overuse them.

// BAD: Heavy computation at edge
addEventListener("fetch", (event) => {
  event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
  // This runs on EVERY request - expensive!
  const result = await doHeavyComputation(request);
  return new Response(result);
}

// GOOD: Simple routing, lightweight transforms
async function handleRequest(request) {
  const url = new URL(request.url);

  // Lightweight: Add headers, modify path
  // Heavy: Pass through to origin
  if (url.pathname.startsWith("/api/")) {
    return fetch(request); // Pass through
  }

  // Simple edge logic for static content
  const response = await fetch(request);
  const headers = new Headers(response.headers);
  headers.set("X-Edge-Location", "Tokyo");
  return new Response(response.body, { headers });
}

Missing Origin Shield

Without origin shield, every cache miss hits your origin directly.

graph TD
    A[User] --> B[CDN Edge]
    B -->|miss| C[Origin Shield]
    C -->|miss| D[Origin Server]

    E[User] --> F[CDN Edge]
    F -->|miss| G[Origin Server]

    C --> H[Cache warm]
    D --> C
# Enable origin shield in Cloudflare
# Network tab -> Always Use NOW (Argo)
# Or configure secondary origin for origin shield

# This reduces origin load by caching at intermediate tier

Quick Recap

Observability Checklist

CDN Provider Metrics

  • cdn.bandwidth_bytes - Total bandwidth served
  • cdn.cache_hit_ratio - Percentage served from cache vs origin
  • cdn.requests_total - Total requests
  • cdn.latency_p95_p99 - Edge response times
  • cdn.origin_fetch_time - Time spent fetching from origin when cache miss

Application Metrics

# Check cache headers on responses
curl -I https://example.com/assets/app.js

# Look for:
# X-Cache: HIT/MISS/EXPIRED
# CF-Cache-Status: HIT/MISS/EXPIRED/REVALIDATED
# Age: seconds since cached at edge

Logs to Capture

# Log CDN cache status for debugging
import structlog

logger = structlog.get_logger()

def log_cdn_response(response, url):
    headers = dict(response.headers)

    cache_status = headers.get('X-Cache', headers.get('CF-Cache-Status', 'Unknown'))
    age = headers.get('Age', '0')

    logger.info("cdn_response",
        url=url,
        cache_status=cache_status,
        age_seconds=age,
        content_type=headers.get('Content-Type'),
        content_length=headers.get('Content-Length'))

# Cloudflare Analytics API
# curl "https://api.cloudflare.com/client/v4/zones/$ZONE/analytics/dashboard" \
#   -H "Authorization: Bearer ***

Alert Rules

# CDN-specific alerts
- alert: CDNCacheHitRateLow
  expr: cdn_cache_hits_total / (cdn_cache_hits_total + cdn_origin_fetches_total) < 0.7
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: "CDN cache hit rate below 70%"

- alert: CDNOriginLatencyHigh
  expr: histogram_quantile(0.95, cdn_origin_fetch_duration_seconds) > 2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "CDN origin fetch latency above 2 seconds"

- alert: CDNBandwidthAnomaly
  expr: rate(cdn_bandwidth_bytes_total[5m]) > 1.1 * avg_over_time(rate(cdn_bandwidth_bytes_total[1h])[7d:1h])
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "CDN bandwidth significantly above normal"

Implementation Checklist

Security Checklist

  • Set appropriate Cache-Control headers — Never cache private/personal content
  • Use private for user-specific data — User profiles, auth tokens, payment info
  • Implement signed URLs — For premium content that shouldn’t be shared
  • Validate Origin headers — Prevent host header injection attacks
  • Enable DDoS protection — Most CDNs provide this; ensure it’s configured
  • Monitor for cache poisoning — Unexpected content in cache
  • Use WAF rules — Block malicious requests at CDN edge
  • Certificate management — Automated renewals; prevent expiry
  • Purge credentials on compromise — If API keys leaked, rotate immediately
  • Respect Vary headers — Cache different versions appropriately
# Security headers via CDN
# Cloudflare Workers example
response.headers.set('X-Content-Type-Options', 'nosniff')
response.headers.set('X-Frame-Options', 'DENY')
response.headers.set('X-XSS-Protection', '1; mode=block')
response.headers.set('Referrer-Policy', 'strict-origin-when-cross-origin')

# Signed URL for premium content
# Cloudflare Stream: signed tokens for time-limited access
# https://example.com/video.mp4?exp=1699999999&token=***

Key Takeaways

  • CDNs reduce latency by serving from geographically close edge locations
  • Cache-Control headers control what and how long CDN caches content
  • Static assets with content hashes should use immutable for maximum caching
  • HTML should have short TTL with stale-while-revalidate for smoothness
  • Always use private for user-specific, authenticated, or sensitive content
  • Cache invalidation (purge) is the hardest problem — design around it
  • Edge computing adds capability but adds complexity — use sparingly
  • Origin shield prevents cache stampede from overwhelming your origin

Copy/Paste Checklist

# CDN header checklist for static site
# HTML (short cache, revalidate)
Cache-Control: public, max-age=60, stale-while-revalidate=300

# Versioned static assets (immutable)
Cache-Control: public, max-age=31536000, immutable

# Images (medium cache)
Cache-Control: public, max-age=86400, stale-while-revalidate=3600

# API responses (don't cache or very short)
Cache-Control: private, no-store
# OR for public APIs
Cache-Control: public, max-age=60

# User-specific content (never CDN cache)
Cache-Control: private, no-store

# Cloudflare purge all
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE/purge_cache" \
  -H "Authorization: Bearer *** \
  -H "Content-Type: application/json" \
  --data '{"purge_everything": true}'

# Cloudflare purge by tag
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE/purge_cache" \
  -H "Authorization: Bearer *** \
  -H "Content-Type: application/json" \
  --data '{"tags":["product-123"]}'

# Deployment checklist:
# Set up cache headers before going live
# Test cache behavior with curl -I
# Implement versioning/hashing for static assets
# Plan cache purge strategy for deployments
# Enable origin shield to protect origin
# Set up monitoring for cache hit rate
# Configure alerts for origin fetch latency
# Review security headers (X-Frame-Options, CSP, etc.)

Best Practices Summary

Cache Header Strategy

Content TypeCache-Control HeaderWhy
HTML pagespublic, max-age=60, stale-while-revalidate=300Fresh enough, serves stale during revalidation
Versioned JS/CSS (hashed)public, max-age=31536000, immutableNever changes, cache forever
Imagespublic, max-age=86400, stale-while-revalidate=3600Medium freshness, saves bandwidth
API responsesprivate, no-storeNever cache user-specific data
Authenticated APIprivate, no-store or public, max-age=0Don’t leak private content

Performance Checklist

  • Enable Brotli compression (15-25% better than gzip)
  • Enable origin shield to reduce origin load
  • Use stale-while-revalidate to serve stale during revalidation
  • Set up tiered caching for large content catalogs
  • Monitor cache hit ratio (target 90%+ for static sites)
  • Use Vary: Accept-Encoding only when necessary

Operations Checklist

  • Set up automated cache purge on deployment
  • Configure multi-CDN failover
  • Enable CDN health checks and origin fallback
  • Monitor TTLs — too long causes stale content, too short increases origin load
  • Use cache tags for granular invalidation
  • Log and alert on cache poisoning attempts

Real-World Case Studies


On June 21, 2022, Cloudflare suffered a global outage affecting 19 of their core data centers. The root cause was a bug in a deployment that included a rule to reject traffic from specific IP ranges — but the rule was applied incorrectly, causing all traffic at affected PoPs to be rejected rather than just the targeted IPs.

The impact: sites using Cloudflare saw HTTP 522 (Connection Timed Out) errors. Cloudflare’s own status page went down. The outage lasted approximately 6 hours globally.

The lesson for CDN-dependent infrastructure: CDN is a single point of failure even when the CDN promises 100% uptime. The correct mitigations:

  1. Multi-CDN strategy: Route different percentages of traffic to different CDNs. If one fails, you fail over to the other. This adds complexity but eliminates single CDN dependency.

  2. Origin fallback: Configure your origin as a direct fallback. During a CDN outage, some users will get slower responses from origin, but the site stays up.

  3. Health checks and automatic failover: Use a load balancer or DNS-based failover that detects CDN unavailability and routes traffic elsewhere. The Cloudflare status page going down was itself a cascading failure — their internal monitoring relied on the same infrastructure they were trying to monitor.

The irony: the outage was caused by a change intended to protect against a specific threat. The change was misconfigured. This is a reminder that deploys to critical infrastructure deserve extra scrutiny, staged rollouts, and the ability to roll back immediately.


Interview Questions

1. How does a CDN choose which PoP to route a user request to?

A CDN uses Anycast routing to route requests to the nearest PoP. Multiple PoPs announce the same IP address via BGP, and the user's router automatically picks the shortest path. DNS-based routing can also be used, where the CDN's authoritative DNS returns a different IP based on the user's geographic location (resolved via EDNS or the requesting resolver's IP).

  • Anycast: same IP announced from all PoPs, routing protocol picks shortest path
  • DNS routing: CDN DNS resolver returns PoP IP based on user geolocation
  • Some CDNs also consider PoP load and health in their routing decisions
2. What is the difference between `Cache-Control: private` and `Cache-Control: no-store`?

Both prevent caching, but at different layers and with different implications:

  • private: Caches at the browser only, not at shared proxies or CDNs. Appropriate for user-specific content that shouldn't be stored on intermediate proxies.
  • no-store: Tells all caches (browser, CDN, proxies) not to store any part of the response. More restrictive. Used for highly sensitive data like credentials or payment info.

In practice: private is sufficient for most user-specific content. no-store should be reserved for highly sensitive data where even browser caching is a concern.

3. How does `stale-while-revalidate` improve user experience?

stale-while-revalidate allows the CDN to serve a cached response even after it expires, while simultaneously fetching a fresh copy in the background.

  • User benefit: Gets an immediate response (no waiting for revalidation)
  • Freshness benefit: Next user gets the updated content
  • Example: Cache-Control: public, max-age=3600, stale-while-revalidate=60 — serve stale for up to 60 seconds past expiry while revalidating

This is ideal for content that changes occasionally but doesn't need to be perfectly fresh on every request.

4. What is cache invalidation and what strategies exist?

Cache invalidation is the process of forcing the CDN to discard or update cached content before its TTL expires.

  • Purge by URL: Delete specific cached URLs via API
  • Purge by tag: Delete all cached responses with a specific cache tag (e.g., all pages showing product-123)
  • Purge everything: Flush entire cache (expensive, use sparingly)
  • TTL expiration: Natural invalidation when max-age passes

Cache invalidation is one of the hardest problems in distributed systems because you need to ensure all PoPs worldwide invalidate simultaneously.

5. What is origin shield and why is it important?

Origin shield is an intermediate caching tier between CDN edge nodes and your origin server. It reduces origin load by serving as a shared cache for multiple edge PoPs.

  • Without shield: Every cache miss at every edge hits origin directly (thundering herd problem)
  • With shield: Only the shield cache misses hit origin; edges share the shield's cache
  • Benefit: Reduces origin bandwidth by 50-90%, protects against cache stampedes

Cloudflare calls this "Argo" or "Tiered Cache." Fastly calls it "Shielding."

6. How would you design a cache invalidation strategy for a static site with 10,000 pages that updates content on every deploy?

A multi-layered approach works best:

  • Content-hashed filenames: Every deploy generates new hashes. Old URLs naturally become uncached as TTL passes. New URLs cache immediately.
  • Tag-based invalidation: Tag all pages with content type or section. On deploy, purge only affected tags.
  • Short TTL on HTML: max-age=60, stale-while-revalidate=300 means maximum 5 minutes of stale content
  • Cache warming: After purge, proactively fetch critical pages to warm the cache

Avoid purging everything — it causes a thundering herd on your origin as all PoPs refill simultaneously.

7. What is the difference between a CDN and a reverse proxy?

A CDN is a globally distributed network of PoPs that acts as a reverse proxy plus additional features.

  • Reverse proxy: Sits in front of your origin, caches responses, forwards requests. Single location or few locations.
  • CDN: Reverse proxy distributed across hundreds of PoPs worldwide. Adds DDoS protection, WAF, edge computing, performance optimization, analytics.

Technically, every CDN edge is a reverse proxy. But not every reverse proxy is a CDN (e.g., nginx running in front of your app is a reverse proxy but not a CDN).

8. What happens when a CDN cache miss occurs? Walk through the full request flow.
  1. User makes request to CDN edge (e.g., Tokyo)
  2. Edge checks local cache — MISS
  3. Edge checks origin shield cache (if configured) — MISS
  4. Edge makes request to origin server, passes Cache-Control headers from origin response
  5. Origin responds with content and cache headers (e.g., max-age=3600)
  6. Edge caches content at local cache and/or origin shield based on headers
  7. Edge returns response to user with cache status header (X-Cache: MISS)

All subsequent requests for the same URL within the TTL window return X-Cache: HIT from cache.

9. How does the `Vary` header impact CDN caching?

The Vary header tells the CDN that it should cache separate versions of a response based on certain request headers.

  • Vary: Accept-Encoding: Cache separate versions for gzip vs Brotli vs uncompressed
  • Vary: Accept-Language: Cache separate versions for en vs fr vs ja
  • Vary: Authorization: Cache separately for authenticated vs anonymous users

Warning: Each Vary value multiplies your cache entries. Vary: Accept-Encoding, Accept-Language, Authorization creates 3 dimensions of cache entries. Too many variations floods your CDN cache and reduces hit ratio.

10. What is a multi-CDN strategy and when would you use it?

Multi-CDN uses two or more CDN providers simultaneously, routing traffic between them for improved reliability and/or performance.

  • Failover: If CDN-A goes down, traffic automatically routes to CDN-B
  • Geographic routing: Route Asia-Pacific to Cloudflare, Americas to Fastly based on performance
  • Feature specialization: Use one CDN for static assets, another for video streaming

When to use: SLA requires >99.9% uptime, critical infrastructure where CDN failure is business-critical, or when a single CDN lacks PoP coverage in key regions.

Complexity cost: Multi-CDN requires DNS-based routing, health checks, and coordination logic. Only worth it if your SLA genuinely requires it.

11. How do you debug CDN cache behavior when users report seeing stale content?

A systematic debugging approach helps identify the root cause:

  • Check CDN headers: Inspect CF-Cache-Status, X-Cache, Age, and Cache-Control headers on responses
  • Verify purge was successful: Use CDN API to check purge status; some CDNs show propagation time
  • Check for query string differences: /page vs /page?v=123 are different cache entries
  • Examine cache tags: If using cache tags, verify the correct tags are set and no older cached pages have stale tags
  • Bypass cache for testing: Use CDN dev mode or Cache-Control: no-cache to confirm origin behavior
  • Check edge vs shield: Purge at edge may not invalidate shield cache — verify shield was also purged

Common causes: TTL too long, purge not propagated, query strings creating unique entries, shield cache holding stale content.

12. What is the difference between CDN caching and browser caching, and how do they interact?

CDN and browser caching serve different purposes and have different scopes:

  • Browser cache: Caches on the user's device for that specific browser instance. Limited to one user.
  • CDN cache: Caches at edge nodes globally, serving multiple users. Much larger hit radius.
  • Interaction: Browser checks CDN first via Cache-Control headers. If CDN returns max-age=3600 and browser hasn't cached, browser fetches from CDN.

Header directives work differently: s-maxage is only respected by shared proxies (CDN), not browsers. private prevents CDN caching but allows browser caching.

Design for both: Use content-hashed filenames for long-term browser caching, short TTL on HTML for CDN freshness.

13. How would you handle cache invalidation for a live sports website with real-time scores?

Live sports websites require special handling due to the freshness vs performance trade-off:

  • Short TTL with frequent purging: Set max-age=5-10 seconds, purge on score updates
  • Surrogate keys: Tag pages by game/team; purge all pages related to a game when score updates
  • Edge computing for dynamic parts: Cache static shell at CDN, fetch scores via API at edge
  • Stale-while-revalidate: Serve slightly stale score (5-10s old) while fetching fresh data
  • Separate cache for scores API: Score endpoint has different cache rules than page content

Use WebSocket or Server-Sent Events for truly real-time updates rather than polling. CDN edge functions can fan-out to push updates to connected users.

14. What are the security implications of CDN, and how do you mitigate them?

CDNs introduce both security benefits and new attack surfaces:

  • DDoS protection: CDN absorbs volumetric attacks before they reach origin
  • Shared risk: If CDN PoP is compromised, cached content could be malicious
  • Header injection: CDN must validate and sanitize origin response headers
  • Cache poisoning: Attacker could poison CDN cache with malicious content
  • TLS termination: CDN decrypts traffic — ensure CDN provider is trusted

Mitigations:

  • Use CDN WAF rules to block common attack patterns
  • Implement Content Security Policy (CSP) headers via CDN
  • Validate origin responses before caching (disable origin pull if possible)
  • Use signed URLs for premium content
  • Monitor for unexpected cache behavior indicating poisoning
  • Ensure CDN has proper SOC 2 compliance if handling sensitive data
15. How does CDN affect SEO, and what cache headers help or hurt SEO?

CDN can significantly impact SEO through page load speed, which is a Google ranking factor:

  • Positive effects: Faster TTFB, lower latency, improved Core Web Vitals (LCP, FID)
  • Negative effects: Incorrect caching can serve stale content or prevent crawling

Cache headers affecting SEO:

  • Cache-Control: no-store on HTML: Search engines may recrawl more frequently but don't penalize this
  • Long TTL without versioning: Updated content not picked up until TTL expires
  • Missing Vary: Accept-Encoding: May cause indexing issues with wrong content type

Best practices: Set reasonable TTLs (1 hour for HTML, 24h for static), implement cache purge on content update, use stale-while-revalidate to balance freshness and performance.

16. Explain the concept of "thundering herd" problem in CDN context and how to prevent it.

The thundering herd problem occurs when many users request the same uncached content simultaneously, overwhelming the origin:

  • Scenario: Popular content expires from cache; all users requesting simultaneously cause 1000s of origin requests
  • Impact: Origin server overwhelmed, slow responses, potential origin crash

Solutions:

  • Origin shield: Only one shield request hits origin per cache miss (others wait for shield)
  • Cache warming: Proactively fetch critical content after purge before users request it
  • Stale-while-revalidate: Serve stale content to some users while one request refreshes
  • Request coalescing: CDN deduplicates concurrent requests for same URL to single origin request
  • Jittered TTLs: Add random offset to TTL so cache entries don't expire simultaneously
17. What is the role of CDN in a DDoS attack, and how does anycast help mitigate DDoS?

CDN plays a dual role in DDoS scenarios:

  • First line of defense: CDN absorbs attack traffic at edge before it reaches origin
  • Attack amplifier: If CDN is compromised, it could be used to amplify attacks

Anycast helps DDoS mitigation:

  • Traffic distribution: Anycast routes attack traffic to multiple PoPs, diluting impact
  • Geographic isolation: Attack targeting one region doesn't affect other regions
  • BGP-based filtering: CDN can announce specific IP ranges from clean PoPs only

Additional CDN DDoS features: Rate limiting, challenge pages (CAPTCHA), fingerprinting bot patterns, automatic traffic normalization.

18. How do you measure CDN effectiveness and what metrics should you track?

Key CDN metrics to track:

  • Cache hit ratio: Percentage of requests served from cache vs origin (target: 90%+ for static)
  • TTFB (Time to First Byte): Lower is better; CDN should show dramatic improvement over direct origin
  • Origin bandwidth: Tracks how much traffic still hits your origin servers
  • Request rate: Total requests through CDN vs directly to origin
  • Edge response time: CDN processing time at edge nodes

Calculate CDN savings:

origin_bw_saved = total_bw × cache_hit_ratio
cost_saved = origin_bw_saved × origin_bandwidth_cost_per_gb

Set up monitoring dashboards and alerts for cache hit ratio dropping below threshold, origin latency spiking, or unexpected origin bandwidth increases.

19. What is the relationship between CDN and HTTP/2 or HTTP/3, and why does it matter?

CDNs are critical for HTTP/2 and HTTP/3 performance due to connection handling:

  • HTTP/2 multiplexing: CDN maintains persistent connections to origin; handles many concurrent streams over few connections
  • HPACK compression: CDN efficiently compresses HTTP headers for HTTP/2
  • Connection coalescing: CDN reuses origin connections for multiple user requests
  • HTTP/3 (QUIC): CDN handles QUIC handshake and connection migration; fallback to HTTP/2 if QUIC blocked

Performance impact: Without CDN, each user would need a separate connection to origin. CDN aggregates thousands of users through persistent connections, dramatically reducing origin load.

CDN also enables HTTP/3 support before origin servers implement it — user connects to CDN over HTTP/3, CDN connects to origin over HTTP/2 or HTTP/3.

20. Design a CDN configuration for an e-commerce platform with 1M products and frequent flash sales.

E-commerce CDN configuration requires balancing cache freshness with performance:

  • Product pages: Medium TTL (1-4 hours), tag-based invalidation on price/stock changes
  • Home page and category pages: Short TTL (5-15 min) with stale-while-revalidate
  • Images (product photos): Long TTL (24-48h), immutable via content hash in filename
  • Inventory/price API: Short TTL or no-cache; use Cache-Control: private, max-age=5
  • Shopping cart/checkout: Never CDN cache; origin only with private, no-store

Flash sale preparation:

  • Pre-warm cache for top-selling products before sale starts
  • Reduce TTL on flash sale items to 1-5 minutes during event
  • Use edge computing to queue excess traffic rather than overwhelming origin
  • Implement rate limiting at edge to prevent bots and abuse
  • Configure multi-CDN failover for sale-critical availability

Monitoring: Watch cache hit ratio during flash sales (traffic patterns unusual), monitor origin load, set up autoscaling triggers.


Further Reading

Official Documentation

RFCs and Standards

  • RFC 7234 — HTTP/1.1 Caching specification
  • RFC 9111 — HTTP Semantics (updated caching rules)
  • RFC 9000 — QUIC transport protocol

Tools and Utilities

ToolPurpose
CacheCheckTest CDN cache behavior
HTTP ArchiveCrawl and analyze CDN usage patterns
WebPageTestMeasure CDN impact on page load
Cloudflare RadarGlobal internet traffic insights
SpeedCurveMonitor CDN performance over time

Conclusion

CDNs are essential infrastructure for any public website. They reduce latency, cut costs, and add a layer of protection. But they require careful configuration — wrong cache headers can leak private data or serve stale content for days.

Start with sensible defaults: short cache on HTML, long cache on static assets with content hashes, and add edge computing only when you have clear performance gains to show for it.


Category

Related Posts

Load Balancing: The Traffic Controller of Modern Infrastructure

Learn how load balancers distribute traffic across servers, the differences between L4 and L7 load balancing, and when to use software vs hardware solutions.

#system-design #load-balancing #infrastructure

Cache Stampede Prevention: Protecting Your Cache

Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.

#cache #cache-stampede #performance

Database Capacity Planning: A Practical Guide

Plan for growth before you hit walls. This guide covers growth forecasting, compute and storage sizing, IOPS requirements, and cloud vs on-prem decisions.

#database #capacity-planning #infrastructure