System Design: Twitter Feed Architecture and Scalability

Deep dive into Twitter system design. Learn about feed generation, fan-out, timeline computation, search, notifications, and scaling challenges.

published: reading time: 30 min read author: GeekWorkBench

System Design: Twitter Feed Architecture and Scalability

Twitter handles millions of users posting billions of tweets. The core challenge is not just storing tweets but delivering them to followers in real-time. This case study examines how to design a Twitter-like social media platform.

This is an advanced system design topic. If you are new to distributed systems, start with our Database Scaling and Caching Strategies guides.

Introduction

The Twitter timeline architecture shows what it takes to build a system that’s simultaneously read-heavy, write-heavy, and real-time at scale. This case study covers data modeling, production hardening, and the tradeoffs in between.

Requirements Analysis

Functional Requirements

Users should be able to:

  • Post tweets (text, images, videos)
  • Follow/unfollow other users
  • View a timeline of tweets from followed users
  • Search for tweets and users
  • Like, retweet, and reply to tweets
  • Receive notifications for interactions

Non-Functional Requirements

The system needs:

  • Timeline latency under 200ms for 99th percentile
  • Support for 500 million users
  • Handle 200 million daily active users
  • Process 500 million tweets per day (6,000 tweets per second peak)
  • 99.99% availability

Data Models

Core Entities

-- Users table
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    username VARCHAR(15) NOT NULL UNIQUE,
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,
    display_name VARCHAR(50),
    bio TEXT,
    avatar_url VARCHAR(500),
    follower_count BIGINT DEFAULT 0,
    following_count BIGINT DEFAULT 0,
    verified BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Tweets table
CREATE TABLE tweets (
    id BIGSERIAL PRIMARY KEY,
    author_id BIGINT NOT NULL REFERENCES users(id),
    parent_id BIGINT REFERENCES tweets(id),  -- For replies
    content TEXT NOT NULL,
    media_urls JSONB DEFAULT '[]',
    retweet_of BIGINT REFERENCES tweets(id),
    like_count BIGINT DEFAULT 0,
    retweet_count BIGINT DEFAULT 0,
    reply_count BIGINT DEFAULT 0,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    deleted BOOLEAN DEFAULT FALSE,

    INDEX idx_tweets_author (author_id),
    INDEX idx_tweets_created (created_at DESC)
);

-- Follows table
CREATE TABLE follows (
    follower_id BIGINT NOT NULL REFERENCES users(id),
    following_id BIGINT NOT NULL REFERENCES users(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    PRIMARY KEY (follower_id, following_id)
);

-- Likes table
CREATE TABLE likes (
    user_id BIGINT NOT NULL REFERENCES users(id),
    tweet_id BIGINT NOT NULL REFERENCES tweets(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    PRIMARY KEY (user_id, tweet_id)
);

NoSQL Alternative

For extreme write throughput, use a wide-column store:

{
  "table": "tweets",
  "key": "tweet_id",
  "columns": {
    "author_id": "bigint",
    "content": "text",
    "created_at": "timestamp",
    "media": ["list of urls"]
  },
  "clustering": ["created_at DESC"]
}

Timeline Architecture

The timeline is where Twitter’s architecture gets interesting. There are two main approaches: push (fan-out on write) and pull (fan-out on read).

Push Model (Fan-out on Write)

When a user posts a tweet, push it to all followers’ timelines immediately:

sequenceDiagram
    participant U as User Posts Tweet
    participant API as API Server
    participant FF as Fan-out Service
    participant Q as Message Queue
    participant TM as Timeline Cache

    U->>API: POST /tweets {content: "Hello"}
    API->>DB: Store tweet
    DB-->>API: Tweet created
    API->>FF: Fan-out tweet
    FF->>DB: Get followers (10K)
    FF->>Q: Enqueue fan-out jobs
    Q->>TM: Push to each timeline cache

This approach provides fast reads but expensive writes. For celebrities with millions of followers, fan-out becomes problematic.

Pull Model (Fan-out on Read)

When a user loads their timeline, aggregate tweets from all followed users:

sequenceDiagram
    participant U as User Views Timeline
    participant API as API Server
    participant TM as Timeline Cache
    participant DB as Tweet DB
    participant MR as Merge & Rank

    U->>API: GET /timeline
    API->>TM: Get cached timeline IDs
    TM-->>API: Timeline IDs (200 tweets)
    API->>MR: Request tweet details
    MR->>DB: Batch fetch tweets
    DB-->>MR: Tweet objects
    MR-->>API: Enriched tweets
    API-->>U: Timeline response

Hybrid approach combines both: pull for celebrities, push for regular users.

Feed Generation

Timeline Consistency Model

The hybrid push/pull architecture introduces a fundamental consistency challenge: how do you merge pre-computed tweets (pushed) with on-demand tweets (pulled) while maintaining chronological order?

The Problem

When a user loads their timeline, they receive:

  1. Pre-computed tweets from regular users (cached in timeline store)
  2. On-demand tweets from celebrities (fetched at read time)

These two sources arrive separately and must be merged. The merge must preserve reverse-chronological order without comparing actual timestamps.

The Solution: Snowflake ID Sorting

Snowflake IDs provide a self-ordering property. Each ID encodes:

  • 41 bits: timestamp (milliseconds since Twitter epoch)
  • 5 bits: datacenter ID
  • 5 bits: worker ID
  • 12 bits: sequence number

Since datacenter clocks are synchronized and IDs increment within each datacenter, ID ordering correlates directly with time ordering. The merge algorithm:

def merge_timeline(pushed_tweets, pulled_tweets, limit=200):
    # Both lists pre-sorted by Snowflake ID descending
    merged = []
    p_idx, pu_idx = 0, 0

    while len(merged) < limit:
        p_tweet = pushed_tweets[p_idx] if p_idx < len(pushed_tweets) else None
        pu_tweet = pulled_tweets[pu_idx] if pu_idx < len(pulled_tweets) else None

        if p_tweet is None:
            merged.append(pulled_tweets[pu_idx])
            pu_idx += 1
        elif pu_tweet is None:
            merged.append(pushed_tweets[p_idx])
            p_idx += 1
        elif p_tweet.id > pu_tweet.id:  # ID encodes time
            merged.append(p_tweet)
            p_idx += 1
        else:
            merged.append(pu_tweet)
            pu_idx += 1

    return merged

Consistency Guarantees:

  • Eventual consistency for pushed tweets: Due to async fan-out, there’s a small window where a new tweet exists but hasn’t been pushed to all timelines
  • Strong consistency for pulled tweets: Celebrity tweets are fetched directly from DB at read time
  • No duplicate tweets: The timeline service deduplicates by tweet ID during merge
  • No ordering anomalies: Snowflake IDs eliminate clock synchronization issues

Cache Invalidation Strategy:

When a user unfollows someone, their cached timeline retains old tweets from that user until TTL expiry. This is acceptable because:

  1. The user explicitly chose to unfollow
  2. The stale data naturally ages out
  3. Forced cache invalidation would be expensive

For scenarios requiring stronger guarantees (like blocking), use write-through invalidation: when you block a user, explicitly remove their tweets from your timeline cache.

Timeline Service

class TimelineService:
    def __init__(self, cache: RedisCluster, db: Database):
        self.cache = cache
        self.db = db

    async def get_timeline(
        self,
        user_id: int,
        limit: int = 200,
        cursor: str = None
    ) -> TimelineResponse:
        # Try cache first
        cached = await self.cache.get(f"timeline:{user_id}")
        if cached and not cursor:
            return self._parse_cached_timeline(cached)

        # Fetch from multiple sources
        following = await self._get_following(user_id)
        tweets = await self._fetch_tweets(user_id, following, limit, cursor)

        return TimelineResponse(
            tweets=tweets,
            cursor=self._generate_cursor(tweets)
        )

    async def _fetch_tweets(
        self,
        user_id: int,
        following: List[int],
        limit: int,
        cursor: str
    ) -> List[Tweet]:
        # Split into celebrities and regular users
        celebrities, regular = await self._categorize_users(following)

        # Fetch in parallel
        celebrity_tweets = await self._fetch_celebrity_tweets(celebrities, limit)
        regular_tweets = await self._fetch_regular_tweets(
            user_id, regular, limit - len(celebrity_tweets), cursor
        )

        # Merge and rank
        merged = self._merge_tweets(celebrity_tweets, regular_tweets)
        return self._rank_tweets(merged)[:limit]

Caching Timeline

Cache timelines with user-specific keys:

CACHE_CONFIG = {
    "timeline_ttl": 3600,        # 1 hour
    "max_timeline_size": 800,    # Store 800 tweets, return 200
    "min_followers_for_fanout": 10000  # Celebrity threshold
}

async def cache_timeline(user_id: int, tweets: List[Tweet]):
    key = f"timeline:{user_id}"
    serialized = serialize_tweets(tweets)

    await self.cache.setex(
        key,
        CACHE_CONFIG["timeline_ttl"],
        serialized
    )

Fan-out Service

The fan-out service distributes tweets to follower timelines:

High-Follower Handling

class FanoutService:
    async def fanout_tweet(self, tweet: Tweet):
        author_id = tweet.author_id
        follower_count = await self.db.fetch_val(
            "SELECT follower_count FROM users WHERE id = $1",
            author_id
        )

        if follower_count > CACHE_CONFIG["min_followers_for_fanout"]:
            # Celebrity: skip fan-out, users pull on read
            await self._mark_tweet__celebrity(tweet)
        else:
            # Regular user: fan-out to all followers
            await self._fanout_to_followers(tweet)

    async def _fanout_to_followers(self, tweet: Tweet):
        async with self.db.pool.acquire() as conn:
            async for row in conn.cursor(
                "SELECT follower_id FROM follows WHERE following_id = $1",
                tweet.author_id
            ):
                follower_id = row["follower_id"]
                await self._push_to_timeline(follower_id, tweet)

    async def _push_to_timeline(self, user_id: int, tweet: Tweet):
        key = f"timeline:{user_id}"
        tweet_preview = {
            "id": tweet.id,
            "author_id": tweet.author_id,
            "created_at": tweet.created_at.isoformat()
        }

        # Add to front of list
        await self.cache.lpush(key, serialize_tweet_preview(tweet_preview))
        # Trim to max size
        await self.cache.ltrim(key, 0, CACHE_CONFIG["max_timeline_size"] - 1)

Search Architecture

Twitter search requires full-text search across billions of tweets.

Elasticsearch Mapping

{
  "mappings": {
    "properties": {
      "tweet_id": { "type": "long" },
      "author_id": { "type": "long" },
      "content": {
        "type": "text",
        "analyzer": "twitter_analyzer",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      },
      "hashtags": { "type": "keyword" },
      "mentions": { "type": "keyword" },
      "created_at": { "type": "date" },
      "engagement_score": { "type": "integer" }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "twitter_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop", "snowball"]
        }
      }
    }
  }
}

Search Query

async def search_tweets(
    query: str,
    limit: int = 20,
    since_id: str = None,
    max_id: str = None
) -> SearchResponse:
    must_clauses = [
        {"match": {"content": query}}
    ]

    filter_clauses = [
        {"range": {"created_at": {"gte": "2020-01-01"}}}
    ]

    if since_id:
        filter_clauses.append({"range": {"tweet_id": {"gt": int(since_id)}}})

    result = await es.search(
        index="tweets",
        body={
            "query": {
                "bool": {
                    "must": must_clauses,
                    "filter": filter_clauses
                }
            },
            "sort": [
                {"_score": "desc"},
                {"tweet_id": "desc"}
            ],
            "size": limit
        }
    )

    return SearchResponse(
        tweets=[Tweet(**hit["_source"]) for hit in result["hits"]["hits"]],
        count=result["hits"]["total"]["value"]
    )

Notification System

Notification Types

TypeTriggerDelivery
Mention@username in tweetImmediate
ReplyReply to user’s tweetImmediate
LikeSomeone likes your tweetBatch (hourly)
FollowSomeone follows youBatch (hourly)
RetweetSomeone retweets your tweetBatch (hourly)

Notification Pipeline

graph LR
    A[Tweet Event] --> B[Event Router]
    B --> C[Real-time Queue]
    B --> D[Batch Queue]
    C --> E[Push Service]
    C --> F[WebSocket]
    D --> G[Batch Processor]
    G --> H[Email Service]
    G --> I[Push Notifications]

Notification Service

class NotificationService:
    def __init__(self, queue: KafkaProducer, db: Database):
        self.queue = queue
        self.db = db

    async def notify_mention(self, tweet: Tweet, mentioned_users: List[int]):
        for user_id in mentioned_users:
            await self.queue.send("notifications", {
                "type": "mention",
                "user_id": user_id,
                "tweet_id": tweet.id,
                "author_id": tweet.author_id,
                "created_at": datetime.utcnow().isoformat()
            })

    async def notify_followers(self, follower_ids: List[int], event: Dict):
        batch = {
            "type": "follow",
            "user_ids": follower_ids,
            "actor_id": event["actor_id"],
            "created_at": datetime.utcnow().isoformat()
        }
        await self.queue.send_batch("notifications-batch", [batch])

Caching Patterns

Multi-Layer Caching

CACHE_LAYERS = [
    {"name": "tweet", "ttl": 86400, "size": "16MB"},
    {"name": "timeline", "ttl": 3600, "size": "100MB"},
    {"name": "user", "ttl": 900, "size": "16MB"},
    {"name": "social-graph", "ttl": 300, "size": "50MB"}
]

Cache-Aside for Tweets

async def get_tweet(tweet_id: int) -> Optional[Tweet]:
    # L1: Memcached
    cached = await mc.get(f"tweet:{tweet_id}")
    if cached:
        return Tweet(**json.loads(cached))

    # L2: Redis cluster
    cached = await redis.get(f"tweet:{tweet_id}")
    if cached:
        tweet = Tweet(**json.loads(cached))
        await mc.set(f"tweet:{tweet_id}", json.dumps(tweet.dict()))
        return tweet

    # Database
    tweet = await db.fetch_one(
        "SELECT * FROM tweets WHERE id = $1", tweet_id
    )

    if tweet:
        await redis.setex(f"tweet:{tweet_id}", 3600, json.dumps(tweet))
        await mc.set(f"tweet:{tweet_id}", json.dumps(tweet.dict()))

    return tweet

Write Path Optimization

Write-Behind Cache

async def post_tweet(user_id: int, content: str) -> Tweet:
    # Write to database
    tweet = await db.create(
        "INSERT INTO tweets (author_id, content) VALUES ($1, $2) RETURNING *",
        user_id, content
    )

    # Update timeline cache asynchronously
    asyncio.create_task(self._update_timeline_caches(user_id, tweet))

    # Index for search asynchronously
    asyncio.create_task(self._index_tweet(tweet))

    return tweet

Tweet ID Generation

Use snowflake IDs for tweet ordering:

class SnowflakeID:
    def __init__(self, datacenter_id: int = 0, worker_id: int = 0):
        self.datacenter_id = datacenter_id
        self.worker_id = worker_id
        self.timestamp = 0
        self.sequence = 0

    def generate(self) -> int:
        timestamp = int(time.time() * 1000) - 1609459200000  # Epoch offset

        if timestamp == self.timestamp:
            self.sequence = (self.sequence + 1) & 4095
        else:
            self.sequence = 0

        self.timestamp = timestamp

        return (
            (timestamp << 22) |
            (self.datacenter_id << 17) |
            (self.worker_id << 12) |
            self.sequence
        )

API Design

Core Endpoints

MethodEndpointDescription
POST/api/v2/tweetsCreate tweet
DELETE/api/v2/tweets/:idDelete tweet
GET/api/v2/users/:id/timelineGet user timeline
GET/api/v2/timelines/homeGet home timeline
GET/api/v2/tweets/searchSearch tweets
POST/api/v2/users/:id/followFollow user
DELETE/api/v2/users/:id/followUnfollow user

Response Format

{
  "data": {
    "id": "1234567890",
    "text": "Hello, Twitter!",
    "author_id": "9876543210",
    "created_at": "2026-03-22T10:30:00Z",
    "like_count": 1523,
    "retweet_count": 342,
    "reply_count": 89
  },
  "meta": {
    "result_count": 1,
    "next_token": "b26v89c19zqg8o3fosdk7n9l"
  }
}

Scalability Challenges

The Celebrity Problem

Users like @elonmusk have millions of followers. Fanning out their tweets to all timelines takes time and resources.

Solutions:

  • Skip fan-out for high-follower accounts
  • Use CDN for celebrity content
  • Implement rate limiting on celebrity tweets

Hot Keys

Popular tweets create hot spots in the database.

Solutions:

  • Partition by hash(tweet_id)
  • Use eventual consistency for counts
  • Implement client-side rate limiting

Write Amplification

Fan-out multiplies write load.

Solutions:

  • Async fan-out via queues
  • Hybrid push/pull model
  • Periodic batch updates

Trade-off Analysis

FactorPush (Fan-out on Write)Pull (Fan-out on Read)Hybrid Approach
Read latencyLow (pre-computed)High (compute on read)Medium
Write latencyHigh (fan-out to all)Low (single write)Medium
Memory usageHigh (all timelines)Low (compute on need)Medium
Celebrity postsProblematic (fan-out storm)Efficient (pull)Efficient
ConsistencyEventual (async fan-out)Strong (read own)Tunable
ImplementationComplex (queues, workers)SimplerModerate

Production Failure Scenarios

Failure ScenarioImpactMitigation
Fan-out queue backlogTimeline updates delayed for hoursAuto-scale fan-out workers; prioritize active users
Celebrity tweet viralMassive fan-out burst crashes serversRate limit celebrity tweets; pre-compute popular timelines
Timeline cache missCold start: timeline loads from DB, high latencyPre-warm caches for trending users; use background refresh
Search index lagNew tweets not appearing in searchAccept search lag; prioritize by engagement
Hot tweet partitionSingle partition overloaded by likes/retweetsShard by tweet_id; use eventual consistency for counts
Tweet DB primary failureWrites fail; no new tweetsUse multi-primary replication; promote read replica

Common Pitfalls / Anti-Patterns

Pitfall 1: Fanning Out to All Followers Synchronously

Problem: Synchronous fan-out on every tweet creates massive latency spikes for users with many followers.

Solution: Use async fan-out via message queues. Return success to user immediately, fan-out in background.

Pitfall 2: Not Handling the Celebrity Problem

Problem: A user with 10M followers causes fan-out storms that overwhelm the system.

Solution: Implement hybrid model. Skip fan-out for users above a follower threshold; their tweets are pulled on read.

Pitfall 3: Storing Full Tweets in Timeline Cache

Problem: Caching 800 tweets × 5KB each = 4MB per user timeline in cache.

Solution: Store only tweet IDs and timestamps in timeline cache; fetch full tweet objects separately with multi-get.

Pitfall 4: Ignoring Engagement Calculation Cost

Problem: Calculating “trending” or “ranked” timeline requires scanning engagement data on every read.

Solution: Pre-compute engagement scores; update asynchronously; use Redis sorted sets for real-time ranking.


Real-world Failure Scenarios

Scenario 1: Twitter API Rate Limit Storm (2020)

What happened: During a major global event, Twitter’s API experienced a massive surge in requests that overwhelmed rate limiting infrastructure, causing cascading failures across third-party applications and internal services.

Root cause: Rate limiting tokens were stored in a single Redis cluster without geographic distribution. When the cluster experienced elevated latency, all token validation requests queued up, causing timeouts across the board.

Impact: Third-party Twitter clients became unusable for several hours. Internal services that relied on the Twitter API for real-time data also experienced degradation, affecting timeline generation and search functionality.

Lesson learned: Distribute rate limiting state across multiple Redis clusters with geographic proximity to API consumers. Implement circuit breakers that gracefully degrade rather than cascade when backends are slow.

Scenario 2: Tweet Storage Corruption

What happened: A bug in Twitter’s distributed tweet storage system caused partial data corruption for tweets containing certain Unicode characters, rendering them un retrievable and causing gaps in user timelines.

Root cause: The tweet compaction process did not properly handle multi-byte UTF-8 sequences in edge cases. Corrupted records were compacted and replaced the original, making the data permanently unavailable.

Impact: Affected users saw gaps in their timelines with no explanation. The bug persisted for several days before detection, and the corrupted data was unrecoverable.

Lesson learned: Implement checksums for all stored data. Never overwrite original records during compaction — write the new compacted record to a separate location and atomically update the pointer. Regular integrity checks on stored data are essential.

Scenario 3: Search Index Desynchronization

What happened: Twitter’s real-time search index fell out of sync with the primary tweet database after a network partition, causing newly posted tweets to be invisible in search results for up to 30 minutes.

Root cause: The search indexing pipeline used eventual consistency semantics. When the network partition occurred, the indexing queue accumulated a backlog that took significant time to drain after connectivity was restored.

Impact: Users searching for trending topics during the partition window found incomplete results, leading to complaints about “missing tweets” in search. The desynchronization was not visible to users who only browsed their timeline.

Lesson learned: For real-time search use cases, use synchronous indexing with acknowledgment before confirming the write to the user. Implement index health monitoring and alerts for queue depth and indexing lag metrics.

Quick Recap

  • Timeline delivery: Hybrid push/pull
  • Celebrity handling: Pull on read (skip fan-out)
  • Tweet ordering: Snowflake IDs (time-sortable)
  • Cache strategy: Multi-layer (L1 Memcached, L2 Redis)
  • Write path: Write-behind caching
  • Fan-out: Async via message queues
  • Search: Elasticsearch with custom analyzer
  • Notifications: Real-time (mentions) + batch (likes/follows)

Copy/Paste Checklist

- [ ] Implement hybrid push/pull for celebrity accounts
- [ ] Cache timeline IDs, not full tweet objects
- [ ] Use async fan-out via message queue
- [ ] Partition by tweet_id for hot tweet handling
- [ ] Monitor fan-out queue depth and auto-scale workers
- [ ] Pre-warm caches for trending users
- [ ] Implement rate limiting on tweet publication
- [ ] Use snowflake IDs for ordering

Observability Checklist

Metrics to Capture

  • tweets_published_total (counter) - By author tier (celebrity vs regular)
  • timeline_load_duration_seconds (histogram) - P50, P95, P99
  • timeline_cache_hit_ratio (gauge) - Hit vs miss rate
  • fanout_queue_depth (gauge) - Messages pending fan-out
  • search_indexing_lag_seconds (gauge) - Time from tweet to searchable
  • engagement_events_total (counter) - Likes, retweets, replies by tier

Logs to Emit

{
  "timestamp": "2026-03-22T10:30:00Z",
  "event": "tweet_published",
  "author_id": "123456",
  "author_tier": "celebrity",
  "follower_count": 15000000,
  "fanout_queued": true,
  "fanout_queue_depth": 50000
}

Alerts to Configure

AlertThresholdSeverity
Timeline P99 > 500ms500ms for 5 minWarning
Fan-out queue > 1M1000000 pendingCritical
Cache hit ratio < 80%80% for 10 minWarning
Search lag > 5 min300sWarning

Security Checklist

  • Rate limiting on tweet publication (prevent spam)
  • Tweet content moderation (pre-scan for abuse)
  • Anti-scraping measures (limit bulk reads)
  • DM encryption (end-to-end for private messages)
  • OAuth 2.0 for all API access
  • IP-based access controls for internal services
  • Audit logging of admin operations

Capacity Estimation

QPS Calculations

Daily active users: 200 million
Tweets per user per day: 2 (average)
Peak QPS: 200M × 2 / 86400 ≈ 4,600 tweets/second (average)
Peak spike factor: 10x → 46,000 tweets/second

Timeline reads:
- 50% users check timeline 3x/day
- Average timeline: 200 tweets
- Read QPS: 200M × 0.5 × 3 / 86400 ≈ 3,500 reads/second
- Peak: ~35,000 reads/second

Storage Estimation

Tweets stored: 500 million tweets/day × 365 days × 5 years
= 500M × 365 × 5 = 912.5 billion tweets

Per tweet (avg): 500 bytes content + 200 bytes metadata = 700 bytes
Total: 912.5B × 700 bytes ≈ 639 TB

With 3x replication and indices: ~2 PB storage needed

Interview Questions

1. How does Twitter handle celebrity users with millions of followers?

Twitter uses a hybrid approach. For regular users (under 10,000 followers), tweets are fanned out to follower timelines on write. For celebrities and high-follower accounts, fan-out is skipped. Their tweets are pulled on read when users load their timeline. This prevents fan-out storms that would overwhelm the system.

2. Why does Twitter store only tweet IDs in timeline cache, not full tweet objects?

Memory is the constraint. Twitter caches 800 tweet IDs per user timeline (to serve 200 tweets on request). Each tweet object is around 5KB with metadata. If you cached the full objects, that's 4MB per user timeline. With hundreds of millions of users, you'd need hundreds of petabytes of cache.

Storing only IDs keeps the cache compact. The IDs are small, and you can fetch the actual tweet content with a multi-get operation — one request that pulls 200 tweet objects from cache or database. This is fast because tweet objects for popular content are themselves cached in a separate layer.

3. How does Twitter ensure message ordering in timelines?

Tweet IDs are Snowflake IDs, which encode a timestamp. Each ID is 64 bits: 41 bits for timestamp (milliseconds since a custom epoch), 5 bits for datacenter, 5 bits for worker, and 12 bits for sequence number. The key property is that IDs are monotonically increasing within a datacenter.

When merging tweets from celebrities (pulled at read time) with tweets from regular users (pushed at write time), the client sorts by tweet ID. Since IDs encode time, this gives you chronological order without comparing actual timestamps.

This sidesteps clock synchronization issues. If you used wall-clock timestamps for ordering, you'd need all servers synchronized via NTP, and even small clock skew could cause ordering violations.

4. What happens when you like a tweet?

The like action is lightweight — it's not synchronous on the hot path. The client sends the like request, the server records it in the database with eventual consistency (not immediately consistent), and returns success. If the database write fails, the user sees an error; if it succeeds, the like count might take a moment to update everywhere.

Notifications follow from that: if the tweet author has immediate notifications enabled, they get a push right away. If they've chosen batched notifications, they get a digest later. The distinction matters because a celebrity with millions of followers can't afford immediate fan-out of every like notification.

5. What is the trade-off between push (fan-out on write) and pull (fan-out on read) timeline models?

Push (fan-out on write) means timelines are pre-computed. When you post, the system fans out your tweet to everyone's cached timelines. Reads are fast — the timeline is ready. But writes are expensive, especially for users with many followers.

Pull (fan-out on read) means you write once (to the tweet database) and everyone reads your tweet when they load their timeline. Writes are cheap. Reads are expensive — every timeline load requires merging tweets from everyone you follow.

The hybrid model splits the difference: regular users get pushed, celebrities get pulled. You get fast reads for most content without the fan-out cost for high-follower accounts.

6. How does Snowflake ID generation work and why is it important for Twitter?

A Snowflake ID is 64 bits split into: timestamp (41 bits, custom epoch), datacenter (5 bits), worker (5 bits), and sequence (12 bits). Each datacenter-worker pair generates IDs independently, so there's no coordination needed between machines.

Why it matters for Twitter: you get globally unique IDs that sort by creation time without a central coordinator. No database sequence, no Zookeeper, no consensus protocol. The tradeoff is that you need enough datacenters and workers — with 5 bits each, you get 32 datacenters and 32 workers per datacenter, plenty for any practical deployment.

7. Design the search functionality for Twitter. How would you handle full-text search across billions of tweets?

Elasticsearch is the standard approach for this scale. You index tweets with a custom analyzer that handles Twitter-specific patterns: hashtags, mentions, URLs, and slang. The index is partitioned by time period (monthly or weekly) so old tweets can be kept in cheaper storage.

On the query side, you need to balance relevance with recency. Twitter uses a hybrid scoring: text match score weighted against engagement metrics (likes, retweets) and recency. Trending topics get a boost via a separate signal.

For performance, results are cached for popular queries. The index itself is eventually consistent — there will be a small lag (seconds to minutes) between when a tweet is posted and when it becomes searchable.

8. How would you design Twitter's notification system to handle billions of notifications?

Notifications get split into two paths based on urgency. Immediate notifications (mentions, replies) go through a real-time queue and get pushed via WebSocket or push notification immediately. Batched notifications (likes, follows, retweets) accumulate in a batch queue and get processed hourly.

The notification service itself is stateless — it reads from Kafka and writes to a notification store (a distributed database with user-specific partitions). The store indexes notifications by user_id and created_at for efficient retrieval.

For delivery confirmation, you track read status. Unread counts are cached separately and invalidated when new notifications arrive. The main tradeoff is between immediacy and system load — Twitter chose to batch non-critical notifications to keep the system stable.

9. What is the hot key problem in Twitter's database and how do you solve it?

Popular tweets (think viral content or a celebrity post) create hot spots — a single partition receives way more traffic than others. Reads for that tweet's data overwhelm the database.

Solutions include: sharding by hash(tweet_id) so hot tweets spread across partitions, using eventual consistency for like/retweet counts (not requiring strong consistency), and adding a caching layer specifically for popular tweets. You can also rate-limit clients to prevent a single tweet from causing a cascade.

10. How does write-behind caching work and when would you use it?

Write-behind (also called write-back) caches write to the database first, then asynchronously update the cache. The user gets a fast response, and the cache population happens in the background.

Twitter uses this for tweet posting: the write goes to the database (source of truth), and then background tasks update timeline caches and search indexes. This keeps the write path fast and resilient. If the background task fails, you either retry or accept a stale cache until TTL expiry.

The tradeoff is that if the database write succeeds but the cache update fails, you have inconsistency. You need monitoring to detect and recover from these cases.

11. Explain the timeline merge problem. How do you correctly merge tweets from push and pull sources?

The merge problem: your timeline has pre-computed tweets from regular users (pushed) and needs to pull celebrity tweets on read. The challenge is sorting them together correctly.

Twitter's solution uses Snowflake IDs as the sort key. Both pushed and pulled tweets have IDs that encode their creation time. When merging, you sort by ID descending. Since IDs are monotonically increasing within a datacenter, this gives you chronological order without comparing actual timestamps.

The practical challenge is ensuring you get enough tweets. If you're pulling celebrity tweets at read time, you need to know which celebrities a user follows to fetch the right tweets. That follow-graph lookup adds latency, which is why there's a social graph cache.

12. How would you design the follower/following data model at scale?

The follows table is a many-to-many relationship: user_id → many → user_id. The primary access patterns are "who does user X follow?" and "who follows user Y?" Both queries need to be fast.

Sharding strategy: partition follows by follower_id for "who does user X follow" queries, and by following_id for "who follows user Y" queries. Or use a hybrid approach where some data is replicated for both access patterns.

The counts (follower_count, following_count) are denormalized for fast access. They're updated asynchronously because they don't need strong consistency — you can accept temporary inaccuracies while the system catches up.

13. What caching layers does Twitter use and why?

Twitter uses a multi-layer cache hierarchy. L1 is an in-process cache (like Memcached) for the fastest hits. L2 is Redis cluster for shared cache across servers. Then the database.

Different data has different cache strategies: tweets cache for high-read, timeline cache for user-specific timelines, user cache for profile data, and social graph cache for the follower/following relationships. Each layer has its own TTL based on how often the data changes.

The social graph cache is particularly important because the follow relationships don't change as often as tweet content, but they're queried on every timeline load.

14. How does Twitter handle rate limiting and why is it necessary?

Rate limiting prevents abuse and keeps the system stable under load. Twitter limits both reads (how many timelines you can fetch) and writes (how many tweets you can post) per user, per window.

Implementation: a token bucket or sliding window counter per user_id, stored in a fast cache (Redis). When a request comes in, you check if the user has tokens left. If not, you reject with a 429 response.

Rate limiting is especially important for celebrity tweets — without it, a single viral tweet could cause a fan-out burst that overwhelms the system. Limiting how often a single user can post also helps reduce spam and coordinate attacks.

15. Design the data storage strategy for tweets over a 5-year period.

Over 5 years, Twitter stores roughly 900 billion tweets. At 700 bytes per tweet (content + metadata), that's about 630PB before replication, or ~2PB with 3x replication and indices.

Storage is tiered by age. Recent tweets (last 30-90 days) are kept in fast storage (SSD) for quick access. Older tweets move to cheaper storage (HDD or cold storage). This is called tiered storage or data aging.

Data is partitioned by time period (monthly or weekly partitions). This makes it easy to drop old data or move it to cheaper storage. Queries for recent tweets hit recent partitions; queries for old tweets span more partitions but are less frequent.

16. How would you handle a scenario where the fan-out queue backs up?

Queue backup means timeline updates are delayed — users might not see new tweets for hours. The impact is visibility, not data loss.

Mitigations: auto-scaling fan-out workers based on queue depth, prioritizing active users (people who login frequently) over inactive ones, and using backpressure to slow down new tweet ingestion when the queue is overwhelmed.

For the user experience, you might show a "timeline may be outdated" warning or load tweets directly from the database instead of the cache. The key is that the timeline remains functional even if it's stale — you don't fail completely, you degrade gracefully.

17. What is the role of a message queue in Twitter's architecture?

Kafka is the backbone for async processing. Tweet ingestion goes through Kafka: the write API publishes to a topic, and fan-out workers consume from that topic to update timeline caches. Notification processing similarly uses queues — the write path is decoupled from the fan-out and notification paths.

Queues provide three benefits: buffering (absorb spikes without overwhelming downstream services), decoupling (write path doesn't wait for fan-out to complete), and fault tolerance (failed messages can be retried). If a fan-out worker crashes, the message stays in the queue and another worker picks it up.

18. How does Twitter compute trending topics?

Trending is computed from engagement velocity — how fast a topic is gaining likes, retweets, and replies. A simple approach counts mentions in the last N minutes and ranks by velocity. More sophisticated systems weight by follower count and engagement quality.

The pipeline: tweets are streamed through Kafka, a storm or flink job computes rolling mention counts for hashtags and search queries, results are stored in a sorted set (like Redis sorted sets) with time-based expiry, and the top K trending items are served to clients.

The main tradeoff is between sensitivity (detecting trends quickly) and noise (avoiding spikes from scheduled events or coordinated campaigns). Geographic segmentation also matters — what's trending globally might differ from what's trending in Tokyo.

19. Design the API endpoint for fetching a user's timeline.

The primary endpoint is GET /api/v2/timelines/home. Query parameters include max_id (for pagination, exclusive upper bound), since_id (exclusive lower bound), and limit (default 200, max 800).

Response format is JSON with a data array of tweet objects and a meta object containing result_count and next_token for cursor-based pagination.

The implementation checks the timeline cache first (Redis). If there's a cache hit and no cursor is provided, return cached timeline IDs and batch-fetch tweet objects. On cache miss, fetch from the database and rebuild the cache asynchronously.

20. What happens when a Twitter user with 10 million followers posts a tweet?

The write goes to the database and returns success immediately. The fan-out service checks the follower count and sees it's above the celebrity threshold (around 10K), so it skips fan-out entirely.

When followers load their timeline, the system fetches the celebrity's recent tweets separately (pull-based fan-out on read). This merge happens in the timeline service: the cached timeline for the user includes pre-computed tweets from regular users, and at read time the system pulls the latest tweets from celebrities they follow.

The tradeoff: some read latency for celebrity tweets (they're not pre-computed in the timeline cache), but massive savings on write side. A single celebrity tweet would otherwise generate millions of cache updates. At 10M followers, that fan-out could take hours and overwhelm the system.


Further Reading

For more on caching patterns, see our Caching Strategies guide. For database sharding, see Horizontal Sharding. For real-time features like Twitter uses, see the Design Chat System case study.


Conclusion

Twitter’s architecture balances read and write optimization through a hybrid push/pull model. The key decisions:

  • Push for regular users, pull for celebrities
  • Multi-layer caching throughout
  • Eventual consistency for non-critical data
  • Async processing for expensive operations

Category

Related Posts

System Design: Netflix Architecture for Global Streaming

Deep dive into Netflix architecture. Learn about content delivery, CDN design, microservices, recommendation systems, and streaming protocols.

#system-design #case-study #netflix

System Design: URL Shortener from Scratch

Deep dive into URL shortener architecture. Learn hash function design, redirect logic, data storage, rate limiting, and high-availability.

#system-design #case-study #url-shortener

Amazon Architecture: Lessons from the Pioneer of Microservices

Learn how Amazon pioneered service-oriented architecture, the famous 'two-pizza team' rule, and how they built the foundation for AWS.

#microservices #amazon #architecture