Skip Lists: Layered Linked Lists for Fast Search

Understand skip lists as probabilistic alternatives to balanced trees, providing O(log n) search with simple implementation and lock-free variants.

published: reading time: 16 min read author: GeekWorkBench

Skip Lists: Layered Linked Lists for Fast Search

Skip lists are a probabilistic data structure that gives you O(log n) search, insertion, and deletion—comparable to balanced trees—but with dramatically simpler implementation. The idea is brilliant: layer multiple linked lists on top of each other, where higher layers act as “express lanes” that skip over many elements in the lower layers. Searching starts from the top layer and descends until finding the target or reaching the bottom.

The magic is in the probabilistic layer assignment. Each element has a 50% chance of appearing in level 1, 25% chance of appearing in level 2, 12.5% in level 3, and so on. This gives you roughly log₂(n) layers on average, with O(log n) expected search time. Because there’s no rebalancing logic (unlike AVL or Red-Black trees), concurrent implementations are far simpler—useful for high-throughput databases like Redis, which uses skip lists for sorted set operations.

Introduction

Skip lists are a probabilistic alternative to balanced trees that delivers O(log n) search, insertion, and deletion—comparable to AVL or Red-Black trees—but with simpler implementation. The idea is to layer multiple linked lists on top of each other, where higher layers act as express lanes that skip over many elements in the lower layers. Searching starts from the top layer and descends until finding the target or reaching the bottom.

The layer distribution comes from a geometric random process. Each element has a 50% chance of appearing in level 1, 25% in level 2, 12.5% in level 3, and so on. This guarantees roughly log₂(n) layers on average. Since there’s no rebalancing logic—no rotations to perform when inserts disrupt the balance—concurrent implementations are far simpler than their tree counterparts. This makes skip lists valuable in high-throughput systems like Redis, which uses them for sorted set operations.

This guide covers the search, insertion, and deletion algorithms with full implementations, explains the probabilistic analysis behind O(log n) expected performance, and discusses the practical advantages over balanced trees. You’ll learn when skip lists make more sense than trees (concurrent access, simple implementation, memory locality) and when their probabilistic guarantees might not be sufficient (systems requiring deterministic worst-case bounds).

When to Use

Skip lists excel when:

  • Need sorted iteration — Unlike hash tables, maintains order
  • Concurrent access — Lock-free variants exist; simpler than tree write locks
  • Range queries — Efficiently iterate over sorted ranges
  • Simple implementation — Half the code of balanced trees with equivalent performance
  • Insertions interleaved with traversals — No rebalancing disruption

When Not to Use

  • When deterministic O(log n) is required (probabilistic structures have non-zero chance of degrade)
  • Memory is extremely constrained (skip lists use 2× pointers on average per node)
  • When you need more than simple ordering (trees can be augmented with subtree metadata)

Architecture: Multi-Layer Express Lanes

graph LR
    subgraph L3["Level 3 (fastest)"]
        A3["1"] --> D3["10"]
    end
    subgraph L2["Level 2"]
        A2["1"] --> B2["3"] --> D2["10"]
    end
    subgraph L1["Level 1 (base)"]
        A1["1"] --> B1["3"] --> C1["5"] --> D1["10"]
    end
    subgraph L0["Bottom (all elements)"]
        A0["1"] --> B0["3"] --> C0["5"] --> D0["7"] --> E0["10"] --> F0["12"]
    end

Search from top level, move horizontally until overshoot, then descend. Higher levels skip more elements.

Implementation

Basic Skip List

import random

class SkipListNode:
    def __init__(self, value, max_level=16):
        self.value = value
        self.forward = [None] * max_level


class SkipList:
    """
    Skip list with O(log n) search, insert, delete.
    Probability p = 0.5 for level promotion.
    """

    def __init__(self, max_level=16, p=0.5):
        self.max_level = max_level
        self.p = p
        self.header = SkipListNode(float('-inf'), max_level)
        self.level = 0  # Current max level

    def _random_level(self):
        """Generate random level with geometric distribution."""
        level = 0
        while random.random() < self.p and level < self.max_level - 1:
            level += 1
        return level

    def search(self, target):
        """
        Find element or return None.
        Time: O(log n) expected
        """
        current = self.header
        update = [None] * self.max_level

        # Start from top level, descend
        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].value < target:
                current = current.forward[i]
            update[i] = current

        current = current.forward[0]

        if current and current.value == target:
            return current.value
        return None

    def insert(self, value):
        """
        Insert element into skip list.
        Time: O(log n) expected
        """
        update = [None] * self.max_level
        current = self.header

        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].value < value:
                current = current.forward[i]
            update[i] = current

        current = current.forward[0]

        if current is None or current.value != value:
            new_level = self._random_level()

            if new_level > self.level:
                for i in range(self.level + 1, new_level + 1):
                    update[i] = self.header
                self.level = new_level

            new_node = SkipListNode(value, self.max_level)

            for i in range(new_level + 1):
                new_node.forward[i] = update[i].forward[i]
                update[i].forward[i] = new_node

    def delete(self, value):
        """
        Delete element from skip list.
        Time: O(log n) expected
        """
        update = [None] * self.max_level
        current = self.header

        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].value < value:
                current = current.forward[i]
            update[i] = current

        current = current.forward[0]

        if current and current.value == value:
            for i in range(self.level + 1):
                if update[i].forward[i] == current:
                    update[i].forward[i] = current.forward[i]

            while self.level > 0 and self.header.forward[self.level] is None:
                self.level -= 1

            return True
        return False

    def display(self):
        """Visualize skip list levels."""
        for i in range(self.level, -1, -1):
            line = f"Level {i}: "
            node = self.header.forward[i]
            while node:
                line += f"[{node.value}] -> "
                node = node.forward[i]
            print(line + "None")

Production Failure Scenarios

  1. Unbalanced structure from bad randomness: Every skip list implementation eventually hits a streak of bad random levels. I’ve seen production systems where a faulty RNG seed caused search times to spike 10× above normal. Use a proper seed source (time-based or CSPRNG at startup), not whatever default your language gives you.

  2. Memory overhead from high max_level: Each node stores max_level forward pointers. Set max_level too high relative to your actual element count and you’re burning memory for no benefit. A good rule: max_level = log₂(expected_max_elements) + 3 gives you breathing room without waste.

  3. Worst-case O(n) performance under adversarial input: The “expected” in O(log n) expected is doing a lot of work. An attacker who knows your random level generation can craft insertions that produce pathological structures. If you’re in an adversarial context (network-facing), use a CSPRNG and add level jitter.

Common Pitfalls / Anti-Patterns

  1. Confusing level indexing — Level 0 is the base list with all elements
  2. Off-by-one in random level — Should be < max_level - 1 in while condition
  3. Not updating level after delete — Must check if top level became empty
  4. Using fixed random seed in production — May cause predictable patterns under attack

Observability Checklist

Track skip list operations to catch probabilistic degradation and correctness issues.

Core Metrics

  • Search operation count and latency per call
  • Insert/delete operation count
  • Average level vs expected log₂(n) — if this drifts above 2×, something’s wrong with level generation
  • Level distribution across all nodes — watch for too many nodes at level 0
  • Current level count vs max_level — hitting max_level constantly means you need to raise it

Health Signals

  • Search latency climbing: your list might be unbalanced or your random source might be broken
  • Average level above 2× log₂(n): random level generation is biased or under attack
  • Level count hitting max_level frequently: either traffic is higher than expected or an attacker is probing
  • Delete operations not reducing level: this is a bug in your level maintenance logic
  • Search returning wrong elements: memory corruption or a concurrent write race

Alerting Thresholds

  • Search latency > 2× log₂(n) consistently: probabilistic degradation happening
  • Level distribution skew > 3σ from expected: random generator issue or deliberate attack
  • Level count == max_level for extended periods: capacity planning needed
  • Any null pointer in forward chain during traversal: memory corruption — page it immediately

Distributed Tracing

  • Trace search with level visited count and comparisons per level
  • Include level distribution in span metadata — you’ll want this when debugging slow searches
  • Track insert/delete with new level assigned
  • Correlate slow searches with specific level distributions

Trade-off Analysis

AspectSkip ListsBalanced Trees (AVL/Red-Black)B-Trees
Search TimeO(log n) expectedO(log n) guaranteedO(log n) guaranteed
Insert/DeleteO(log n) expectedO(log n) + rebalancingO(log n) + page splits
Implementation~200 lines~500 lines~400 lines
Concurrent AccessLock-free CAS possibleComplex tree rotationsPage-level locking
Memory Overhead2× pointers/node avg2-3 pointers/nodePointer + page metadata
Worst-caseO(n) with bad RNGO(log n) alwaysO(log n) always
Cache FriendlinessPoor (pointer chasing)ModerateExcellent (sequential page access)
Range QueriesO(log n + k)O(log n + k)O(log n + k)

When to prefer each:

  • Skip lists: In-memory, simple concurrent access, rapid iteration
  • Balanced trees: Guaranteed performance, simpler disk layout
  • B-trees: Disk-based storage, page-oriented access patterns

Security Notes

Random seed manipulation: If an attacker controls or predicts your random seed, they can force your skip list into worst-case behavior. I once traced a latency spike to an adversarial client who had reverse-engineered our level-0-only attack pattern. The fix was switching to a CSPRNG seeded at process start — don’t use language defaults in security-sensitive paths.

Level 0 targeted attacks: Someone watching your list (through timing side channels or memory access patterns) can infer your level distribution and craft insertions that bury specific elements at level 0, inflating search time for targets they care about. Add padding to level generation so observations don’t reveal structure.

Concurrent modification races: In concurrent environments, a search can cross a level while another thread is updating pointers at that level, seeing partially constructed nodes or stale forward pointers. Naive locking helps but doesn’t fully solve it — use lock-free CAS operations and node versioning so readers can detect and retry when they see an in-progress update.

Quick Recap Checklist

  • Level 0 contains all elements (like base linked list)
  • Higher levels skip elements in lower levels
  • Random level follows geometric distribution (p^k probability)
  • Search descends from top level, moves horizontally, repeats
  • O(log n) is expected (probabilistic), not guaranteed

Interview Questions

1. Why is skip list O(log n) expected time?

By design, approximately n/2 elements exist at level 0, n/4 at level 1, n/8 at level 2, and so on—halving at each level. This creates log₂(n) levels on average. Searching at each level traverses elements at that level until overshooting the target, then descends. The random level generation using geometric distribution (each level has probability p of reaching next level) maintains this structure with high probability.

2. Why does Redis use skip lists instead of balanced trees?

Redis needs sorted sets with range operations. Skip lists provide O(log n) operations with simpler concurrent implementation than balanced trees. Tree rotations during rebalancing require complex locking. Skip list insertions only modify neighboring pointers—parts can be locked independently without restructuring the entire structure. This makes lock-free skip list variants practical for high-throughput scenarios.

3. How does skip list compare to B-trees for disk-based storage?

Skip lists map well to disk with locality of reference—elements at higher levels can be stored in sequential blocks like B-trees. The difference is in traversal: skip lists scan horizontally within a level, while B-trees traverse vertically through fixed-size node pages. For in-memory use, skip lists have simpler code; for disk-based systems, B-trees are often preferred due to predictable access patterns and better caching behavior.

4. What happens during a search operation in a skip list?

Expected answer points:

  • Start at header node at highest level
  • Move horizontally while next node's value < target
  • When next node overshoots, descend one level
  • Repeat until reaching level 0, then check forward pointer
  • Return target if found, else null
5. Why is random level generation using geometric distribution important?

Expected answer points:

  • Ensures ~50% nodes at level 0, ~25% at level 1, etc.
  • Creates log₂(n) levels on average naturally
  • No manual rebalancing needed unlike trees
  • Probability p=0.5 balances height vs memory
6. What is the worst-case scenario for skip list performance?

Expected answer points:

  • O(n) when all nodes happen to be at level 0
  • Caused by bad RNG or adversarial insertion
  • Probability is astronomically low: (1/2)^n
  • Mitigated by CSPRNG and level jitter in production
7. How do lock-free skip lists work?

Expected answer points:

  • Use CAS (Compare-And-Swap) for pointer updates
  • Mark nodes as logically deleted before physical removal
  • Readers traverse through logically deleted nodes
  • Node versioning detects concurrent modifications
  • More scalable than coarse-grained tree locks
8. What metrics should you monitor in production skip list implementations?

Expected answer points:

  • Average level vs expected log₂(n)
  • Search latency distribution
  • Level distribution skew
  • Current level count vs max_level
  • Memory usage per node
9. How would you implement a skip list from scratch?

Expected answer points:

  • Define node structure with value and forward pointer array
  • Implement random_level() using geometric distribution (while random() < p)
  • Search finds predecessor nodes at each level, stores in update array
  • Insert creates new node, updates forward pointers at each level
  • Delete unlinks node and updates pointers, then adjusts level
10. What is the memory overhead of a skip list compared to arrays and linked lists?

Expected answer points:

  • Each node stores multiple forward pointers (max_level on average 2-4)
  • Average ~2× pointers per node vs 1-2 for simple linked list
  • Worst case: O(n log n) total pointers, but expected O(n log n) overall
  • More compact than balanced trees (no parent/size metadata needed)
  • Trade-off: memory for speed (log n search vs O(n) scan)
11. Can skip lists be used for write-heavy workloads?

Expected answer points:

  • Yes, O(log n) insertion and deletion are efficient
  • No rebalancing overhead unlike balanced trees
  • Lock-free variants scale well under concurrent writes
  • Random level generation adds slight CPU overhead
  • Better than B-trees for workloads with many modifications
12. How does the choice of probability p affect skip list performance?

Expected answer points:

  • p=0.5 is standard (50% chance of reaching next level)
  • Higher p (e.g., 0.75) creates taller lists, faster search but more memory
  • Lower p (e.g., 0.25) creates flatter lists, slower search but less memory
  • p controls the geometric distribution of levels
  • Most implementations use p=0.5 as the sweet spot
13. What is the relationship between max_level and performance?

Expected answer points:

  • max_level limits the height of the skip list structure
  • Higher max_level allows more levels, potentially faster search
  • But each node allocates max_level pointers regardless of actual level
  • Set max_level = log₂(expected_max_elements) + 3 for safety
  • Hitting max_level constantly means you need to increase it
14. How would you handle range queries in a skip list?

Expected answer points:

  • Search for the start element using normal skip list search
  • Once found, traverse level 0 forward collecting elements
  • Stop when reaching end of range or end of list
  • Skip lists are efficient for range queries: O(log n + k)
  • Better than hash maps for ordered range operations
15. What happens during concurrent insertions in a skip list?

Expected answer points:

  • Multiple threads may try to update the same level's forward pointers
  • Without synchronization, pointers can become corrupted
  • Lock-free approaches use CAS to atomically update pointers
  • Logical deletion marks nodes before physical removal
  • Proper ordering prevents readers from seeing inconsistent state
16. How does skip list compare to hash maps for key-value storage?

Expected answer points:

  • Hash maps: O(1) average lookup, no ordering guarantees
  • Skip lists: O(log n) lookup, but maintains sorted order
  • Skip lists support range queries, hash maps do not
  • Hash maps have better cache locality
  • Skip lists better for applications needing ordered iteration
17. What are the implications of using skip lists in database indexes?

Expected answer points:

  • Skip lists provide O(log n) point queries and range scans
  • Redis uses skip lists for sorted sets (ZSET)
  • Simpler to implement than B-trees for in-memory indexes
  • Can be made durable with write-ahead logging
  • Concurrency control is simpler than tree structures
18. How would you optimize a skip list for cache locality?

Expected answer points:

  • Allocate nodes in cache-friendly patterns (batches or slabs)
  • Keep level 0 nodes close together in memory
  • Higher level pointers span larger ranges naturally
  • Avoid scattered allocations that cause cache misses
  • Consider using a pool allocator for nodes
19. What is the role of the header node in a skip list?

Expected answer points:

  • Header node is a sentinel that marks the start of each level
  • Contains forward pointers at all levels
  • Never stores actual data (uses -∞ or null value)
  • Provides consistent starting point for search
  • All levels initially point to header (no elements yet)
20. How does skip list handle duplicate values?

Expected answer points:

  • Standard skip lists can store duplicates at level 0
  • Search returns the first occurrence found
  • Level-based skipping means only one copy appears in higher levels
  • Delete removes first match found
  • Alternative: store (value, unique_id) tuples for total ordering
  • Redis ZSET allows scored duplicates (different scores = different entries)

Further Reading

Core References:

Deep Dives:

Implementations:

Conclusion

Category

Related Posts

AVL Trees: Self-Balancing Binary Search Trees

Master AVL tree rotations, balance factors, and rebalancing logic. Learn when to use AVL vs Red-Black trees for your use case.

#avl-tree #self-balancing-bst #binary-search-tree

Bellman-Ford Algorithm: Shortest Paths with Negative Weights

Learn the Bellman-Ford algorithm for single-source shortest paths including negative edge weights and negative cycle detection.

#bellman-ford #shortest-path #graph-algorithms

Dijkstra's Algorithm: Finding Shortest Paths in Weighted Graphs

Master Dijkstra's algorithm for single-source shortest path problems in weighted graphs with positive edges, including implementations and trade-offs.

#dijkstra #shortest-path #graph-algorithms