Consistent Hashing: Data Distribution Across Systems
Learn how consistent hashing works in caches, databases, and CDNs, including hash rings, virtual nodes, node additions, and redistribution strategies.
Consistent Hashing: Distributing Data Across Distributed Systems
Regular hashing breaks when nodes join or leave. With simple modulo hashing, adding a server remaps nearly every key. Consistent hashing solves this. When nodes join or leave, only neighboring keys remap.
I first ran into consistent hashing in Memcached clients. The docs warned against using regular hash % server_count distribution. They had a point. Our cache hit rate dropped every time we added a server, until we switched to consistent hashing.
The Problem with Regular Hashing
Regular hash distribution uses a simple formula: server = hash(key) % num_servers. This works until you add or remove a server.
# Regular hashing distribution
num_servers = 4
servers = ['server-1', 'server-2', 'server-3', 'server-4']
for key in ['user:1', 'user:2', 'user:3', 'user:4']:
server_idx = hash(key) % num_servers
print(f"{key} -> {servers[server_idx]}")
# Adding a server changes everything
num_servers = 5
for key in ['user:1', 'user:2', 'user:3', 'user:4']:
server_idx = hash(key) % num_servers
print(f"{key} -> {servers[server_idx]}") # Most keys map differently
With 4 servers, key user:1 might map to server-1. With 5 servers, it could end up on server-3 instead. Almost every key remaps to a different server. Cache miss rates spike. Database load follows.
This causes real problems in production. Rolling out a new cache server triggers a wave of cache misses. Every miss hits the database. Databases often cannot handle this sudden load.
How Consistent Hashing Works
Consistent hashing arranges both keys and servers on a hash ring. Keys map to the first server clockwise on the ring.
graph TD
Ring[Hash Ring] -->|hash=50| K1[Key A]
Ring -->|hash=150| K2[Key B]
Ring -->|hash=250| K3[Key C]
Ring -->|hash=75| S1[Server 1: 50-150]
Ring -->|hash=175| S2[Server 2: 151-250]
Ring -->|hash=275| S3[Server 3: 251-50]
K1 --> S1
K2 --> S2
K3 --> S3
In this example, Key A (hash 50) maps to Server 1. Key B (hash 150) maps to Server 2. Key C (hash 250) wraps around to Server 3.
When you add Server 4 between Server 1 and Server 2, only keys between them remap. Keys hash 50-100 move from Server 1 to Server 4. Keys hash 100-150 stay on Server 1. Most keys are unaffected.
class ConsistentHash:
def __init__(self, nodes=None):
self.ring = {}
self.sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def add_node(self, node):
key = self.hash(node)
self.ring[key] = node
self.sorted_keys.append(key)
self.sorted_keys.sort()
def get_node(self, key):
hash_val = self.hash(key)
for node_key in self.sorted_keys:
if node_key >= hash_val:
return self.ring[node_key]
# Wrap around to first node
return self.ring[self.sorted_keys[0]]
Virtual Nodes
Basic consistent hashing has a distribution problem. With few servers, keys do not distribute evenly. One server might get twice as many keys as another.
Virtual nodes solve this. Each physical server appears multiple times on the ring. This smooths out the distribution.
graph TD
Ring[Hash Ring] -->|vn:10| S1[Server 1]
Ring -->|vn:10| S2[Server 2]
Ring -->|vn:10| S3[Server 3]
S1a[Server 1: hash 25] --> Ring
S1b[Server 1: hash 125] --> Ring
S1c[Server 1: hash 225] --> Ring
With 3 physical servers and 100 virtual nodes each, the ring has 300 points. Each server handles roughly 1/3 of the keys. Adding a server adds 100 points and takes 1/4 of the keys from each existing server.
class ConsistentHash:
def __init__(self, num_virtual=100):
self.num_virtual = num_virtual
self.ring = {}
self.sorted_keys = []
def add_node(self, node):
for i in range(self.num_virtual):
key = self.hash(f"{node}:vn{i}")
self.ring[key] = node
self.sorted_keys.append(key)
self.sorted_keys.sort()
def remove_node(self, node):
for i in range(self.num_virtual):
key = self.hash(f"{node}:vn{i}")
del self.ring[key]
self.sorted_keys.remove(key)
Adding and Removing Nodes
When a node joins, it claims its portion of the ring. When a node leaves, its keys are claimed by the next clockwise neighbor.
# Before adding Server 4
# Server 1: keys 0-99, 200-299
# Server 2: keys 100-199
# Server 3: keys 200-255 (wrapped)
# After adding Server 4 (hash 125)
# Server 1: keys 0-99
# Server 4: keys 100-124
# Server 2: keys 125-199
# Server 3: keys 200-255
Only keys between the new node and its predecessor remap. In the example, keys 100-124 move from Server 2 to Server 4. Other keys stay where they are.
With virtual nodes, the redistribution is more granular. Each virtual node claims a small portion of keys. The impact of adding or removing a node is spread evenly.
Applications
Distributed Caches
Memcached clients use consistent hashing. The client knows the cache servers and uses consistent hashing to figure out which server holds each key. Adding cache servers only affects keys that map to the new server.
Redis Cluster uses consistent hashing concepts, though with hash slots rather than a continuous ring. 16384 slots distribute across nodes. Key remapping happens at slot boundaries.
Amazon DynamoDB
DynamoDB uses consistent hashing for data distribution. Virtual nodes map to physical nodes. Data partitions redistribute when capacity changes.
Table partitions in DynamoDB distribute based on partition key hash. Consistent hashing ensures even distribution across partitions.
Content Delivery Networks
CDNs use consistent hashing to direct requests to origin servers. When an origin server fails, only requests for that server’s keys reroute. Other origins keep serving their keys without interruption.
Replication with Consistent Hashing
Consistent hashing extends to replication. A key’s primary replica sits on its assigned node. Additional replicas sit on subsequent nodes on the ring.
def get_replicas(self, key, num_replicas=3):
replicas = []
hash_val = self.hash(key)
for node_key in self.sorted_keys:
if node_key >= hash_val:
replicas.append(self.ring[node_key])
if len(replicas) == num_replicas:
break
# Wrap around if needed
while len(replicas) < num_replicas:
for node_key in self.sorted_keys:
if self.ring[node_key] not in replicas:
replicas.append(self.ring[node_key])
break
return replicas
With replication factor 3, each key lives on 3 nodes. If one node fails, other replicas serve the data. The system continues operating without data loss.
Trade-offs
Consistent hashing has costs. The ring structure requires coordination. In distributed systems, nodes must agree on ring state.
Client-side consistent hashing puts complexity in clients. All clients must implement the same hashing logic. Configuration must stay synchronized.
Some systems use centralized routing. A coordinator tracks the ring and routes requests. This simplifies clients but introduces a dependency on the coordinator.
Consistent hashing trades simplicity for resilience. Regular hashing is simpler but breaks when things change. Consistent hashing handles change gracefully.
Comparison: Partitioning Strategies
| Dimension | Consistent Hashing | Range Partitioning | Hash Partitioning (Modulo) |
|---|---|---|---|
| Key Distribution | Even across nodes | Sequential ranges | Based on hash mod N |
| Add/Remove Nodes | Minimal remapping (neighboring keys only) | Entire ranges shift | Nearly all keys remap |
| Range Queries | Inefficient (may span multiple nodes) | Efficient (keys in range on same node) | Inefficient |
| Hot Spot Risk | Higher (adjacent keys on same node) | Lower (can split ranges) | Low (if hash spreads evenly) |
| Complexity | Medium-High | Low | Very Low |
| Metadata Overhead | Ring state per node | Partition map | None |
| Use Cases | Distributed caches, DHTs, CDNs | Time-series data, ordered scans | Key-value stores, random access |
When each strategy works best:
| Strategy | Best For | Avoid When |
|---|---|---|
| Consistent Hashing | Multi-node caching, horizontal scaling, replication | Keys need ordering, range scans |
| Range Partitioning | Time-series, lexicographic keys, ordered retrieval | High cardinality keys, random access |
| Hash Partitioning | Simple key-value, even distribution needed | Range queries required, key locality matters |
Rendezvous Hashing: An Alternative Approach
Rendezvous hashing (also called highest random weight hashing) is an alternative to consistent hashing that avoids the ring structure entirely. It works by computing a hash score for each key-server pair and picking the server with the highest score.
import hashlib
def rendezvous_hash(key, servers):
"""Pick server with highest hash score for this key."""
best_server = None
best_score = -1
for server in servers:
score = int(hashlib.sha256(f"{key}:{server}".encode()).hexdigest(), 16)
if score > best_score:
best_score = score
best_server = server
return best_server
# Example
key = "user:123"
servers = ["cache-1", "cache-2", "cache-3"]
print(rendezvous_hash(key, servers)) # Deterministic, but spreads keys evenly
The key difference from consistent hashing: with rendezvous hashing, adding a server only affects keys that previously hashed to that specific server — not neighboring keys on a ring. No virtual nodes needed.
| Aspect | Consistent Hashing | Rendezvous Hashing |
|---|---|---|
| Structure | Hash ring | Pairwise hash scoring |
| Adding N nodes | Affects ~1/N neighboring keys | Affects keys previously on those N servers |
| Virtual nodes needed | Yes (for even distribution) | No |
| Computation per lookup | O(log N) binary search | O(N) — must hash all servers |
| Server-pinned requests | Easy (same ring lookup) | Easy (same scoring) |
Rendezvous hashing shines when you have a fixed set of servers and want minimal redistribution on changes. Memcached’s “ketama” library uses this approach. The tradeoff is the O(N) lookup cost — for large server pools, consistent hashing’s O(log N) scales better.
Ring Rebalancing: Step by Step
When you add or remove nodes, the ring redistributes keys. Here is the sequence during a node addition:
sequenceDiagram
participant R as Ring State
participant N1 as Node 1
participant N2 as Node 2
participant N3 as Node 3
participant N4 as New Node 4
Note over R: Initial state: N1, N2, N3 on ring
Note over R: N1 owns [0-100), N2 owns [100-200), N3 owns [200-300)
N4->>R: Compute position: hash("node4") = 150
Note over R: N4 inserts at position 150
R->>N1: N4 added between N1 and N2
N1->>R: Keys in [100, 150) migrate to N4
N4->>R: Acknowledged
Note over R: New state: N1 owns [0-100), N4 owns [100, 150), N2 owns [150-200), N3 owns [200-300)
For controlled rebalancing in production, add nodes gradually. Add one node, wait for the cluster to stabilize and data to redistribute, then add the next. Redis Cluster does this automatically when using redis-cli --cluster add-node with --cluster-slots to gradually migrate slots.
If you need to move a large volume of keys quickly (for maintenance or rebalancing), use a double-linking approach: the old owner continues serving requests while the new owner pulls data in the background. Once caught up, switch the routing atomically.
MurmurHash vs MD5 vs SHA-256: Choosing a Hash Function
The hash function you use affects both distribution quality and performance. Here is how the main options compare:
import hashlib
import mmh3 # MurmurHash3 (install: pip install mmh3)
def benchmark_hash(data, iterations=100000):
import time
# MD5
start = time.time()
for _ in range(iterations):
hashlib.md5(data)
md5_time = time.time() - start
# SHA-256
start = time.time()
for _ in range(iterations):
hashlib.sha256(data)
sha_time = time.time() - start
# MurmurHash3
start = time.time()
for _ in range(iterations):
mmh3.hash(data)
mmh3_time = time.time() - start
return md5_time, sha_time, mmh3_time
Typical results on a modern laptop for 100,000 hashes of a 32-byte key:
| Hash Function | Speed (relative) | Output Bits | Collision Resistance |
|---|---|---|---|
| MurmurHash3 | 10x fastest | 32 or 128 | Low (not cryptographic) |
| MD5 | 2x slower | 128 | Broken (do not use for security) |
| SHA-256 | baseline | 256 | High (cryptographic) |
Use MurmurHash3 for consistent hashing in production systems — it is fast and distributes keys evenly. The non-cryptographic nature does not matter for load distribution since an attacker cannot control key placement without controlling the hash function input.
Use SHA-256 only if you need cryptographic guarantees (for example, preventing hash collision attacks where a client crafts keys that all hash to the same server). Most production deployments use MMH3 or xxHash for performance.
When to Use and When Not to Use Consistent Hashing
When to Use Consistent Hashing:
- You need to distribute keys across multiple servers or shards
- Server additions or removals should not cause massive key remapping
- You are building a distributed cache, load balancer, or data store
- You need horizontal scaling with minimal key redistribution
- You want to achieve uniform key distribution across nodes
When Not to Use Consistent Hashing:
- You have only a single server (add complexity only when needed)
- Your system does not change frequently (static distribution is fine)
- You need ordered key retrieval (consistent hashing does not maintain order)
- You have strict partitioning requirements (use deterministic partitioning)
- The operational complexity is not justified for your scale
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Node failure | Keys mapped to failed node become unavailable | Implement replication, use virtual nodes for graceful degradation |
| Network partition | Ring becomes split, keys route incorrectly | Use quorum reads/writes, implement failure detection |
| Ring state inconsistency | Different clients route to different nodes | Use centralized ring state (e.g., ZooKeeper, etcd) |
| Hot spots on popular keys | Single key overwhelms one node | Implement key splitting, use replication for popular keys |
| Virtual node misconfiguration | Uneven distribution, some nodes overloaded | Monitor distribution metrics, validate virtual node count |
| Client ring cache staleness | Client uses outdated ring configuration | Implement cache invalidation, use short TTLs |
| Hash collision attacks | Malicious keys all hash to same point | Use salted hashes, limit key entropy |
| Addition of many nodes simultaneously | Temporary redistribution chaos | Add nodes gradually, use controlled rebalancing |
Observability Checklist
Metrics to Monitor:
- Request distribution across nodes (detect hot spots)
- Cache hit/miss ratio for distributed caches
- Node availability and health
- Key distribution balance (variance across nodes)
- Request latency by node
- Ring state version/consistency across clients
- Failed request count due to routing errors
- Replication lag for replicated keys
Logs to Capture:
- Node join and leave events
- Ring state changes
- Routing decisions for debugging
- Hot spot detection events
- Failure detection and failover events
- Client configuration changes
Alerts to Set:
- Node becomes unavailable
- Distribution skew exceeds threshold (e.g., > 20% from mean)
- Cache hit ratio drops significantly
- Request routing failures
- Ring state inconsistency detected
- Latency spike on specific nodes
# Example: Consistent hashing monitoring
def monitor_distribution(ring):
distribution = {}
for key in sample_keys:
node = ring.get_node(key)
distribution[node] = distribution.get(node, 0) + 1
mean = sum(distribution.values()) / len(distribution)
variance = sum((v - mean) ** 2 for v in distribution.values()) / len(distribution)
return {
'distribution': distribution,
'mean': mean,
'stddev': variance ** 0.5,
'skew_pct': (max(distribution.values()) - mean) / mean * 100
}
Security Checklist
- Secure ring state storage (ZooKeeper, etcd access controls)
- Encrypt communication between nodes and clients
- Authenticate node-to-node communication
- Implement rate limiting to prevent hash collision attacks
- Audit ring configuration changes
- Use secure bootstrapping for new nodes
- Protect client-to-node connections
- Monitor for unauthorized ring modifications
- Use TLS for all inter-node communication
Common Pitfalls and Anti-Patterns
-
Too few virtual nodes: Insufficient virtual nodes cause uneven distribution. Use at least 100-200 virtual nodes per physical node.
-
Not handling node failures gracefully: Without replication, failed nodes lose all their keys. Implement replica placement.
-
Inconsistent ring state across clients: Different clients with different ring views route to wrong nodes. Use centralized ring state or strong synchronization.
-
Ignoring hash function quality: Poor hash functions cause uneven distribution or predictability. Use cryptographic hashes (SHA-256, MD5).
-
Not testing redistribution: If you have never tested adding/removing nodes, you do not know the impact. Run chaos tests.
-
Using consistent hashing when static partitioning suffices: If nodes never change, simple modulo hashing is fine. Do not add complexity unnecessarily.
-
Forgetting about replication alongside consistent hashing: Consistent hashing handles distribution but not durability. Plan replication separately.
-
Assuming linear scalability: Adding nodes does not proportionally increase capacity due to virtual node redistribution overhead.
Multi-Dimensional Consistent Hashing
Standard consistent hashing gives you one axis of distribution. That works fine until you need two axes. A geo-distributed database might want data-locality for both user region AND tenant ID. A single hash value cannot capture both.
Multi-dimensional consistent hashing solves this by treating each attribute as its own hash ring. Query each ring separately, then combine the results through weighted voting.
class ConsistentHashRing:
"""Single-dimension consistent hash ring for use in multi-dimensional hashing."""
def __init__(self, nodes=None, virtual_nodes=100):
self.virtual_nodes = virtual_nodes
self.ring = {}
self.sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def _hash(self, key):
import hashlib
return int(hashlib.md5(str(key).encode()).hexdigest(), 16) % (2**32)
def add_node(self, node):
for i in range(self.virtual_nodes):
key = self._hash(f"{node}:vn{i}")
self.ring[key] = node
self.sorted_keys.append(key)
self.sorted_keys.sort()
def get_node(self, key):
if not self.sorted_keys:
return None
hash_val = self._hash(key)
import bisect
idx = bisect.bisect(self.sorted_keys, hash_val)
if idx >= len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]
class MultiDimensionalHasher:
"""
Multi-dimensional consistent hashing.
Each dimension is an independent hash ring. Nodes are selected
by combining votes from each dimension with weights.
"""
def __init__(self, dimensions, nodes=None, virtual_nodes=100):
"""
Initialize with `dimensions` independent hash rings.
dimensions: list of dimension names, e.g., ['region', 'tenant']
"""
self.dimension_names = dimensions
self.rings = [ConsistentHashRing(nodes, virtual_nodes) for _ in dimensions]
self.weights = [1.0] * len(dimensions)
def get_node(self, key_attrs):
"""
key_attrs: dict mapping dimension name to key value, e.g.,
{'region': 'us-east', 'tenant': 'tenant-42'}
Returns the node with the highest weighted vote across all dimensions.
"""
if isinstance(key_attrs, dict):
# If passed as dict, extract values in dimension order
attrs = [key_attrs.get(dim, '') for dim in self.dimension_names]
else:
# If passed as list/tuple, assume same order as dimensions
attrs = list(key_attrs)
votes = {}
for i, attr in enumerate(attrs):
node = self.rings[i].get_node(attr)
votes[node] = votes.get(node, 0) + self.weights[i]
return max(votes, key=votes.get)
def get_replicas(self, key_attrs, num_replicas=3):
"""Get multiple replica nodes, one per dimension per replica slot."""
if isinstance(key_attrs, dict):
attrs = [key_attrs.get(dim, '') for dim in self.dimension_names]
else:
attrs = list(key_attrs)
all_candidates = []
for i, attr in enumerate(attrs):
node = self.rings[i].get_node(attr)
weight = self.weights[i]
all_candidates.append((node, weight))
# Sort by weight descending and pick top num_replicas
all_candidates.sort(key=lambda x: x[1], reverse=True)
return [c[0] for c in all_candidates[:num_replicas]]
# Example: geo-distributed database with region and tenant dimensions
hasher = MultiDimensionalHasher(
dimensions=['region', 'tenant'],
nodes=['node-us-east-1', 'node-us-east-2', 'node-eu-west-1', 'node-ap-south-1'],
virtual_nodes=50
)
# Keys are routed based on both region AND tenant
key1 = {'region': 'us-east', 'tenant': 'tenant-42'}
key2 = {'region': 'us-east', 'tenant': 'tenant-42'} # Same tenant, same region
key3 = {'region': 'eu-west', 'tenant': 'tenant-42'} # Different region
print(hasher.get_node(key1)) # Likely returns node-us-east-1 or node-us-east-2
print(hasher.get_node(key2)) # Same result as key1 (deterministic)
print(hasher.get_node(key3)) # Likely returns node-eu-west-1
How the voting works:
- For each dimension, hash the attribute value and find the winning node on that dimension’s ring
- Tally votes: each dimension contributes its winner with that dimension’s weight
- Return the node with the highest total vote
The result is data-locality along multiple axes. Tenants in the same region end up on regional nodes, but different tenants in that region still distribute across the nodes in it.
Use case: geo-distributed databases. Systems like DynamoDB Global Tables or Cassandra with multi-DC replication benefit from multi-dimensional hashing. You want reads to hit the nearest replica for latency, but you also want tenant-level isolation. By hashing tenant_id on one dimension and using proximity-based replica selection on another, you get both properties.
Trade-offs:
| Aspect | Single-Dimension CH | Multi-Dimensional CH |
|---|---|---|
| Routing logic | O(log N) per dimension | O(D * log N) where D = dimensions |
| Data locality | One axis only | Multiple axes |
| Implementation | Simpler | More complex |
| Debugging | Easier to trace | Must track multiple rings |
Handling Hash Collisions Gracefully
Hash functions map arbitrary input to fixed-size output. With a large enough key space, two different keys can produce the same hash value. This is a collision. In consistent hashing, collisions create ambiguity: which node owns the key?
Most consistent hashing implementations use 32-bit or 64-bit hash values. With billions of keys, collisions are not theoretical — they happen in production.
How Collisions Manifest
When two keys hash to the same position on the ring, the first node clockwise from that position handles both keys. This is correct behavior, but it creates uneven distribution if many keys collide at the same point.
import hashlib
def simple_hash(key):
"""Standard hash - collisions happen."""
return int(hashlib.md5(str(key).encode()).hexdigest(), 16) % (2**32)
# Different keys, same hash
key1 = "user:1234567"
key2 = "user:2345678"
hash1 = simple_hash(key1)
hash2 = simple_hash(key2)
print(f"Key 1 hash: {hash1}")
print(f"Key 2 hash: {hash2}")
print(f"Same? {hash1 == hash2}") # Can be True!
Consistent Collision Handling with Double Hashing
The standard approach is double hashing: use a secondary hash function when the first produces a collision.
import hashlib
import mmh3 # MurmurHash3
class ConsistentHashWithCollisionHandling:
def __init__(self, nodes=None, virtual_nodes=100):
self.virtual_nodes = virtual_nodes
self.ring = {}
self.sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def _hash_primary(self, key):
"""Primary hash function - use for ring position."""
return mmh3.hash(str(key)) % (2**32)
def _hash_secondary(self, key):
"""Secondary hash function - used for collision resolution."""
# Use a different algorithm and salt
return int(hashlib.sha256(f"{key}:secondary".encode()).hexdigest(), 16) % (2**32)
def add_node(self, node):
for i in range(self.virtual_nodes):
key = mmh3.hash(f"{node}:vn{i}") % (2**32)
self.ring[key] = node
self.sorted_keys.append(key)
self.sorted_keys.sort()
def get_node(self, key):
if not self.sorted_keys:
return None
hash_val = self._hash_primary(key)
# Find first node >= hash
for node_key in self.sorted_keys:
if node_key >= hash_val:
return self.ring[node_key]
# Collision: use secondary hash to find alternative position
alt_hash = self._hash_secondary(key)
for node_key in self.sorted_keys:
if node_key >= alt_hash:
return self.ring[node_key]
# If still collided (extremely rare), use first node
return self.ring[self.sorted_keys[0]]
def get_node_with_stats(self, key):
"""Get node and track collision statistics."""
hash_val = self._hash_primary(key)
alt_hash = self._hash_secondary(key)
# Check for collision
if hash_val == alt_hash:
return self.get_node(key), True
return self.get_node(key), False
Trade-offs of Double Hashing
| Aspect | Single Hash | Double Hashing |
|---|---|---|
| Collision probability | Higher (birthday paradox) | Lower (two independent hashes) |
| Lookup speed | O(log N) | O(log N) or O(2*log N) worst case |
| Implementation | Simpler | Slightly more complex |
| Memory overhead | None | None |
| Distribution quality | May degrade with many collisions | More even distribution |
Double hashing adds minimal overhead. The secondary hash is only consulted when two keys share a primary hash, which is rare for well-chosen hash functions. The O(log N) lookup stays the same — you just traverse the ring twice in the worst case.
Production Considerations
Monitor collision rates. Track how often the secondary hash is used. A rate above 0.1% suggests your hash function has poor distribution for your key set.
Use independent hash families. The secondary hash should use a different algorithm than the primary. If both use MD5, they may share collision patterns. Mixing MD5 with MurmurHash3 gives you independent distributions.
Consider bounded collision domains. If your keys have known structure (e.g., user:ID where ID is numeric), ensure your hash function distributes that specific pattern well. Some hash functions perform poorly on sequential or low-entropy keys.
Jump hashing is an alternative worth knowing about. Google’s Jump Consistent Hashing handles collisions implicitly through a deterministic remapping algorithm. It avoids the ring structure entirely and handles node additions without remapping all keys. It works well for up to hundreds of nodes but is not as widely adopted as classic consistent hashing.
def jump_hash(key, num_buckets):
"""
Google's Jump Consistent Hashing.
Handles collisions implicitly - no double hashing needed.
"""
var hash = key
var b = -1
var j = 0
while j < num_buckets:
b = j
hash = (hash * 2862933555777941757 + 1) % (2**64)
j = float(hash) / ((2**64 - 1) / (j + 1))
return b
Jump hashing handles the birthday paradox differently — it assigns keys to buckets deterministically without a ring structure. For systems adding and removing nodes frequently, this avoids rebalancing complexity entirely.
Real-World Case Studies
Amazon DynamoDB: Consistent Hashing with Partition Management
DynamoDB uses consistent hashing but with a twist: it pre-splits the hash space into 16384 hash slots and distributes them across partitions. Each partition is served by multiple nodes for replication. When you provision throughput, DynamoDB allocates capacity across partitions — not across a raw consistent hash ring.
The practical lesson: pure consistent hashing handles distribution well but does not handle hot spots. DynamoDB’s partition key architecture means that a single partition can handle roughly 1000 WCUs and 3000 RCUs. If your access pattern sends all traffic to one partition key (say, user_id = “alice” for every request), you saturate one partition regardless of your total provisioned throughput. The fix is adding a random suffix to partition keys — this distributes load across multiple partitions.
Dynamo’s 2017 rearchitecture also introduced DynamoDB Streams, which tracks item-level changes with sequence numbers. This enabled read-your-writes consistency within a session by tracking the last sequence number read.
Cassandra: Ketama and the Memcached Precedent
Cassandra’s original hashing strategy used a derivative of the ketama algorithm — Memcached’s consistent hashing implementation. The ketama approach uses virtual nodes (100 per physical node) and MD5 hashing, and was designed specifically to avoid the “thundering herd” problem when nodes are added or removed.
The known gotcha: Cassandra’s initial implementation had uneven distribution for small clusters because the ketama weights were not properly calibrated below 8 nodes. This was fixed in later versions, but it meant that 3-node Cassandra clusters could have 30-40% distribution skew. The operational lesson: validate your distribution metrics on any consistent hashing cluster, especially when node counts are low.
Memcached: The Client-Side Hashing Problem
Memcached itself has no server-side consistent hashing — the client decides which server to use. This means every client library must implement consistent hashing independently. This caused fragmentation: some clients used ketama, others used simple modulo, others used rendezvous hashing.
The operational failure mode: rolling out a new Memcached client version that changed the hashing algorithm. All keys that hashed differently would miss, flooding the database. This happened at several companies including Reddit — a client update caused cache miss rates to spike, which translated directly into database CPU spikes.
The mitigation: hash ring versioning. When introducing a new hash algorithm, run both the old and new ring simultaneously for a migration window, and drain keys from the old ring before decommissioning it.
Interview Questions
Q: You have a distributed cache with 10 nodes using consistent hashing and 100 virtual nodes per physical node. A node fails. How many keys, approximately, need to be remapped?
A: With 10 physical nodes and 100 virtual nodes each, there are 1000 points on the ring. Each physical node owns roughly 1/10 of the ring. When one node fails, its 100 virtual nodes are removed, and keys that previously mapped to those points are redistributed to the next clockwise nodes. In total, approximately 1/10 of keys (10%) would need to remap — specifically, the keys that were mapped to the 100 virtual node points of the failed node.
Q: What is the difference between consistent hashing and rendezvous hashing? When would you choose rendezvous over consistent hashing?
A: Rendezvous hashing (highest random weight) computes a score for each key-server pair and picks the server with the highest score. It avoids the ring structure entirely and guarantees that adding N servers only affects keys that were previously mapped to those N servers — not neighboring keys. Consistent hashing’s advantage is O(log N) lookup versus O(N) for rendezvous. You would choose rendezvous when you have a fixed server set, need minimal redistribution on changes, and can afford the O(N) lookup cost, or when you need deterministic behavior without a distributed ring state.
Q: Virtual nodes solve a specific problem in consistent hashing. Explain the problem and the trade-off introduced.
A: Virtual nodes solve the uneven distribution problem when physical node counts are small. Without virtual nodes, a 3-node ring has only 3 hash points — some nodes might end up with significantly more keys than others. With 100 virtual nodes per physical node, the law of large numbers smooths the distribution. The trade-off is increased metadata overhead: instead of storing N nodes, you store N×V virtual node entries. Lookup is still O(log N) if using a sorted ring, but the ring structure is larger. Additionally, when a node fails, the redistribution is more granular but affects more keys per virtual node failure.
Q: Your distributed cache hit rate dropped from 95% to 60% after a routine deployment that added two new cache nodes. What happened and how would you diagnose it?
A: The most likely cause is that the new nodes were added to the consistent hash ring and immediately started receiving requests for their hash ranges, but the data was not pre-warmed on them — so cold cache. The theoretical 1/N cache miss rate increase would be modest (from 1/10 to 1/12 ≈ 8% miss rate increase), not 35 percentage points. Possible additional causes: the new nodes have different configuration causing connection issues, the hash algorithm changed, or the node addition triggered a rebalancing that inadvertently invalidated a large portion of existing cached data. Diagnosis: compare cache key distribution before and after, check client logs for connection errors, verify the hash ring state on the clients, and check whether pre-warming ran after the deployment.
Q: A system uses consistent hashing for data distribution across 5 nodes with replication factor 3. One node is down for 30 minutes. Describe what happens to reads and writes during this window.
A: For writes with replication factor 3, the coordinator sends the write to the primary replica and two additional nodes in the ring. If one of the replicas is on the failed node, the write can still succeed as long as the coordinator receives acknowledgments from the remaining two replicas (depending on consistency level — ONE, QUORUM, or ALL). For reads, if the consistency level is ONE and the replica contacted is stale, you get a stale read. If using QUORUM, reads quorum with the available replicas and may still return fresh data. The key risk: if the failed node held the primary replica for a significant number of keys, and the remaining nodes cannot reach quorum, writes may be unavailable. The anti-entropy process catches up after the node recovers, but the 30-minute window means up to 30 minutes of writes may need replay.
Quick Recap
Key Bullets:
- Consistent hashing minimizes key redistribution when nodes join or leave
- Virtual nodes smooth out uneven distribution across the ring
- Keys map to the first node clockwise on the ring
- Replication placement extends consistent hashing for durability
- Ring state consistency across clients is critical for correct routing
- Monitor distribution balance to detect hot spots and configuration errors
Copy/Paste Checklist:
# Minimal consistent hashing implementation with monitoring
import hashlib
import bisect
class ConsistentHash:
def __init__(self, nodes=None, virtual_nodes=100):
self.virtual_nodes = virtual_nodes
self.ring = {}
self.sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def _hash(self, key):
return int(hashlib.md5(str(key).encode()).hexdigest(), 16) % (2**32)
def add_node(self, node):
for i in range(self.virtual_nodes):
key = self._hash(f"{node}:vn{i}")
self.ring[key] = node
bisect.insort(self.sorted_keys, key)
def remove_node(self, node):
for i in range(self.virtual_nodes):
key = self._hash(f"{node}:vn{i}")
self.ring.pop(key)
self.sorted_keys.remove(key)
def get_node(self, key):
if not self.sorted_keys:
return None
hash_val = self._hash(key)
idx = bisect.bisect(self.sorted_keys, hash_val)
if idx >= len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]
# Usage
ch = ConsistentHash(['node1', 'node2', 'node3'], virtual_nodes=200)
print(ch.get_node('user:123')) # Returns which node handles this key
Related Posts
- Distributed Systems Roadmap - Consistent hashing is a foundational concept for distributed systems, enabling data partitioning, leader election, and replication across node clusters without requiring global coordination
Conclusion
Consistent hashing solves the distribution problem in large-scale systems. When nodes join or leave, only neighboring keys remap. Virtual nodes smooth out uneven distribution.
You will find this pattern in distributed caches, databases, and CDNs. Memcached clients, DynamoDB, and Cassandra all use variations of consistent hashing.
For related reading, see Horizontal Sharding for database sharding strategies, and Database Scaling for scaling approaches.
Category
Related Posts
Cache Stampede Prevention: Protecting Your Cache
Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.
Microservices vs Monolith: Choosing the Right Architecture
Understand the fundamental differences between monolithic and microservices architectures, their trade-offs, and how to decide which approach fits your project.
Asynchronous Replication: Speed and Availability at Scale
Learn how asynchronous replication works in distributed databases, including eventual consistency implications, lag monitoring, and practical use cases where speed outweighs strict consistency.