Vector Clocks: Detecting Concurrent Updates in Distributed Systems
Learn how vector clocks track full causal history to detect concurrent updates, enable conflict detection, and support automatic conflict resolution in distributed databases.
Vector Clocks: Detecting Concurrent Updates
Lamport timestamps tell you that if A happens-before B, then T(A) < T(B). But they do not let you determine whether two events are causally related or concurrent. Vector clocks solve this limitation.
The idea: instead of a single number, each event gets a vector of counters, one per node. This vector encodes the full causal history of the event.
This post builds on Logical Clocks. Read that first if you are unfamiliar with Lamport timestamps.
The Problem with Lamport Clocks
Consider two users editing the same document offline. Both start from version 1. User A edits the title. User B edits the footer. Both come back online and try to sync.
sequenceDiagram
participant A as Device A
participant B as Device B
participant Server
Note over A: Version 1
Note over B: Version 1
A->>Server: Update: title changed
Note over Server: VC_A: [A:2, B:1]
B->>Server: Update: footer changed
Note over Server: VC_B: [A:1, B:2]
Note over Server: VC_A and VC_B are concurrent!<br/>Neither is a subset of the other
With Lamport clocks, both updates might have timestamp 2. Which came first? You cannot tell. They are concurrent.
With vector clocks, each update carries its full causal history. Server can see that neither history is a subset of the other, meaning the updates are truly concurrent.
How Vector Clocks Work
The Data Structure
A vector clock is a map from node IDs to counters:
// Vector clock representation
class VectorClock {
constructor(nodeId) {
this.nodeId = nodeId;
this.clocks = new Map();
}
// Get counter for this node
get(nodeId) {
return this.clocks.get(nodeId) || 0;
}
// Increment this node's counter
tick() {
this.clocks.set(this.nodeId, this.get(this.nodeId) + 1);
return this.clone();
}
// Merge with another vector clock (on message receive)
merge(other) {
for (const [node, counter] of other.clocks) {
this.clocks.set(node, Math.max(this.get(node), counter));
}
return this;
}
// Clone the vector clock
clone() {
const copy = new VectorClock(this.nodeId);
copy.clocks = new Map(this.clocks);
return copy;
}
}
The Three Rules
Vector clocks follow three rules:
- Local event: Increment your own counter
- Send message: Include current vector clock
- Receive message: Set each component to max(local, received), then increment your own
// Rule 1: Local event
function localEvent(clock, nodeId) {
clock.clocks.set(nodeId, clock.get(nodeId) + 1);
return clock.clone();
}
// Rule 2: Send
function send(clock, nodeId, recipientId, message) {
const ts = clock.tick();
message.vectorClock = ts;
return { message, timestamp: ts };
}
// Rule 3: Receive
function receive(clock, nodeId, message) {
for (const [node, counter] of message.vectorClock.clocks) {
clock.clocks.set(node, Math.max(clock.get(node), counter));
}
clock.clocks.set(nodeId, clock.get(nodeId) + 1);
return clock.clone();
}
Comparing Vector Clocks
The Comparison Operation
Given two vector clocks VC1 and VC2:
- VC1 < VC2 if VC1[node] <= VC2[node] for all nodes, and VC1[node] < VC2[node] for at least one node
- VC1 > VC2 if VC1 > VC2 by the same definition
- VC1 || VC2 (concurrent) if neither VC1 < VC2 nor VC1 > VC2
// Compare two vector clocks
function compare(clock1, clock2) {
let lessThan = false;
let greaterThan = false;
// Get all node IDs
const allNodes = new Set([...clock1.clocks.keys(), ...clock2.clocks.keys()]);
for (const node of allNodes) {
const v1 = clock1.get(node);
const v2 = clock2.get(node);
if (v1 < v2) lessThan = true;
if (v1 > v2) greaterThan = true;
}
if (lessThan && !greaterThan) return -1; // clock1 < clock2
if (greaterThan && !lessThan) return 1; // clock1 > clock2
return 0; // concurrent
}
Visualizing the Ordering
graph TD
A[Start: A:1, B:0, C:0] --> B[A:2, B:0, C:0]
A --> C[A:1, B:1, C:0]
B --> D[A:2, B:1, C:0]
C --> E[A:1, B:2, C:0]
D --> F[A:2, B:2, C:0]
E --> F
F --> G[A:2, B:2, C:1]
C -.->|concurrent| B
E -.->|concurrent| D
The key insight: when neither vector clock is a subset of the other, they are concurrent.
Vector Clocks in Dynamo-style Systems
Amazon Dynamo popularized vector clocks for conflict detection in distributed databases. Many real systems use variants of this approach.
Riak’s Vector Clock Implementation
Riak, an open-source Dynamo implementation, uses vector clocks extensively:
// Riak-style vector clock operations
class RiakVectorClock {
constructor() {
this.vclock = []; // Array of [nodeId, counter] pairs
}
// Create a new vclock after local update
descend(other) {
const result = new RiakVectorClock();
if (other) {
// Copy all entries from other vclock
for (const [node, counter] of other.vclock) {
result.vclock.push([node, counter]);
}
}
// Increment our counter
const myEntry = result.vclock.find((e) => e[0] === myNodeId);
if (myEntry) {
myEntry[1]++;
} else {
result.vclock.push([myNodeId, 1]);
}
return result;
}
// Merge two vclocks
merge(other) {
const result = new RiakVectorClock();
const map = new Map();
// Start with entries from both
for (const [node, counter] of this.vclock) map.set(node, counter);
for (const [node, counter] of other.vclock) {
map.set(node, Math.max(map.get(node) || 0, counter));
}
// Convert back to array
result.vclock = Array.from(map.entries());
return result;
}
}
Conflict Detection
When a GET request returns multiple values with concurrent vector clocks, a conflict exists:
// GET with vector clock conflict detection
async function get(key) {
const replicas = await getReplicas(key);
const results = await Promise.all(replicas.map((r) => r.get(key)));
// Group by vector clock
const byClock = new Map();
for (const result of results) {
const vcKey = JSON.stringify(result.vectorClock);
if (!byClock.has(vcKey)) {
byClock.set(vcKey, []);
}
byClock.get(vcKey).push(result);
}
// If multiple distinct vector clocks, we have conflicts
if (byClock.size > 1) {
return {
hasConflicts: true,
values: Array.from(byClock.values()).map((group) => group[0]),
vectorClocks: Array.from(byClock.keys()),
};
}
// No conflict - return the value
return {
hasConflicts: false,
value: results[0].value,
};
}
Practical Implementation
Storage Overhead
Vector clocks grow with the number of nodes that have updated an object. In systems with thousands of nodes, this becomes expensive.
// Storage overhead example
// With 100 nodes, each doing updates:
// VC size = 100 entries * ~50 bytes = 5KB per object
// Optimization: prune old entries
// Dynamo uses "故障检测" to remove stale entries
// Keep entries that are subsets of recent vclocks
function prune(vclock, recentVclocks, threshold = 3) {
const dominated = new Set();
for (const recent of recentVclocks) {
if (isSubset(vclock, recent)) {
dominated.add(recent);
}
}
if (dominated.size >= threshold) {
// This vclock is dominated by enough others - prune it
return true;
}
return false;
}
Message Passing with Vector Clocks
// Send with vector clock
async function put(nodeId, key, value, vclock) {
const newVclock = vclock.descend();
const message = {
type: "put",
key,
value,
vclock: newVclock,
timestamp: Date.now(),
};
await network.send(nodeId, message);
return newVclock;
}
// Receive and merge
async function receive(message) {
const { vclock: msgVclock } = message;
// Merge incoming vclock with local
const mergedVclock = localVclock.merge(msgVclock);
// If this causally subsumes our previous state, apply update
if (isSubset(localVclock, msgVclock)) {
applyUpdate(message);
}
localVclock = mergedVclock;
}
Version Vectors vs Vector Clocks
There is an important distinction between Version Vectors and true Vector Clocks. Understanding the difference matters for system design.
What is a Version Vector?
A Version Vector (VV) is a special case of vector clock used for conflict detection in replicated data. The key constraint: a version vector has exactly one entry per replica, and those entries represent the version number at that replica.
// Version Vector structure (simplified)
class VersionVector {
constructor(replicas) {
this.versions = new Map();
for (const replica of replicas) {
this.versions.set(replica, 0);
}
}
// Increment version at a specific replica
increment(replica) {
this.versions.set(replica, this.versions.get(replica) + 1);
}
// Merge with another version vector
merge(other) {
for (const [replica, version] of other.versions) {
this.versions.set(
replica,
Math.max(this.versions.get(replica) || 0, version),
);
}
}
}
Key Differences
| Aspect | Version Vector | True Vector Clock |
|---|---|---|
| Size | O(r) where r = number of replicas | O(n) where n = number of nodes |
| Causality tracking | Per-replica versions only | Full causal history |
| Conflict detection | Yes | Yes |
| Conflict resolution | Application-specific | Application-specific |
| Space efficiency | Better for fixed replica sets | Better for dynamic node sets |
| Used in | Riak, Cassandra 2.0 | Dynamiq, COPS, general purpose |
Space Efficiency Comparison
Version Vectors are more space-efficient when you have a fixed, known number of replicas:
Scenario: 3-node replicated database with 1 update/second per node
Version Vector size:
- 3 entries × ~50 bytes = 150 bytes per object
- Grows only if replica count changes
True Vector Clock size (in a 100-node cluster):
- 100 entries × ~50 bytes = 5,000 bytes per object
- Grows linearly with node count
Quantitative Bounds
Vector clocks can grow significantly in large clusters:
// Space calculation for vector clocks
function calculateVectorClockSize(
nodeCount,
avgNodeIdLength = 16,
counterBytes = 8,
) {
// Each entry: node ID (string) + counter (int64)
const bytesPerEntry = avgNodeIdLength + counterBytes;
const totalBytes = nodeCount * bytesPerEntry;
return {
perObject: totalBytes,
perObjectKB: (totalBytes / 1024).toFixed(2),
perObjectMB: (totalBytes / (1024 * 1024)).toFixed(4),
};
}
// Examples:
calculateVectorClockSize(10);
// { perObject: 240, perObjectKB: '0.23', perObjectMB: '0.0002' }
calculateVectorClockSize(50);
// { perObject: 1200, perObjectKB: '1.17', perObjectMB: '0.0011' }
calculateVectorClockSize(100);
// { perObject: 2400, perObjectKB: '2.34', perObjectMB: '0.0023' }
calculateVectorClockSize(500);
// { perObject: 12000, perObjectKB: '11.72', perObjectMB: '0.0114' }
With N nodes doing 1 update/second, vector clocks grow to approximately 50KB per object after sustained operation in large clusters.
When to Use Each
Use Version Vectors when:
- You have a fixed number of replicas (3-7 typical)
- You need conflict detection for a replicated database
- Space efficiency is important
Use True Vector Clocks when:
- Nodes can join/leave dynamically
- You need full causal history tracking
- You’re building a general-purpose distributed system
DynamoDB: Timestamp-Based, Not Vector Clocks
Amazon DynamoDB uses a timestamp-based approach for conflict resolution, not true vector clocks. This is a common point of confusion.
DynamoDB’s Approach
DynamoDB uses hybrid logical clocks (HLC) combined with physical timestamps:
// DynamoDB's timestamp-based conflict resolution
// Each item has a scalar attribute used for last-write-wins
{
"user:123": {
"name": "Alice",
"_meta": {
"updatedAt": 1712345678901, // Physical timestamp (server time)
"updatedBy": "dynamo-db",
"vectorClock": { /* Not used in standard DynamoDB */ }
}
}
}
// Conflict resolution: last-write-wins based on updatedAt
// If two writes arrive, the one with larger updatedAt wins
Key Points About DynamoDB
- No true vector clocks: DynamoDB does not track causal history across updates
- Last-write-wins: Conflicts are resolved by timestamp, not by detecting concurrency
- Per-item versioning: Each item has its own version, not a global vector clock
- Conditional writes: Applications can use conditional expressions to detect conflicts
Comparison with True Vector Clock Systems
| Feature | DynamoDB | True VC Systems (Riak, Cassandra) |
|---|---|---|
| Conflict detection | No (uses LWW) | Yes |
| Concurrent update detection | No | Yes |
| Multiple valid values | No | Yes |
| Application-controlled resolution | Limited | Full control |
| Causality tracking | No | Yes |
When This Matters
DynamoDB’s approach works well when:
- Last-write-wins is acceptable for your use case
- You don’t need to detect true concurrent updates
- You want simplicity over fine-grained conflict handling
True vector clocks matter when:
- Concurrent updates to the same key are possible
- You need to know if updates are causally related
- You want application-level conflict resolution
Debugging: Real Conflict Scenario Walkthrough
Let me walk through a real conflict scenario to show how vector clocks work in practice:
sequenceDiagram
participant A as Node A (us-east-1)
participant B as Node B (us-west-2)
participant C as Node C (eu-west-1)
Note over A: Initial: VC=[A:1, B:0, C:0]
Note over B: Initial: VC=[A:0, B:1, C:0]
Note over C: Initial: VC=[A:0, B:0, C:1]
A->>A: Local update<br/>VC=[A:2, B:0, C:0]
Note over A: User edits profile
B->>B: Local update<br/>VC=[A:0, B:2, C:0]
Note over B: User edits same profile
A->>B: Sync: VC=[A:2, B:0, C:0]
B->>A: Sync: VC=[A:0, B:2, C:0]
Note over A: After merge:<br/>VC=[A:2, B:2, C:0]
Note over B: After merge:<br/>VC=[A:2, B:2, C:0]
Note over A,B: CONFLICT DETECTED!<br/>VC[A]=2, VC[B]=2<br/>Neither is subset of other
Step-by-Step Debug Session
// Debugging vector clock conflicts in a distributed database
// Step 1: Log the vector clocks at each node
const nodeAVClock = { A: 2, B: 2, C: 0 };
const nodeBVClock = { A: 2, B: 2, C: 0 };
// Step 2: Compare them
function compareVCs(vc1, vc2) {
let lessThan = false;
let greaterThan = false;
const allNodes = new Set([...Object.keys(vc1), ...Object.keys(vc2)]);
for (const node of allNodes) {
const v1 = vc1[node] || 0;
const v2 = vc2[node] || 0;
if (v1 < v2) lessThan = true;
if (v1 > v2) greaterThan = true;
}
if (lessThan && !greaterThan) return "vc1 < vc2";
if (greaterThan && !lessThan) return "vc1 > vc2";
return "CONCURRENT";
}
console.log(compareVCs(nodeAVClock, nodeBVClock));
// Output: CONCURRENT
// Step 3: Analyze the conflict
// VC[A] = 2 in both: both nodes have seen A's updates
// VC[B] = 2 in both: both nodes have seen B's updates
// But: Node A's update at A:2 happened independently from Node B's update at B:2
// Neither node saw the other's update before making their own
// Step 4: Resolution strategies
// Strategy 1: Last-write-wins (use physical timestamps)
// Strategy 2: Keep both (application resolves)
// Strategy 3: Merge (CRDT if possible)
// Strategy 4: Flag for manual resolution
Real Debugging Output Example
// Actual debugging output from a Riak cluster
{
"key": "user:123:profile",
"siblings": [
{
"value": { "name": "Alice", "bio": "Software engineer" },
"vclock": "vclock:[{A,2},{B,2},{C,1}]",
"vector_clock": { "A": 2, "B": 2, "C": 1 }
},
{
"value": { "name": "Alice Chen", "bio": "Engineer" },
"vclock": "vclock:[{A,1},{B,3},{C,0}]",
"vector_clock": { "A": 1, "B": 3, "C": 0 }
}
],
"conflict_detected": true,
"reason": "Neither version dominates the other",
"vc_comparison": {
"v1_subsumes_v2": false,
"v2_subsumes_v1": false,
"status": "concurrent"
}
}
Resolution Flow
flowchart TD
A[Conflict Detected] --> B{Are values<br/>mergeable?}
B -->|Yes - CRDT| C[Apply CRDT merge<br/>Automatic resolution]
B -->|No| D{Are values<br/>similar?}
D -->|Yes| E[Semantic merge<br/>e.g., document merge]
D -->|No| F[Present both to user<br/>or application]
C --> G[Resolved - single value]
E --> H[Resolved - merged value]
F --> I[Awaiting resolution]
Why Not Just Use Timestamps?
Timestamp-Based Conflict Resolution Fails
Last-write-wins uses wall-clock timestamps for conflict resolution. This fails in predictable ways:
// Timestamp-based resolution fails
async function resolveWithTimestamps(local, remote) {
// Assumes: larger timestamp = newer
// Problem: clocks are not synchronized
// Problem: clock skew can be seconds or minutes
if (remote.timestamp > local.timestamp) {
return remote; // But remote might be causally older!
}
return local;
}
Vector Clocks Capture Causality
Vector clocks let you distinguish between causally related updates and concurrent updates:
- Causally related updates: A caused B, so you keep B
- Concurrent updates: neither caused the other, both are valid choices
// With vector clocks
async function resolveWithVectorClocks(local, remote) {
const vc1 = parseVectorClock(local.vclock);
const vc2 = parseVectorClock(remote.vclock);
const cmp = compare(vc1, vc2);
if (cmp < 0) {
// VC1 < VC2: local causally precedes remote
return remote;
} else if (cmp > 0) {
// VC1 > VC2: remote causally precedes local
return local;
} else {
// Concurrent: both are valid, keep both for application to resolve
return { hasConflict: true, local, remote };
}
}
This is why Cassandra, DynamoDB, Riak, and similar systems use vector clocks for conflict detection.
Dotted Version Vectors
Production systems often use a variant called Dotted Version Vectors (DVVs), which combine vector clocks with operation counts for more efficient conflict detection.
// DVV combines VC with operation dots
class DottedVersionVector {
constructor() {
this.vclock = new Map(); // node -> counter
this.dot = null; // [node, counter] for this specific update
}
// When creating a new version:
dot() {
const node = this.nodeId;
this.vclock.set(node, (this.vclock.get(node) || 0) + 1);
this.dot = [node, this.vclock.get(node)];
}
// When syncing with peer:
sync(peerDvv) {
// Merge vclocks
for (const [node, counter] of peerDvv.vclock) {
this.vclock.set(node, Math.max(this.vclock.get(node) || 0, counter));
}
}
}
DVVs are used in systems like AntidoteDB and provide more precise conflict detection than plain vector clocks.
Real-World Usage Patterns
Facebook’s Cassandra
Facebook’s original Cassandra used vector clocks for column-level timestamps. Each column had a version vector, allowing very fine-grained conflict detection.
CRDTs with Vector Clocks
Conflict-Free Replicated Data Types (CRDTs) work well with vector clocks. When two CRDT operations are concurrent, you apply both and the CRDT merge handles it automatically.
// GCounter with vector clock
class VClockGCounter {
constructor(nodeId) {
this.nodeId = nodeId;
this.vclock = new VectorClock(nodeId);
this.counters = new Map(); // node -> count
}
increment() {
this.vclock = this.vclock.tick();
this.counters.set(this.nodeId, (this.counters.get(this.nodeId) || 0) + 1);
}
merge(other) {
// Merge vector clocks
this.vclock.merge(other.vclock);
// Merge counters (CRDT merge: max for G-Counter)
for (const [node, count] of other.counters) {
this.counters.set(node, Math.max(this.counters.get(node) || 0, count));
}
}
value() {
return Array.from(this.counters.values()).reduce((a, b) => a + b, 0);
}
}
Limitations
Memory and Bandwidth
Vector clocks grow with node count. In large clusters, this becomes significant. Most production systems prune or limit vclock size.
Complexity
Vector clocks add complexity to application code. For simple use cases, eventual consistency with last-write-wins is simpler.
Pruning Trade-offs
Pruning vector clocks to save space can cause you to lose the ability to detect conflicts. You might incorrectly assume one update causally precedes another.
Conclusion
Vector clocks extend Lamport timestamps to capture full causal history. The key advance: you can determine whether two events are causally related or concurrent.
Dynamo-style systems use vector clocks for conflict detection. When concurrent updates are detected, the application or a resolution strategy decides which to keep.
Key takeaways:
- Vector clocks track per-node counters, not a single value
- If VC1 is a subset of VC2, VC1 causally precedes VC2
- If neither VC is a subset of the other, events are concurrent
- Concurrent events require conflict resolution
- Vector clocks enable automatic conflict detection in distributed databases
For related topics, see Logical Clocks, Consistency Models, and Geo-Distribution.
When to Use / When Not to Use Vector Clocks
| Scenario | Recommendation |
|---|---|
| Distributed database with multi-version concurrency | Use vector clocks |
| Offline-first applications with sync | Use vector clocks |
| Collaborative editing | Use vector clocks |
| Single-region database with strong consistency | Might not need |
| Simple key-value store with LWW | Might not need |
| Systems with thousands of nodes | Consider DVVs or pruning |
When TO Use Vector Clocks
- Building a Dynamo-style eventually-consistent database
- Implementing offline-first sync
- Collaborative editing with conflict detection
- Any system where concurrent updates to the same data are possible
When NOT to Use Vector Clocks
- Strongly consistent systems using consensus (Raft, Paxos)
- Single-node databases with no replication
- Simple caches where last-write-wins is acceptable
- Very large node counts where storage overhead matters
Production Considerations
Monitoring Conflict Rates
Track how often conflicts occur:
// Count conflicts
metrics.counter("dvv_conflicts_total").inc();
metrics.gauge("dvv_conflict_rate", (conflictsDetected / totalGets) * 100);
Pruning Strategies
// Time-based pruning
function pruneByTime(vclock, maxAge = 86400000) {
const cutoff = Date.now() - maxAge;
const pruned = new Map();
for (const [node, counter] of vclock.clocks) {
if (lastUpdate(node) > cutoff) {
pruned.set(node, counter);
}
}
return pruned;
}
// Size-based pruning
function pruneBySize(vclock, maxEntries = 50) {
if (vclock.clocks.size <= maxEntries) return vclock.clocks;
// Keep most recent entries
const sorted = Array.from(vclock.clocks.entries()).sort(
(a, b) => b[1] - a[1],
);
return new Map(sorted.slice(0, maxEntries));
}
Debugging Tips
When debugging vector clock issues, log both the raw vclock and a human-readable causal chain:
function describeCausalHistory(vclock) {
const history = [];
for (const [node, counter] of vclock.clocks) {
history.push(`${node}:${counter}`);
}
return history.sort().join(" -> ");
}
Quick Recap
- Vector clocks track per-node counters for full causal history
- If VC1 is a subset of VC2, VC1 causally precedes VC2
- Concurrent updates have VCs where neither is a subset of the other
- Use for conflict detection in eventually-consistent databases
- Pruning strategies manage storage overhead in large clusters
For more on distributed systems, see CAP Theorem, Consistency Models, and PACELC Theorem.
Category
Tags
Related Posts
Clock Skew in Distributed Systems: Problems and Solutions
Explore how clock skew affects distributed systems, causes silent data corruption, breaks conflict resolution, and what you can do to mitigate these issues.
Logical Clocks: Lamport Timestamps and Event Ordering
Understand Lamport timestamps and logical clocks for ordering distributed events without synchronized physical clocks. Learn how to determine what happened before what.
Physical Clocks in Distributed Systems: NTP and Synchronization
Learn how physical clocks work in distributed systems, including NTP synchronization, clock sources, and the limitations of wall-clock time for ordering events.