Vector Clocks: Detecting Concurrent Updates in Distributed Systems

Learn how vector clocks track full causal history to detect concurrent updates, enable conflict detection, and support automatic conflict resolution in distributed databases.

published: reading time: 18 min read

Vector Clocks: Detecting Concurrent Updates

Lamport timestamps tell you that if A happens-before B, then T(A) < T(B). But they do not let you determine whether two events are causally related or concurrent. Vector clocks solve this limitation.

The idea: instead of a single number, each event gets a vector of counters, one per node. This vector encodes the full causal history of the event.

This post builds on Logical Clocks. Read that first if you are unfamiliar with Lamport timestamps.


The Problem with Lamport Clocks

Consider two users editing the same document offline. Both start from version 1. User A edits the title. User B edits the footer. Both come back online and try to sync.

sequenceDiagram
    participant A as Device A
    participant B as Device B
    participant Server

    Note over A: Version 1
    Note over B: Version 1

    A->>Server: Update: title changed
    Note over Server: VC_A: [A:2, B:1]
    B->>Server: Update: footer changed
    Note over Server: VC_B: [A:1, B:2]

    Note over Server: VC_A and VC_B are concurrent!<br/>Neither is a subset of the other

With Lamport clocks, both updates might have timestamp 2. Which came first? You cannot tell. They are concurrent.

With vector clocks, each update carries its full causal history. Server can see that neither history is a subset of the other, meaning the updates are truly concurrent.


How Vector Clocks Work

The Data Structure

A vector clock is a map from node IDs to counters:

// Vector clock representation
class VectorClock {
  constructor(nodeId) {
    this.nodeId = nodeId;
    this.clocks = new Map();
  }

  // Get counter for this node
  get(nodeId) {
    return this.clocks.get(nodeId) || 0;
  }

  // Increment this node's counter
  tick() {
    this.clocks.set(this.nodeId, this.get(this.nodeId) + 1);
    return this.clone();
  }

  // Merge with another vector clock (on message receive)
  merge(other) {
    for (const [node, counter] of other.clocks) {
      this.clocks.set(node, Math.max(this.get(node), counter));
    }
    return this;
  }

  // Clone the vector clock
  clone() {
    const copy = new VectorClock(this.nodeId);
    copy.clocks = new Map(this.clocks);
    return copy;
  }
}

The Three Rules

Vector clocks follow three rules:

  1. Local event: Increment your own counter
  2. Send message: Include current vector clock
  3. Receive message: Set each component to max(local, received), then increment your own
// Rule 1: Local event
function localEvent(clock, nodeId) {
  clock.clocks.set(nodeId, clock.get(nodeId) + 1);
  return clock.clone();
}

// Rule 2: Send
function send(clock, nodeId, recipientId, message) {
  const ts = clock.tick();
  message.vectorClock = ts;
  return { message, timestamp: ts };
}

// Rule 3: Receive
function receive(clock, nodeId, message) {
  for (const [node, counter] of message.vectorClock.clocks) {
    clock.clocks.set(node, Math.max(clock.get(node), counter));
  }
  clock.clocks.set(nodeId, clock.get(nodeId) + 1);
  return clock.clone();
}

Comparing Vector Clocks

The Comparison Operation

Given two vector clocks VC1 and VC2:

  • VC1 < VC2 if VC1[node] <= VC2[node] for all nodes, and VC1[node] < VC2[node] for at least one node
  • VC1 > VC2 if VC1 > VC2 by the same definition
  • VC1 || VC2 (concurrent) if neither VC1 < VC2 nor VC1 > VC2
// Compare two vector clocks
function compare(clock1, clock2) {
  let lessThan = false;
  let greaterThan = false;

  // Get all node IDs
  const allNodes = new Set([...clock1.clocks.keys(), ...clock2.clocks.keys()]);

  for (const node of allNodes) {
    const v1 = clock1.get(node);
    const v2 = clock2.get(node);

    if (v1 < v2) lessThan = true;
    if (v1 > v2) greaterThan = true;
  }

  if (lessThan && !greaterThan) return -1; // clock1 < clock2
  if (greaterThan && !lessThan) return 1; // clock1 > clock2
  return 0; // concurrent
}

Visualizing the Ordering

graph TD
    A[Start: A:1, B:0, C:0] --> B[A:2, B:0, C:0]
    A --> C[A:1, B:1, C:0]
    B --> D[A:2, B:1, C:0]
    C --> E[A:1, B:2, C:0]

    D --> F[A:2, B:2, C:0]
    E --> F

    F --> G[A:2, B:2, C:1]

    C -.->|concurrent| B
    E -.->|concurrent| D

The key insight: when neither vector clock is a subset of the other, they are concurrent.


Vector Clocks in Dynamo-style Systems

Amazon Dynamo popularized vector clocks for conflict detection in distributed databases. Many real systems use variants of this approach.

Riak’s Vector Clock Implementation

Riak, an open-source Dynamo implementation, uses vector clocks extensively:

// Riak-style vector clock operations
class RiakVectorClock {
  constructor() {
    this.vclock = []; // Array of [nodeId, counter] pairs
  }

  // Create a new vclock after local update
  descend(other) {
    const result = new RiakVectorClock();

    if (other) {
      // Copy all entries from other vclock
      for (const [node, counter] of other.vclock) {
        result.vclock.push([node, counter]);
      }
    }

    // Increment our counter
    const myEntry = result.vclock.find((e) => e[0] === myNodeId);
    if (myEntry) {
      myEntry[1]++;
    } else {
      result.vclock.push([myNodeId, 1]);
    }

    return result;
  }

  // Merge two vclocks
  merge(other) {
    const result = new RiakVectorClock();
    const map = new Map();

    // Start with entries from both
    for (const [node, counter] of this.vclock) map.set(node, counter);
    for (const [node, counter] of other.vclock) {
      map.set(node, Math.max(map.get(node) || 0, counter));
    }

    // Convert back to array
    result.vclock = Array.from(map.entries());

    return result;
  }
}

Conflict Detection

When a GET request returns multiple values with concurrent vector clocks, a conflict exists:

// GET with vector clock conflict detection
async function get(key) {
  const replicas = await getReplicas(key);
  const results = await Promise.all(replicas.map((r) => r.get(key)));

  // Group by vector clock
  const byClock = new Map();
  for (const result of results) {
    const vcKey = JSON.stringify(result.vectorClock);
    if (!byClock.has(vcKey)) {
      byClock.set(vcKey, []);
    }
    byClock.get(vcKey).push(result);
  }

  // If multiple distinct vector clocks, we have conflicts
  if (byClock.size > 1) {
    return {
      hasConflicts: true,
      values: Array.from(byClock.values()).map((group) => group[0]),
      vectorClocks: Array.from(byClock.keys()),
    };
  }

  // No conflict - return the value
  return {
    hasConflicts: false,
    value: results[0].value,
  };
}

Practical Implementation

Storage Overhead

Vector clocks grow with the number of nodes that have updated an object. In systems with thousands of nodes, this becomes expensive.

// Storage overhead example
// With 100 nodes, each doing updates:
// VC size = 100 entries * ~50 bytes = 5KB per object

// Optimization: prune old entries
// Dynamo uses "故障检测" to remove stale entries
// Keep entries that are subsets of recent vclocks
function prune(vclock, recentVclocks, threshold = 3) {
  const dominated = new Set();

  for (const recent of recentVclocks) {
    if (isSubset(vclock, recent)) {
      dominated.add(recent);
    }
  }

  if (dominated.size >= threshold) {
    // This vclock is dominated by enough others - prune it
    return true;
  }
  return false;
}

Message Passing with Vector Clocks

// Send with vector clock
async function put(nodeId, key, value, vclock) {
  const newVclock = vclock.descend();

  const message = {
    type: "put",
    key,
    value,
    vclock: newVclock,
    timestamp: Date.now(),
  };

  await network.send(nodeId, message);
  return newVclock;
}

// Receive and merge
async function receive(message) {
  const { vclock: msgVclock } = message;

  // Merge incoming vclock with local
  const mergedVclock = localVclock.merge(msgVclock);

  // If this causally subsumes our previous state, apply update
  if (isSubset(localVclock, msgVclock)) {
    applyUpdate(message);
  }

  localVclock = mergedVclock;
}

Version Vectors vs Vector Clocks

There is an important distinction between Version Vectors and true Vector Clocks. Understanding the difference matters for system design.

What is a Version Vector?

A Version Vector (VV) is a special case of vector clock used for conflict detection in replicated data. The key constraint: a version vector has exactly one entry per replica, and those entries represent the version number at that replica.

// Version Vector structure (simplified)
class VersionVector {
  constructor(replicas) {
    this.versions = new Map();
    for (const replica of replicas) {
      this.versions.set(replica, 0);
    }
  }

  // Increment version at a specific replica
  increment(replica) {
    this.versions.set(replica, this.versions.get(replica) + 1);
  }

  // Merge with another version vector
  merge(other) {
    for (const [replica, version] of other.versions) {
      this.versions.set(
        replica,
        Math.max(this.versions.get(replica) || 0, version),
      );
    }
  }
}

Key Differences

AspectVersion VectorTrue Vector Clock
SizeO(r) where r = number of replicasO(n) where n = number of nodes
Causality trackingPer-replica versions onlyFull causal history
Conflict detectionYesYes
Conflict resolutionApplication-specificApplication-specific
Space efficiencyBetter for fixed replica setsBetter for dynamic node sets
Used inRiak, Cassandra 2.0Dynamiq, COPS, general purpose

Space Efficiency Comparison

Version Vectors are more space-efficient when you have a fixed, known number of replicas:

Scenario: 3-node replicated database with 1 update/second per node

Version Vector size:
  - 3 entries × ~50 bytes = 150 bytes per object
  - Grows only if replica count changes

True Vector Clock size (in a 100-node cluster):
  - 100 entries × ~50 bytes = 5,000 bytes per object
  - Grows linearly with node count

Quantitative Bounds

Vector clocks can grow significantly in large clusters:

// Space calculation for vector clocks
function calculateVectorClockSize(
  nodeCount,
  avgNodeIdLength = 16,
  counterBytes = 8,
) {
  // Each entry: node ID (string) + counter (int64)
  const bytesPerEntry = avgNodeIdLength + counterBytes;
  const totalBytes = nodeCount * bytesPerEntry;

  return {
    perObject: totalBytes,
    perObjectKB: (totalBytes / 1024).toFixed(2),
    perObjectMB: (totalBytes / (1024 * 1024)).toFixed(4),
  };
}

// Examples:
calculateVectorClockSize(10);
// { perObject: 240, perObjectKB: '0.23', perObjectMB: '0.0002' }

calculateVectorClockSize(50);
// { perObject: 1200, perObjectKB: '1.17', perObjectMB: '0.0011' }

calculateVectorClockSize(100);
// { perObject: 2400, perObjectKB: '2.34', perObjectMB: '0.0023' }

calculateVectorClockSize(500);
// { perObject: 12000, perObjectKB: '11.72', perObjectMB: '0.0114' }

With N nodes doing 1 update/second, vector clocks grow to approximately 50KB per object after sustained operation in large clusters.

When to Use Each

Use Version Vectors when:

  • You have a fixed number of replicas (3-7 typical)
  • You need conflict detection for a replicated database
  • Space efficiency is important

Use True Vector Clocks when:

  • Nodes can join/leave dynamically
  • You need full causal history tracking
  • You’re building a general-purpose distributed system

DynamoDB: Timestamp-Based, Not Vector Clocks

Amazon DynamoDB uses a timestamp-based approach for conflict resolution, not true vector clocks. This is a common point of confusion.

DynamoDB’s Approach

DynamoDB uses hybrid logical clocks (HLC) combined with physical timestamps:

// DynamoDB's timestamp-based conflict resolution
// Each item has a scalar attribute used for last-write-wins

{
  "user:123": {
    "name": "Alice",
    "_meta": {
      "updatedAt": 1712345678901,  // Physical timestamp (server time)
      "updatedBy": "dynamo-db",
      "vectorClock": { /* Not used in standard DynamoDB */ }
    }
  }
}

// Conflict resolution: last-write-wins based on updatedAt
// If two writes arrive, the one with larger updatedAt wins

Key Points About DynamoDB

  1. No true vector clocks: DynamoDB does not track causal history across updates
  2. Last-write-wins: Conflicts are resolved by timestamp, not by detecting concurrency
  3. Per-item versioning: Each item has its own version, not a global vector clock
  4. Conditional writes: Applications can use conditional expressions to detect conflicts

Comparison with True Vector Clock Systems

FeatureDynamoDBTrue VC Systems (Riak, Cassandra)
Conflict detectionNo (uses LWW)Yes
Concurrent update detectionNoYes
Multiple valid valuesNoYes
Application-controlled resolutionLimitedFull control
Causality trackingNoYes

When This Matters

DynamoDB’s approach works well when:

  • Last-write-wins is acceptable for your use case
  • You don’t need to detect true concurrent updates
  • You want simplicity over fine-grained conflict handling

True vector clocks matter when:

  • Concurrent updates to the same key are possible
  • You need to know if updates are causally related
  • You want application-level conflict resolution

Debugging: Real Conflict Scenario Walkthrough

Let me walk through a real conflict scenario to show how vector clocks work in practice:

sequenceDiagram
    participant A as Node A (us-east-1)
    participant B as Node B (us-west-2)
    participant C as Node C (eu-west-1)

    Note over A: Initial: VC=[A:1, B:0, C:0]
    Note over B: Initial: VC=[A:0, B:1, C:0]
    Note over C: Initial: VC=[A:0, B:0, C:1]

    A->>A: Local update<br/>VC=[A:2, B:0, C:0]

    Note over A: User edits profile

    B->>B: Local update<br/>VC=[A:0, B:2, C:0]

    Note over B: User edits same profile

    A->>B: Sync: VC=[A:2, B:0, C:0]
    B->>A: Sync: VC=[A:0, B:2, C:0]

    Note over A: After merge:<br/>VC=[A:2, B:2, C:0]
    Note over B: After merge:<br/>VC=[A:2, B:2, C:0]

    Note over A,B: CONFLICT DETECTED!<br/>VC[A]=2, VC[B]=2<br/>Neither is subset of other

Step-by-Step Debug Session

// Debugging vector clock conflicts in a distributed database

// Step 1: Log the vector clocks at each node
const nodeAVClock = { A: 2, B: 2, C: 0 };
const nodeBVClock = { A: 2, B: 2, C: 0 };

// Step 2: Compare them
function compareVCs(vc1, vc2) {
  let lessThan = false;
  let greaterThan = false;

  const allNodes = new Set([...Object.keys(vc1), ...Object.keys(vc2)]);

  for (const node of allNodes) {
    const v1 = vc1[node] || 0;
    const v2 = vc2[node] || 0;

    if (v1 < v2) lessThan = true;
    if (v1 > v2) greaterThan = true;
  }

  if (lessThan && !greaterThan) return "vc1 < vc2";
  if (greaterThan && !lessThan) return "vc1 > vc2";
  return "CONCURRENT";
}

console.log(compareVCs(nodeAVClock, nodeBVClock));
// Output: CONCURRENT

// Step 3: Analyze the conflict
// VC[A] = 2 in both: both nodes have seen A's updates
// VC[B] = 2 in both: both nodes have seen B's updates
// But: Node A's update at A:2 happened independently from Node B's update at B:2
// Neither node saw the other's update before making their own

// Step 4: Resolution strategies
// Strategy 1: Last-write-wins (use physical timestamps)
// Strategy 2: Keep both (application resolves)
// Strategy 3: Merge (CRDT if possible)
// Strategy 4: Flag for manual resolution

Real Debugging Output Example

// Actual debugging output from a Riak cluster
{
  "key": "user:123:profile",
  "siblings": [
    {
      "value": { "name": "Alice", "bio": "Software engineer" },
      "vclock": "vclock:[{A,2},{B,2},{C,1}]",
      "vector_clock": { "A": 2, "B": 2, "C": 1 }
    },
    {
      "value": { "name": "Alice Chen", "bio": "Engineer" },
      "vclock": "vclock:[{A,1},{B,3},{C,0}]",
      "vector_clock": { "A": 1, "B": 3, "C": 0 }
    }
  ],
  "conflict_detected": true,
  "reason": "Neither version dominates the other",
  "vc_comparison": {
    "v1_subsumes_v2": false,
    "v2_subsumes_v1": false,
    "status": "concurrent"
  }
}

Resolution Flow

flowchart TD
    A[Conflict Detected] --> B{Are values<br/>mergeable?}

    B -->|Yes - CRDT| C[Apply CRDT merge<br/>Automatic resolution]
    B -->|No| D{Are values<br/>similar?}

    D -->|Yes| E[Semantic merge<br/>e.g., document merge]
    D -->|No| F[Present both to user<br/>or application]

    C --> G[Resolved - single value]
    E --> H[Resolved - merged value]
    F --> I[Awaiting resolution]

Why Not Just Use Timestamps?

Timestamp-Based Conflict Resolution Fails

Last-write-wins uses wall-clock timestamps for conflict resolution. This fails in predictable ways:

// Timestamp-based resolution fails
async function resolveWithTimestamps(local, remote) {
  // Assumes: larger timestamp = newer
  // Problem: clocks are not synchronized
  // Problem: clock skew can be seconds or minutes
  if (remote.timestamp > local.timestamp) {
    return remote; // But remote might be causally older!
  }
  return local;
}

Vector Clocks Capture Causality

Vector clocks let you distinguish between causally related updates and concurrent updates:

  1. Causally related updates: A caused B, so you keep B
  2. Concurrent updates: neither caused the other, both are valid choices
// With vector clocks
async function resolveWithVectorClocks(local, remote) {
  const vc1 = parseVectorClock(local.vclock);
  const vc2 = parseVectorClock(remote.vclock);
  const cmp = compare(vc1, vc2);

  if (cmp < 0) {
    // VC1 < VC2: local causally precedes remote
    return remote;
  } else if (cmp > 0) {
    // VC1 > VC2: remote causally precedes local
    return local;
  } else {
    // Concurrent: both are valid, keep both for application to resolve
    return { hasConflict: true, local, remote };
  }
}

This is why Cassandra, DynamoDB, Riak, and similar systems use vector clocks for conflict detection.


Dotted Version Vectors

Production systems often use a variant called Dotted Version Vectors (DVVs), which combine vector clocks with operation counts for more efficient conflict detection.

// DVV combines VC with operation dots
class DottedVersionVector {
  constructor() {
    this.vclock = new Map(); // node -> counter
    this.dot = null; // [node, counter] for this specific update
  }

  // When creating a new version:
  dot() {
    const node = this.nodeId;
    this.vclock.set(node, (this.vclock.get(node) || 0) + 1);
    this.dot = [node, this.vclock.get(node)];
  }

  // When syncing with peer:
  sync(peerDvv) {
    // Merge vclocks
    for (const [node, counter] of peerDvv.vclock) {
      this.vclock.set(node, Math.max(this.vclock.get(node) || 0, counter));
    }
  }
}

DVVs are used in systems like AntidoteDB and provide more precise conflict detection than plain vector clocks.


Real-World Usage Patterns

Facebook’s Cassandra

Facebook’s original Cassandra used vector clocks for column-level timestamps. Each column had a version vector, allowing very fine-grained conflict detection.

CRDTs with Vector Clocks

Conflict-Free Replicated Data Types (CRDTs) work well with vector clocks. When two CRDT operations are concurrent, you apply both and the CRDT merge handles it automatically.

// GCounter with vector clock
class VClockGCounter {
  constructor(nodeId) {
    this.nodeId = nodeId;
    this.vclock = new VectorClock(nodeId);
    this.counters = new Map(); // node -> count
  }

  increment() {
    this.vclock = this.vclock.tick();
    this.counters.set(this.nodeId, (this.counters.get(this.nodeId) || 0) + 1);
  }

  merge(other) {
    // Merge vector clocks
    this.vclock.merge(other.vclock);

    // Merge counters (CRDT merge: max for G-Counter)
    for (const [node, count] of other.counters) {
      this.counters.set(node, Math.max(this.counters.get(node) || 0, count));
    }
  }

  value() {
    return Array.from(this.counters.values()).reduce((a, b) => a + b, 0);
  }
}

Limitations

Memory and Bandwidth

Vector clocks grow with node count. In large clusters, this becomes significant. Most production systems prune or limit vclock size.

Complexity

Vector clocks add complexity to application code. For simple use cases, eventual consistency with last-write-wins is simpler.

Pruning Trade-offs

Pruning vector clocks to save space can cause you to lose the ability to detect conflicts. You might incorrectly assume one update causally precedes another.


Conclusion

Vector clocks extend Lamport timestamps to capture full causal history. The key advance: you can determine whether two events are causally related or concurrent.

Dynamo-style systems use vector clocks for conflict detection. When concurrent updates are detected, the application or a resolution strategy decides which to keep.

Key takeaways:

  1. Vector clocks track per-node counters, not a single value
  2. If VC1 is a subset of VC2, VC1 causally precedes VC2
  3. If neither VC is a subset of the other, events are concurrent
  4. Concurrent events require conflict resolution
  5. Vector clocks enable automatic conflict detection in distributed databases

For related topics, see Logical Clocks, Consistency Models, and Geo-Distribution.


When to Use / When Not to Use Vector Clocks

ScenarioRecommendation
Distributed database with multi-version concurrencyUse vector clocks
Offline-first applications with syncUse vector clocks
Collaborative editingUse vector clocks
Single-region database with strong consistencyMight not need
Simple key-value store with LWWMight not need
Systems with thousands of nodesConsider DVVs or pruning

When TO Use Vector Clocks

  • Building a Dynamo-style eventually-consistent database
  • Implementing offline-first sync
  • Collaborative editing with conflict detection
  • Any system where concurrent updates to the same data are possible

When NOT to Use Vector Clocks

  • Strongly consistent systems using consensus (Raft, Paxos)
  • Single-node databases with no replication
  • Simple caches where last-write-wins is acceptable
  • Very large node counts where storage overhead matters

Production Considerations

Monitoring Conflict Rates

Track how often conflicts occur:

// Count conflicts
metrics.counter("dvv_conflicts_total").inc();
metrics.gauge("dvv_conflict_rate", (conflictsDetected / totalGets) * 100);

Pruning Strategies

// Time-based pruning
function pruneByTime(vclock, maxAge = 86400000) {
  const cutoff = Date.now() - maxAge;
  const pruned = new Map();

  for (const [node, counter] of vclock.clocks) {
    if (lastUpdate(node) > cutoff) {
      pruned.set(node, counter);
    }
  }

  return pruned;
}

// Size-based pruning
function pruneBySize(vclock, maxEntries = 50) {
  if (vclock.clocks.size <= maxEntries) return vclock.clocks;

  // Keep most recent entries
  const sorted = Array.from(vclock.clocks.entries()).sort(
    (a, b) => b[1] - a[1],
  );

  return new Map(sorted.slice(0, maxEntries));
}

Debugging Tips

When debugging vector clock issues, log both the raw vclock and a human-readable causal chain:

function describeCausalHistory(vclock) {
  const history = [];
  for (const [node, counter] of vclock.clocks) {
    history.push(`${node}:${counter}`);
  }
  return history.sort().join(" -> ");
}

Quick Recap

  • Vector clocks track per-node counters for full causal history
  • If VC1 is a subset of VC2, VC1 causally precedes VC2
  • Concurrent updates have VCs where neither is a subset of the other
  • Use for conflict detection in eventually-consistent databases
  • Pruning strategies manage storage overhead in large clusters

For more on distributed systems, see CAP Theorem, Consistency Models, and PACELC Theorem.

Category

Related Posts

Clock Skew in Distributed Systems: Problems and Solutions

Explore how clock skew affects distributed systems, causes silent data corruption, breaks conflict resolution, and what you can do to mitigate these issues.

#distributed-systems #distributed-computing #clock-skew

Logical Clocks: Lamport Timestamps and Event Ordering

Understand Lamport timestamps and logical clocks for ordering distributed events without synchronized physical clocks. Learn how to determine what happened before what.

#distributed-systems #distributed-computing #logical-clocks

Physical Clocks in Distributed Systems: NTP and Synchronization

Learn how physical clocks work in distributed systems, including NTP synchronization, clock sources, and the limitations of wall-clock time for ordering events.

#distributed-systems #distributed-computing #clock-synchronization