View-Stamped Replication

View-Stamped Replication (VSR) is a distributed consensus protocol that uses views and timestamps to achieve agreement in asynchronous systems.

published: reading time: 10 min read updated: March 24, 2026

Introduction

View-Stamped Replication (VSR) was developed by Barbara Liskov and colleagues at MIT in the late 1980s, predating Raft by nearly two decades. The algorithm achieved consensus using a view-based approach where nodes progress through numbered views, with one node serving as the primary in each view.

The name comes from the two key pieces of state that nodes maintain: the view number (indicating their current understanding of who is primary) and the status (whether they are normal, recovering, or have been replaced).

VSR is less well-known than Paxos or Raft, but it influenced both. Understanding VSR gives you another perspective on how consensus can work.

Views and Roles

The core abstraction in VSR is the view. A view consists of a view number and an ordered list of replicas. The first replica in the list is the primary, the second is the backup, and so on. Nodes advance through views as failures happen.

stateDiagram-v2
    [*] --> Normal
    Normal --> Normal: Primary receives client requests
    Normal --> Normal: Backup processes prepare messages

    Normal --> ViewChange: Timeout without communication
    ViewChange --> Normal: New view established

    state ViewChange {
        [*] --> StartViewChange
        StartViewChange --> DoViewChange: Received enough VIEW-CHANGE messages
        DoViewChange --> [*]: Sent STARTVIEW to new primary
    }

When a backup detects that the primary is unresponsive (via timeout), it initiates a view change. It increments its local view number and sends VIEW-CHANGE messages to all replicas. Once a node receives VIEW-CHANGE messages from a majority of replicas, it sends a STARTVIEW message to the new primary.

The Normal Phase

In the normal (non-view-change) phase, the primary processes client requests sequentially. When the primary receives a request, it assigns it a monotonically increasing transaction number and forwards an PREPARE message to all backups.

Each backup receives the PREPARE message, persists the operation to its log, and sends an acknowledgment back to the primary. When the primary receives acknowledgments from a majority, it applies the operation to its state machine and responds to the client.

This is similar to how Raft handles log replication, but with a different mechanism for view changes and recovery.

sequenceDiagram
    participant C as Client
    participant P as Primary
    participant B1 as Backup 1
    participant B2 as Backup 2

    C->>P: Request: op(x)
    P->>P: Assign transaction number n=42
    P->>B1: PREPARE(view=5, n=42, op(x))
    P->>B2: PREPARE(view=5, n=42, op(x))
    B1->>P: ACK(view=5, n=42)
    B2->>P: ACK(view=5, n=42)
    P->>P: Apply op(x) to state machine
    P-->>C: Response: OK
    Note over P: Committed: transaction 42

Key points about the normal phase:

  1. The primary assigns a monotonically increasing transaction number (not log index like Raft)
  2. Operations are forwarded to all backups via PREPARE messages
  3. A majority of acknowledgments (including the primary) commits the transaction
  4. The primary applies the operation after quorum is reached, not before

Handling Failures and Recovery

VSR handles node failures carefully. When a primary fails, some backup becomes the new primary through the view change protocol. When a failed node recovers, it must rejoin the group and catch up on missed operations.

Recovery involves the recovering node sending a RECOVERY message to the current primary. The primary responds with a STATE message containing the current view number and the operations the recovering node missed. This allows the recovering node to replay those operations and rejoin as a full member.

Relationship to Paxos and Raft

VSR shares conceptual territory with both Paxos and Raft but takes a different path. Like Raft, it has a clear notion of leadership within a view. Like Paxos, it uses quorum-based replication and can tolerate asynchronous communication (within limits).

The view change mechanism is more elaborate than Raft’s leader election. VSR separates the concepts of “who is primary” from “what is committed,” whereas Raft embeds these in its leader and log replication.

View Change Sequence Diagram

When the leader fails, remaining nodes detect the timeout and perform a view change:

sequenceDiagram
    participant N1 as Node 1 (ViewServer)
    participant N2 as Node 2
    participant N3 as Node 3
    participant O as Old Leader (failed)

    Note over N1,N3: Leader (N1) crashes
    N2->>N2: Election timeout fires
    N2->>N1: VIEW-CHANGE(v=2, lastNormal=N1)
    N2->>N3: VIEW-CHANGE(v=2, lastNormal=N1)
    N3->>N2: VIEW-CHANGE(v=2, lastNormal=N1)
    Note over N1,N3: N2 and N3 exchange view-change messages
    N3->>N2: I have all view-change messages
    Note over N2: N2 has quorum (N2+N3)
    N2->>N1: STARTVIEW(v=2, committed=...)
    N2->>N3: STARTVIEW(v=2, committed=...)
    Note over N2,N3: N2 becomes new leader in view 2

Practical Applications

VSR influenced the design of several systems, though it is less commonly used directly. The CORFU protocol and some versions of ZooKeeper’s Zab protocol draw from VSR’s approach. Understanding VSR helps you recognize these patterns when reading code or debugging issues.

VSR Influence on Modern Systems

VSR inspired several production systems. CORFU used it as a foundation for distributed shared log storage. The Zab protocol in Apache ZooKeeper borrows heavily from VSR’s view-change approach. It sits at the foundation of coordination systems running millions of deployments.

Once you know VSR, you start seeing its patterns everywhere. ZooKeeper’s leader election behavior makes more sense. CockroachDB’s replication layer becomes easier to reason about. The view-change pattern is not just academic — it is in production at scale.

When to Use VSR

VSR is a solid choice for building replicated state machines, but it is less commonly implemented than Raft. If you need a well-tested consensus algorithm with a mature ecosystem, Raft-based implementations (like etcd) are more readily available.

However, if you’re building a system that requires the specific properties of view-stamped replication, or if you’re working with existing VSR-based code, understanding the algorithm helps.

Membership Changes and Reconfiguration

VSR handles cluster membership changes through the view change mechanism. When nodes need to be added or removed, the system progresses through a new view that includes the updated membership configuration.

The reconfiguration procedure follows these steps:

sequenceDiagram
    participant O as Old Configuration
    participant L as Leader
    participant N as New Node
    participant F as Followers

    Note over O,L: Client requests configuration change
    L->>L: Prepare reconfiguration view v+1
    L->>N: STARTVIEW(v+1, new_config)
    L->>F: STARTVIEW(v+1, new_config)
    Note over N: New node catches up via STATE messages
    N-->>L: ACK(v+1)
    F-->>L: ACK(v+1)
    Note over L: Quorum reached in view v+1
    Note over L,N,F: New configuration active

Key points about VSR reconfiguration:

  1. View-based approach — Membership changes are embedded in view changes, giving a natural mechanism for transitioning between configurations
  2. Catch-up via STATE messages — New nodes receive the current state from the primary before participating in consensus
  3. No joint consensus required — Unlike Raft’s two-phase joint consensus, VSR transitions in a single view change once the new node catches up
  4. Transition safety — Quorum requirements prevent two conflicting configurations from operating simultaneously

The primary sends STATE messages containing the view number, committed transaction number, and the recent log entries the new node needs to replay. Once the new node acknowledges receipt and catches up, it becomes a full voting member.

Recovery and Checkpointing Procedure

VSR nodes can fail and recover, requiring them to rejoin the cluster with the correct state. The recovery procedure uses the RECOVERY and STATE message exchange.

sequenceDiagram
    participant R as Recovering Node
    participant P as Primary
    participant B as Backup

    R->>P: RECOVERY(v=5, n_recovered)
    Note over P: Current view = 5, committed_n = 100
    P->>R: STATE(view=5, committed=100, operations=[...])
    Note over R: Replay operations 1-100
    R->>R: State machine up-to-date
    Note over R: Rejoin as full member

The recovery procedure:

  1. RECOVERY message — The recovering node sends its last known view number and last recovered transaction number to the current primary
  2. STATE response — The primary responds with the current view, the highest committed transaction number, and any uncommitted operations the recovering node missed
  3. State reconstruction — The recovering node replays the missing operations to its state machine, bringing it back up to date
  4. Return to service — Once caught up, the node resumes normal operation and can participate in quorum decisions

Checkpointing in VSR

Checkpointing (snapshotting) in VSR serves the same purpose as in Raft: bounding log growth for long-running clusters. The checkpoint contains:

class Checkpoint:
    def __init__(self, view_number, last_transaction, state):
        self.view_number = view_number
        self.last_transaction = last_transaction  # Last committed transaction
        self.state = state                        # State machine state snapshot
        self.checksum = compute_hash(state)       # Integrity verification

Checkpoint creation:

  • The primary periodically creates checkpoints of the application state machine
  • Checkpoints are triggered based on transaction count thresholds (e.g., every 10,000 transactions)
  • Log entries up to the checkpoint’s last_transaction can be discarded

Checkpoint recovery:

  • A recovering node receives the latest checkpoint via STATE message
  • The checkpoint includes enough information to replay from that point
  • Nodes verify checkpoint integrity via the embedded checksum

This approach means:

  • Memory usage stays bounded even with high transaction rates
  • Recovery time stays bounded by the checkpoint interval, not total transaction history
  • Network transfer during recovery stays proportional to checkpoint size, not full history

Comparison to Other Approaches

For more context on consensus in distributed systems, see my posts on CAP Theorem and Consistency Models. These explore the broader landscape of trade-offs in distributed systems.

The Two-Phase Commit post discusses a different coordination approach that is simpler but less fault-tolerant.

VSR vs Raft vs Paxos

AspectVSRRaftPaxos
Leader ConceptPrimary within viewSingle strong leaderOptional distinguished proposer
ReconfigurationView changes with majorityJoint consensusComplex, requires extra care
Recovery ComplexityRECOVERY/STATE messagesLog truncation + snapshotDepends on implementation
CommitmentMajority of acknowledgmentsMajority of AppendEntriesMajority of Accept messages
Log StructureTransaction numbersLog indices with termsProposal numbers
Node StateView number + statusCurrent term + votedForPromised proposal numbers
AspectVSRRaftPaxos
UnderstandabilityModerateHigh (designed for clarity)Low (proof-heavy)
PerformanceComparable to RaftOptimized for throughputLower due to 2 phases
InfluenceCORFU, Zabetcd, CockroachDB, TiKVChubby, Spanner
Formal ProofYesYesYes (classical)
Year Published198820141998 (first paper)

Conclusion

View-Stamped Replication is a consensus algorithm worth knowing about. It predates Raft, influenced both Paxos and Raft, and provides a coherent approach to achieving consensus through views and quorum-based replication.

If you’re interested in the history and diversity of consensus algorithms, VSR is worth studying. It shows that there are many valid approaches to the same fundamental problem.

Category

Related Posts

Paxos Consensus Algorithm

Paxos is a family of consensus algorithms for achieving agreement in distributed systems despite failures, pioneered by Leslie Lamport.

#distributed-systems #consensus #paxos

The FLP Impossibility Result

The FLP impossibility theorem proves that no consensus algorithm can guarantee termination in an asynchronous system with even one faulty process. Understanding FLP is essential for distributed systems designers.

#distributed-systems #consensus #flp

Leader Election in Distributed Systems

Leader election is the process of designating a single node as the coordinator among a set of distributed nodes, critical for consensus protocols.

#distributed-systems #leader-election #consensus