View-Stamped Replication
View-Stamped Replication (VSR) is a distributed consensus protocol that uses views and timestamps to achieve agreement in asynchronous systems.
Introduction
View-Stamped Replication (VSR) was developed by Barbara Liskov and colleagues at MIT in the late 1980s, predating Raft by nearly two decades. The algorithm achieved consensus using a view-based approach where nodes progress through numbered views, with one node serving as the primary in each view.
The name comes from the two key pieces of state that nodes maintain: the view number (indicating their current understanding of who is primary) and the status (whether they are normal, recovering, or have been replaced).
VSR is less well-known than Paxos or Raft, but it influenced both. Understanding VSR gives you another perspective on how consensus can work.
Views and Roles
The core abstraction in VSR is the view. A view consists of a view number and an ordered list of replicas. The first replica in the list is the primary, the second is the backup, and so on. Nodes advance through views as failures happen.
stateDiagram-v2
[*] --> Normal
Normal --> Normal: Primary receives client requests
Normal --> Normal: Backup processes prepare messages
Normal --> ViewChange: Timeout without communication
ViewChange --> Normal: New view established
state ViewChange {
[*] --> StartViewChange
StartViewChange --> DoViewChange: Received enough VIEW-CHANGE messages
DoViewChange --> [*]: Sent STARTVIEW to new primary
}
When a backup detects that the primary is unresponsive (via timeout), it initiates a view change. It increments its local view number and sends VIEW-CHANGE messages to all replicas. Once a node receives VIEW-CHANGE messages from a majority of replicas, it sends a STARTVIEW message to the new primary.
The Normal Phase
In the normal (non-view-change) phase, the primary processes client requests sequentially. When the primary receives a request, it assigns it a monotonically increasing transaction number and forwards an PREPARE message to all backups.
Each backup receives the PREPARE message, persists the operation to its log, and sends an acknowledgment back to the primary. When the primary receives acknowledgments from a majority, it applies the operation to its state machine and responds to the client.
This is similar to how Raft handles log replication, but with a different mechanism for view changes and recovery.
sequenceDiagram
participant C as Client
participant P as Primary
participant B1 as Backup 1
participant B2 as Backup 2
C->>P: Request: op(x)
P->>P: Assign transaction number n=42
P->>B1: PREPARE(view=5, n=42, op(x))
P->>B2: PREPARE(view=5, n=42, op(x))
B1->>P: ACK(view=5, n=42)
B2->>P: ACK(view=5, n=42)
P->>P: Apply op(x) to state machine
P-->>C: Response: OK
Note over P: Committed: transaction 42
Key points about the normal phase:
- The primary assigns a monotonically increasing transaction number (not log index like Raft)
- Operations are forwarded to all backups via PREPARE messages
- A majority of acknowledgments (including the primary) commits the transaction
- The primary applies the operation after quorum is reached, not before
Handling Failures and Recovery
VSR handles node failures carefully. When a primary fails, some backup becomes the new primary through the view change protocol. When a failed node recovers, it must rejoin the group and catch up on missed operations.
Recovery involves the recovering node sending a RECOVERY message to the current primary. The primary responds with a STATE message containing the current view number and the operations the recovering node missed. This allows the recovering node to replay those operations and rejoin as a full member.
Relationship to Paxos and Raft
VSR shares conceptual territory with both Paxos and Raft but takes a different path. Like Raft, it has a clear notion of leadership within a view. Like Paxos, it uses quorum-based replication and can tolerate asynchronous communication (within limits).
The view change mechanism is more elaborate than Raft’s leader election. VSR separates the concepts of “who is primary” from “what is committed,” whereas Raft embeds these in its leader and log replication.
View Change Sequence Diagram
When the leader fails, remaining nodes detect the timeout and perform a view change:
sequenceDiagram
participant N1 as Node 1 (ViewServer)
participant N2 as Node 2
participant N3 as Node 3
participant O as Old Leader (failed)
Note over N1,N3: Leader (N1) crashes
N2->>N2: Election timeout fires
N2->>N1: VIEW-CHANGE(v=2, lastNormal=N1)
N2->>N3: VIEW-CHANGE(v=2, lastNormal=N1)
N3->>N2: VIEW-CHANGE(v=2, lastNormal=N1)
Note over N1,N3: N2 and N3 exchange view-change messages
N3->>N2: I have all view-change messages
Note over N2: N2 has quorum (N2+N3)
N2->>N1: STARTVIEW(v=2, committed=...)
N2->>N3: STARTVIEW(v=2, committed=...)
Note over N2,N3: N2 becomes new leader in view 2
Practical Applications
VSR influenced the design of several systems, though it is less commonly used directly. The CORFU protocol and some versions of ZooKeeper’s Zab protocol draw from VSR’s approach. Understanding VSR helps you recognize these patterns when reading code or debugging issues.
VSR Influence on Modern Systems
VSR inspired several production systems. CORFU used it as a foundation for distributed shared log storage. The Zab protocol in Apache ZooKeeper borrows heavily from VSR’s view-change approach. It sits at the foundation of coordination systems running millions of deployments.
Once you know VSR, you start seeing its patterns everywhere. ZooKeeper’s leader election behavior makes more sense. CockroachDB’s replication layer becomes easier to reason about. The view-change pattern is not just academic — it is in production at scale.
When to Use VSR
VSR is a solid choice for building replicated state machines, but it is less commonly implemented than Raft. If you need a well-tested consensus algorithm with a mature ecosystem, Raft-based implementations (like etcd) are more readily available.
However, if you’re building a system that requires the specific properties of view-stamped replication, or if you’re working with existing VSR-based code, understanding the algorithm helps.
Membership Changes and Reconfiguration
VSR handles cluster membership changes through the view change mechanism. When nodes need to be added or removed, the system progresses through a new view that includes the updated membership configuration.
The reconfiguration procedure follows these steps:
sequenceDiagram
participant O as Old Configuration
participant L as Leader
participant N as New Node
participant F as Followers
Note over O,L: Client requests configuration change
L->>L: Prepare reconfiguration view v+1
L->>N: STARTVIEW(v+1, new_config)
L->>F: STARTVIEW(v+1, new_config)
Note over N: New node catches up via STATE messages
N-->>L: ACK(v+1)
F-->>L: ACK(v+1)
Note over L: Quorum reached in view v+1
Note over L,N,F: New configuration active
Key points about VSR reconfiguration:
- View-based approach — Membership changes are embedded in view changes, giving a natural mechanism for transitioning between configurations
- Catch-up via STATE messages — New nodes receive the current state from the primary before participating in consensus
- No joint consensus required — Unlike Raft’s two-phase joint consensus, VSR transitions in a single view change once the new node catches up
- Transition safety — Quorum requirements prevent two conflicting configurations from operating simultaneously
The primary sends STATE messages containing the view number, committed transaction number, and the recent log entries the new node needs to replay. Once the new node acknowledges receipt and catches up, it becomes a full voting member.
Recovery and Checkpointing Procedure
VSR nodes can fail and recover, requiring them to rejoin the cluster with the correct state. The recovery procedure uses the RECOVERY and STATE message exchange.
sequenceDiagram
participant R as Recovering Node
participant P as Primary
participant B as Backup
R->>P: RECOVERY(v=5, n_recovered)
Note over P: Current view = 5, committed_n = 100
P->>R: STATE(view=5, committed=100, operations=[...])
Note over R: Replay operations 1-100
R->>R: State machine up-to-date
Note over R: Rejoin as full member
The recovery procedure:
- RECOVERY message — The recovering node sends its last known view number and last recovered transaction number to the current primary
- STATE response — The primary responds with the current view, the highest committed transaction number, and any uncommitted operations the recovering node missed
- State reconstruction — The recovering node replays the missing operations to its state machine, bringing it back up to date
- Return to service — Once caught up, the node resumes normal operation and can participate in quorum decisions
Checkpointing in VSR
Checkpointing (snapshotting) in VSR serves the same purpose as in Raft: bounding log growth for long-running clusters. The checkpoint contains:
class Checkpoint:
def __init__(self, view_number, last_transaction, state):
self.view_number = view_number
self.last_transaction = last_transaction # Last committed transaction
self.state = state # State machine state snapshot
self.checksum = compute_hash(state) # Integrity verification
Checkpoint creation:
- The primary periodically creates checkpoints of the application state machine
- Checkpoints are triggered based on transaction count thresholds (e.g., every 10,000 transactions)
- Log entries up to the checkpoint’s last_transaction can be discarded
Checkpoint recovery:
- A recovering node receives the latest checkpoint via STATE message
- The checkpoint includes enough information to replay from that point
- Nodes verify checkpoint integrity via the embedded checksum
This approach means:
- Memory usage stays bounded even with high transaction rates
- Recovery time stays bounded by the checkpoint interval, not total transaction history
- Network transfer during recovery stays proportional to checkpoint size, not full history
Comparison to Other Approaches
For more context on consensus in distributed systems, see my posts on CAP Theorem and Consistency Models. These explore the broader landscape of trade-offs in distributed systems.
The Two-Phase Commit post discusses a different coordination approach that is simpler but less fault-tolerant.
VSR vs Raft vs Paxos
| Aspect | VSR | Raft | Paxos |
|---|---|---|---|
| Leader Concept | Primary within view | Single strong leader | Optional distinguished proposer |
| Reconfiguration | View changes with majority | Joint consensus | Complex, requires extra care |
| Recovery Complexity | RECOVERY/STATE messages | Log truncation + snapshot | Depends on implementation |
| Commitment | Majority of acknowledgments | Majority of AppendEntries | Majority of Accept messages |
| Log Structure | Transaction numbers | Log indices with terms | Proposal numbers |
| Node State | View number + status | Current term + votedFor | Promised proposal numbers |
| Aspect | VSR | Raft | Paxos |
|---|---|---|---|
| Understandability | Moderate | High (designed for clarity) | Low (proof-heavy) |
| Performance | Comparable to Raft | Optimized for throughput | Lower due to 2 phases |
| Influence | CORFU, Zab | etcd, CockroachDB, TiKV | Chubby, Spanner |
| Formal Proof | Yes | Yes | Yes (classical) |
| Year Published | 1988 | 2014 | 1998 (first paper) |
Conclusion
View-Stamped Replication is a consensus algorithm worth knowing about. It predates Raft, influenced both Paxos and Raft, and provides a coherent approach to achieving consensus through views and quorum-based replication.
If you’re interested in the history and diversity of consensus algorithms, VSR is worth studying. It shows that there are many valid approaches to the same fundamental problem.
Category
Related Posts
Paxos Consensus Algorithm
Paxos is a family of consensus algorithms for achieving agreement in distributed systems despite failures, pioneered by Leslie Lamport.
The FLP Impossibility Result
The FLP impossibility theorem proves that no consensus algorithm can guarantee termination in an asynchronous system with even one faulty process. Understanding FLP is essential for distributed systems designers.
Leader Election in Distributed Systems
Leader election is the process of designating a single node as the coordinator among a set of distributed nodes, critical for consensus protocols.