The FLP Impossibility Result

The FLP impossibility theorem proves that no consensus algorithm can guarantee termination in an asynchronous system with even one faulty process.

published: March 24, 2026 reading time: 25 min read author: GeekWorkBench updated: March 24, 2026

Quick Summary

The FLP impossibility theorem proves that in a fully asynchronous system where you can't tell if a process crashed or is just slow, no consensus algorithm can guarantee termination—and this isn't an engineering limitation but a mathematical fact. This post explains the intuition behind the proof: an adversarial scheduler can always delay the right messages to keep the system in a 'bivalent' state where any decision might be wrong. The practical takeaway is that real consensus algorithms like Raft and Paxos guarantee safety (you'll never have two nodes decide different values) but sacrifice liveness during partitions, Spanner uses GPS and atomic clocks to add timing bounds and sidestep the problem, and partial synchrony models are how most production systems work around FLP.

Introduction

In 1985, Fischer, Lynch, and Paterson published a short paper that proved a striking impossibility: no consensus algorithm can guarantee termination in a fully asynchronous system if even a single process can fail. The result is known as FLP after the authors’ initials.

This is not a practical limitation to be engineered around. It is a mathematical proof that consensus and guaranteed termination are fundamentally incompatible in asynchronous systems.

The Setup

The FLP result assumes an asynchronous system with the following properties:

Processes communicate by sending messages
Messages can be delayed arbitrarily but eventually delivered
Processes can fail by stopping (crash-stop, not Byzantine)
No clocks or timeouts are available to detect failures

This is a realistic model of many systems. Networks have variable latency; you cannot reliably distinguish a crashed process from a slow one without real-time clocks.

What Consensus Requires

Consensus algorithms typically require three properties:

Agreement: All non-faulty processes decide on the same value
Validity: The decided value must have been proposed by some process
Termination: All non-faulty processes eventually decide

FLP proves that in an asynchronous system with at least one faulty process, no algorithm can guarantee all three simultaneously.

graph TB
    subgraph "FLP Model"
        P1[Process 1]
        P2[Process 2]
        P3[Process 3]
        M1[Message Queue]
        M2[Message Queue]
        M3[Message Queue]

        P1 --> M2
        P2 --> M1
        P2 --> M3
        P3 --> M1
        P3 --> M2
    end

    Note(("No timeouts<br/>Messages delayed<br/>but eventually<br/>delivered"))

The Core Insight

The proof constructs an adversarial message scheduler. Given any protocol that appears to be working, the scheduler can delay critical messages to keep the system in a state where processes disagree but cannot gather enough information to decide.

The key is that in an asynchronous system, you cannot know whether a process has crashed or is just slow. The scheduler can exploit this uncertainty indefinitely by ensuring that whichever choice a process makes, there is always a plausible scenario where the other choice was correct.

The Bivalent Undecidable State

The proof hinges on the concept of bivalent states. A system state is bivalent if the final decision depends on future events that have not yet occurred. The proof shows that from any initial bivalent state, the adversary can always keep the system bivalent by carefully ordering message deliveries.

This means the system can be driven to a point where processes have conflicting information, yet cannot make progress because any decision might be wrong.

Concrete 2-Process Example

Consider two processes, P1 and P2, trying to agree on a binary value (0 or 1):

graph LR
    subgraph Initial
        A1[Initial State<br/>P1: initial<br/>P2: initial]
    end

    subgraph Bivalent
        B1[State A<br/>P1: decided 0<br/>P2: undecided]
        B2[State B<br/>P1: undecided<br/>P2: decided 1]
        B3[Truly Bivalent<br/>P1: if recv 0 -> 0<br/>P2: if recv 1 -> 1]
    end

    A1 -->|P1 receives 0| B1
    A1 -->|P1 receives 1| B2
    B1 -.->|scheduler delays<br/>critical msg| B3
    B2 -.->|scheduler delays<br/>critical msg| B3

The scheduler keeps the system in state B3 (bivalent) by:

Delaying the message from P1 to P2 that would confirm value 0
Delaying the message from P2 to P1 that would confirm value 1
Neither process has enough information to commit to a decision

This adversarial scheduling can continue indefinitely because from either “almost-decided” state, there exists a plausible scenario where the other value would have been correct.

Adversarial Scheduler in Action

The FLP proof constructs an explicit adversarial scheduler that keeps the system undecided. Here is the gist with two nodes:

sequenceDiagram
    participant A as Node A
    participant S as Adversary<br/>Scheduler
    participant B as Node B

    rect rgb(50, 50, 80)
        Note over A,B: Round 1: A proposes
        A->>S: Send "propose 0"
        Note over S: Intercept message
        S-xB: Delay "propose 0"
        Note over A: A waits for ack<br/>from majority (only B)<br/>Incomplete info
    end

    rect rgb(80, 50, 50)
        Note over A,B: Round 2: B proposes
        B->>S: Send "propose 1"
        Note over S: Intercept message
        S-xA: Delay "propose 1"
        Note over B: B waits for ack<br/>from majority (only A)<br/>Incomplete info
    end

    rect rgb(50, 80, 50)
        Note over A,B: Round 3: Scheduler acts again
        S-xA: Deliver B's "propose 1"<br/>to A only
        S-xB: Deliver A's "propose 0"<br/>to B only
        Note over A: A now sees<br/>conflicting info
        Note over B: B now sees<br/>conflicting info
    end

    Note over A,B: Neither node has majority<br/>System stays bivalent<br/>Scheduler repeats indefinitely

The scheduler alternates which messages it delays. Each node sees partial information but never enough to commit. The system can remain in this state forever, proving that no algorithm can guarantee termination.

Formal Problem Reduction

FLP proves impossibility through reduction. If you could solve consensus in an asynchronous system with even one possible failure, you could solve the simpler problem of Byzantine agreement, which is provably impossible.

The reduction works like this. Given a system that solves consensus despite failures, you can construct a scheduler (an adversary) that forces the system into a bivalent state, where the final outcome depends on timing, by delaying just the right messages at just the right moments. Since no algorithm can avoid this, consensus is impossible in general asynchronous systems.

This is not a construction flaw. It is a fundamental result. Any algorithm that makes progress in all failure scenarios can be forced into indecision by an adversarial scheduler that carefully delays messages to keep the system in a bivalent state.

The proof’s central point: you cannot distinguish a slow node from a crashed one in an asynchronous network. This ambiguity is what makes consensus undecidable without additional assumptions, like timing bounds or synchrony.

What This Means Practically

FLP does not mean consensus is impossible. It means consensus algorithms must make a trade-off:

They can guarantee safety (agreement and validity) but not liveness (termination), or
They can guarantee liveness under certain conditions (like synchronous networks), or
They can use randomness to guarantee termination with high probability

Common Pitfalls / Anti-Patterns

Real systems use various strategies to work around FLP:

Synchrony assumptions: If you assume bounds on message delivery, you can use timeouts to detect failures and guarantee termination. The CAP theorem captures this trade-off: during partitions, you must choose between consistency (giving up availability) or availability (giving up consistency during partition recovery).

Probabilistic termination: Some algorithms, like Ben-Or’s randomized consensus, guarantee termination with probability 1. They may run for an unbounded time in the worst case, but the probability of that happening is zero.

Lease-based approaches: As discussed in my Leader Election post, lease-based approaches assume bounded clock skew and network delays. They provide eventual detection of failures but cannot guarantee instant detection.

Real-world Failure Scenarios

Understanding FLP is one thing; seeing it manifest in production is another. These are real scenarios where FLP’s impossibility rears its head:

Scenario 1: Network Partition with Split-Brain

Two data centers lose connectivity for 30 seconds. Both sides believe the other has failed. Without timing bounds to distinguish slow from dead, each side runs its own leader election independently.

The adversarial network here is not a malicious actor but the network itself acting like the FLP adversary. When connectivity is severed, message delivery stalls indefinitely from each side’s perspective. The partition acts as a message delay engine, allowing both processes to believe the other has stopped. Both sides attempt to elect a leader simultaneously, creating two nodes that each believe they hold the authoritative lease.

This is the FLP impossibility playing out in practice. Paxos and Raft guarantee safety by refusing to commit conflicting data, but they cannot guarantee liveness during the partition. Writes are not lost, but they block. The partition is the message scheduler keeping the system in a bivalent state. Once the link heals and messages flow again, the system collapses to a single leader and progress resumes. The key insight: FLP does not prevent consensus from eventually happening, it only prevents guaranteed termination during asynchronous conditions.

The recovery process itself reveals another FLP subtlety. When the partition heals, both leaders may have accepted different sets of writes during the split. Reconciling these divergent states requires additional rounds of consensus, and those rounds again depend on message delivery. If the network becomes lossy again during recovery, the system could remain in a conflicted state longer. Real systems handle this through log compaction, snapshot reconciliation, or simple last-writer-wins strategies, but all of these are engineering workarounds built on top of the safety guarantees that FLP proves are the best you can do.

Scenario 2: Leader Crashes During Message Delay

A Raft leader sends a RequestVote message that gets delayed by 40 seconds due to network jitter. In a fully asynchronous model, there is no way to know whether that message will arrive in 40 milliseconds or 40 seconds. From the perspective of the followers, the leader has either crashed or is extraordinarily slow. Both interpretations are locally consistent with the available information.

Raft’s term checking is the specific mechanism that handles this. Every message carries a term number. If a follower receives a message from a leader with a stale term, it rejects it. The delayed RequestVote arrives with the old leader’s term. Followers have since incremented their own terms during the election that installed the new leader. The stale message is discarded. This is not a lucky coincidence. It is a deliberate choice that ensures safety even when messages are arbitrarily delayed.

Notice what FLP tells us about this situation. The delay of that single message kept the system in a bivalent state longer than it would have been otherwise. If the message had arrived before the election timeout, the old leader would have had a chance to suppress the new election. Since it was delayed, the new leader had a window to be elected and start committing entries. The old leader’s lease is now technically expired but the old leader does not know that yet. During this window, the old leader could have written data that conflicts with what the new leader committed.

Raft closes this window by requiring leaders to heartbeat their lease continuously. If the heartbeat is late, followers start an election. The old leader will then receive a RequestVote from a follower with a higher term and step down before causing harm. The 40-second delay in our example is extreme but not impossible in cloud environments where network jitter can be severe. The FLP insight is that no algorithm can guarantee this window is always zero. You can shrink it with tighter timing bounds, but without synchrony, you cannot eliminate it.

Scenario 3: Clock Skew Exploiting Asynchrony

In a system without TrueTime-style bounded clocks, an adversarial network can cause a node to believe its lease is still valid when the new leader has already taken over. This is not a bug in the consensus algorithm. It is the FLP impossibility showing up through clock skew.

The core problem is that leases are issued with timestamps. A leader obtains a lease that is valid for a bounded time. The leader assumes it can act authoritatively during that window. But in an asynchronous system, clocks drift. The leader’s lease was granted based on its view of time, which may differ from the other nodes’ view. If the new leader was elected based on a different clock reading, the old leader may still believe its lease is active even though the new leader’s lease is already committed.

This creates a split-brain scenario where two nodes believe they are simultaneously the leader. Reads from the old leader return stale data. Writes to the old leader may be lost when the old leader finally steps down. The system appears to violate safety, but it is just experiencing the consequences of clock skew interacting with the FLP impossibility. You cannot know whether the old leader is slow or dead without bounded clock uncertainty.

The adversarial scheduler in FLP is not a person but any source of uncertainty that delays critical messages. Clock skew acts like message delay. When the network delays a message that would inform the old leader to step down, the scheduler is acting through the network. Bounded clock uncertainty, as in Spanner’s TrueTime, converts this from an unbounded delay to a bounded one. The old leader’s lease uncertainty is bounded to at most 7 seconds. After that, the new leader is recognized as the sole leader. This is the partial synchrony model in practice.

Scenario 4: GC Pauses and Message Reordering

A JVM garbage collection pause of 10 or more seconds can cause heartbeat timeouts to trigger leader elections and messages in-flight to appear as if never sent. From the perspective of the other nodes, this is indistinguishable from the leader crashing. GC pauses are one of the most common causes of unexpected leader elections in production Java systems running consensus protocols.

The FLP impossibility shows up here in a particularly uncomfortable way. The GC pause is a form of process stuttering. The JVM thread that handles consensus messages is frozen. During this freeze, heartbeat messages are not sent. Followers detect the missed heartbeat and start an election. A new leader is elected. Meanwhile, the old leader’s JVM resumes, and messages that were queued during the pause are sent. These messages carry old timestamps and old terms. They arrive at followers who now have a newer leader.

This is the message reordering problem. The messages were not reordered by the network but by the pause itself. The network delivered them in the correct order, but the application state had changed during the pause. The old leader’s messages describe a state that no longer exists. The new leader’s messages describe a state that includes writes the old leader never saw.

Raft and Paxos handle this through the same mechanism: term checking and commitment constraints. The old leader’s messages are rejected because they carry stale terms. The system maintains safety. But liveness is violated during the pause. No writes can be committed while the heartbeat timeout is firing and the election runs. This is FLP’s guaranteed liveness violation in action. The system recovers once the GC completes and heartbeats resume, but that recovery time is not bounded by the protocol. It is bounded by the GC implementation, which is a JVM concern, not a consensus concern.

Scenario 5: Byzantine Failure in Practice

While FLP originally addresses crash-stop failures, real systems face Byzantine failures where nodes behave arbitrarily maliciously. A node may claim to have received messages it never received. A corrupted node may send contradictory messages to different peers. This is a stronger failure model than FLP assumed, and the impossibility results are stronger under Byzantine assumptions.

The original FLP proof shows that with just one crash-stop failure, consensus with guaranteed termination is impossible in async systems. If a single crash-stop is sufficient to break guaranteed termination, then Byzantine failures, which include crash-stop plus arbitrary malicious behavior, are also sufficient to break it. The impossibility is strictly harder to overcome under Byzantine conditions.

PBFT and similar protocols handle this by requiring more than two-thirds honest nodes. They use message authentication codes to prevent tampering, and they use view changes to remove suspected leaders. But notice what they cannot do: they cannot guarantee termination if the adversary controls message scheduling. The Byzantine FLP variant shows that no protocol can guarantee termination with even one Byzantine failure in an async system.

In practice, PBFT systems accept that during certain adversarial conditions, they will stall. The difference from crash-stop FLP is that PBFT must also detect and exclude Byzantine nodes, which requires additional rounds of communication and more replicas. The3f + 1 formula (you need 3f + 1 total nodes to tolerate f Byzantine nodes) reflects this overhead. Each additional round of detection is another opportunity for the adversarial scheduler to delay messages and keep the system bivalent. The result is the same as crash-stop consensus: safety is guaranteed, liveness is not, and bounded timing assumptions are what make PBFT work in practice, just like Raft and Paxos.

Partial Synchrony and the Dwork-Lynch Model

FLP assumes a fully asynchronous system with no timing assumptions. But what if we relax this slightly?

The Dwork-Lynch model (1988) introduced partial synchrony: the system is usually asynchronous, but eventually messages are delivered within some bounded time. This bounded time is not known a priori but exists.

graph LR
    subgraph "System Models"
        A[Asynchronous<br/>No timing<br/>assumptions] --> B[Partial Synchrony<br/>Eventually<br/>bounded delay]
        B --> C[Synchronous<br/>Known bounds<br/>always]
    end

In partial synchrony:

Initially: System behaves asynchronously (FLP applies)
Eventually: After unknown bound GST (Global Stabilization Time), timing guarantees hold
Result: Algorithms can guarantee liveness after GST while maintaining safety always

This is how practical systems work around FLP. They assume “the network will eventually be well-behaved” rather than “the network is always well-behaved.”

FLP in Practice: How Spanner and Paxos Handle This

Real systems using Paxos or Raft don’t violate FLP mathematically, but they work in practice because the assumptions underlying FLP don’t perfectly match reality:

Google Spanner uses TrueTime (bounded clock uncertainty) to provide external consistency. Spanner’s TrueTime API guarantees that uncertainty is bounded to at most 7 seconds. This means:

Spanner can use timeout-based leader election safely
After a leader failure, Spanner waits at least the maximum clock uncertainty before promoting a new leader
This effectively converts the system to partial synchrony during critical periods

Paxos-based systems (Chubby, Zookeeper) use leader leases:

The leader acquires a lease before processing requests
If the leader fails to renew, followers wait for the lease to expire before starting an election
This bounds the “asynchronous” period where the adversary could schedule messages adversarially

Raft (used in etcd, CockroachDB) relies on:

Heartbeat timeouts to detect leader failure
Election timeout randomization to break ties
Assumption that networks eventually deliver messages

These systems guarantee safety (no two nodes can be leaders simultaneously, no divergent state) but accept that during extended network partitions, liveness (ability to make progress) may be temporarily suspended.

In practice: “the network is usually reliable, and when it’s not, we sacrifice liveness for safety until it recovers.”

Relationship to CAP

FLP and CAP are related but distinct. CAP focuses on the trade-off between consistency and availability during network partitions. FLP focuses on the impossibility of guaranteed termination in asynchronous systems with failures.

Both results stem from the same underlying reality: in asynchronous systems, you cannot distinguish failures from delays. CAP accepts this and makes availability the default. FLP formalizes the impossibility and forces algorithm designers to be explicit about their assumptions.

My post on the CAP Theorem explores these trade-offs in more detail.

Why This Matters

FLP is an important result in distributed systems theory. It sets limits on what can be achieved and forces practitioners to be explicit about their assumptions.

Understanding FLP changes how you think about system design. Instead of trying to achieve impossible guarantees, you design systems that degrade gracefully under adversarial conditions.

The Broader Impact

Since 1985, researchers have built on FLP in various directions. The result has been extended to Byzantine failures (where nodes can behave arbitrarily maliciously), partial synchrony models, and different communication patterns.

The field of distributed consensus has grown substantially since FLP. Paxos, Raft, and many other algorithms have been developed with practical trade-offs in mind. FLP does not make these algorithms useless; it clarifies what they can and cannot guarantee.

Trade-off Comparison Table

The FLP result forces every consensus algorithm to make explicit trade-offs along safety/liveness dimensions. Here is how common approaches compare:

Approach	System Model	Safety	Liveness	Failure Handling	Practical Example
FLP Original	Fully Async	Yes	No	Crash-stop	Theoretical only
Paxos	Partial Sync	Yes	Yes*	Crash-stop	Chubby, Zookeeper
Raft	Partial Sync	Yes	Yes*	Crash-stop	etcd, CockroachDB
Ben-Or (Randomized)	Fully Async	Yes	Probabilistic	Crash-stop	Theoretical interest
PBFT	Partial Sync	Yes	Yes*	Byzantine	Zyzzyva, BFT-SMaRt
Dwork-Lynch	Partial Sync	Yes	Yes*	Crash-stop	Most production systems

*Liveness guaranteed only after the system stabilizes (GST in partial synchrony models).

Quick Recap Checklist

Before diving into implementation or deeper theory, verify your understanding of these core FLP concepts:

The FLP result applies specifically to fully asynchronous systems with crash-stop failures
Bivalent states are states where the final decision depends on future message ordering
An adversarial scheduler can keep a bivalent system undecided indefinitely
FLP proves you cannot simultaneously guarantee safety, liveness, and fault tolerance in async systems
Partial synchrony (Dwork-Lynch model) relaxes timing assumptions to allow liveness guarantees
Real systems like Paxos and Raft guarantee safety but may sacrifice liveness during partitions
Failure detectors and randomized algorithms are practical workarounds for the FLP impossibility
The FLP result does not mean consensus is impossible, only that guaranteed termination is impossible without additional assumptions

Interview Questions

1. What does the FLP impossibility result actually prove?

Expected answer points:

No consensus algorithm can guarantee termination in a fully asynchronous system with even one crash-stop failure
The proof shows safety and guaranteed liveness are incompatible under async timing assumptions
It is a mathematical impossibility, not a practical engineering limitation

2. What is a bivalent state in the context of the FLP proof?

Expected answer points:

A system state where the final decision depends on future message ordering that has not yet occurred
Both possible final values (0 or 1) remain plausible given current information
The adversary can keep the system bivalent by carefully ordering message deliveries

3. Why can an adversarial scheduler force a consensus algorithm to never terminate?

Expected answer points:

In async systems, you cannot distinguish a crashed process from a slow one without real-time clocks
The scheduler delays critical messages strategically, keeping each process in an "almost decided" but not committed state
From any bivalent state, the scheduler can always find a plausible scenario where the other decision value would have been correct

4. What are the three properties that consensus algorithms typically require?

Expected answer points:

Agreement: All non-faulty processes decide on the same value
Validity: The decided value must have been proposed by some process
Termination: All non-faulty processes eventually decide

5. How does the Dwork-Lynch partial synchrony model work around FLP?

Expected answer points:

The system is initially asynchronous with no timing guarantees (FLP applies)
Eventually, after a global stabilization time (GST), message delivery is bounded
Algorithms guarantee safety always, but liveness only after GST when timing assumptions hold

6. How does Google Spanner use TrueTime to work around FLP?

Expected answer points:

TrueTime provides bounded clock uncertainty (at most 7 seconds for Spanner)
This effectively converts the system to partial synchrony during critical periods
Spanner waits at least the maximum clock uncertainty before promoting a new leader after failure

7. Why does FLP not make consensus algorithms like Paxos or Raft useless?

Expected answer points:

FLP proves termination cannot be guaranteed, not that consensus is impossible
Paxos and Raft guarantee safety (no two nodes decide different values) under all conditions
They guarantee liveness under practical assumptions (eventual message delivery, partial synchrony)

8. What is the relationship between FLP and the CAP theorem?

Expected answer points:

Both stem from the same underlying reality: you cannot distinguish failures from delays in async networks
CAP focuses on consistency vs availability trade-offs during partitions
FLP formalizes the impossibility of guaranteed termination, forcing algorithm designers to be explicit about assumptions

9. How does Ben-Or's randomized consensus algorithm address FLP?

Expected answer points:

Ben-Or uses coin-flipping to achieve consensus with probability 1 in fully async systems
Termination is probabilistic rather than deterministic
The probability of non-termination in the worst case is zero, bypassing FLP's deterministic impossibility

10. What is the role of failure detectors in working around FLP?

Expected answer points:

Failure detectors provide suspicion information about process failures (possibly unreliable)
They introduce timing assumptions indirectly, converting async to partial synchrony
Algorithms using failure detectors can guarantee liveness because they can eventually suspect crashed processes

11. Why is the asynchronous communication model the key to FLP's impossibility?

Expected answer points:

Asynchronous message passing means no bounds on message delivery time
Without timing bounds, no process can know whether another has crashed or is simply slow
This uncertainty is what the adversarial scheduler exploits to keep the system bivalent indefinitely

12. What is the formal problem reduction used in the FLP proof?

Expected answer points:

FLP reduces Byzantine agreement to consensus, showing if consensus is solvable, Byzantine agreement would also be solvable
Since Byzantine agreement is provably impossible in async systems, consensus must also be impossible
The reduction constructs an adversarial scheduler that forces the system into a bivalent state

13. How do lease-based approaches like in Paxos handle FLP?

Expected answer points:

The leader acquires a lease before processing requests
If the leader fails to renew, followers wait for the lease to expire before starting election
This bounds the asynchronous window where adversarial scheduling could occur

14. What is the Global Stabilization Time (GST) in partial synchrony models?

Expected answer points:

GST is the unknown but finite time after which message delivery is bounded
Before GST, the system behaves asynchronously and FLP applies
After GST, timing assumptions hold and algorithms can guarantee liveness

15. Does FLP apply to Byzantine failures as well as crash-stop failures?

Expected answer points:

The original FLP result applies to crash-stop (only stops responding) failures
The result has been extended to Byzantine failures where nodes behave arbitrarily maliciously
Byzantine FLP variants show impossibility results are even stronger with more severe failure models

16. What is the practical implication of FLP for system designers?

Expected answer points:

You must be explicit about which guarantees your system provides under which conditions
Design for graceful degradation under adversarial network conditions
Accept that during extended network partitions, some form of liveness sacrifice is unavoidable

17. Why does Raft's election timeout randomization help with liveness?

Expected answer points:

Randomized timeouts reduce the chance of split-brain elections where multiple nodes become candidates simultaneously
It increases the probability that one node wins decisively before adversarial scheduling can interfere
Combined with heartbeat assumptions, it provides probabilistic liveness guarantees

18. Can randomized algorithms completely bypass FLP's impossibility?

Expected answer points:

Randomized algorithms like Ben-Or achieve consensus with probability 1, not deterministic certainty
They circumvent FLP by not requiring guaranteed termination in the worst case
The probability of non-termination is zero, but the running time could be unbounded in adversarial scenarios

19. What happens to consensus algorithms during network partitions in practice?

Expected answer points:

Safety is always maintained: no two nodes decide different values
Liveness may be sacrificed: the system cannot make progress until the partition heals
This is the CAP theorem in action: choosing consistency over availability during partitions

20. In the 2-process bivalent example, how does the scheduler keep the system undecided?

Expected answer points:

The scheduler delays messages from P1 to P2 confirming value 0 and from P2 to P1 confirming value 1
Neither process has enough information to commit to a decision
The system reaches a truly bivalent state where whichever choice a process makes, the other value remains plausible

Introduction

The Setup

What Consensus Requires

The Core Insight

The Bivalent Undecidable State

Concrete 2-Process Example

Adversarial Scheduler in Action

Formal Problem Reduction

What This Means Practically

Common Pitfalls / Anti-Patterns

Real-world Failure Scenarios

Scenario 1: Network Partition with Split-Brain

Scenario 2: Leader Crashes During Message Delay

Scenario 3: Clock Skew Exploiting Asynchrony

Scenario 4: GC Pauses and Message Reordering

Scenario 5: Byzantine Failure in Practice

Partial Synchrony and the Dwork-Lynch Model

FLP in Practice: How Spanner and Paxos Handle This

Relationship to CAP

Why This Matters

The Broader Impact

Trade-off Comparison Table

Quick Recap Checklist

Interview Questions

Further Reading

Category

Tags

Related Posts

Leader Election in Distributed Systems

Paxos Consensus Algorithm

View-Stamped Replication