The FLP Impossibility Result

The FLP impossibility theorem proves that no consensus algorithm can guarantee termination in an asynchronous system with even one faulty process. Understanding FLP is essential for distributed systems designers.

published: reading time: 9 min read updated: March 24, 2026

Introduction

In 1985, Fischer, Lynch, and Paterson published a short paper that proved a striking impossibility: no consensus algorithm can guarantee termination in a fully asynchronous system if even a single process can fail. The result is known as FLP after the authors’ initials.

This is not a practical limitation to be engineered around. It is a mathematical proof that consensus and guaranteed termination are fundamentally incompatible in asynchronous systems.

The Setup

The FLP result assumes an asynchronous system with the following properties:

  • Processes communicate by sending messages
  • Messages can be delayed arbitrarily but eventually delivered
  • Processes can fail by stopping (crash-stop, not Byzantine)
  • No clocks or timeouts are available to detect failures

This is a realistic model of many systems. Networks have variable latency; you cannot reliably distinguish a crashed process from a slow one without real-time clocks.

What Consensus Requires

Consensus algorithms typically require three properties:

  1. Agreement: All non-faulty processes decide on the same value
  2. Validity: The decided value must have been proposed by some process
  3. Termination: All non-faulty processes eventually decide

FLP proves that in an asynchronous system with at least one faulty process, no algorithm can guarantee all three simultaneously.

graph TB
    subgraph "FLP Model"
        P1[Process 1]
        P2[Process 2]
        P3[Process 3]
        M1[Message Queue]
        M2[Message Queue]
        M3[Message Queue]

        P1 --> M2
        P2 --> M1
        P2 --> M3
        P3 --> M1
        P3 --> M2
    end

    Note(("No timeouts<br/>Messages delayed<br/>but eventually<br/>delivered"))

The Core Insight

The proof constructs an adversarial message scheduler. Given any protocol that appears to be working, the scheduler can delay critical messages to keep the system in a state where processes disagree but cannot gather enough information to decide.

The key is that in an asynchronous system, you cannot know whether a process has crashed or is just slow. The scheduler can exploit this uncertainty indefinitely by ensuring that whichever choice a process makes, there is always a plausible scenario where the other choice was correct.

The Bivalent Undecidable State

The proof hinges on the concept of bivalent states. A system state is bivalent if the final decision depends on future events that have not yet occurred. The proof shows that from any initial bivalent state, the adversary can always keep the system bivalent by carefully ordering message deliveries.

This means the system can be driven to a point where processes have conflicting information, yet cannot make progress because any decision might be wrong.

Concrete 2-Process Example

Consider two processes, P1 and P2, trying to agree on a binary value (0 or 1):

graph LR
    subgraph Initial
        A1[Initial State<br/>P1: initial<br/>P2: initial]
    end

    subgraph Bivalent
        B1[State A<br/>P1: decided 0<br/>P2: undecided]
        B2[State B<br/>P1: undecided<br/>P2: decided 1]
        B3[Truly Bivalent<br/>P1: if recv 0 -> 0<br/>P2: if recv 1 -> 1]
    end

    A1 -->|P1 receives 0| B1
    A1 -->|P1 receives 1| B2
    B1 -.->|scheduler delays<br/>critical msg| B3
    B2 -.->|scheduler delays<br/>critical msg| B3

The scheduler keeps the system in state B3 (bivalent) by:

  1. Delaying the message from P1 to P2 that would confirm value 0
  2. Delaying the message from P2 to P1 that would confirm value 1
  3. Neither process has enough information to commit to a decision

This adversarial scheduling can continue indefinitely because from either “almost-decided” state, there exists a plausible scenario where the other value would have been correct.

Adversarial Scheduler in Action

The FLP proof constructs an explicit adversarial scheduler that keeps the system undecided. Here is the gist with two nodes:

sequenceDiagram
    participant A as Node A
    participant S as Adversary<br/>Scheduler
    participant B as Node B

    rect rgb(50, 50, 80)
        Note over A,B: Round 1: A proposes
        A->>S: Send "propose 0"
        Note over S: Intercept message
        S-xB: Delay "propose 0"
        Note over A: A waits for ack<br/>from majority (only B)<br/>Incomplete info
    end

    rect rgb(80, 50, 50)
        Note over A,B: Round 2: B proposes
        B->>S: Send "propose 1"
        Note over S: Intercept message
        S-xA: Delay "propose 1"
        Note over B: B waits for ack<br/>from majority (only A)<br/>Incomplete info
    end

    rect rgb(50, 80, 50)
        Note over A,B: Round 3: Scheduler acts again
        S-xA: Deliver B's "propose 1"<br/>to A only
        S-xB: Deliver A's "propose 0"<br/>to B only
        Note over A: A now sees<br/>conflicting info
        Note over B: B now sees<br/>conflicting info
    end

    Note over A,B: Neither node has majority<br/>System stays bivalent<br/>Scheduler repeats indefinitely

The scheduler alternates which messages it delays. Each node sees partial information but never enough to commit. The system can remain in this state forever, proving that no algorithm can guarantee termination.

Formal Problem Reduction

FLP proves impossibility through reduction. If you could solve consensus in an asynchronous system with even one possible failure, you could solve the simpler problem of Byzantine agreement, which is provably impossible.

The reduction works like this. Given a system that solves consensus despite failures, you can construct a scheduler (an adversary) that forces the system into a bivalent state, where the final outcome depends on timing, by delaying just the right messages at just the right moments. Since no algorithm can avoid this, consensus is impossible in general asynchronous systems.

This is not a construction flaw. It is a fundamental result. Any algorithm that makes progress in all failure scenarios can be forced into indecision by an adversarial scheduler that carefully delays messages to keep the system in a bivalent state.

The proof’s central point: you cannot distinguish a slow node from a crashed one in an asynchronous network. This ambiguity is what makes consensus undecidable without additional assumptions, like timing bounds or synchrony.

What This Means Practically

FLP does not mean consensus is impossible. It means consensus algorithms must make a trade-off:

  • They can guarantee safety (agreement and validity) but not liveness (termination), or
  • They can guarantee liveness under certain conditions (like synchronous networks), or
  • They can use randomness to guarantee termination with high probability

Coping Strategies

Real systems use various strategies to work around FLP:

Synchrony assumptions: If you assume bounds on message delivery, you can use timeouts to detect failures and guarantee termination. The CAP theorem captures this trade-off: during partitions, you must choose between consistency (giving up availability) or availability (giving up consistency during partition recovery).

Probabilistic termination: Some algorithms, like Ben-Or’s randomized consensus, guarantee termination with probability 1. They may run for an unbounded time in the worst case, but the probability of that happening is zero.

Lease-based approaches: As discussed in my Leader Election post, lease-based approaches assume bounded clock skew and network delays. They provide eventual detection of failures but cannot guarantee instant detection.

Partial Synchrony and the Dwork-Lynch Model

FLP assumes a fully asynchronous system with no timing assumptions. But what if we relax this slightly?

The Dwork-Lynch model (1988) introduced partial synchrony: the system is usually asynchronous, but eventually messages are delivered within some bounded time. This bounded time is not known a priori but exists.

graph LR
    subgraph "System Models"
        A[Asynchronous<br/>No timing<br/>assumptions] --> B[Partial Synchrony<br/>Eventually<br/>bounded delay]
        B --> C[Synchronous<br/>Known bounds<br/>always]
    end

In partial synchrony:

  • Initially: System behaves asynchronously (FLP applies)
  • Eventually: After unknown bound GST (Global Stabilization Time), timing guarantees hold
  • Result: Algorithms can guarantee liveness after GST while maintaining safety always

This is how practical systems work around FLP. They assume “the network will eventually be well-behaved” rather than “the network is always well-behaved.”

FLP in Practice: How Spanner and Paxos Handle This

Real systems using Paxos or Raft don’t violate FLP mathematically, but they work in practice because the assumptions underlying FLP don’t perfectly match reality:

Google Spanner uses TrueTime (bounded clock uncertainty) to provide external consistency. Spanner’s TrueTime API guarantees that uncertainty is bounded to at most 7 seconds. This means:

  • Spanner can use timeout-based leader election safely
  • After a leader failure, Spanner waits at least the maximum clock uncertainty before promoting a new leader
  • This effectively converts the system to partial synchrony during critical periods

Paxos-based systems (Chubby, Zookeeper) use leader leases:

  • The leader acquires a lease before processing requests
  • If the leader fails to renew, followers wait for the lease to expire before starting an election
  • This bounds the “asynchronous” period where the adversary could schedule messages adversarially

Raft (used in etcd, CockroachDB) relies on:

  • Heartbeat timeouts to detect leader failure
  • Election timeout randomization to break ties
  • Assumption that networks eventually deliver messages

These systems guarantee safety (no two nodes can be leaders simultaneously, no divergent state) but accept that during extended network partitions, liveness (ability to make progress) may be temporarily suspended.

In practice: “the network is usually reliable, and when it’s not, we sacrifice liveness for safety until it recovers.”

Relationship to CAP

FLP and CAP are related but distinct. CAP focuses on the trade-off between consistency and availability during network partitions. FLP focuses on the impossibility of guaranteed termination in asynchronous systems with failures.

Both results stem from the same underlying reality: in asynchronous systems, you cannot distinguish failures from delays. CAP accepts this and makes availability the default. FLP formalizes the impossibility and forces algorithm designers to be explicit about their assumptions.

My post on the CAP Theorem explores these trade-offs in more detail.

Why This Matters

FLP is an important result in distributed systems theory. It sets limits on what can be achieved and forces practitioners to be explicit about their assumptions.

Understanding FLP changes how you think about system design. Instead of trying to achieve impossible guarantees, you design systems that degrade gracefully under adversarial conditions.

The Broader Impact

Since 1985, researchers have built on FLP in various directions. The result has been extended to Byzantine failures (where nodes can behave arbitrarily maliciously), partial synchrony models, and different communication patterns.

The field of distributed consensus has grown substantially since FLP. Paxos, Raft, and many other algorithms have been developed with practical trade-offs in mind. FLP does not make these algorithms useless; it clarifies what they can and cannot guarantee.

Conclusion

The FLP impossibility result tells us something fundamental about the nature of distributed systems. We cannot have both safety and guaranteed liveness in asynchronous systems with failures. Every consensus algorithm makes explicit trade-offs based on this reality.

Understanding FLP does not make distributed systems programming easier, but it does make your reasoning about these systems more sound. When something seems too good to be true, FLP reminds us why.

For more on consistency trade-offs, see my post on Consistency Models.

Category

Related Posts

Leader Election in Distributed Systems

Leader election is the process of designating a single node as the coordinator among a set of distributed nodes, critical for consensus protocols.

#distributed-systems #leader-election #consensus

Paxos Consensus Algorithm

Paxos is a family of consensus algorithms for achieving agreement in distributed systems despite failures, pioneered by Leslie Lamport.

#distributed-systems #consensus #paxos

View-Stamped Replication

View-Stamped Replication (VSR) is a distributed consensus protocol that uses views and timestamps to achieve agreement in asynchronous systems.

#distributed-systems #consensus #view-stamped