Kruskal's Algorithm: Building Minimum Spanning Trees

Learn Kruskal's algorithm for finding minimum spanning trees in weighted graphs using union-find data structures.

published: April 7, 2026 reading time: 18 min read author: GeekWorkBench

Kruskal’s Algorithm: Building Minimum Spanning Trees

A Minimum Spanning Tree (MST) connects all vertices in a weighted graph using the minimum possible total edge weight, without creating any cycles. Kruskal’s algorithm builds MSTs with a greedy approach: sort all edges by weight and add them one by one, skipping any that would create a cycle.

The elegance of Kruskal’s lies in its simplicity and the clever use of the union-find data structure to efficiently detect cycles. It handles disconnected graphs by producing a Minimum Spanning Forest—one tree per connected component.

Introduction

Kruskal’s algorithm finds the Minimum Spanning Tree (MST) of a weighted undirected graph—the subset of edges that connects all vertices with the minimum possible total weight, using exactly V-1 edges and no cycles. The MST problem appears everywhere: network design (laying cable, building roads), clustering analysis, and approximation algorithms for NP-hard problems like the Traveling Salesman Problem. A correctly computed MST can save millions in infrastructure costs.

What makes Kruskal’s elegant is its greedy simplicity: sort all edges by weight, then repeatedly add the cheapest edge that doesn’t create a cycle. The algorithm seems almost too simple to guarantee optimality, yet it works—and the proof via the cut property is both beautiful and instructive. The key to efficient cycle detection is the union-find (disjoint set union) data structure, which tracks connected components in near-constant amortized time.

In this guide, you’ll implement the union-find with path compression and union by rank, understand why the greedy edge selection guarantees an MST, handle disconnected graphs to produce a Minimum Spanning Forest, and analyze when Kruskal’s outperforms Prim’s (and vice versa). You’ll also see common pitfalls like improper edge sorting and how to avoid them.

The Union-Find Data Structure

Before Kruskal’s algorithm, we need union-find for efficient cycle detection:

class UnionFind:
    """Union-Find (Disjoint Set Union) with path compression and union by rank."""

    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, x):
        """Find root with path compression."""
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # Compress path
        return self.parent[x]

    def union(self, x, y):
        """Union by rank. Returns True if merged, False if already connected."""
        root_x, root_y = self.find(x), self.find(y)

        if root_x == root_y:
            return False  # Already in same set = would create cycle

        # Union by rank: attach smaller tree under larger
        if self.rank[root_x] < self.rank[root_y]:
            root_x, root_y = root_y, root_x

        self.parent[root_y] = root_x
        if self.rank[root_x] == self.rank[root_y]:
            self.rank[root_x] += 1

        return True

Kruskal’s Algorithm

def kruskal_mst(num_vertices, edges):
    """
    Kruskal's algorithm for Minimum Spanning Tree.

    Args:
        num_vertices: Number of vertices
        edges: List of (weight, u, v) tuples

    Returns:
        List of (weight, u, v) edges in the MST, or empty if graph is disconnected
    """
    # Sort edges by weight
    sorted_edges = sorted(edges, key=lambda x: x[0])

    uf = UnionFind(num_vertices)
    mst_edges = []
    total_weight = 0

    for weight, u, v in sorted_edges:
        if uf.union(u, v):  # If union succeeded (no cycle formed)
            mst_edges.append((weight, u, v))
            total_weight += weight

            # MST has exactly V-1 edges
            if len(mst_edges) == num_vertices - 1:
                break

    return mst_edges, total_weight

When to Use Kruskal’s Algorithm

Use Kruskal’s when:

You need a Minimum Spanning Tree
Graph is sparse (works well with adjacency list)
You want a simple, easy-to-implement solution
Graph may be disconnected (produces forest)

Don’t use Kruskal’s when:

You need single-source shortest paths (use Dijkstra)
Graph is very dense (Prim’s algorithm may be faster with adjacency matrix)
You need to preserve existing spanning tree structure

Trade-off Analysis

Aspect	Kruskal’s	Prim’s
Data Structure	Union-Find	Priority Queue
Time (Sparse)	O(E log E)	O((V + E) log V)
Time (Dense)	O(E log E) ≈ O(V² log V)	O(V²)
Implementation	Simple	Moderate
Produces	Forest for disconnected graphs	Single tree
Edge Selection	Global sorting	Local minimum

Production Failure Scenarios

Disconnected graph: Kruskal’s handles this naturally, returning a forest with total weight summing all tree components
Integer overflow in weight sums: Use larger integer types or check for overflow when accumulating total weight
Large number of edges: Sorting E edges dominates runtime—consider Prim’s for very dense graphs

Complete Example

def solve():
    # Example: Network connection problem
    # n = number of servers, connections = [(city1, city2, cost)]

    n = 4
    connections = [
        (1, 2, 1),   # (u, v, cost)
        (2, 3, 2),
        (3, 4, 3),
        (4, 1, 4),
        (1, 3, 10)
    ]

    # Convert to (cost, u, v) format for Kruskal's
    edges = [(cost, u-1, v-1) for u, v, cost in connections]

    mst, total = kruskal_mst(n, edges)

    if len(mst) == n - 1:
        print(f"MST weight: {total}")
        print("Edges:", [(u+1, v+1, w) for w, u, v in mst])
    else:
        print("Graph is disconnected, no MST exists")

Quick Recap Checklist

Sort all edges by weight
Use union-find to detect cycles
Add edge if it connects two different components
Stop after V-1 edges (full MST)
For disconnected graphs, produces a forest
Total weight is minimized

Interview Questions

1. Why does Kruskal's algorithm produce a minimum spanning tree?

Kruskal's relies on the cut property: for any cut that separates vertices into two sets, the minimum-weight edge crossing that cut belongs to some MST. By processing edges in increasing order, Kruskal always adds the globally minimum edge that doesn't create a cycle. Since adding an edge never increases the total weight of what we've built, and we only add when two components are being connected, we end up with exactly V-1 edges of minimum possible total weight.

2. What's the difference between Kruskal's and Prim's algorithms?

Both build MSTs greedily, but they grow differently. Kruskal's looks at all edges globally, sorts them, and adds the cheapest edge that doesn't create a cycle. Prim's grows a single tree from a starting vertex, always adding the cheapest edge from the tree to an outside vertex. Kruskal's is easier to implement with union-find; Prim's is often faster for dense graphs with adjacency matrix representation.

3. How does union-find detect cycles in Kruskal's?

Union-find maintains connected components. Before adding an edge (u, v), we check if u and v are already in the same component using find(). If they share a root, they're already connected through some path—adding this edge would create a cycle. If they're in different components, union() merges them and we safely add the edge. Path compression and union by rank make this nearly O(1) amortized per operation.

4. Can Kruskal's handle a graph with negative edge weights?

Yes. Kruskal's algorithm works perfectly with negative edge weights because it simply sorts edges by weight and adds them if they don't form a cycle. Negative weights don't violate the cut property that guarantees correctness. The only concern is integer overflow when summing many negative weights—use appropriate data types or arbitrary precision integers.

5. What is the time complexity of Kruskal's algorithm and what factors influence it?

The dominant factor is sorting E edges: O(E log E). After sorting, each of the E edges requires a find/union operation, which is nearly O(1) amortized with path compression and union by rank — effectively O(E α(V)), where α is the inverse Ackermann function (practically constant). The overall complexity is O(E log E), which simplifies to O(E log V) since E ≤ V², making log E ≤ 2 log V. For sparse graphs (E ≈ V), this is excellent; for dense graphs (E ≈ V²), Prim's with an adjacency matrix O(V²) may be faster.

6. How would you modify Kruskal's algorithm to find a maximum spanning tree?

Two equivalent approaches:

Negate weights: Multiply all edge weights by -1 and run standard Kruskal's (minimum on negated weights = maximum on original).
Reverse sort: Sort edges in descending order instead of ascending, then apply the same cycle-detection logic.

Both produce a Maximum Spanning Tree (MaxST) with the same O(E log E) complexity. The correctness follows from the same cut property, but applied to the complement: the heaviest edge crossing any cut belongs to some maximum spanning tree.

7. Does Kruskal's algorithm work correctly on graphs with zero-weight edges?

Yes. Zero-weight edges are handled exactly like any other edge. They are sorted by weight and added if they don't create a cycle. The only subtlety arises when multiple MSTs exist — zero-weight edges can create ties. In such cases, Kruskal's may produce any valid MST depending on the tie-breaking order. The total weight will still be minimal, though there may be multiple optimal spanning trees with the same total weight.

8. What real-world applications use Kruskal's algorithm and MSTs?

Key applications include:

Network design: Laying fiber-optic cables, electrical grids, or water pipelines at minimum cost while connecting all nodes
Cluster analysis: Single-linkage clustering uses an MST — cutting the heaviest edges produces clusters
Image segmentation: Treating pixels as graph nodes and edge weights as dissimilarity, MST-based segmentation isolates regions
Approximation algorithms: The 2-approximation for the Traveling Salesman Problem uses an MST as a backbone
Circuit design: Minimizing wire length connecting components on a printed circuit board

9. What is the cut property and why is it fundamental to Kruskal's correctness?

The cut property states: for any partition of vertices into two non-empty sets, the minimum-weight edge crossing that cut belongs to some minimum spanning tree. This is the theoretical foundation that makes Kruskal's greedy approach work.

During Kruskal's execution, each added edge defines a cut between the two components it connects. Since Kruskal processes edges globally in ascending order, when it encounters the minimum edge crossing any current cut, that edge is guaranteed to be in some MST. Adding it cannot create a sub-optimal partial tree because any other edge crossing the same cut would be equally or more expensive.

10. How does path compression improve union-find performance?

Path compression flattens the tree structure during find() operations by making every node along the search path point directly to the root. On a subsequent find() from any of those nodes, the lookup is O(1) instead of traversing a tall tree.

Combined with union by rank (attaching shorter trees under taller ones), the amortized cost per operation becomes the inverse Ackermann function α(V)—practically a constant for any realistic V (e.g., α(10^80) ≈ 4). Without path compression, operations degrade to O(log V) amortized; with it, we achieve near-constant time, making Kruskal's overall O(E log E) dominated by sorting rather than cycle detection.

11. What happens when Kruskal's encounters equal-weight edges?

Equal-weight edges can be processed in any order without affecting the total weight of the final MST. However, the specific tree structure may differ depending on which equal-weight edge is chosen first.

In practice, the sort stability (or lack thereof) determines which MST is returned. If an implementation requires a deterministic result, explicit tie-breaking by edge identifier or endpoint indices should be applied after sorting by weight. The cut property still guarantees optimality since any minimum-weight edge crossing a cut is valid—ties simply mean multiple MSTs exist with the same total weight.

12. Why is union by rank used instead of simple union or union by size?

Union by rank attaches the shorter tree under the root of the taller tree. The "rank" approximates tree height without storing extra data. This keeps trees roughly balanced, preventing the O(log V) worst case that naive union (always attaching one root to another) creates.

Union by size (actual node count) achieves similar balance but requires tracking sizes. Union by rank is preferred because rank is a single integer that rarely overflows (unlike size, which grows with component size), and rank comparisons are sufficient to prove the O(α(V)) amortized bound. The rank upper-bounds the true height of each tree, so attaching by rank never increases the maximum depth.

13. How would you parallelize Kruskal's algorithm for large-scale graphs?

Parallelizing Kruskal's has two main bottlenecks:

Sorting: Use parallel sort (e.g., GPU sort, multi-core merge sort) for the O(E log E) edge sorting phase
Union-find: The cycle checks are inherently sequential due to component merges—only independent edges (connecting different components) can be processed in parallel

A common approach: sort edges in parallel, then partition independent edges into batches that can be processed concurrently using a parallel union-find variant or a lock-free approach. Borůvka's algorithm (which performs parallel "contraction" steps) is often easier to parallelize and is frequently combined with Kruskal's in modern MST implementations.

14. What is the inverse Ackermann function and why does it appear in union-find analysis?

The inverse Ackermann function α(V) is the extremely slow-growing inverse of the Ackermann function A(n,n). For any practical input size, α(V) ≤ 4—A(4,4) is already astronomically huge, and its inverse grows so slowly that it is effectively a constant.

This function appears in the amortized analysis of union-find with path compression and union by rank: each find/union operation costs O(α(V)) amortized. This is why union-find operations are described as "practically O(1)"—no matter how many vertices, you never wait more than a handful of steps. In Kruskal's O(E log E + E·α(V)), the α(V) term is swamped by the sorting cost and is negligible in practice.

15. Can Kruskal's algorithm be adapted to find non-minimum spanning trees?

Yes, with simple modifications:

Maximum Spanning Tree: Sort edges in descending order (or negate weights), keeping the greedy strategy but always picking the heaviest non-cycling edge. Correctness follows from the same cut property applied to the maximum.
K-th Minimum Spanning Tree: Use a modified approach with edge deletions and reconnections (Eppstein's algorithm finds the K-th MST in O(E + V log V + K log V))
Second-best MST: Find the MST normally, then for each non-MST edge, compute the minimum weight increase when substituting it for an edge in the MST cycle it creates

The core greedy structure remains, but changing the selection criterion (heaviest instead of lightest, or constrained substitutions) extends Kruskal's beyond pure minimization.

16. How does Kruskal's compare when using a binary heap versus full sorting?

Both approaches yield O(E log E) overall complexity, but with different constants and practical trade-offs:

Full sorting (standard): Sorts all E edges upfront with efficient comparison sorts. One-time O(E log E) cost, no per-operation overhead.
Binary heap: Insert all edges into heap O(E), then repeatedly extract-min O(log E) × O(E). Total O(E log E), but with larger constant factors from heap operations. Useful when edges are generated incrementally.
Fibonacci heap: Extract-min becomes O(log V) (not log E), giving O(E + V log V) for Prim's—but Kruskal's still benefits less because the sorting step dominates.

For Kruskal's, full sorting is typically faster in practice due to cache locality and lower constant factors. The heap approach shines in Prim's algorithm where edge weights change dynamically.

17. What edge cases must be handled when implementing Kruskal's algorithm?

Critical edge cases include:

Disconnected graph: Algorithm produces a forest, not a single tree. Check that len(mst_edges) == V - 1 to confirm a true MST exists.
Self-loops: Edges from a vertex to itself must be ignored—they always create cycles and would corrupt the result.
Parallel edges: Multiple edges between the same two vertices: all are valid candidates; the lightest that connects two components is added. Subsequent parallel edges between already-connected vertices are rejected.
Integer overflow: Summing many weights (especially with negative edges or large dense graphs) can overflow fixed-width integers—use 64-bit or arbitrary precision.
Empty graph or single vertex: V = 0 or V = 1 yields empty MST with total weight 0.

18. How does Kruskal's handle parallel edges between the same pair of vertices?

Parallel edges (multiple edges with the same endpoints but possibly different weights) are handled naturally by the algorithm. All parallel edges are sorted together by weight. The first one that connects two different components is added; the others are skipped because they would create cycles.

This means if you have three edges A-B with weights 5, 10, and 15, only the weight-5 edge can potentially enter the MST (assuming neither endpoint is already connected via a lighter path). The heavier parallel edges are correctly excluded. If the weight-5 edge is later found to create a cycle through another path, then the weight-10 edge might be tried next, and so on.

19. What is the difference between MST and Minimum Spanning Forest?

A Minimum Spanning Tree (MST) is defined for a connected graph—it spans all vertices with minimum total weight. A Minimum Spanning Forest (MSF) is the generalization for disconnected graphs: one MST per connected component.

Kruskal's naturally produces a Minimum Spanning Forest because it processes all edges globally. When two components cannot be connected (no edge crosses the remaining cuts between them), they remain separate trees in the forest. This is a feature, not a bug—Kruskal's detects disconnectedness automatically and returns all available trees without requiring a separate connectivity check beforehand.

20. How would you find the second-best MST using Kruskal's as a base?

To find the second-best MST (the spanning tree with the second-lowest total weight):

Run Kruskal's to find the MST with total weight W₁
For each non-MST edge e with weight w_e, identify the unique path in the MST between its endpoints. Find the maximum-weight edge f on that path such that w_e > w_f
Replacing f with e gives a spanning tree with weight W₁ - w_f + w_e
The minimum across all candidates is the second-best MST

Using LCA (Lowest Common Ancestor) with binary lifting, step 2 can be answered in O(log V) per edge, making total O(E log V). This works because any spanning tree differing from the MST must replace exactly one edge—the "minimum maximum-weight edge" criterion ensures we make the smallest possible increase.

Conclusion

Kruskal’s algorithm builds a minimum spanning tree by sorting all edges and adding them one by one, skipping any that would create a cycle. Union-find makes cycle detection nearly O(1) amortized. It’s particularly clean for sparse graphs and naturally produces a forest when the input is disconnected. If your graph is dense, Prim’s algorithm with an adjacency matrix may be faster. To compare them directly, see Prim’s Algorithm: Growing a Minimum Spanning Tree.

Kruskal's Algorithm: Building Minimum Spanning Trees

Kruskal’s Algorithm: Building Minimum Spanning Trees

Introduction

The Union-Find Data Structure

Kruskal’s Algorithm

When to Use Kruskal’s Algorithm

Trade-off Analysis

Production Failure Scenarios

Complete Example

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Kruskal's Algorithm: Building Minimum Spanning Trees

Kruskal’s Algorithm: Building Minimum Spanning Trees

Introduction

The Union-Find Data Structure

Kruskal’s Algorithm

When to Use Kruskal’s Algorithm

Trade-off Analysis

Production Failure Scenarios

Complete Example

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

Prim's Algorithm: Growing a Minimum Spanning Tree

Bellman-Ford Algorithm: Shortest Paths with Negative Weights

Dijkstra's Algorithm: Finding Shortest Paths in Weighted Graphs