Kruskal's Algorithm: Building Minimum Spanning Trees
Learn Kruskal's algorithm for finding minimum spanning trees in weighted graphs using union-find data structures.
Kruskal’s Algorithm: Building Minimum Spanning Trees
A Minimum Spanning Tree (MST) connects all vertices in a weighted graph using the minimum possible total edge weight, without creating any cycles. Kruskal’s algorithm builds MSTs with a greedy approach: sort all edges by weight and add them one by one, skipping any that would create a cycle.
The elegance of Kruskal’s lies in its simplicity and the clever use of the union-find data structure to efficiently detect cycles. It handles disconnected graphs by producing a Minimum Spanning Forest—one tree per connected component.
Introduction
Kruskal’s algorithm finds the Minimum Spanning Tree (MST) of a weighted undirected graph—the subset of edges that connects all vertices with the minimum possible total weight, using exactly V-1 edges and no cycles. The MST problem appears everywhere: network design (laying cable, building roads), clustering analysis, and approximation algorithms for NP-hard problems like the Traveling Salesman Problem. A correctly computed MST can save millions in infrastructure costs.
What makes Kruskal’s elegant is its greedy simplicity: sort all edges by weight, then repeatedly add the cheapest edge that doesn’t create a cycle. The algorithm seems almost too simple to guarantee optimality, yet it works—and the proof via the cut property is both beautiful and instructive. The key to efficient cycle detection is the union-find (disjoint set union) data structure, which tracks connected components in near-constant amortized time.
In this guide, you’ll implement the union-find with path compression and union by rank, understand why the greedy edge selection guarantees an MST, handle disconnected graphs to produce a Minimum Spanning Forest, and analyze when Kruskal’s outperforms Prim’s (and vice versa). You’ll also see common pitfalls like improper edge sorting and how to avoid them.
The Union-Find Data Structure
Before Kruskal’s algorithm, we need union-find for efficient cycle detection:
class UnionFind:
"""Union-Find (Disjoint Set Union) with path compression and union by rank."""
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
"""Find root with path compression."""
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # Compress path
return self.parent[x]
def union(self, x, y):
"""Union by rank. Returns True if merged, False if already connected."""
root_x, root_y = self.find(x), self.find(y)
if root_x == root_y:
return False # Already in same set = would create cycle
# Union by rank: attach smaller tree under larger
if self.rank[root_x] < self.rank[root_y]:
root_x, root_y = root_y, root_x
self.parent[root_y] = root_x
if self.rank[root_x] == self.rank[root_y]:
self.rank[root_x] += 1
return True
Kruskal’s Algorithm
def kruskal_mst(num_vertices, edges):
"""
Kruskal's algorithm for Minimum Spanning Tree.
Args:
num_vertices: Number of vertices
edges: List of (weight, u, v) tuples
Returns:
List of (weight, u, v) edges in the MST, or empty if graph is disconnected
"""
# Sort edges by weight
sorted_edges = sorted(edges, key=lambda x: x[0])
uf = UnionFind(num_vertices)
mst_edges = []
total_weight = 0
for weight, u, v in sorted_edges:
if uf.union(u, v): # If union succeeded (no cycle formed)
mst_edges.append((weight, u, v))
total_weight += weight
# MST has exactly V-1 edges
if len(mst_edges) == num_vertices - 1:
break
return mst_edges, total_weight
When to Use Kruskal’s Algorithm
Use Kruskal’s when:
- You need a Minimum Spanning Tree
- Graph is sparse (works well with adjacency list)
- You want a simple, easy-to-implement solution
- Graph may be disconnected (produces forest)
Don’t use Kruskal’s when:
- You need single-source shortest paths (use Dijkstra)
- Graph is very dense (Prim’s algorithm may be faster with adjacency matrix)
- You need to preserve existing spanning tree structure
Trade-off Analysis
| Aspect | Kruskal’s | Prim’s |
|---|---|---|
| Data Structure | Union-Find | Priority Queue |
| Time (Sparse) | O(E log E) | O((V + E) log V) |
| Time (Dense) | O(E log E) ≈ O(V² log V) | O(V²) |
| Implementation | Simple | Moderate |
| Produces | Forest for disconnected graphs | Single tree |
| Edge Selection | Global sorting | Local minimum |
Production Failure Scenarios
- Disconnected graph: Kruskal’s handles this naturally, returning a forest with total weight summing all tree components
- Integer overflow in weight sums: Use larger integer types or check for overflow when accumulating total weight
- Large number of edges: Sorting E edges dominates runtime—consider Prim’s for very dense graphs
Complete Example
def solve():
# Example: Network connection problem
# n = number of servers, connections = [(city1, city2, cost)]
n = 4
connections = [
(1, 2, 1), # (u, v, cost)
(2, 3, 2),
(3, 4, 3),
(4, 1, 4),
(1, 3, 10)
]
# Convert to (cost, u, v) format for Kruskal's
edges = [(cost, u-1, v-1) for u, v, cost in connections]
mst, total = kruskal_mst(n, edges)
if len(mst) == n - 1:
print(f"MST weight: {total}")
print("Edges:", [(u+1, v+1, w) for w, u, v in mst])
else:
print("Graph is disconnected, no MST exists")
Quick Recap Checklist
- Sort all edges by weight
- Use union-find to detect cycles
- Add edge if it connects two different components
- Stop after V-1 edges (full MST)
- For disconnected graphs, produces a forest
- Total weight is minimized
Interview Questions
Kruskal's relies on the cut property: for any cut that separates vertices into two sets, the minimum-weight edge crossing that cut belongs to some MST. By processing edges in increasing order, Kruskal always adds the globally minimum edge that doesn't create a cycle. Since adding an edge never increases the total weight of what we've built, and we only add when two components are being connected, we end up with exactly V-1 edges of minimum possible total weight.
Both build MSTs greedily, but they grow differently. Kruskal's looks at all edges globally, sorts them, and adds the cheapest edge that doesn't create a cycle. Prim's grows a single tree from a starting vertex, always adding the cheapest edge from the tree to an outside vertex. Kruskal's is easier to implement with union-find; Prim's is often faster for dense graphs with adjacency matrix representation.
Union-find maintains connected components. Before adding an edge (u, v), we check
if u and v are already in the same component using find(). If they
share a root, they're already connected through some path—adding this edge would
create a cycle. If they're in different components, union() merges
them and we safely add the edge. Path compression and union by rank make this
nearly O(1) amortized per operation.
Yes. Kruskal's algorithm works perfectly with negative edge weights because it simply sorts edges by weight and adds them if they don't form a cycle. Negative weights don't violate the cut property that guarantees correctness. The only concern is integer overflow when summing many negative weights—use appropriate data types or arbitrary precision integers.
The dominant factor is sorting E edges: O(E log E). After sorting, each of the E edges requires a find/union operation, which is nearly O(1) amortized with path compression and union by rank — effectively O(E α(V)), where α is the inverse Ackermann function (practically constant). The overall complexity is O(E log E), which simplifies to O(E log V) since E ≤ V², making log E ≤ 2 log V. For sparse graphs (E ≈ V), this is excellent; for dense graphs (E ≈ V²), Prim's with an adjacency matrix O(V²) may be faster.
Two equivalent approaches:
- Negate weights: Multiply all edge weights by -1 and run standard Kruskal's (minimum on negated weights = maximum on original).
- Reverse sort: Sort edges in descending order instead of ascending, then apply the same cycle-detection logic.
Both produce a Maximum Spanning Tree (MaxST) with the same O(E log E) complexity. The correctness follows from the same cut property, but applied to the complement: the heaviest edge crossing any cut belongs to some maximum spanning tree.
Yes. Zero-weight edges are handled exactly like any other edge. They are sorted by weight and added if they don't create a cycle. The only subtlety arises when multiple MSTs exist — zero-weight edges can create ties. In such cases, Kruskal's may produce any valid MST depending on the tie-breaking order. The total weight will still be minimal, though there may be multiple optimal spanning trees with the same total weight.
Key applications include:
- Network design: Laying fiber-optic cables, electrical grids, or water pipelines at minimum cost while connecting all nodes
- Cluster analysis: Single-linkage clustering uses an MST — cutting the heaviest edges produces clusters
- Image segmentation: Treating pixels as graph nodes and edge weights as dissimilarity, MST-based segmentation isolates regions
- Approximation algorithms: The 2-approximation for the Traveling Salesman Problem uses an MST as a backbone
- Circuit design: Minimizing wire length connecting components on a printed circuit board
The cut property states: for any partition of vertices into two non-empty sets, the minimum-weight edge crossing that cut belongs to some minimum spanning tree. This is the theoretical foundation that makes Kruskal's greedy approach work.
During Kruskal's execution, each added edge defines a cut between the two components it connects. Since Kruskal processes edges globally in ascending order, when it encounters the minimum edge crossing any current cut, that edge is guaranteed to be in some MST. Adding it cannot create a sub-optimal partial tree because any other edge crossing the same cut would be equally or more expensive.
Path compression flattens the tree structure during find() operations by making every node along the search path point directly to the root. On a subsequent find() from any of those nodes, the lookup is O(1) instead of traversing a tall tree.
Combined with union by rank (attaching shorter trees under taller ones), the amortized cost per operation becomes the inverse Ackermann function α(V)—practically a constant for any realistic V (e.g., α(10^80) ≈ 4). Without path compression, operations degrade to O(log V) amortized; with it, we achieve near-constant time, making Kruskal's overall O(E log E) dominated by sorting rather than cycle detection.
Equal-weight edges can be processed in any order without affecting the total weight of the final MST. However, the specific tree structure may differ depending on which equal-weight edge is chosen first.
In practice, the sort stability (or lack thereof) determines which MST is returned. If an implementation requires a deterministic result, explicit tie-breaking by edge identifier or endpoint indices should be applied after sorting by weight. The cut property still guarantees optimality since any minimum-weight edge crossing a cut is valid—ties simply mean multiple MSTs exist with the same total weight.
Union by rank attaches the shorter tree under the root of the taller tree. The "rank" approximates tree height without storing extra data. This keeps trees roughly balanced, preventing the O(log V) worst case that naive union (always attaching one root to another) creates.
Union by size (actual node count) achieves similar balance but requires tracking sizes. Union by rank is preferred because rank is a single integer that rarely overflows (unlike size, which grows with component size), and rank comparisons are sufficient to prove the O(α(V)) amortized bound. The rank upper-bounds the true height of each tree, so attaching by rank never increases the maximum depth.
Parallelizing Kruskal's has two main bottlenecks:
- Sorting: Use parallel sort (e.g., GPU sort, multi-core merge sort) for the O(E log E) edge sorting phase
- Union-find: The cycle checks are inherently sequential due to component merges—only independent edges (connecting different components) can be processed in parallel
A common approach: sort edges in parallel, then partition independent edges into batches that can be processed concurrently using a parallel union-find variant or a lock-free approach. Borůvka's algorithm (which performs parallel "contraction" steps) is often easier to parallelize and is frequently combined with Kruskal's in modern MST implementations.
The inverse Ackermann function α(V) is the extremely slow-growing inverse of the Ackermann function A(n,n). For any practical input size, α(V) ≤ 4—A(4,4) is already astronomically huge, and its inverse grows so slowly that it is effectively a constant.
This function appears in the amortized analysis of union-find with path compression and union by rank: each find/union operation costs O(α(V)) amortized. This is why union-find operations are described as "practically O(1)"—no matter how many vertices, you never wait more than a handful of steps. In Kruskal's O(E log E + E·α(V)), the α(V) term is swamped by the sorting cost and is negligible in practice.
Yes, with simple modifications:
- Maximum Spanning Tree: Sort edges in descending order (or negate weights), keeping the greedy strategy but always picking the heaviest non-cycling edge. Correctness follows from the same cut property applied to the maximum.
- K-th Minimum Spanning Tree: Use a modified approach with edge deletions and reconnections (Eppstein's algorithm finds the K-th MST in O(E + V log V + K log V))
- Second-best MST: Find the MST normally, then for each non-MST edge, compute the minimum weight increase when substituting it for an edge in the MST cycle it creates
The core greedy structure remains, but changing the selection criterion (heaviest instead of lightest, or constrained substitutions) extends Kruskal's beyond pure minimization.
Both approaches yield O(E log E) overall complexity, but with different constants and practical trade-offs:
- Full sorting (standard): Sorts all E edges upfront with efficient comparison sorts. One-time O(E log E) cost, no per-operation overhead.
- Binary heap: Insert all edges into heap O(E), then repeatedly extract-min O(log E) × O(E). Total O(E log E), but with larger constant factors from heap operations. Useful when edges are generated incrementally.
- Fibonacci heap: Extract-min becomes O(log V) (not log E), giving O(E + V log V) for Prim's—but Kruskal's still benefits less because the sorting step dominates.
For Kruskal's, full sorting is typically faster in practice due to cache locality and lower constant factors. The heap approach shines in Prim's algorithm where edge weights change dynamically.
Critical edge cases include:
- Disconnected graph: Algorithm produces a forest, not a single tree. Check that
len(mst_edges) == V - 1to confirm a true MST exists. - Self-loops: Edges from a vertex to itself must be ignored—they always create cycles and would corrupt the result.
- Parallel edges: Multiple edges between the same two vertices: all are valid candidates; the lightest that connects two components is added. Subsequent parallel edges between already-connected vertices are rejected.
- Integer overflow: Summing many weights (especially with negative edges or large dense graphs) can overflow fixed-width integers—use 64-bit or arbitrary precision.
- Empty graph or single vertex: V = 0 or V = 1 yields empty MST with total weight 0.
Parallel edges (multiple edges with the same endpoints but possibly different weights) are handled naturally by the algorithm. All parallel edges are sorted together by weight. The first one that connects two different components is added; the others are skipped because they would create cycles.
This means if you have three edges A-B with weights 5, 10, and 15, only the weight-5 edge can potentially enter the MST (assuming neither endpoint is already connected via a lighter path). The heavier parallel edges are correctly excluded. If the weight-5 edge is later found to create a cycle through another path, then the weight-10 edge might be tried next, and so on.
A Minimum Spanning Tree (MST) is defined for a connected graph—it spans all vertices with minimum total weight. A Minimum Spanning Forest (MSF) is the generalization for disconnected graphs: one MST per connected component.
Kruskal's naturally produces a Minimum Spanning Forest because it processes all edges globally. When two components cannot be connected (no edge crosses the remaining cuts between them), they remain separate trees in the forest. This is a feature, not a bug—Kruskal's detects disconnectedness automatically and returns all available trees without requiring a separate connectivity check beforehand.
To find the second-best MST (the spanning tree with the second-lowest total weight):
- Run Kruskal's to find the MST with total weight W₁
- For each non-MST edge e with weight w_e, identify the unique path in the MST between its endpoints. Find the maximum-weight edge f on that path such that w_e > w_f
- Replacing f with e gives a spanning tree with weight W₁ - w_f + w_e
- The minimum across all candidates is the second-best MST
Using LCA (Lowest Common Ancestor) with binary lifting, step 2 can be answered in O(log V) per edge, making total O(E log V). This works because any spanning tree differing from the MST must replace exactly one edge—the "minimum maximum-weight edge" criterion ensures we make the smallest possible increase.
Further Reading
To deepen your understanding of Kruskal’s algorithm and related topics, explore these resources:
- Prim’s Algorithm: Growing a Minimum Spanning Tree — The complementary MST algorithm; compare how Prim’s grows a single tree vs. Kruskal’s global edge sorting approach
- Union-Find (Disjoint Set Union) Deep Dive — Master the data structure that makes Kruskal’s cycle detection efficient with path compression and union by rank
- The Cut Property and Cycle Property — These two theorems form the theoretical foundation for all MST algorithms. Understanding them explains why greedy MST algorithms are correct
- Borůvka’s Algorithm — The earliest MST algorithm (1926), rediscovered for parallel and distributed computing contexts
- Minimum Spanning Tree Applications — Explore real-world uses: network design, cluster analysis, image segmentation, approximation algorithms (traveling salesman problem), and circuit design
- Cormen et al., Introduction to Algorithms (CLRS) — Chapters on MST algorithms with formal proofs and complexity analysis
For a more visual approach, see how Kruskal’s algorithm processes edges in sorted order, always adding the lightest edge that connects two separate components. This greedy strategy, backed by the cut property, guarantees optimality.
Conclusion
Kruskal’s algorithm builds a minimum spanning tree by sorting all edges and adding them one by one, skipping any that would create a cycle. Union-find makes cycle detection nearly O(1) amortized. It’s particularly clean for sparse graphs and naturally produces a forest when the input is disconnected. If your graph is dense, Prim’s algorithm with an adjacency matrix may be faster. To compare them directly, see Prim’s Algorithm: Growing a Minimum Spanning Tree.
Category
Related Posts
Prim's Algorithm: Growing a Minimum Spanning Tree
Learn Prim's algorithm for finding minimum spanning trees starting from a vertex, using a greedy approach with priority queues.
Bellman-Ford Algorithm: Shortest Paths with Negative Weights
Learn the Bellman-Ford algorithm for single-source shortest paths including negative edge weights and negative cycle detection.
Dijkstra's Algorithm: Finding Shortest Paths in Weighted Graphs
Master Dijkstra's algorithm for single-source shortest path problems in weighted graphs with positive edges, including implementations and trade-offs.