SI335: Prim's MST Algorithm (get greedy!)

Reading

These notes and closely review Unit 5 Section 4.1.

Homework

Graph Sermon

In reviewing the homework, I gave a little sermon on the importance of graphs. Basically, many, many problems in computing boil down to problems about graphs. This a great thing, since we have so many existing algorithms for graphs and graph problems and so many theoretical results about graphs. These are all tools that we can bring to bear on a problem once we are able to cast it as being about graphs.

Undirected Graphs and Trees

Til now we have mostly been interested in directed graphs, i.e. graphs where the edges have a direction (visually, an arrow). Now we will consider undirected graphs, where edges are directionless connections between vertices.

Our first question is going to be this: what do we mean by a "tree" in a directionless graph? Without directed edges, the concept of parent, which was the basis of our understanding of tree in the world of directed graphs, doesn't make sense. Our definition relies on the concept of a connected graph, and of a cycle.

We say graph $G$ is connected if for any two vertices $u$ and $v$ there is a path in $G$ from $u$ to $v$.

If $G$ is a graph with edge set $E$ we say that $v_0,v_1,\ldots,v_{n-1}$ is a cycle in $G$ if for each $i\in\{1,\ldots,n-1\}$ we have $(v_{i-1},v_i)\in E$ and $(v_{n-1},v_0)\in E$.

We call undirected graph $G$ a tree if it is connected and has no cycles.

There is an equivalent characterization of trees as graphs in which for every pair of vertices $u,v$ there is a unique path from $u$ to $v$.

Undirected graph $G$ is a tree if and only if for any pair of vertices $u,v$ there is a unique path from $u$ to $v$.

Suppose $G$ is a tree. Let $u$ and $v$ be vertices in $G$, then there is at least one path from $u$ to $v$ since $G$ is connected (because it is a tree). If there were also a different path connecting $u$ to $v$ then there would be a cycle. [Note: we really ought to prove this assertion, but I guess it's obvious enough to you that it is true, that proving it wouldn't be helpful.] Thus the path is unique.

Suppose that for any pair of vertices $u,v$ there is a unique path from $u$ to $v$. In this case, $G$ is obviously connected. It also has no cycles, since a cycle would give us a pair of vertices connected by two different paths. Since $G$ is connected and has no cycles, it is a tree.

Weighted graphs and Spanning Trees

Note: we often describe a graph by the ordered pair $(V,E)$ where $V$ is the set of vertices of $G$, and $E\subseteq V\times V$ is the set of its edges.

$G'=(V',E')$ is a subgraph of graph $G=(V,E)$ if $V' \subseteq V$ and $E' \subseteq E$.

Tree $T=(V',E')$ is a spanning tree of connected graph $G = (V,E)$ if $T$ is a subgraph of $G$ and $V' = V$.

$G = (V,E,w)$ is a weighted graph if $(V,E)$ is an undirected graph and $w$ is a function that maps and edge to a weight. Weights may come from the reals, integers, positive integers, etc.

Tree $T$ is a minimum spanning tree of connected weighted graph $G$ if $T$ is spanning tree and the sum of the weights of the edges in $T$ is the minimum amongst all spanning trees.

Minimum Spanning Tree

Here is Prim's Algorithm for computing a minimum spanning tree of a graph from starting vertex $r$.

	Algorithm: Prim's MST
	Input : G, a connected, weighted, undirected graph with n vertices and e edges
        r, a vertex in G 
	Output: T a minimum weight spanning tree for G (reprented by a "parent" array)

	0. create P, the "parent" array, with each entry initialized to nil (means "unknown")
	P[r] = -1 ← means r is root, NOTE: only r is in the current MST

	1. select edge (u,v) where P[v] = nil (i.e. parent "unknown") and P[u] ≠ nil,
	such that (u,v) has minimum weight amongst all such edges
	IF no such edge exists, return P

	2. P[v] = u  ← i.e. add v to the current MST by connecting it to u

	3. goto Step 1

Proof that Prim's algorithm gives a minimum spanning tree

First we have to prove that the greedy algorithm is always part of some optimum solution.

If $T'$ is a subtree of a minimum spanning tree for $G$, then there is a complete minimum spanning tree for $G$, call it $T$, that contains $T'$ and contains the edge $(u,v)$ that is the minimum weight edge from a vertex in $T'$ to a vertex not in $T'$.

Let $T^*$ be a spanning tree that contains $T'$ but does not contain the edge $(u,v)$. We will show how to "repair" it to create a tree that still contains $T'$, but also contains $(u,v)$, without increasing the total weight of the resulting tree. The unique path from $u$ to $v$ in $T^*$ leaves $T'$ along some edge $(y,z)$, i.e. $y$ is in $T'$, but $z$ is not. If we remove $(y,z)$ from the spanning tree we have two disconnected trees, the one containing $u$ and the one containing $v$. If we then add the edge $(u,v)$ to the spanning tree, we reconnect the two trees, resulting in a new spanning tree, call it $T$. Since the weight of $(u,v)$ $\le$ the weight of $(y,z)$, the overall weight of $T$ is $\le$ the overall weight of $T^*$.

Essentially this proof is an algorithm that transforms tree $T^*$ from a minimum weight spanning tree that doesn't use edge $(u,v)$ to a minimum weight spanning tree that uses edge $(u,v)$. If we were to write this out formally we would say:
Input: Graph $G$, sub-minimum-spanning tree $T'$, complete minimum spanning tree $T^*$ that extends $T'$ to include all the vertices, and edge $(u,v)$ of minimum weight where $u\in T'$ and $v \notin T'$ such that $(u,v) \notin T^*$.
Output: complete minimum spanning tree $T$ that extends $T'$ and includes the edge $(u,v)$

→

The second thing we have to prove is that finding an optimum solution to the smaller subproblem yields an optimum solution to the original problem. At first this is a bit confusing, because the way we've phrased Prim's algorithm, it's not clear that we have a "smaller problem of the same kind" to solve. So let's imagine specifying the problem Prim's algorithm is solving in a slightly different way:

    Problem: Minimum Weighted Spanning Tree (containing proscribed subtree)
    Given: a connected Graph G (V is the set of vertices, E the set of edges)
    a subtree T' of G spanning the vertex set R, which is a subset of V
    Find : a spanning tree T of the graph G that contains T' as a subtree
    and has minimum weight over all such spanning trees.

Hopefully you see that Prim's algorithm can be viewed as solving a sequence of such problems, where $|V-R|$ gets smaller at each step. Now what we have to prove is trivial, because a solution to the "smaller sub-problem" is a solution to the original problem, and therefore finding an optimal solution to the smaller subproblem is indeed the right thing to do.