Minimum Spanning Treesalgo.epfl.ch › _media › en › courses › 2011-2012 › algorithm... ·...
Transcript of Minimum Spanning Treesalgo.epfl.ch › _media › en › courses › 2011-2012 › algorithm... ·...
Minimum Spanning Trees
AlgorithmiqueFall semester 2011/12
Acknowledgment: Slides modeled after the course CO226 at Princeton
Given: Undirected+connected graph G with positive edge weightsDefinition: A spanning tree of G is a connected acyclic subgraph T of G with the same set of vertices as G.Goal: find a minimum weight spanning tree of G. (Minimum spanning tree, or MST)
4
12
10
7
19 11
9 5
2
7
2314
Graph G
4
Given: Undirected+connected graph G with positive edge weightsDefinition: A spanning tree of G is a connected acyclic subgraph T of G with the same set of vertices as G.Goal: find a minimum weight spanning tree of G. (Minimum spanning tree, or MST)
4
12
10
7
19 11
9 5
2
7
2314
Acyclic, but not spanning
4
Given: Undirected+connected graph G with positive edge weightsDefinition: A spanning tree of G is a connected acyclic subgraph T of G with the same set of vertices as G.Goal: find a minimum weight spanning tree of G. (Minimum spanning tree, or MST)
4
12
10
7
19 11
9 5
2
7
2314
Spanning, acyclic, but not connected
4
Given: Undirected+connected graph G with positive edge weightsDefinition: A spanning tree of G is a connected acyclic subgraph T of G with the same set of vertices as G.Goal: find a minimum weight spanning tree of G. (Minimum spanning tree, or MST)
Spanning, connected, but not acyclic
4
12
10
7
19 11
9 5
2
7
2314
4
Given: Undirected+connected graph G with positive edge weightsDefinition: A spanning tree of G is a connected acyclic subgraph T of G with the same set of vertices as G.Goal: find a minimum weight spanning tree of G. (Minimum spanning tree, or MST)
Spanning tree of cost 4+7+11+10+4+7+5 = 48
4
12
10
7
19 11
9 5
2
7
2314
4
A multinational company wants to lease communication lines between its various locations.
Example 1: Communication Networks
A multinational company wants to lease communication lines between its various locations.
Example 1: Communication Networks
A multinational company wants to lease communication lines between its various locations.
Example 1: Communication Networks
5
400
80
90
10
1210
19
20
19
10
10
100
20
90
18
25
50
43
90
90
200
20
10
122
19
20
18
10
540
100
200
10
120
20
120
200
20
30
152
21
10
Each communication line comes with its own price tag. Company wants to spend the least amount of money, and have all its branches connected.
Solution given by a MST on the graph. (Why?)
Example 1: Communication Networks
Possible solution: Find MST. Eliminate “fat” edges.
Example 2: Clustering
Note: this is a “heuristic” algorithm. Needs analysis.
Example 3: Dendritic Structures in the Brain
http://cvlab.epfl.ch/research/medical/neurons/
Problem: Reconstruct shape of neurons from noisy microscopy data via automatic tools
Example 3: Dendritic Structures in the Brain
Dendrite tracking in microscopic images using minimum spanning trees and localized EM -- by Fleuret and Fua
Example 4: Phylogenetic Trees
Genetic variability and population structure of endangered Panax ginseng in the Russian Primorye -- by Zhuravlev et al
Cuts
Cut: A cut (A,B) in a graph G=(V,E) is a partition of V into two nonempty sets A and B.Crossing edge: Any edge connecting a vertex in A to a vertex in B.
Crossing edges
A B
Cut Property
Cut C=(A,B), tree T on A which is part of MST, e crossing edge of minimum weight. Then there is MST M containing e and T.
MST
Crossing edge of minimum weightA
Cut C=(A,B), tree T on A which is part of MST, e crossing edge of minimum weight. Then there is MST M containing e and T. Proof:• Take MST containing T
• Add crossing edge e of min weight to MST• This creates a cycle
• Cycle has one other crossing edge f• Weight of f is at least equal to that of e
• Replace f by e in the MST• This gives new MST which contains e• This means that the weights of e and f have been equal
Cut Property
e
e
f
f
e
Prim’s Algorithm
http://www.ithistory.org/honor_roll/fame-detail.php?recordID=882
http://inserv.math.muni.cz/biografie/vojtech_jarnik.html
http://en.wikipedia.org/wiki/Edsger_W._Dijkstra
Voijtech Jarnik1897-1970
Robert Prim1921 -
Edsger Dijkstra1930-2002
Prim’s Algorithm
Start with any vertex v, set tree T to singleton v.Greedily grow tree T:
at each step add to T a minimum weight edge with exactly one endpoint in T.
etc
Tree nodes
Edges connected to exactly one tree node Tree edges
v
Prim’s Algorithm
Why does it work?T is always a subtree of a MSTInduction on number of nodes in T. Final T is MST by this result.
Start: trivial
Step: use cut property
In MST by hypothesis
v
Crossing edge of minimum weight
In MST by cut property
Singleton v is part of a MST
Implementation Challenge
How do we find minimum crossing edge at every iteration?
Check all the outgoing edges: O(|E|) comparisons at every iteration O(|E| |V|) running time in total
More clever data structure:
• For every node w, keep value dist(w) that measures the “distance” of w from current tree
• At the start, dist(v) = 0, and dist(w) = infinity for all other w
• When a new node u is added to tree, check whether the neighbors of u decrease their distance to tree; if so, decrease distance.
Maintain a min-priority queue for the nodes and their distances.
Implementation
(1) dist(v) = 0, dist(w) = for w v, pred(v) = NULL
(2) Create min-priority queue Q for V with respect to dist
(3) While Q is not empty do
(a) u = deleteMin(Q)
(b) if ( u is not marked) then
(i) Mark u
(ii) For all neighbors w of u do
a. if ( dist(w) > weight of edge (u,w) and w not marked) then
i. dist(w) = weight of edge (u,w)
ii. pred(w) = u
iii. Sift up w in Q
(4) Output tree {(pred(v),v) | v in V}
∞ �=
Q contains all nodes that are not yet covered by the MST
u is node with smallest distance to the current tree
u is now covered
Update distance of neighbors of u
Predecessor of w in the tree
Multiple copies of w may be present in Q, but only one gets marked
(1) dist(v) = 0, dist(w) = for w v, pred(v) = NULL
(2) Create min-priority queue Q for V with respect to dist
(3) While Q is not empty do
(a) u = deleteMin(Q)
(b) if ( u is not marked) then
(i) Mark u
(ii) For all neighbors w of u do
a. if ( dist(w) > weight of edge (u,w) and w not marked) then
i. dist(w) = weight of edge (u,w)
ii. pred(w) = u
iii. Sift up w in Q
(4) Output tree {(pred(v),v) | v in V}
Analysis
∞ �=
O(|V|)
O(log(|Q|)) = O(log(|E|))
< |E| times (each vertex gets added to Q at most its degree times)
at most |E| times in total
O(log(|Q|)) = O(log(|E|))
O(|E| log(|E|)) for connected graphs.
Kruskal’s Algorithm
http://www.voteview.com/ideal_point_Non_Metric_MDS.htm
Joseph B. Kruskal1928-2010
Kruskal’s Algorithm
Maintains forest which will become a MST at the end.
(1)Start from empty tree T
(2)Consider edges in ascending order of cost. Add next edge in list to T if it doesn’t create a cycle.
0
1
2
3
45
6
7
02
4
6 02
4 4
4
u v weight
3 5 0.18
1 7 0.21
6 7 0.25
0 2 0.29
0 7 0.31
0 1 0.32
3 4 0.34
4 5 0.40
4 7 0.46
0 6 0.51
4 6 0.51
0 5 0.6
3-5
1
7
3
51-7
1
3
5
1
3
5
1
3
5
1
5
1
5
6-7
6
7
0-2
0-7 3-4
02
6
02
02
4
6
7
3
7
02
6
4
6
7
3
7
4-7
Several components
Kruskal’s Algorithm
Why does it work? T is always a sub-forest of a MSTInduction on number of nodes in T. Final T is MST by this result.
Start: trivial
Step: by hypothesis, current T is sub-forest of a MST.
Edge e is edge of minimum weight that doesn’t create cycles (among the black edges)
T is a union of singleton vertices
Blue edges are part of a MST, but are not added yet
Blue edges have already been added
e
Kruskal’s Algorithm
Why does it work? T is always a sub-forest of a MSTInduction on number of nodes in T. Final T is MST by this result.
Start: trivial
Step: by hypothesis, current T is sub-forest of a MST.
Weight of e is smaller than the weights of the blue edges
T is a union of singleton vertices
Blue edges are part of a MST, but are not added yet
Blue edges have already been added
e
Kruskal’s Algorithm
Why does it work? T is always a sub-forest of a MSTInduction on number of nodes in T. Final T is MST by this result.
Start: trivial
Step: by hypothesis, current T is sub-forest of a MST.
Exchanging one of the blue edges with e creates another MST which contains T as a sub-forest, and also contains e, hence contains the new T.
T is a union of singleton vertices
Blue edges are part of a MST, but are not added yet
Blue edges have already been added
e
Kruskal’s Algorithm
Why does it work? T is always a sub-forest of a MSTInduction on number of nodes in T. Final T is MST by this result.
Start: trivial
Step: by hypothesis, current T is sub-forest of a MST.
Cost of the new T is smaller than or equal to cost of the old MST, since weight of e is smaller than or equal to the weight of the blue edges.
T is a union of singleton vertices
Blue edges are part of a MST, but are not added yet
Blue edges have already been added
e
Kruskal’s Algorithm
Why does it work? T is always a sub-forest of a MSTInduction on number of nodes in T. Final T is MST by this result.
Start: trivial
Step: by hypothesis, current T is sub-forest of a MST.
Hence, new spanning tree is part of a MST, and hence T and e belong to a MST.
T is a union of singleton vertices
Blue edges are part of a MST, but are not added yet
Blue edges have already been added
e
1
12
3
3
6
5
1
1
2
2
7
4
3
2
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
3
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
3
6
5
1
1
2
2
7
4
3
2
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
3
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
3
6
5
1
1
2
2
7
4
3
2
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
3
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
3
6
5
1
1
2
2
7
4
3
2
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
3
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
3
6
5
1
1
2
2
7
4
3
2
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
3
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
3
6
5
1
1
2
2
7
4
3
2
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
3
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
3
6
5
1
1
2
7
4
3
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
3
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
3
5
1
1
2
7
4
3
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
3
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
5
1
1
2
7
4
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
5
1
1
2
7
4
1
1
1
10
4
4
5
1
12
14
4
6
3
2
1
7
7
4
3
5
8
10
5
4
9
8
19
12
8
7
19
3
4
1
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
5
1
1
2
7
4
1
1
1
10
4
5
1
12
146
3
2
1
7
7
3
5
8
10
5
9
8
19
12
8
7
19
3
1
2
5
2
3
2
3
1
3
Kruskal’s Algorithm
1
12
3
1
1
2
7
4
1
1
1
10
4
1
12
146
3
2
1
7
7
3
8
10
5
9
8
19
12
8
7
19
3
1
2
2
3
2
3
1
3
All nodes covered now.Kruskal’s Algorithm
Implementation Challenge
How do we check whether addition of a new edge creates a cycle?Note: if G=(V,E) is the original graph, and T is the set
of edges already created, then we are looking for a
cycle in the graph F = (V,T) (not in G).
e=(u,v) creates cycle iff connected component of u = connected component of v
We could check whether the connected components are equal by doing a DFS starting from u and checking whether we reach v.
Implementation: 1st Version
(1) Create min-priority queue Q for E with respect to weight
(2) For all v in V set pred(v)=NULL
(3) Set T to the empty set
(4) While Q is not empty do
(a) (u,v) = deleteMin(Q)
(b) Run DFS starting from u on T
(c) If v is not in the connected component of u then
(i) if pred(u)=NULL set pred(u)=v else pred(v)=u
(ii) Add edge (u,v) to T
(5) Output T
........................................................................ Initial forest doesn’t have edges
.......................................................................... Take smallest edge off the queue
................ Sort the edges
............ Continue until all edges processed (can stop when |T|=|V|-1)
.......................................... Check whether edge creates cycle
Implementation: 1st Version
(1) Create min-priority queue Q for E with respect to weight
(2) For all v in V set pred(v)=NULL
(3) Set T to the empty set
(4) While Q is not empty do
(a) (u,v) = deleteMin(Q)
(b) Run DFS starting from u on T
(c) If v is not in the connected component of u then
(i) if pred(u)=NULL set pred(u)=v else pred(v)=u
(ii) Add edge (u,v) to T
(5) Output T
O(|E|)
O(log(|E|)
< |E| times
O(|T|) = O(|V|)
O(|E| |V|) total.
Better?
Just sort edges with respect to weight. No need for Q.
(1) Create min-priority queue Q for E with respect to weight
(2) For all v in V set pred(v)=NULL
(3) Set T to the empty set
(4) While Q is not empty do
(a) (u,v) = deleteMin(Q)
(b) Run DFS starting from u on T
(c) If v is not in the connected component of u then
(i) if pred(u)=NULL set pred(u)=v else pred(v)=u
(ii) Add edge (u,v) to T
(5) Output T
Check whether |T|=|V|-1
Just take next edge off the listWhat should we do with this?
Still O(|E| |V|) total.
????????
Data Structure
Need a good data structure to check whether components are equal.
This is done via the Union-Find data structure.
Dynamic Graph Model
We have a graph G on n vertices 0,1,....,n-1.
Edges are revealed one-by-one.
Want to keep track of the connected components of the graph as edges are revealed.
Introduce data structure UF on the set of vertices which keeps track of the components.
Operations on UF:• Union(a,b): join the components of a and b.
• Connected(a,b): returns true iff a and b are in the same component.
Quick Find
Data Structure:• Integer array id[] of size n
• Interpretation: p and q are connected iff they have the same id
i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 9 9 6 6 7 8 9
5 and 6 are connected2,3,4, and 9 are connected
0 1 2 3
5 6 7 8
4
9
Quick Find
Data Structure:• Integer array id[] of size n
• Interpretation: p and q are connected iff they have the same id
i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 9 9 6 6 7 8 9
5 and 6 are connected2,3,4, and 9 are connected
Connected(p,q): true, iff p and q have same id id[3]=id[9]=6Connected(3,9)=true
Quick Find
Data Structure:• Integer array id[] of size n
• Interpretation: p and q are connected iff they have the same id
i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 9 9 6 6 7 8 9
5 and 6 are connected2,3,4, and 9 are connected
Connected(p,q): true, iff p and q have same id id[3]=id[9]=6Connected(3,9)=true
Union(p,q): to merge the components of p and q, change all entries whose id[] equals id[p] to id[q].
i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 6 6 6 6 6 7 8 6
Problem: many values can change
After union(2,5)
Example
id[p] and id[q] differ, so union()changes entries equal to id[p] to
id[q] (in red)
id[p] and id[q] match, sono change
Too Slow
Count number of array accesses
algorithm init union connected
Quick-find n n 1
Defect: unions are too expensive
Quick-Union
Data Structure:• Integer array id[] of size n
• Interpretation: id[i] is parent of i• Root of i is id[id[id[......id[i].....]]].
i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 4 9 6 6 7 8 9
p
0 1
2
3
5
6 7 8
4
9
Keep going until no change
q
3’s root is 9, 5’s root is 6
Quick-Union
Data Structure:• Integer array id[] of size n
• Interpretation: id[i] is parent of i• Root of i is id[id[id[......id[i].....]]].
i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 4 9 6 6 7 8 9
p
0 1
2
3
5
6 7 8
4
9
Keep going until no change
q
Connected(3,5)=falseConnected(p,q): Check whether p and q have same root
3’s root is 9, 5’s root is 6
Union(p,q): to merge the components of p and q, set the id of p’s root to the id of q’s root
p
0 1
2
3
5
6 7 8
4
9
q
union(3,5)
i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 4 9 6 6 7 8 6
Only one value changes
Also Too Slow
Count number of array accesses
algorithm init union connected
Quick-find n n 1
Quick-union n n n
Quick-find defect: unions are too expensive
Quick-union defects:• Trees can get tall• Connected() too expensive (could be n array accesses)
Worst case
Union-Find Data Structure
Data Structure:• Integer array id[] of size n
• Interpretation: id[i] is parent of i• Root of i is id[id[id[......id[i].....]]].• size[j] is size of connected component of j (if j is root)
i 0 1 2 3 4 5 6 7 8 9id[i] 0 1 9 4 9 6 6 7 8 9
size[i] 1 1 x x x x 2 1 1 4
0 1
2
3
5
6 7 8
4
9
Size may not be accuratefor non-roots
Union-Find Data Structure
Data Structure:• Integer array id[] of size n
• Interpretation: id[i] is parent of i• Root of i is id[id[id[......id[i].....]]].• size[j] is size of connected component of j (if j is root)
Connected(p,q): Check whether p and q have same root
CompSize(p): to find the size of the component of p, find p’s root r and return size[r].
Union(p,q): to merge the components of p and q, set the id of p’s root to the id of q’s root if CompSize(p) smaller than CompSize(q). Otherwise, set the id of q’s root to the id of p’s root.
UF.init(n)
for i from 0 to n-1 do set id[i]=i, size[i]=1
UF.FindRoot(i)
while id[i] is not equal to i do set id[i] = ireturn i
UF.union(i,j)
Set p=UF.FindRoot(i), q = UF.FindRoot(j)if p is not equal to q then if size[p] < size[q] then Set id[p] = q and size[p] = size[p]+size[q] else Set id[q]=p and size[q] = size[p]+size[q]
UF.Connected(i,j)
if UF.FindRoot(i) = UF.FindRoot(j) then return trueelse return false
................................................................................................................................... Initializes the data structure
......................................................................................................................................... Finds the root of i
............................................................................................................................................... Connected()
............................................................................................................................................................................... union()
AnalysisUF.init(n)
for i from 0 to n-1 do set id[i]=i, size[i]=1
UF.FindRoot(i)
while id[i] is not equal to i do set id[i] = ireturn i
UF.union(i,j)
Set p=UF.FindRoot(i), q = UF.FindRoot(j)if p is not equal to q then if size[p] < size[q] then Set id[p] = q and size[p] = size[p]+size[q] else Set id[q]=p and size[q] = size[p]+size[q]
UF.Connected(i,j)
if UF.FindRoot(i) = UF.FindRoot(j) then return trueelse return false
O(n)
Order of height of the tree representing component of i
Same complexity as FindRoot
Same complexity as FindRoot
Analysis
What is the height of the trees formed in terms of their size?
For Quick-union the trees can have height linear in their size. This is responsible for the bad performance of root-finding.
Theorem: for the union-find data structure the height of the tree representing a component is at most log2(s) where s is the number of nodes in the tree.
algorithm init union connected
Quick-find n n 1
Quick-union n n n
Union-find n log(n) log(n) Worst case
Proof
Induction on size of the tree.Height changes only during the union operation. T union of trees T1 and T2
| T | = | T1 | + | T2 |Assume | T1 | | T2 |, h1 = height(T1), h2 = height(T2), h = height(T)Then h = max{h1+1,h2}
Case 1: h1 < h2 .So h = h2 log2(| T2 |) < log2(| T |)
Case 2: h2 h1.So h = h1+1 log2(| T1 |)+log2(2) = log2(2 | T1 |) log2( | T1 | + | T2 | ) = log2(| T |)
≤
≤
≤≤
≤
T1
T2 h=h2h1
T2
T1h1h=1+h1
Induction hypothesis
Induction hypothesis
Improvement
Path Compression: Immediately after computing the root of p set the id of each examined node to point to the root.
0
1 2
3
UF.FintRoot(9)
4 5
6 7
8 9
10 11 12
p
0
1 23
4 5
6
78
9
10
11 12
Does it Work?
Theorem:Starting from an empty data structure, any sequence of M union-find operations on N objects makes at most proportional to N+M log*(N) array accesses.
Without proof.
log*(n) = iterated logarithm of n n log*(n)(1,2] 1(2,4] 2(4,16] 3(16,216] 4
(216,265536] 5
Upgraded Version of Kruskal’s Algorithm
(1) Create sorted ascending list L of edges with respect to weight
(2) UF.Init(|V|)
(3) Set T to the empty set
(4) for i from 0 to |E|-1 do
(a) (u,v) = L[i]
(b) If not UF.Connected(u,v) then
(i) UF.union(u,v)
(ii) Add edge (u,v) to T
(5) Output T
These two steps use the UF.FindRoot routine twice. Can make it more efficient by letting UF.union return a boolean value which is true iff u and v are connected.
Upgraded Version
(1) Create sorted ascending list L of edges with respect to weight
(2) UF.Init(|V|)
(3) Set T to the empty set
(4) for i from 0 to |E|-1 do
(a) (u,v) = L[i]
(b) If not UF.Connected(u,v) then
(i) UF.union(u,v)
(ii) Add edge (u,v) to T
(5) Output T
O(|E| log(|E|)
O(|V|)
O(log(|V|)
O(log(|V|)
O(1)
O(|E| log(|E|) + |V| + |E|log(|E|) ) = O(|E| log|E|) total.