Algorithmics 07 Graph Algorithms.ppt - ut · 11.03.2009 1 Advanced Algorithmics (4AP) Graphs Jaak...
-
Upload
vuongkhuong -
Category
Documents
-
view
220 -
download
0
Transcript of Algorithmics 07 Graph Algorithms.ppt - ut · 11.03.2009 1 Advanced Algorithmics (4AP) Graphs Jaak...
11.03.2009
1
Advanced Algorithmics (4AP)Graphsp
Jaak Vilo
2009 Spring
1Advanced Algorithmics (MTAT.03.238)Jaak Vilo
Chapter 22 Elementary Graph Algorithms
2
y p g
http://www.cc.nctu.edu.tw/~claven/course/Algorithm/
11.03.2009
2
A Simple Metabolic Pathway
Shoshanna Wodak, Jacques van Helden
Hughes, T. R. et al: “Functional Discovery via a Compendium of Expression Profiles”, Cell 102 (2000), 109-126.
Green arrows - upregulationRed arrows - downregulationThickness of arrow represents certainty of direction (up/down)
A complete graph Filter•choose a list of genes
(MATING, marked in red)•filter for these genes plus
neighbouring genes from the graphYAR014C
SST2
URA3YGR250C
PGU1
AGA2ASG7
NRC465
YIL080W
FUS2
YNL279W
YOL154W
YPL156CYPL192C
YML048W-A
STE18
STE24
YJL107C
CUP5
AKR1
VMA8
YEL044W
YER050C
MFA1STE2
BAR1
MFA2
AGA1
AFG3FUS1
FKS1
FUS3
VCX1
ADR1
ICL1YLR042C
YNR067C
HOG1
FIG1
KSS1
RAD6
STE6
RAS2
RPD3
CRS4
KAR4
STE11
STE12
GPA1
STE18
STE4
STE5
STE7
TUP1
YER044C
AFR1
SHE4
CMK2PHO89
RAD16
CYC8 QCR2SWI4NPR2
Mutation network =4
11.03.2009
3
CUP5
SST2
DIG1
KIN3
RRP6
STE6RTS1
FUS3
GPA1YMR029C AGA2YMR031W-A
ADE2
AFG3
YAR064W
CHS3
VAP1
ICS2
YCLX09W
YDL009C
PMT1
THI13
ADR1
YDR249C PAM1
YDR275W
HXT7
HXT6 YDR366CYDR534C
YEL071W
MNN1
ICL1
RNR1
YER130C
SPI1 DMC1
HSP12
NIL1
GSC2
MUP1
YGR138C
YGR250C
YHR097C YHR116W
YHR145C
YIL096C
YIL117C
RHO3
YIL122W FKH1
RPL17B
YJL217W
DAN1
PGU1
GFA1
HAP4
RRN3
STE3
PRY2
KTR2
YLR040C
YLR042C
YLR297W
RPS22B YLR413W
HOF1
DDR48
RNA1
YMR266W
YNL078W
SPC98
YNL133C
YNL217W
RFA2
YNR009W
YNR067C
YOL154W
NDJ1
CDC21
RGA1
YOR248W
YOR338W
GDS1PDE2
FRE5
RPS9A
YPL256C
SUA7
MEP3
YPR156C
RAD6
YAR031W
YBR012C
HIS7
YCLX07W
YCRX18C PCL2
YDR124W
ECM18APA2
YER024W
HOM3
THI5
YGL053W
NRC465
YGR161C YHR055C
YIL037C
YIL080W
YIL082W
HIS5
YJL037W
SAG1
CPA2
AAD10
HYM1
MET1
MID2
YML047C
KAR5
CIK1
FUS2 SCW10
BOP3
YNL279WTHI12
YOL119C
YOR203W
TEA1
ISU1
YPL156C
YPL192CYPL250C
KAR3YIL082W
-A
YML048W-A
YMR085W
STE11
STE5
MAK1
AEP2
AKR1
CMK2
ANP1
RAD16
AFR1
CEM1
UBP10
STE2
ERG2
PHO89ERG6
GAS1 PTP2
GYP1
HIR2HPT1
ISW1
FIG1 ISW2
MAC1MRPL33
MSU1
NPR2
PET111
RAD57
RIP1
ASG7
SCS7
SGS1
MFA1
SHE4AGA1
SWI4
FUS1SWI5
VAC8
VMA8
YAL004W
YAR014C
YEL044W
YER050C
BAR1
MFA2
YER083C
RTT104
YMR014W
YMR293C
YOR078W
BNI1
CLA4
ERG3
FKS1
KAR4
STP4 VCX1HO
URA3
YER135C
KSS1
SKN1YHR122W
YIL060W
NCA3
YJL145W
CYC1 SRL3
SSP120
HSP60
WSC2YPT11
MDH2 WSC3
PFY1
MSB1
SRL1
YOR296W
YPL080C
BBP1
HMG1
HOG1
MED2
QCR2
RAS2
RPD3
RPS24A
CRS4CYC8
STE12 STE18
URA1
URA4
STE24
STE4
STE7SWI6
TUP1
YER044C YJL107C
Mutation network =2
unknown
aam
ribo
mitochondrial
YMR045C
YBL005W-B
YMR050C
RPL25
RPL19B
YAR009C
YBL101W-B
YKL155C
YBR047W
PAU1
RIB1 VID24
YNL111C
YGR257CYBR207W YIL165C
YER138C
YCL019W
YJR027W
YPL060W
AGP1
YAL053W
YCL049C
BOP1
YER126C
YDL110C
YMR093W
RIB5
GLN3
YER175C
APG1
YJR130C
MET1
YNR069C
YPL033C
YPL264C
YMC1
YLR149C
YDR242W
AKR1
CCC2
YIL108W
YDR340WYIL060W
YDR366C
QCR7
ATP7
YET1
YGR294W PAU2
YBR187W
YDR476C
SPF1
FTR1
YFR024C
ARC1ACO1
FET3
ATP11
YOL158C SER3 YOL118C
YPL052W
ALG5 YPL250C
YGL117W
PST1
THI5
YLR099C ERG5
TEA1
YER160C
ECM13
YBR005W
ZTA1 GYP7
YGL101W
YGL179C
COX4
RIM101
HXT4
YIL055C
VTH1
GTT1
GFA1
OAC1
YKL161C
BAS1
YLL064C YPS3
YLR194C
THI7 MET17YLR312C
PLB1
SNO1 GCV2
URA10
YMR295C
YOR052C
HES1
YOR271C
YPL088W
MET16
YFR024C-A
YMR316C-A
YGL009C
YML056C
YHR177W YGL121C
GSC2
YHB1
AGM1
SAM1
SMF1
YOR107W
YPL283C YGR296W
SLT2
YJL108C
SRL3
YNR065C
PTP2
YPR078C
YGR260W YHR055C
GPH1
ARO9
BIO4
TWT1
MSF1'
CSI2
YHR209W
YER083C
YIL011W
HEM15
LYS21
BOP2
HIS3
SSU1
YIL117C
PCA1
STP2
YPS1
YLR334C
CLU1 ZRC1YMR316W
CMK2
YIL121W
YDR262W
YDR380W
FCY21
MRPS9YJL163C
ERG6
VTH2
YJR029W
YBR012W-B YML039W
RLM1
YLR414C
HMGS YMR215W
GAS1ESBP6
ADE12
YLL067CYNR053C
BIO3
YLR101C
MDH2
ARG8
GLC3
YOR173W
YOR220W YOR306C
YOR394W
YPL278C
NORF10
YER004W
MET6
TRP5
YHR213W
YLR049C
SHM2
ACS2
CYT1 YOR225W
ISU2 COT1
YPL272C
YHR214C-B
RPS30A
YLR035C-A
mat
cos
Probability network underlayed in green are groups of genes which are more interconnected. The genes are coloured according to annotation in YPD (“cellular role”). The genes which are more interconnected are involved in the same cellular processes, like mating behaviour (mat, green), aminoacid metabolism (aam, red), cos gene family (cos, light blue), mitochondrial function (mitochondrial, dark blue), ribosome (ribo, purple) and a group of genes of unknown function (unknown, grey)
YCR043C
GCV3
YOR049C
YAL061W
IDH2
YAR075W
YAR073WGLT1
YAR074C
YLR432W
LCB1
TUF1
YOL082W
FUI1
YOR135C
YDR425WPDX3
YOL017W
YIL003WBAP2
HSP26
GPA1
TEC1
YLR290C ATR1
POS5
LYS12
LYS2
YPL273W
YBR043C
YBR145W
YGL224C
FRE3
ARO4
HIS7
ARO3
TRP2
ERG4
BNA1
YOL119C
MAL33
YDR426C
FUS1
KAR4
HIS4
HAP1
ILV2
LEU4
ORT1SER33
ARO2
YCLX11W
ASN2
CCP1 YKR081C
YMR196W
STP4 YDL054C
YJL213W
STB4YML128C UGA3
ERV1
YIL164C
YLR152C
YMR097C
PIP2YDR531W
LYS20
CHO2 YHM1
YDL204W
COS5
COS7
NTH1
UBC5
CLB1
CYS4
ADE17
ARO1
CAF16
YPR059C
HOM2
URA8
YDR222W
YGL157W
HSP78
LAC1
MRPL13
FRE2
TRP4
SER1
MFA1
YRO2
YDR534C
ICS2
YGL261C
BIO2
ECM4
SMF3
YMR090W
DDR48
YMR251W
YNR009W
PAU6
YOL153C
TIR2
YOR383C
YPL222W
USV1
YAL068C
MAM33
YDR539W
YLR218C
IMP4
SDH4
RIP1
YLR395C
GLY1
MIS1
SIT1
GCV1PDR15
YER045C
YER130C
YGR052W
YHL035C
YHL047C YIL088C
CIS3
FRE6
CYB2
IDH1
YTP1
BIO5
YNR061C
YPL279C
MNN1 YER024W
ECM33
ALG7
SUL1
SIR2
YIL056W
CPA2
YJR111C
PSD1
HOM3
LYS1
HIS1
ARG10
HOR2
PDE1
COX6
YLR023C
CUP9
ARG5
YNL276C
ALD5MET10
YHL029C
QCR8YLR183C
ILV5 YOR108W
YER124C
YJR096WSPI1
YDR032CYER158C
HSP12
GDH3
COR1
YBL054W
TPS1
KTR3
AMI1
PHO89RSC6MCD1
COX9
YDL193W
YDL223C
YDL241W
SNQ2
NRG1
YDR070C
RRP1
YDR186C
ADR1
HXT7
YDR391C
ADE8 PHO8
SAM2
EUG1
CVT1
HPA3
PMI40 CHO1
YER037W
SSU81BUR6
FRS2
SCW11
RPL30
YGL184CYGR043C
RME1
CTT1
CLB6
PEX21
COS6YHR033W
YHR087W
YHR097C
YHR112C
YHR122W
YHR143W
YHR162W
OYE2
SRA1
CBR1
YIL123W
MET28
NCA3
YJL212C
ECM17
MRT4
TEF4
TRP3
YKL224CYKR049C
KTR2
YKR071C
KNS1
ISA1
SDH2
YLR089C
TFS1
CTS1
EXG1 ADE13
ERO1
YMR002W
YMR102C
ALD2
YMR181C
YMR191W
DFG5
YHM2
YMR315W
YNL115C
MFA2
CHS1
YNL208W
LAP3
CIT1
LYS9
PLB3
PFK27
YOL030W
MET22
ADE2
SRL1
YOR248WYOR289W
CPA1
TYE7
PDE2
ALD4
ISU1
SVS1
YPL256C
DIP5
SAM3
ATP20
CLB2
MEP3
YPR139C
YPR156C
YPR158W
YPR184W
QCR2
ALD3 DDR2
STE2
STE6
COS4
AGA2
BAR1
AMS1
RPS21A
ARN1 YHR029C
YGR086C
PUR5
HIS5 YKL218C
AFR1
YIR035C
ARG3
YJL200C
BAT2
YPL282C
YLR042C
YNL279W RAD16
YLR327C
CRH1
YLR349W
RPS22BHSP60
SST2
YMR145C
YKL153W
YPL159C
THI11 FUS3
AGA1 ASG7
YOL150C
FIG1
YOL154W
YOL106W
YOR203W
YOR338W
GDH1
YOR382W
SUC2
NORF18
YIL082W-A
PLB2
ECM40
HOR7
LSC2
MET3
UGP1
YLR231C
SCJ1
YOR175C
B
2009/3/11CS4335 Design and Analysis of Algorithms
/WANG LushengPage 15
Find a shortest path from station A to station B.
-need serious thinking to get a correct algorithm.
A
Introduction
• G=(V, E)
– V = vertex set (nodes)
– E = edge set (arcs)
• Graph representation Undirected Graph
16
p p
– Adjacency list
– Adjacency matrix
• Graph search
– Breadth‐first search (BFS)
– Depth‐first search (DFS)• Topological sort
• Strongly connected components
Directed Graph
http://www.cc.nctu.edu.tw/~claven/course/Algorithm/
Graphs• Set of nodes |V| = n
• Set of edges |E| =m– Undirected edges/graph: pairs of nodes {v,w}
– Directed edges/graph: pairs of nodes (v,w)
• Set of neighbors of v : set of nodes connected by an edge with v (directed: in‐neighbors out‐neighbors)
Lecture 1: Preliminaries 17
with v (directed: in neighbors, out neighbors)
• Degree of a node: number of its neighbors
• Path: a sequence of nodes such that every two consecutive nodes constitute an edge
• Length of a path: number of nodes minus 1
• Distance between two nodes: the length of the shortest path between these nodes
• Diameter: the longest distance in the graph
Dariusz Kowalski
Lines, cycles, trees, cliques
Line Cycle
Lecture 1: Preliminaries 18
Clique
Tree
Dariusz Kowalski
11.03.2009
4
A joke: • The boss wants to produce programs to solve the
following two problems
– Euler circuit problem:• given a graph G, find a way to go through each edge exactly
once.
– Hamilton circuit problem:
2009/3/11CS4335 Design and Analysis of Algorithms
/WANG LushengPage 19
Hamilton circuit problem: • given a graph G, find a way to go through each vertex
exactly once.
• The two problems seem to be very similar.
• Person A takes the first problem and person B takes the second.
• Outcome: Person A quickly completes the program, whereas person B works 24 hours per day and is firedafter a few months.
Euler Circuit: The original Konigsberg bridge
2009/3/11CS4335 Design and Analysis of Algorithms
/WANG LushengPage 20
Hamilton Circuit
2009/3/11CS4335 Design and Analysis of Algorithms
/WANG LushengPage 21
A joke (continued):
• Why? no body in the company has taken CS4335.
• Explanation:
– Euler circuit problem can be easily solved in polynomial time.
2009/3/11CS4335 Design and Analysis of Algorithms
/WANG LushengPage 22
– Hamilton circuit problem is proved to be NP‐hard.
– So far, no body in the world can give a polynomial time algorithm for a NP‐hard problem.
– Conjecture: there does not exist polynomial time algorithm for this problem.
2009/3/11CS4335 Design and Analysis of Algorithms
/WANG LushengPage 23 2009/3/11
CS4335 Design and Analysis of Algorithms /WANG Lusheng
Page 24
11.03.2009
8
Complete graph
• Every node is connected to every other node
• Clique – fully connected subgraph of a graph
• How many different subgraphs does a complete graph have?
Subgraph
• Subset of vertices (m) V’ is a subset of V
• Subset of edges (n) E’ is a subset of E, s.t. {u,v} in E’ if u,v both in V)
?• How many?
– 2m different subsets of vertices (= many!)
– Likewise, nr of edges is any subset of set ofedges…
• Nr of different possible graphs of size m,n ishuge
Representation of Graphs
• Adjacency list: (V+E)– Preferred for sparse graph
• |E| << |V|2
– Adj[u] contains all the vertices v such that there is an edge (u, v) E– Weighted graph: w(u, v) is stored with vertex v in Adj[u]
45
– No quick way to determine if a given edge is present in the graph
• Adjacency matrix: (V2)– Preferred for dense graph
– Symmetry for undirected graph
– Weighted graph: store w(u, v) in the (u, v) entry
– Easy to determine if a given edge is present in the graph
Representation For An Undirected Graph
46
Representation For A Directed Graph
47
1 bit per entry
11.03.2009
9
Breadth‐First Search (BFS)
• Graph search: given a source vertex s, explores the edges of G to discover every vertex that is reachablefrom s– Compute the distance (smallest number of edges) from s to each reachable vertex
51
to each reachable vertex
– Produce a breadth‐first tree with root s that contains all reachable vertices
– Compute the shortest path from s to each reachable vertex
• BFS discovers all vertices at distance k from s before discovering any vertices at distance k+1
Data Structure for BFS
• Adjacency list
• color[u] for each vertex– WHITE if u has not been discovered
– BLACK if u and all its adjacent vertices have been discovered
– GRAY if u has been discovered but has some adjacent white vertices
52
GRAY if u has been discovered, but has some adjacent white vertices
• Frontier between discovered and undiscovered vertices
• d[u] for the distance from (source) s to u
• [u] for predecessor of u• FIFO queue Q to manage the set of gray vertices
– Q stores all the gray vertices
Initialization
Set up s and initialize Q
53
Explore all the vertices adjacent tou and update d, and Q
54
11.03.2009
10
Analysis of BFS
• O(V+E)– Each vertex is en‐queued (O(1)) at most once O(V)
– Each adjacency list is scanned at most once O(E)
55
Shortest path
• Print out the vertices on a shortest path from s to v
56
PRINT‐PATH Illustration
PRINT-PATH(G, s, u)PRINT-PATH(G, s, t)
PRINT-PATH(G, s, w)PRINT-PATH(G, s, s)print s
print wprint t
57
print tprint u
Output: s w t u
11.03.2009
12
Depth‐First Search (DFS)
• DFS: search deeper in the graph whenever possible
– Edges are explored out of the most recently discovered vertex v that still has unexplored edges leaving it
– When all of v’s edges have been explored (finished), the search backtracks to explore edges leaving the vertex from which v was di d
71
discovered
– This process continues until we have discovered all the vertices that are reachable from the original source vertex
– If any undiscovered vertices remain, then one of them is selected as a new source and the search is repeated from that source
– The entire process is repeated until all vertices are discovered
• DFS will create a forest of DFS‐trees
11.03.2009
13
Data Structure for DFS
• Adjacency list
• color[u] for each vertex
– WHITE if u has not been discovered
– GRAY if u is discovered but not finished
74
– BLACK if u is finished
• Timestamps: 1 d[u] < f[u] 2|V|
– d[u] records when u is first discovered (and grayed)
– f[u] records when the search finishes examining u’s adjacency list (and blacken u)
• [u] for predecessor of u
76
77
Properties of DFS
• Time complexity: (V+E)– Loops on lines 1‐3 and 5‐7 of DFS: (V)– DFS‐VISIT
• Called exactly once for each vertex
78
• Loops on lines 4‐7 for a vertex v: |Adj[v]|
• Total time =
• DFS results in a forest of trees
• Discovery and finishing times have parenthesis structure
Vv
EvAdj )(|][|
11.03.2009
14
DFS
• Stack
BFS
• Queue
Randomised search
Use:
Randomized Queue
Another Example of DFS
82
83
Parenthesis theorem
In any depth‐first search of a (directed or undirected) graph G =(V, E), for any two vertices u and v, exactly one of the following three conditions holds:
1. the intervals [d[u], f[u]] and [d[v], f[v]] are entirely disjoint, and neither u nor v is a descendant of the other in the depth‐and neither u nor v is a descendant of the other in the depthfirst forest,
2. the interval [d[u], f[u]] is contained entirely within the interval [d[v], f[v]], and u is a descendant of v in a depth‐first tree, or
3. the interval [d[v], f[v]] is contained entirely within the interval [d[u], f[u]], and v is a descendant of u in a depth‐first tree.
11.03.2009
15
Proof
We begin with the case in which d[u] < d[v].
• There are two subcases to consider, according to whether d[v] < f[u] or not.
• The first subcase occurs when d[v] < f[u], so v was discovered while u was still gray. This implies that v is a descendant of u. Moreover, since v was discovered more recently than u, all of its outgoing edges are explored, and v is finished before the search returns to and finishes u In this caseand v is finished, before the search returns to and finishes u. In this case, therefore, the interval [d[v], f[v]] is entirely contained within the interval [d[u], f[u]].
• In the other subcase, f[u] < d[v], and inequality (22.2) implies that the intervals [d[u], f[u]] and [d[v], f[v]] are disjoint.
• Because the intervals are disjoint, neither vertex was discovered while the other was gray, and so neither vertex is a descendant of the other.
• The case in which d[v] < d[u] is similar, with the roles of u and v reversed in the above argument.
Classification of Edges
• Tree edges are edges in the DFS forest. Edge (u, v) is a tree edge if it was first discovered by exploring edge (u, v).– v is WHITE
• Back edges are those edges (u, v) connecting a vertex u to an ancestor v in a DFS tree. Self‐loops, which may occur in directed graphs, are considered to be back edges.– v is GRAY
86
• Forward edges are those non‐tree edges (u, v) containing a vertex u to a descendant v in a DFS tree– v is BLACK and d[u] < d[v]
• Cross edges are all other edges. They can go between vertices in the same tree, as long as one vertex is not an ancestor of the other, or they can go between vertices in different DFS trees.– v is BLACK and d[u] > d[v]
• In a depth‐first search of an undirected graph G, every edge of G is either a tree edge or a back edge. (Theorem 22.10)
11.03.2009
16
Topological Sort
• A topological sort of a directed acyclic graph (DAG) is a linear order of all its vertices such that if G contains an edge (u, v), then u appears before v in the ordering
94
– If the graph is not acyclic, then no linear ordering is possible.
– A topological sort can be viewed as an ordering of its vertices along a horizontal line so that all directed edges go from left to right
• DAG are used in many applications to indicate precedence among events
95
Topological Sort
• (V+E)
96
11.03.2009
17
Lemma – DAG acyclicity
• DAG is acyclic if and only if DFS of G yields no back edges
Suppose that there is a back edge (u, v). Then vertex v is an ancestor of vertex u in the depth‐first forest. There is thus a path from v to u in G, and the back edge (u, v) completes a cycle
Suppose that G contains a cycle c. We show that a DFS of G yields a b k d b h fi b di d i d l ( )
97
back edge. Let v be the first vertex to be discovered in c, and let (u, v) be the preceding edge in c. At time d[v], the vertices of c form a path of white vertices from v to u. By the white‐path theorem (Theorem 22.9), vertex u becomes a descendant of v in the depth‐first forest. Therefore, (u, v) is a back edge.
Theorem Topological sort (22.12)
• TOPOLOGICAL‐SORT(G) produces a topological sort of a directed acyclic graph G– Suppose that DFS is run on a given DAG G to determine finishing times for its vertices. It suffices to show that for any pair of distinct vertices u, v, if there is an edge in G
98
y p , , gfrom u to v, then f[v] < f[u].
• The linear ordering is corresponding to finishing time ordering
– Consider any edge (u, v) explored by DFS(G). When this edge is explored, v cannot be gray (otherwise, (u, v) will be a back edge). Therefore v must be either white or black
• If v is white, v becomes a descendant of u, f[v] < [u] (ex. pants & shoes)
• If v is black, it has already been finished, so that f[v] has already been set f[v] < f[u] (ex. belt & jacket)
Strongly Connected Components
• A strongly connected components of a directed graph G is a maximal set of vertices C V such that for every pair of vertices u and v in C, we have both uv and vu; that is, vertices u and v are reachable from each other
• Use two depth‐first searches, one on G and one on GT
100
– Transpose of G: GT=(V, ET), where ET={(u, v): (v, u) E}
• (V+E)• Component graph GSCC=(VSCC, ESCC) (DAG)
– one vertex for each component
– (u, v) ESCC if there exists at least one directed edge from the corresponding components
Strongly Connected Components Example
101
Strongly Connected Components Algorithm
102
11.03.2009
18
Why does strongly connectedcomponent method work?
• Seel CLRS (2‐3 pages)
• http://www.cs.berkeley.edu/~vazirani/s99cs170/notes/lec12.pdf
…
Minimum Spanning Tree
• Definition: Given an undirected graph, and for each edge(v, u) E, we have a weight w(u, v) specifying the cost to connect u and v. Find an acyclic subset T E that connects all of the vertices and whose total weight is minimized– vuwTw ),()(
106
– May have more than one MST with the same weight
• Two classic algorithms: O(E lg V) Greedy Algorithms– Kruskal’s algorithm
– Prim’s algorithm
Tvu
vuwTw),(
),()(
Equal weights in left graph (a)
11.03.2009
20
Growing a Minimum Spanning Tree (MST)
• Generic algorithm
– Grow MST one edge at a time
– Manage a set of edges A, maintaining the following loop invariant:
115
following loop invariant:
• Prior to each iteration, A is a subset of some MST
– At each iteration, we determine an edge (u, v) that can be added to A without violating this invariant
• A {(u, v)} is also a subset of a MST
• (u, v) is called a safe edge for A
GENERIC‐MST
116
–Loop in lines 2‐4 is executed |V| ‐ 1 times
• Any MST tree contains |V| ‐ 1 edges
• The execution time depends on how to find a safe edge
How to Find A Safe Edge?
• Theorem. Let A be a subset of E that is included in some MST, let (S, V‐S) be any cut of G that respectsA, and let (u, v) be a light edge crossing (S, V‐S). Then edge (u, v) is safe for A
117
– Cut (S, V‐S): a partition of V
– Crossing edge: one endpoint in S and the other in V‐S
– A cut respects a set of A of edges if no edges in A crosses the cut
– A light edge crossing a cut if its weight is the minimum of any edge crossing the cut
118
Illustration of Theorem 23.1
119
• A={(a,b}, (c, i}, (h, g}, {g, h}}
• S={a, b, c, i, e}; V-S = {h, g, f, d} many kinds of cuts satisfying the requirements of Theorem 23.1
• (c, f) is the light edges crossing S and V-S and will be a safe edge
Proof of Theorem 23.1
• Let T be a MST that includes A, and assume T does not contain the light edge (u, v), since if it does, we are done.
• Construct another MST T’ that includes A {(u, v)} from T– Next slide
– T’ = T – {(x y) (u v)}
120
– T = T {(x, y) (u, v)}
– T’ is also a MST since W(T’) = W(T) – w(x, y) + w(u, v) W(T)
• (u, v) is actually a safe edge for A
– Since A T and (x, y) A A T’
– A {(u, v)} T’
11.03.2009
21
121
Properties of GENERIC‐MST
• As the algorithm proceeds, the set A is always acyclic
• GA=(V, A) is a forest, and each of the connected component of GA is a tree
• Any safe edge (u, v) for A connects distinct
122
Any safe edge (u, v) for A connects distinct component of GA, since A {(u, v)} must be acyclic
• Corollary 23.2. Let A be a subset of E that is included in some MST, and let C = (VC, EC) be a connected components (tree) in the forest GA=(V, A). If (u, v) is a light edge connecting C to some other components in GA, then (u, v) is safe for A
The Algorithms of Kruskal and Prim
• Kruskal’s Algorithm
– A is a forest
– The safe edge added to A is always a least‐weight edge in the graph that connects two distinct
123
edge in the graph that connects two distinct components
• Prim’s Algorithm
– A forms a single tree
– The safe edge added to A is always a least‐weight edge connecting the tree to a vertex not in the tree
Prim’s Algorithm
• The edges in the set A always forms a single tree
• The tree starts from an arbitrary root vertex r and grows until the tree spans all the vertices in V
• At each step, a light edge is added to the tree A that
124
At each step, a light edge is added to the tree A that connects A to an isolated vertex of GA=(V, A)
• Greedy since the tree is augmented at each step with an edge that contributes the minimum amount possible to the tree’s weight
125
Prim’s Algorithm (Cont.)
• How to efficiently select the safe edge to be added to the tree?
– Use a min‐priority queue Q that stores all vertices not in the tree
126
not in the tree
• Based on key[v], the minimum weight of any edge connecting v to a vertex in the tree
– Key[v] = if no such edge
• [v] = parent of v in the tree• A = {(v, [v]): vV‐{r}‐Q} finally Q = empty
11.03.2009
22
Key idea of Prim’s algorithm
Select a vertex to be a tree‐node
while (there are non‐tree vertices)
{
if (there is no edge connecting a tree node with
6 4
5
214
ub
a
a non‐tree node)
return “no spanning tree”
select an edge of minimum weight between a tree node and a non‐tree node
add the selected edge and its new vertex to the tree
}
return tree
2
158
1014
3
vc
d
f
Lines 1‐5 initialize the min‐heap (MH) to contain all vertices. Distances for all vertices, except r, are set to infinity.
r is the starting vertex of the TThe T so far is empty
Prim’s Algorithm
1. for each u V2. do D [u ] 3. D[ r ] 0
4. MHmake‐heap(D,V, {})//No edges5. T6
Add the closest vertex and edge to current T
Get all adjacent vertices v of u,update D of each non‐tree vertex adjacent to u
Store the current minimum weight edge and updated distance in the MH
6.7. while MH do8. (u,e ) MH.extractMin() 9. add (u,e) to T10. for each v Adjacent (u )11. do if v MH && w( u, v ) < D [ v ]12. then D [ v ] w (u, v) 13. MH.decreaseDistance
( D[v], v, (u,v )) 14. return T // T is a MST
129
Properties of MST‐PRIM
• Prior to each iteration of the while loop of lines 6—11
– A = {(v, [v]): vV‐{r}‐Q} – The vertices already placed into the MST are those in V‐Q
130
– For all vertices v Q, if [v]NIL, then key[v] < and key[v] is the weight of a light edge (v, [v]) connecting v to some vertex already placed into the MST
• Line 7: identify a vertex uQ incident on a light edge crossing (V‐Q, Q) add u to V‐Q and (u, [u]) to A
• Lines 8—11: update key and of every vertex v adjacent to u but not in the tree
Illustration of MST‐PRIM
u=a, adj[a] = {b, h}[b] = [h] = akey[b] = 4; key[h]=8
u=b, adj[b] = {a, c, h}[h] = a ; [c] = b
131
[h] = a ; [c] = bkey[h]=8; key[c] = 8
u=c, adj[c] = {b, i, d, f}[d] = [i] = [f] = ckey[h]=8; key[d] = 7; key[i] = 2; key[f] = 4
Performance of MST‐PRIM
• Use binary min‐heap to implement the min‐priority queue Q
– BUILD‐MIN‐HEAP (line 5): O(V)
– The body of while loop is executed |V| times
132
• EXTRACT‐MIN: O(lg V)
– The for loop in lines 8‐11 is executed O(E) times altogether• Line 11: DECREASE‐KEY operation: O(lg V)
– Total performance = O(V lg V + E lg V) = O(E lg V)
• Use Fibonacci heap to implement the min‐priority queue Q
– O(E + V lg V)
11.03.2009
25
Single‐Source Shortest Paths(Chapter 24)
145
( p )
Example: Predictive Mobile text Entry Messaging…
What was the message?
11.03.2009
26
Problem Definition
• Given a weighted, directed graph G=(V, E) with weight function w: ER. The weight of path p=<v0, v1,..., vk> is the sum of the weights of its constituent edges:–
• We define the shortest‐path weight from u to v by
k
iii vvwpw
11 ),()(
151
We define the shortest path weight from u to v by–
• A shortest path from vertex u to vertex v is then defined as any path with w(p)=(u, v)
vupw
vup:)(min{
),( If there is a path from u to v,
Otherwise.
Variants
• Single‐source shortest paths problem – greedy– Finds all the shortest path of vertices reachable from a single source vertex s
• Single‐destination shortest‐path problem– By reversing the direction of each edge in the graph, we can reduce
this problem to a single‐source problem
• Single pair shortest path problem
152
• Single‐pair shortest‐path problem– No algorithm for this problem are known that run asymptotically
faster than the best single‐source algorithm in the worst case
• All‐pairs shortest‐path problem – dynamic programming– Can be solved faster than running the single‐source shortest‐path
problem for each vertex
Optimal Substructure of A Shortest‐Path
• Lemma 24.1 (Subpath of shortest paths are shortest paths). Let p=<v1, v2,..., vk> be a shortest path from vertex v1 to vk, and for any i and j such that 1 i j k let pij = <v1i v2
153
i and j such that 1 i j k, let pij = <v1i, v2,..., vj> be the subpath of p from vertex vi to vj. Then pij is a shortest path from vertex vi to vj.
Negative‐Weight Edges and Cycles
• Cannot contain a negative‐weight cycle
• Of course, a shortest path cannot contain a positive‐weight cycle
154
Relaxation
• For each vertex vV, we maintain an attribute d[v], which is an upper bound on the weight of a shortest path from source s to v. We call d[v] a shortest‐path estimate.
155
Predecessor of v in the shortest path
Relaxation (Cont.)
• Relaxing an edge (u, v) consists of testing whether we can improve the shortest path found so far by going through u and, if so, update d[v] and [v]
156
By Triangle Inequality
11.03.2009
27
Dijkstra’s Algorithm
• Solve the single‐source shortest‐paths problem on a weighted, directed graph and all edge weights are nonnegative
• Data structure
157
– S: a set of vertices whose final shortest‐path weights have already been determined
– Q: a min‐priority queue keyed by their d values
• Idea
– Repeatedly select the vertex uV‐S (kept in Q) with the minimum shortest‐path estimate, adds u to S, and relaxes all edges leaving u
Dijkstra’s Algorithm (Cont.)
158
159
Analysis of Dijkstra’s Algorithm
• Correctness: Theorem 24.6 (Loop invariant)
• Min‐priority queue operations– INSERT (line 3)
– EXTRACT‐MIN( line 5)
(l )
160
– DECREASE‐KEY(line 8)
• Time analysis– Line 4‐8: while loop O(V)
– Line 7‐8: for loop and relaxation |E|
– Running time depends on how to implement min‐priority queue
• Simple array: O(V2+E) = O(V2)
• Binary min‐heap: O((V+E)lg V)
• Fibonacci min‐heap: O(VlgV + E)
http://www.cs.utexas.edu/users/EWD/
• Edsger Wybe Dijkstra was one of the most influential members of computing science's founding generation. Among the domains in which his scientific contributions are fundamental are – algorithm design – programming languages – program design – operating systems – distributed processing – formal specification and verification – design of mathematical arguments
July 3, 2008 161IUCEE: Algorithms
Examples of shortest pathsdepending on start node
11.03.2009
28
All pairs shortest paths
• Diameter of a graph (longest shortest path)
Transitive closure
• Transitive closure of a digraph G is a graph G’ with same vertices, and edge between any u and v from G if there is a path from u to v in G
Transitive closure
G[i][j] and G[j][k] => G[i][k]
Exists link via j
G * G
Complexity…
• V3 operations for V2, V3, … Vv
• O(V4)
• ceiling(log v) times V3
• Can we avoid so many cycles?
11.03.2009
29
i
Paths via 0Paths via 1 (including 0-1, 1-0)…
• How to further improve?
• Test for A[s][i] …
•
11.03.2009
30
Finding the modules
Jüri Reimand: GraphWeb. Genome Informatics, CSHL. Nov 1 2007
IntAct: Protein interactions (PPI), 18773 interactionsIntAct: PPI via orthologs from IntAct, 6705 interactions MEM: gene expression similarity over 89 tumor datasets, 46286 interactionsTransfac: gene regulation data, 5183 interactions
Public datasets for H.sapiens
Finding the modules
Jüri Reimand: GraphWeb. Genome Informatics, CSHL. Nov 1 2007
IntAct: Protein interactions (PPI), 18773 interactionsIntAct: PPI via orthologs from IntAct, 6705 interactions MEM: gene expression similarity over 89 tumor datasets, 46286 interactionsTransfac: gene regulation data, 5183 interactions
Public datasets for H.sapiens
Module evaluation
GO:Transforming growth factor beta signaling pw.embryonic development, gastrulationKEGG: Cell cycle cancers WNT pw
Jüri Reimand: GraphWeb. Genome Informatics, CSHL. Nov 1 2007
GO: Brain developmentPigment granule
Melanine metabolic process
GO: JAK-STAT cascade, Kinase inhibitor activity
Insulin receptor signaling pw.KEGG: Type II diabetes mellitus
KEGG: Cell cycle, cancers, WNT pw.
MCL clustering algorithm
• Markov (Chain Monte Carlo) Clustering
– http://www.micans.org/mcl/
Stijn van Dongen
• Random walks according to edge weights
• Follow the different paths according to their probability
• Regions that are traversed “often” form clusters
Stijn van Dongen
http://www.micans.org/mcl/intro.html
11.03.2009
31
MAXIMUM FLOW
Max-Flow Min-Cut Theorem (Ford Fukerson’s Algorithm)
What is Network Flow ?
Flow network is a directed graph G=(V,E) such that eachedge has a non-negative capacity c(u,v)≥0.
Two distinguished vertices exist in G namely :
S (d t d b ) I d f thi t i 0• Source (denoted by s) : In-degree of this vertex is 0.• Sink (denoted by t) : Out-degree of this vertex is 0.
Flow in a network is an integer-valued function f definedOn the edges of G satisfying 0≤f(u,v)≤c(u,v), for every Edge (u,v) in E.
What is Network Flow ?• Each edge (u,v) has a non-negative capacity c(u,v).
• If (u,v) is not in E assume c(u,v)=0.
• We have source s and sink t.
• Assume that every vertex v in V is on some path from s to t.
Following is an illustration of a network flow:
c(s,v1)=16c(v1,s)=0c(v2,s)=0 …
Conditions for Network Flow
For each edge (u,v) in E, the flow f(u,v) is a real valued functionthat must satisfy following 3 conditions :
Sk S t V f( ) f( )
• Capacity Constraint : u,v V, f(u,v) c(u,v)
• Skew Symmetry : u,v V, f(u,v)= -f(v,u)
• Flow Conservation: u V – {s,t} f(s,v)=0vV
Skew symmetry condition implies that f(u,u)=0.
The Value of a Flow.
The value of a flow is given by :
tvfvsff ),(),(||
The flow into the node is same as flow going out from the node andthus the flow is conserved. Also the total amount of flow from source s = total amount of flow into the sink t.
VvVv
Example of a flow
s
1
2
t
10, 9 8,8
1,1
10,76, 6
Table illustrating Flows and Capacity across different edges of graph above:
Capacity
Flow
fs,1 = 9 , cs,1 = 10 (Valid flow since 10 > 9)
fs,2 = 6 , cs,2 = 6 (Valid flow since 6 ≥ 6)
f1,2 = 1 , c1,2 = 1 (Valid flow since 1 ≥ 1)
f1,t = 8 , c1,t = 8 (Valid flow since 8 ≥ 8)f2,t = 7 , c2,t = 10 (Valid flow since 10 > 7)
Table illustrating Flows and Capacity across different edges of graph above:
The flow across nodes 1 and 2 are also conserved as flow into them = flow out.
11.03.2009
32
The Maximum Flow Problem
Given a Graph G (V,E) such that:
xi,j = flow on edge (i,j)ui,j= capacity of edge (i,j)s = source nodet = sink node
In simple terms maximizethe s to t flow, while ensuringthat the flow is feasible.
Maximize v
Subject To Σjxij - Σjxji = 0 for each i ≠s,t
Σjxsj = v
0 ≤ xij ≤ uij for all (i,j) E.
Cuts of Flow Networks
A Cut in a network is a partition of V into S and T (T=V-S) such that s (source) is in S and t (target) is in T.
10 15
9
1015
2 5
Cut
s 3
4
6
7
t
15
5
30
15
10
8
15
6 10
10
10154
4
Capacity of Cut (S,T)
TvSu
vucTSc,
),(),(
10 15
9
1015
2 5
Cut
s 3
4
6
7
t
15
5
30
15
10
8
15
6 10
10
10154
4
Capacity = 30
Min Cut
Min s-t Cut
Min s-t cut (Also called as a Min Cut) is a cut of minimum capacity
2 5
10 15
9
10154
s 3
4
6
7
t
15
5
30
15
8
6 10
10
4
Capacity = 28
MJ2
Flow of Min Cut (Weak Duality)
Let f be the flow and let (S,T) be a cut. Then | f | ≤ CAP(S,T).
In maximum flow, minimum cut problems forward edges are fullor saturated and the backward edges are empty because of themaximum flow. Thus maximum flow is equal to capacity of cut.This is referred to as weak duality.
P f
0
Proof :
s
t
S T
7
8
Methods
Max-Flow Min-Cut Theorem
• The Ford-Fulkerson Method
• The Preflow-Push Method
Slide 190
MJ2 The vertices in S are colored in green.The vertices in T are colored in grey.The edges from S to T whose sum of capacity is minimum are colored in pink.Mayank Joshi, 1/27/2008
11.03.2009
33
The Ford-Fulkerson Method• Try to improve the flow, until we reach the maximum value of the flow
• The residual capacity of the network with a flow f is given by:
The residual capacity (rc) of an edge (i,j) equals c(i,j) – f(i,j) when (i,j)is a forward edge, and equals f(i,j) when (i,j) is a backward edge. Moreover the residual capacity of an edge is always non-negative.
)()()( vufvucvuc f ),(),(),( vufvucvuc f
s
1
2
t
10, 9 8,8
1,1
10,76, 6
Original Network
s
1
2
t
1 0
0
30
8
6
9
7
1
Residual Network
The Ford-Fulkerson Method
Beginx := 0; // x is the flow.create the residual network G(x);while there is some directed path from s to t in G(x) dobegin
let P be a path from s to t in G(x);∆:= δ(P);send ∆units of flow along P; update the r's;
endend {the flow x is now maximum}.
Click To See Ford Fulkerson’s Algorithm In Action (Animation)
Augmenting Paths ( A Useful Concept )
An augmenting path p is a simple path from s to t on a residual networkthat is an alternating sequence of vertices and edges of the forms,e1,v1,e2,v2,...,ek,t in which no vertex is repeated and no forward edgeis saturated and no backward edge is free.
Definition:
Characteristics of augmenting paths:
• We can put more flow from s to t through p.• The edges of residual network are the edges on which residual capacity
is positive.
• We call the maximum capacity by which we can increase the flow on pthe residual capacity of p.
}on is ),( :),(min{)( pvuvucpc ff
FORDFULKERSON(G,E,s,t)
FOREACH e E
f(e) 0
Gf residual graph
WHILE (there exists augmenting path P)
AUGMENT(f,P)
b bottleneck(P)
FOREACH e P
IF (e E)
// backwards arc
f(e) f(e) + b
The Ford-Fulkerson’s Algorithm
f augment(f, P)
update Gf
ENDWHILE
RETURN f
f(e) f(e) + b
ELSE
// forward arc
f(eR) f(e) - b
RETURN f
Click To See Ford Fulkerson’s Algorithm In Action (Animation)
Proof of correctness of the algorithmLemma: At each iteration all residual capacities are integers.
Proof: It’s true at the beginning. Assume it’s true after the firstk-1 augmentations, and consider augmentation k along path P.
The residual capacity ∆ of P is the smallest residual capacity on P, which is integral.
After updating, we modify the residual capacities by 0 or ∆, and thus residual capacities stay integers.
Theorem: Ford-Fulkerson’s algorithm is finite
Proof: The capacity of each augmenting path is atleast 1. The augmentation reduces the residual capacity of some edge (s,j)and doesn’t increase the residual capacity for some edge (s,i)for any i.So the sum of residual capacities of edges out of s keeps decr-easing, and is bounded below 0.Number of augmentations is O(nC) where C is the largest of thecapacity in the network.
When is the flow optimal ?
A flow f is maximum flow in G if :
(1) The residual network Gf contains no more augmented paths.(2) | f | = c(S,T) for some cut (S,T) (a min-cut)
Proof:
(1) Suppose there is an augmenting path in Gf then it implies that(1) Suppose there is an augmenting path in Gf then it implies that the flow f is not maximum, because there is a path through whichmore data can flow. Thus if flow f is maximum then residual n/wGf will have no more augmented paths.
(2) Let v=Fx(S,T) be the flow from s to t. By assumption v=CAP(S,T)By Weak duality, the maximum flow is at most CAP(S,T). Thus theflow is maximum.
11.03.2009
34
The Ford-Fulkerson Augmenting Path Algorithm for the Maximum Flow Problem
15.082 and 6.855J (MIT OCW)
Ford-Fulkerson Max Flow4
11
2
2
2
3
3
1
s
2
4
5
t2
13
3
This is the original network, plus reversals of the arcs.
Ford-Fulkerson Max Flow4
11
2
2
2
3
3
1
s
2
4
5
t2
13
3
This is the original network, and the original residual network.
4
11
2
2
2
3
3
1
Ford-Fulkerson Max Flow
s
2
4
5
t2
13
Find any s-t path in G(x)
3
4
1
2
3
Ford-Fulkerson Max Flow
11
1
2
2
3
2
1
s
2
4
5
t
11
3
Determine the capacity of the path.
Send units of flow in the path.Update residual capacities.
21
2
3
4
1
2
3
Ford-Fulkerson Max Flow
11
1
2
2
3
2
1
s
2
4
5
t
11
3
Find any s-t path
21
2
3
11.03.2009
35
4
21
1
2
2
1
1
3
Ford-Fulkerson Max Flow
11
1
1
3
2
1
s
2
4
5
t2
1
12
11
3 12
3
Determine the capacity of the path.
Send units of flow in the path.Update residual capacities.
4
21
1
2
2
1
1
3
Ford-Fulkerson Max Flow
11
1
1
3
2
1
s
2
4
5
t2
1
12
11
3 12
3
Find any s-t path
11 1
11
4
2
1
23
Ford-Fulkerson Max Flow
113
2
1
s
2
4
5
t1
1
11
132
3
Determine the capacity of the path.
Send units of flow in the path.Update residual capacities.
11 1
11
4
2
12
23
Ford-Fulkerson Max Flow
113
2
1
s
2
4
5
t1
1
11
132
3
Find any s-t path
12 1 1
11
4
2
12
2
Ford-Fulkerson Max Flow
113
1
1
s
2
4
5
t1
11
2
2
111
3
Determine the capacity of the path.
Send units of flow in the path.Update residual capacities.
2
2 1 1
11
4
2
12
2
Ford-Fulkerson Max Flow
113
1
1
s
2
4
5
t
11
2
2
111
3
Find any s-t path
2
11.03.2009
36
111
11
4
13
2 1 1
3
2 2
Ford-Fulkerson Max Flow
2
1s
2
4
5
t
11
2
2
111
32
Determine the capacity of the path.
Send units of flow in the path.Update residual capacities.
111
11
4
13
2 1 1
3
2 2
Ford-Fulkerson Max Flow
2
1s
2
4
5
t
11
2
2
111
32
There is no s-t path in the residual network. This flow is optimal
111
11
4
13
2 1 1
3
2 2
Ford-Fulkerson Max Flow
2
1s
2
4
5
ts
2
4
5
11
2
2
111
32
These are the nodes that are reachable from node s.
3
Ford-Fulkerson Max Flow1
1
2
2
2
1
2s
2
4
5
t22
3
Here is the optimal flow
Converting Matching to Network Flow
ts
July 3, 2008 215IUCEE: Algorithms