Algorithmics 07 Graph Algorithms.ppt - ut · 11.03.2009 1 Advanced Algorithmics (4AP) Graphs Jaak...

37
11.03.2009 1 Advanced Algorithmics (4AP) Graphs Jaak Vilo 2009 Spring 1 Advanced Algorithmics (MTAT.03.238) Jaak Vilo Chapter 22 Elementary Graph Algorithms 2 http://www.cc.nctu.edu.tw/~claven/course/Algorithm/

Transcript of Algorithmics 07 Graph Algorithms.ppt - ut · 11.03.2009 1 Advanced Algorithmics (4AP) Graphs Jaak...

11.03.2009

1

Advanced Algorithmics (4AP)Graphsp

Jaak Vilo

2009 Spring

1Advanced Algorithmics (MTAT.03.238)Jaak Vilo

Chapter 22 Elementary Graph Algorithms

2

y p g

http://www.cc.nctu.edu.tw/~claven/course/Algorithm/

11.03.2009

2

A Simple Metabolic Pathway

Shoshanna Wodak, Jacques van Helden

Hughes, T. R. et al: “Functional Discovery via a Compendium of Expression Profiles”, Cell 102 (2000), 109-126.

Green arrows - upregulationRed arrows - downregulationThickness of arrow represents certainty of direction (up/down)

A complete graph Filter•choose a list of genes

(MATING, marked in red)•filter for these genes plus

neighbouring genes from the graphYAR014C

SST2

URA3YGR250C

PGU1

AGA2ASG7

NRC465

YIL080W

FUS2

YNL279W

YOL154W

YPL156CYPL192C

YML048W-A

STE18

STE24

YJL107C

CUP5

AKR1

VMA8

YEL044W

YER050C

MFA1STE2

BAR1

MFA2

AGA1

AFG3FUS1

FKS1

FUS3

VCX1

ADR1

ICL1YLR042C

YNR067C

HOG1

FIG1

KSS1

RAD6

STE6

RAS2

RPD3

CRS4

KAR4

STE11

STE12

GPA1

STE18

STE4

STE5

STE7

TUP1

YER044C

AFR1

SHE4

CMK2PHO89

RAD16

CYC8 QCR2SWI4NPR2

Mutation network =4

11.03.2009

3

CUP5

SST2

DIG1

KIN3

RRP6

STE6RTS1

FUS3

GPA1YMR029C AGA2YMR031W-A

ADE2

AFG3

YAR064W

CHS3

VAP1

ICS2

YCLX09W

YDL009C

PMT1

THI13

ADR1

YDR249C PAM1

YDR275W

HXT7

HXT6 YDR366CYDR534C

YEL071W

MNN1

ICL1

RNR1

YER130C

SPI1 DMC1

HSP12

NIL1

GSC2

MUP1

YGR138C

YGR250C

YHR097C YHR116W

YHR145C

YIL096C

YIL117C

RHO3

YIL122W FKH1

RPL17B

YJL217W

DAN1

PGU1

GFA1

HAP4

RRN3

STE3

PRY2

KTR2

YLR040C

YLR042C

YLR297W

RPS22B YLR413W

HOF1

DDR48

RNA1

YMR266W

YNL078W

SPC98

YNL133C

YNL217W

RFA2

YNR009W

YNR067C

YOL154W

NDJ1

CDC21

RGA1

YOR248W

YOR338W

GDS1PDE2

FRE5

RPS9A

YPL256C

SUA7

MEP3

YPR156C

RAD6

YAR031W

YBR012C

HIS7

YCLX07W

YCRX18C PCL2

YDR124W

ECM18APA2

YER024W

HOM3

THI5

YGL053W

NRC465

YGR161C YHR055C

YIL037C

YIL080W

YIL082W

HIS5

YJL037W

SAG1

CPA2

AAD10

HYM1

MET1

MID2

YML047C

KAR5

CIK1

FUS2 SCW10

BOP3

YNL279WTHI12

YOL119C

YOR203W

TEA1

ISU1

YPL156C

YPL192CYPL250C

KAR3YIL082W

-A

YML048W-A

YMR085W

STE11

STE5

MAK1

AEP2

AKR1

CMK2

ANP1

RAD16

AFR1

CEM1

UBP10

STE2

ERG2

PHO89ERG6

GAS1 PTP2

GYP1

HIR2HPT1

ISW1

FIG1 ISW2

MAC1MRPL33

MSU1

NPR2

PET111

RAD57

RIP1

ASG7

SCS7

SGS1

MFA1

SHE4AGA1

SWI4

FUS1SWI5

VAC8

VMA8

YAL004W

YAR014C

YEL044W

YER050C

BAR1

MFA2

YER083C

RTT104

YMR014W

YMR293C

YOR078W

BNI1

CLA4

ERG3

FKS1

KAR4

STP4 VCX1HO

URA3

YER135C

KSS1

SKN1YHR122W

YIL060W

NCA3

YJL145W

CYC1 SRL3

SSP120

HSP60

WSC2YPT11

MDH2 WSC3

PFY1

MSB1

SRL1

YOR296W

YPL080C

BBP1

HMG1

HOG1

MED2

QCR2

RAS2

RPD3

RPS24A

CRS4CYC8

STE12 STE18

URA1

URA4

STE24

STE4

STE7SWI6

TUP1

YER044C YJL107C

Mutation network =2

unknown

aam

ribo

mitochondrial

YMR045C

YBL005W-B

YMR050C

RPL25

RPL19B

YAR009C

YBL101W-B

YKL155C

YBR047W

PAU1

RIB1 VID24

YNL111C

YGR257CYBR207W YIL165C

YER138C

YCL019W

YJR027W

YPL060W

AGP1

YAL053W

YCL049C

BOP1

YER126C

YDL110C

YMR093W

RIB5

GLN3

YER175C

APG1

YJR130C

MET1

YNR069C

YPL033C

YPL264C

YMC1

YLR149C

YDR242W

AKR1

CCC2

YIL108W

YDR340WYIL060W

YDR366C

QCR7

ATP7

YET1

YGR294W PAU2

YBR187W

YDR476C

SPF1

FTR1

YFR024C

ARC1ACO1

FET3

ATP11

YOL158C SER3 YOL118C

YPL052W

ALG5 YPL250C

YGL117W

PST1

THI5

YLR099C ERG5

TEA1

YER160C

ECM13

YBR005W

ZTA1 GYP7

YGL101W

YGL179C

COX4

RIM101

HXT4

YIL055C

VTH1

GTT1

GFA1

OAC1

YKL161C

BAS1

YLL064C YPS3

YLR194C

THI7 MET17YLR312C

PLB1

SNO1 GCV2

URA10

YMR295C

YOR052C

HES1

YOR271C

YPL088W

MET16

YFR024C-A

YMR316C-A

YGL009C

YML056C

YHR177W YGL121C

GSC2

YHB1

AGM1

SAM1

SMF1

YOR107W

YPL283C YGR296W

SLT2

YJL108C

SRL3

YNR065C

PTP2

YPR078C

YGR260W YHR055C

GPH1

ARO9

BIO4

TWT1

MSF1'

CSI2

YHR209W

YER083C

YIL011W

HEM15

LYS21

BOP2

HIS3

SSU1

YIL117C

PCA1

STP2

YPS1

YLR334C

CLU1 ZRC1YMR316W

CMK2

YIL121W

YDR262W

YDR380W

FCY21

MRPS9YJL163C

ERG6

VTH2

YJR029W

YBR012W-B YML039W

RLM1

YLR414C

HMGS YMR215W

GAS1ESBP6

ADE12

YLL067CYNR053C

BIO3

YLR101C

MDH2

ARG8

GLC3

YOR173W

YOR220W YOR306C

YOR394W

YPL278C

NORF10

YER004W

MET6

TRP5

YHR213W

YLR049C

SHM2

ACS2

CYT1 YOR225W

ISU2 COT1

YPL272C

YHR214C-B

RPS30A

YLR035C-A

mat

cos

Probability network underlayed in green are groups of genes which are more interconnected. The genes are coloured according to annotation in YPD (“cellular role”). The genes which are more interconnected are involved in the same cellular processes, like mating behaviour (mat, green), aminoacid metabolism (aam, red), cos gene family (cos, light blue), mitochondrial function (mitochondrial, dark blue), ribosome (ribo, purple) and a group of genes of unknown function (unknown, grey)

YCR043C

GCV3

YOR049C

YAL061W

IDH2

YAR075W

YAR073WGLT1

YAR074C

YLR432W

LCB1

TUF1

YOL082W

FUI1

YOR135C

YDR425WPDX3

YOL017W

YIL003WBAP2

HSP26

GPA1

TEC1

YLR290C ATR1

POS5

LYS12

LYS2

YPL273W

YBR043C

YBR145W

YGL224C

FRE3

ARO4

HIS7

ARO3

TRP2

ERG4

BNA1

YOL119C

MAL33

YDR426C

FUS1

KAR4

HIS4

HAP1

ILV2

LEU4

ORT1SER33

ARO2

YCLX11W

ASN2

CCP1 YKR081C

YMR196W

STP4 YDL054C

YJL213W

STB4YML128C UGA3

ERV1

YIL164C

YLR152C

YMR097C

PIP2YDR531W

LYS20

CHO2 YHM1

YDL204W

COS5

COS7

NTH1

UBC5

CLB1

CYS4

ADE17

ARO1

CAF16

YPR059C

HOM2

URA8

YDR222W

YGL157W

HSP78

LAC1

MRPL13

FRE2

TRP4

SER1

MFA1

YRO2

YDR534C

ICS2

YGL261C

BIO2

ECM4

SMF3

YMR090W

DDR48

YMR251W

YNR009W

PAU6

YOL153C

TIR2

YOR383C

YPL222W

USV1

YAL068C

MAM33

YDR539W

YLR218C

IMP4

SDH4

RIP1

YLR395C

GLY1

MIS1

SIT1

GCV1PDR15

YER045C

YER130C

YGR052W

YHL035C

YHL047C YIL088C

CIS3

FRE6

CYB2

IDH1

YTP1

BIO5

YNR061C

YPL279C

MNN1 YER024W

ECM33

ALG7

SUL1

SIR2

YIL056W

CPA2

YJR111C

PSD1

HOM3

LYS1

HIS1

ARG10

HOR2

PDE1

COX6

YLR023C

CUP9

ARG5

YNL276C

ALD5MET10

YHL029C

QCR8YLR183C

ILV5 YOR108W

YER124C

YJR096WSPI1

YDR032CYER158C

HSP12

GDH3

COR1

YBL054W

TPS1

KTR3

AMI1

PHO89RSC6MCD1

COX9

YDL193W

YDL223C

YDL241W

SNQ2

NRG1

YDR070C

RRP1

YDR186C

ADR1

HXT7

YDR391C

ADE8 PHO8

SAM2

EUG1

CVT1

HPA3

PMI40 CHO1

YER037W

SSU81BUR6

FRS2

SCW11

RPL30

YGL184CYGR043C

RME1

CTT1

CLB6

PEX21

COS6YHR033W

YHR087W

YHR097C

YHR112C

YHR122W

YHR143W

YHR162W

OYE2

SRA1

CBR1

YIL123W

MET28

NCA3

YJL212C

ECM17

MRT4

TEF4

TRP3

YKL224CYKR049C

KTR2

YKR071C

KNS1

ISA1

SDH2

YLR089C

TFS1

CTS1

EXG1 ADE13

ERO1

YMR002W

YMR102C

ALD2

YMR181C

YMR191W

DFG5

YHM2

YMR315W

YNL115C

MFA2

CHS1

YNL208W

LAP3

CIT1

LYS9

PLB3

PFK27

YOL030W

MET22

ADE2

SRL1

YOR248WYOR289W

CPA1

TYE7

PDE2

ALD4

ISU1

SVS1

YPL256C

DIP5

SAM3

ATP20

CLB2

MEP3

YPR139C

YPR156C

YPR158W

YPR184W

QCR2

ALD3 DDR2

STE2

STE6

COS4

AGA2

BAR1

AMS1

RPS21A

ARN1 YHR029C

YGR086C

PUR5

HIS5 YKL218C

AFR1

YIR035C

ARG3

YJL200C

BAT2

YPL282C

YLR042C

YNL279W RAD16

YLR327C

CRH1

YLR349W

RPS22BHSP60

SST2

YMR145C

YKL153W

YPL159C

THI11 FUS3

AGA1 ASG7

YOL150C

FIG1

YOL154W

YOL106W

YOR203W

YOR338W

GDH1

YOR382W

SUC2

NORF18

YIL082W-A

PLB2

ECM40

HOR7

LSC2

MET3

UGP1

YLR231C

SCJ1

YOR175C

B

2009/3/11CS4335 Design and Analysis of Algorithms 

/WANG LushengPage 15

Find a shortest path from station A to station B.

-need serious thinking to get a correct algorithm.

A

Introduction

• G=(V, E)

– V = vertex set (nodes)

– E = edge set (arcs)

• Graph representation Undirected Graph

16

p p

– Adjacency list

– Adjacency matrix

• Graph search

– Breadth‐first search (BFS)

– Depth‐first search (DFS)• Topological sort

• Strongly connected components

Directed Graph

http://www.cc.nctu.edu.tw/~claven/course/Algorithm/

Graphs• Set of nodes |V| = n

• Set of edges |E| =m– Undirected edges/graph: pairs of nodes {v,w}

– Directed edges/graph: pairs of nodes (v,w)

• Set of neighbors of v : set of nodes connected by an edge with v (directed: in‐neighbors out‐neighbors)

Lecture 1: Preliminaries 17

with v (directed: in neighbors, out neighbors)

• Degree of a node: number of its neighbors

• Path: a sequence of nodes such that every two consecutive nodes constitute an edge

• Length of a path: number of nodes minus 1

• Distance between two nodes: the length of the shortest path between these nodes

• Diameter: the longest distance in the graph

Dariusz Kowalski

Lines, cycles, trees, cliques

Line Cycle

Lecture 1: Preliminaries 18

Clique

Tree

Dariusz Kowalski

11.03.2009

4

A joke:  • The boss wants to produce programs to solve the 

following two problems

– Euler circuit  problem:• given a graph G, find a way to go through each edge exactly 

once.

– Hamilton circuit problem:

2009/3/11CS4335 Design and Analysis of Algorithms 

/WANG LushengPage 19

Hamilton circuit problem: • given a graph G, find a way to go through each vertex  

exactly once.

• The two problems seem to be very similar.

• Person A takes the first problem and person B takes the second.

• Outcome: Person A quickly completes the program, whereas person B works 24 hours per day and is firedafter a few months.  

Euler Circuit: The original Konigsberg bridge 

2009/3/11CS4335 Design and Analysis of Algorithms 

/WANG LushengPage 20

Hamilton Circuit   

2009/3/11CS4335 Design and Analysis of Algorithms 

/WANG LushengPage 21

A joke (continued):  

• Why?  no body in the company has taken CS4335.

• Explanation:

– Euler circuit  problem can be easily solved in polynomial time.

2009/3/11CS4335 Design and Analysis of Algorithms 

/WANG LushengPage 22

– Hamilton circuit  problem is proved to be NP‐hard. 

– So far, no body in the world can give a polynomial time algorithm for a NP‐hard problem.

– Conjecture: there does not exist polynomial time algorithm for this problem. 

2009/3/11CS4335 Design and Analysis of Algorithms 

/WANG LushengPage 23 2009/3/11

CS4335 Design and Analysis of Algorithms /WANG Lusheng

Page 24

11.03.2009

5

2009/3/11CS4335 Design and Analysis of Algorithms 

/WANG LushengPage 25

11.03.2009

6

11.03.2009

7

11.03.2009

8

Complete graph

• Every node is connected to every other node

• Clique – fully connected subgraph of a graph

• How many different subgraphs does a complete graph have?

Subgraph

• Subset of vertices (m)      V’ is a subset of V

• Subset of edges (n)   E’ is a subset of E, s.t. {u,v} in E’ if u,v both in V)

?• How many? 

– 2m different subsets of vertices (= many!)

– Likewise, nr of edges is any subset of set ofedges…

• Nr of different possible graphs of size m,n ishuge

Representation of Graphs

• Adjacency list: (V+E)– Preferred for sparse graph

• |E| << |V|2

– Adj[u] contains all the vertices v such that there is an edge (u, v) E– Weighted graph: w(u, v) is stored with vertex v in Adj[u]

45

– No quick way to determine if a given edge is present in the graph

• Adjacency matrix: (V2)– Preferred for dense graph

– Symmetry for undirected graph

– Weighted graph: store w(u, v) in the (u, v) entry

– Easy to determine if a given edge is present in the graph

Representation For An Undirected Graph

46

Representation For A Directed Graph

47

1 bit per entry

11.03.2009

9

Breadth‐First Search (BFS)

• Graph search: given a source vertex s, explores the edges of G to discover every vertex that is reachablefrom s– Compute the distance (smallest number of edges) from s to each reachable vertex

51

to each reachable vertex

– Produce a breadth‐first tree with root s that contains all reachable vertices

– Compute the shortest path from s to each reachable vertex

• BFS discovers all vertices at distance k from s before discovering any vertices at distance k+1

Data Structure for BFS

• Adjacency list

• color[u] for each vertex– WHITE if u has not been discovered

– BLACK if u and all its adjacent vertices have been discovered

– GRAY if u has been discovered but has some adjacent white vertices

52

GRAY if u has been discovered, but has some adjacent white vertices

• Frontier between discovered and undiscovered vertices

• d[u] for the distance from (source) s to u

• [u] for predecessor of u• FIFO queue Q to manage the set of gray vertices

– Q stores all the gray vertices 

Initialization

Set up s and initialize Q

53

Explore all the vertices adjacent tou and update d, and Q

54

11.03.2009

10

Analysis of BFS

• O(V+E)– Each vertex is en‐queued (O(1)) at most once  O(V)

– Each adjacency list is scanned at most once  O(E)

55

Shortest path

• Print out the vertices on a shortest path from s to v

56

PRINT‐PATH Illustration

PRINT-PATH(G, s, u)PRINT-PATH(G, s, t)

PRINT-PATH(G, s, w)PRINT-PATH(G, s, s)print s

print wprint t

57

print tprint u

Output: s w t u

11.03.2009

11

11.03.2009

12

Depth‐First Search (DFS)

• DFS: search deeper in the graph whenever possible

– Edges are explored out of the most recently discovered vertex v that still has unexplored edges leaving it

– When all of v’s edges have been explored (finished), the search backtracks to explore edges leaving the vertex from which v was di d

71

discovered

– This process continues until we have discovered all the vertices that are reachable from the original source vertex

– If any undiscovered vertices remain, then one of them is selected as a new source and the search is repeated from that source

– The entire process is repeated until all vertices are discovered

• DFS will create a forest of DFS‐trees

11.03.2009

13

Data Structure for DFS

• Adjacency list

• color[u] for each vertex

– WHITE if u has not been discovered

– GRAY if u is discovered but not finished

74

– BLACK if u is finished

• Timestamps: 1  d[u] < f[u]  2|V|

– d[u] records when u is first discovered (and grayed)

– f[u] records when the search finishes examining u’s adjacency list (and blacken u)

• [u] for predecessor of u

76

77

Properties of DFS

• Time complexity: (V+E)– Loops on lines 1‐3 and 5‐7 of DFS: (V)– DFS‐VISIT

• Called exactly once for each vertex

78

• Loops on lines 4‐7 for a vertex v: |Adj[v]|

• Total time =

• DFS results in a forest of trees

• Discovery and finishing times have parenthesis structure

Vv

EvAdj )(|][|

11.03.2009

14

DFS

• Stack

BFS

• Queue

Randomised search

Use:

Randomized Queue

Another Example of DFS

82

83

Parenthesis theorem

In any depth‐first search of a (directed or undirected) graph G =(V, E), for any two vertices u and v, exactly one of the following three conditions holds:

1. the intervals [d[u], f[u]] and [d[v], f[v]] are entirely disjoint, and neither u nor v is a descendant of the other in the depth‐and neither u nor v is a descendant of the other in the depthfirst forest,

2. the interval [d[u], f[u]] is contained entirely within the interval [d[v], f[v]], and u is a descendant of v in a depth‐first tree, or

3. the interval [d[v], f[v]] is contained entirely within the interval [d[u], f[u]], and v is a descendant of u in a depth‐first tree.

11.03.2009

15

Proof

We begin with the case in which d[u] < d[v]. 

• There are two subcases to consider, according to whether d[v] < f[u] or not. 

• The first subcase occurs when d[v] < f[u], so v was discovered while u was still gray.  This implies that v is a descendant of u. Moreover, since v was discovered more recently than u, all of its outgoing edges are explored, and v is finished before the search returns to and finishes u In this caseand v is finished, before the search returns to and finishes u. In this case, therefore, the interval [d[v], f[v]] is entirely contained within the interval [d[u], f[u]]. 

• In the other subcase, f[u] < d[v], and inequality (22.2) implies that the intervals [d[u], f[u]] and [d[v], f[v]] are disjoint.

• Because the intervals are disjoint, neither vertex was discovered while the other was gray, and so neither vertex is a descendant of the other.

• The case in which d[v] < d[u] is similar, with the roles of u and v reversed in the above argument.

Classification of Edges

• Tree edges are edges in the DFS forest. Edge (u, v) is a tree edge if it was first discovered by exploring edge (u, v).– v is WHITE

• Back edges are those edges (u, v) connecting a vertex u to an ancestor v in a DFS tree. Self‐loops, which may occur in directed graphs, are considered to be back edges.– v is GRAY

86

• Forward edges are those non‐tree edges (u, v) containing a vertex u to a descendant v in a DFS tree– v is BLACK and d[u] < d[v]

• Cross edges are all other edges. They can go between vertices in the same tree, as long as one vertex is not an ancestor of the other, or they can go between vertices in different DFS trees.– v is BLACK and d[u] > d[v]

• In a depth‐first search of an undirected graph G, every edge of G is either a tree edge or a back edge. (Theorem 22.10)

11.03.2009

16

Topological Sort

• A topological sort of a directed acyclic graph (DAG) is a linear order of all its vertices such that if G contains an edge (u, v), then u appears before v in the ordering

94

– If the graph is not acyclic, then no linear ordering is possible.

– A topological sort can be viewed as an ordering of its vertices along a horizontal line so that all directed edges go from left to right

• DAG are used in many applications to indicate precedence among events

95

Topological Sort

• (V+E)

96

11.03.2009

17

Lemma – DAG acyclicity

• DAG is acyclic if and only if DFS of G yields no back edges

Suppose that there is a back edge (u, v). Then vertex v is an ancestor of vertex u in the depth‐first forest. There is thus a path from v to u in G, and the back edge (u, v) completes a cycle

Suppose that G contains a cycle c. We show that a DFS of G yields a b k d b h fi b di d i d l ( )

97

back edge. Let v be the first vertex to be discovered in c, and let (u, v) be the preceding edge in c. At time d[v], the vertices of c form a path of white vertices from v to u. By the white‐path theorem (Theorem 22.9), vertex u becomes a descendant of v in the depth‐first forest. Therefore, (u, v) is a back edge.

Theorem Topological sort (22.12)

• TOPOLOGICAL‐SORT(G) produces a topological sort of a directed acyclic graph G– Suppose that DFS is run on a given DAG G to determine finishing times for its vertices. It suffices to show that for any pair of distinct vertices u, v, if there is an edge in G 

98

y p , , gfrom u to v, then f[v] < f[u].

• The linear ordering is corresponding to finishing time ordering

– Consider any edge (u, v) explored by DFS(G). When this edge is explored, v cannot be gray (otherwise, (u, v) will be a back edge). Therefore v must be either white or black

• If v is white, v becomes a descendant of u, f[v] < [u] (ex. pants & shoes)

• If v is black, it has already been finished, so that f[v] has already been set  f[v] < f[u] (ex. belt & jacket)

Strongly Connected Components

• A strongly connected components of a directed graph G is a maximal set of vertices C  V such that for every pair of vertices u and v in C, we have both uv and vu; that is, vertices u and v are reachable from each other

• Use two depth‐first searches, one on G and one on GT

100

– Transpose of G: GT=(V, ET), where ET={(u, v): (v, u) E}

• (V+E)• Component graph GSCC=(VSCC, ESCC) (DAG)

– one vertex for each component

– (u, v) ESCC if there exists at least one directed edge from the corresponding components

Strongly Connected Components Example

101

Strongly Connected Components Algorithm

102

11.03.2009

18

Why does strongly connectedcomponent method work?

• Seel CLRS (2‐3 pages)

• http://www.cs.berkeley.edu/~vazirani/s99cs170/notes/lec12.pdf

Minimum Spanning Tree

• Definition: Given an undirected graph, and for each edge(v, u) E, we have a weight w(u, v) specifying the cost to connect u and v. Find an acyclic subset T  E that connects all of the vertices and whose total weight is minimized– vuwTw ),()(

106

– May have more than one MST with the same weight

• Two classic algorithms: O(E lg V)  Greedy Algorithms– Kruskal’s algorithm

– Prim’s algorithm

Tvu

vuwTw),(

),()(

Equal weights in left graph (a)

11.03.2009

19

and node

11.03.2009

20

Growing a Minimum Spanning Tree (MST)

• Generic algorithm

– Grow MST one edge at a time

– Manage a set of edges A, maintaining the following loop invariant:

115

following loop invariant:

• Prior to each iteration, A is a subset of some MST

– At each iteration, we determine an edge (u, v) that can be added to A without violating this invariant

• A  {(u, v)} is also a subset of a MST

• (u, v) is called a safe edge for A

GENERIC‐MST

116

–Loop in lines 2‐4 is executed |V| ‐ 1 times

• Any MST tree contains |V| ‐ 1 edges

• The execution time depends on how to find a safe edge

How to Find A Safe Edge?

• Theorem. Let A be a subset of E that is included in some MST, let (S, V‐S) be any cut of G that respectsA, and let (u, v) be a light edge crossing (S, V‐S). Then edge (u, v) is safe for A

117

– Cut (S, V‐S): a partition of V

– Crossing edge: one endpoint in S and the other in V‐S

– A cut respects a set of A of edges if no edges in A crosses the cut

– A light edge crossing a cut if its weight is the minimum of any edge crossing the cut

118

Illustration of Theorem 23.1

119

• A={(a,b}, (c, i}, (h, g}, {g, h}}

• S={a, b, c, i, e}; V-S = {h, g, f, d} many kinds of cuts satisfying the requirements of Theorem 23.1

• (c, f) is the light edges crossing S and V-S and will be a safe edge

Proof of Theorem 23.1

• Let T be a MST that includes A, and assume T does not contain the light edge (u, v), since if it does, we are done.

• Construct another MST T’ that includes A  {(u, v)} from T– Next slide

– T’ = T – {(x y) (u v)}

120

– T  = T  {(x, y)  (u, v)}

– T’ is also a MST since W(T’) = W(T) – w(x, y) + w(u, v) W(T)

• (u, v) is actually a safe edge for A

– Since A  T and (x, y)  A  A  T’

– A  {(u, v)}  T’

11.03.2009

21

121

Properties of GENERIC‐MST

• As the algorithm proceeds, the set A is always acyclic

• GA=(V, A) is a forest, and each of the connected component of GA is a tree

• Any safe edge (u, v) for A connects distinct

122

Any safe edge (u, v) for A connects distinct component of GA, since A  {(u, v)} must be acyclic

• Corollary 23.2. Let A be a subset of E that is included in some MST, and let C = (VC, EC) be a connected components (tree) in the forest GA=(V, A). If (u, v) is a light edge connecting C to some other components in GA, then (u, v) is safe for A

The Algorithms of Kruskal and Prim

• Kruskal’s Algorithm

– A is a forest

– The safe edge added to A is always a least‐weight edge in the graph that connects two distinct

123

edge in the graph that connects  two distinct components

• Prim’s Algorithm

– A forms a single tree

– The safe edge added to A is always a least‐weight edge connecting the tree to a vertex not in the tree

Prim’s Algorithm

• The edges in the set A always forms a single tree

• The tree starts from an arbitrary root vertex r and grows until the tree spans all the vertices in V

• At each step, a light edge is added to the tree A that

124

At each step, a light edge is added to the tree A that connects A to an isolated vertex of GA=(V, A)

• Greedy since the tree is augmented at each step with an edge that contributes the minimum amount possible to the tree’s weight

125

Prim’s Algorithm (Cont.)

• How to efficiently select the safe edge to be added to the tree?

– Use a min‐priority queue Q that stores all vertices not in the tree

126

not in the tree

• Based on key[v], the minimum weight of any edge connecting v to a vertex in the tree

– Key[v] =  if no such edge

• [v] = parent of v in the tree• A = {(v, [v]): vV‐{r}‐Q}  finally Q = empty

11.03.2009

22

Key idea of Prim’s algorithm

Select a vertex to be a tree‐node

while (there are non‐tree vertices) 

{

if (there is no edge connecting a tree node with 

6 4

5

214

ub

a

a non‐tree node)

return “no spanning tree”

select an edge of minimum weight between a tree node and a non‐tree node

add the selected edge and its new vertex to the tree

}

return tree

2

158

1014

3

vc

d

f

Lines 1‐5 initialize the min‐heap (MH) to contain all vertices. Distances for all vertices, except r, are set to infinity.

r is the starting vertex of the TThe T so far is empty

Prim’s Algorithm

1. for each u V2. do D [u ] 3. D[ r ] 0

4. MHmake‐heap(D,V, {})//No edges5. T6

Add the closest vertex and edge to current T

Get all adjacent vertices v of u,update D of each non‐tree vertex adjacent to u

Store the current minimum weight edge and updated distance in the MH

6.7.  while MH do8. (u,e )  MH.extractMin()  9.       add (u,e) to T10.  for each v Adjacent (u )11. do if v MH && w( u, v ) < D [ v ]12.            then D [ v ]   w (u, v) 13.                MH.decreaseDistance

( D[v], v, (u,v )) 14. return T  // T is a MST

129

Properties of MST‐PRIM

• Prior to each iteration of the while loop of lines 6—11

– A = {(v, [v]): vV‐{r}‐Q} – The vertices already placed into the MST are those in V‐Q

130

– For all vertices v Q, if [v]NIL, then key[v] <  and key[v] is the weight of a light edge (v, [v]) connecting v to some vertex already placed into the MST

• Line 7: identify a vertex uQ incident on a light edge crossing (V‐Q, Q)  add u to V‐Q and (u, [u]) to A

• Lines 8—11: update key and  of every vertex v adjacent to u but not in the tree

Illustration of MST‐PRIM

u=a, adj[a] = {b, h}[b] = [h] = akey[b] = 4; key[h]=8

u=b, adj[b] = {a, c, h}[h] = a ; [c] = b

131

[h] = a ; [c] = bkey[h]=8; key[c] = 8

u=c, adj[c] = {b, i, d, f}[d] = [i] = [f] = ckey[h]=8; key[d] = 7; key[i] = 2; key[f] = 4

Performance of MST‐PRIM

• Use binary min‐heap to implement the min‐priority queue Q

– BUILD‐MIN‐HEAP (line 5): O(V)

– The body of while loop is executed |V| times

132

• EXTRACT‐MIN: O(lg V)

– The for loop in lines 8‐11 is executed O(E) times altogether• Line 11: DECREASE‐KEY operation: O(lg V)

– Total performance = O(V lg V + E lg V) = O(E lg V)

• Use Fibonacci heap to implement the min‐priority queue Q

– O(E + V lg V)

11.03.2009

23

11.03.2009

24

See the lecture on trees…

Prim vs Kruskal vs Boruvka

11.03.2009

25

Single‐Source Shortest Paths(Chapter 24)

145

( p )

Example: Predictive Mobile text Entry Messaging…

What was the message?

11.03.2009

26

Problem Definition

• Given a weighted, directed graph G=(V, E) with weight function w: ER. The weight of path p=<v0, v1,..., vk> is the sum of the weights of its constituent edges:–

• We define the shortest‐path weight from u to v by

k

iii vvwpw

11 ),()(

151

We define the shortest path weight from u to v by–

• A shortest path from vertex u to vertex v is then defined as any path with w(p)=(u, v)

 

vupw

vup:)(min{

),( If there is a path from u to v,

Otherwise.

Variants

• Single‐source shortest paths problem – greedy– Finds all the shortest path of vertices reachable from a single source vertex s

• Single‐destination shortest‐path problem– By reversing the direction of each edge in the graph, we can reduce 

this problem to a single‐source problem

• Single pair shortest path problem

152

• Single‐pair shortest‐path problem– No algorithm for this problem are known that run asymptotically 

faster than the best single‐source algorithm in the worst case

• All‐pairs shortest‐path problem – dynamic programming– Can be solved faster than running the single‐source shortest‐path 

problem for each vertex

Optimal Substructure of A Shortest‐Path

• Lemma 24.1 (Subpath of shortest paths are shortest paths). Let p=<v1, v2,..., vk> be a shortest path from vertex v1 to vk, and for any i and j such that 1 i j k let pij = <v1i v2

153

i and j such that 1 i  j  k, let pij = <v1i, v2,..., vj> be the subpath of p from vertex vi to vj. Then pij is a shortest path from vertex vi to vj.

Negative‐Weight Edges and Cycles

• Cannot contain a negative‐weight cycle

• Of course, a shortest path cannot contain a positive‐weight cycle

154

Relaxation

• For each vertex vV, we maintain an attribute d[v], which is an upper bound on the weight of a shortest path from source s to v. We call d[v] a shortest‐path estimate.

155

Predecessor of v in the shortest path

Relaxation (Cont.)

• Relaxing an edge (u, v) consists of testing whether we can improve the shortest path found so far by going through u and, if so, update d[v] and [v]

156

By Triangle Inequality

11.03.2009

27

Dijkstra’s Algorithm

• Solve the single‐source shortest‐paths problem on a weighted, directed graph and all edge weights are nonnegative

• Data structure

157

– S: a set of vertices whose final shortest‐path weights have already been determined

– Q: a min‐priority queue keyed by their d values 

• Idea

– Repeatedly select the vertex uV‐S (kept in Q) with the minimum shortest‐path estimate, adds u to S, and relaxes all edges leaving u

Dijkstra’s Algorithm (Cont.)

158

159

Analysis of Dijkstra’s Algorithm

• Correctness: Theorem 24.6 (Loop invariant)

• Min‐priority queue operations– INSERT (line 3)

– EXTRACT‐MIN( line 5)

(l )

160

– DECREASE‐KEY(line 8)

• Time analysis– Line 4‐8: while loop  O(V)

– Line 7‐8: for loop and relaxation  |E| 

– Running time depends on how to implement min‐priority queue

• Simple array: O(V2+E) = O(V2)

• Binary min‐heap: O((V+E)lg V)

• Fibonacci min‐heap: O(VlgV + E)

http://www.cs.utexas.edu/users/EWD/

• Edsger Wybe Dijkstra was one of the most influential members of computing science's founding generation. Among the domains in which his scientific contributions are fundamental are – algorithm design – programming languages – program design – operating systems – distributed processing – formal specification and verification – design of mathematical arguments 

July 3, 2008 161IUCEE:  Algorithms

Examples of shortest pathsdepending on start node

11.03.2009

28

All pairs shortest paths 

• Diameter of a graph (longest shortest path)

Transitive closure

• Transitive closure of a digraph G is a graph G’ with same vertices, and edge between any u and v from G if there is a path from u to v in G

Transitive closure

G[i][j]  and G[j][k]   => G[i][k]

Exists link via j

G * G 

Complexity…

• V3 operations for V2, V3, … Vv

• O(V4)

• ceiling(log v)  times V3

• Can we avoid so many cycles? 

11.03.2009

29

i

Paths via 0Paths via 1 (including 0-1, 1-0)…

• How to further improve?

• Test for A[s][i] …

11.03.2009

30

Finding the modules

Jüri Reimand: GraphWeb. Genome Informatics, CSHL. Nov 1 2007

IntAct: Protein interactions (PPI), 18773 interactionsIntAct: PPI via orthologs from IntAct, 6705 interactions MEM: gene expression similarity over 89 tumor datasets, 46286 interactionsTransfac: gene regulation data, 5183 interactions

Public datasets for H.sapiens

Finding the modules

Jüri Reimand: GraphWeb. Genome Informatics, CSHL. Nov 1 2007

IntAct: Protein interactions (PPI), 18773 interactionsIntAct: PPI via orthologs from IntAct, 6705 interactions MEM: gene expression similarity over 89 tumor datasets, 46286 interactionsTransfac: gene regulation data, 5183 interactions

Public datasets for H.sapiens

Module evaluation

GO:Transforming growth factor beta signaling pw.embryonic development, gastrulationKEGG: Cell cycle cancers WNT pw

Jüri Reimand: GraphWeb. Genome Informatics, CSHL. Nov 1 2007

GO: Brain developmentPigment granule

Melanine metabolic process

GO: JAK-STAT cascade, Kinase inhibitor activity

Insulin receptor signaling pw.KEGG: Type II diabetes mellitus

KEGG: Cell cycle, cancers, WNT pw.

MCL clustering algorithm

• Markov (Chain Monte Carlo) Clustering

– http://www.micans.org/mcl/

Stijn van Dongen

• Random walks according to edge weights

• Follow the different paths according to their probability

• Regions that are traversed “often” form clusters

Stijn van Dongen

http://www.micans.org/mcl/intro.html

11.03.2009

31

MAXIMUM FLOW

Max-Flow Min-Cut Theorem (Ford Fukerson’s Algorithm)

What is Network Flow ?

Flow network is a directed graph G=(V,E) such that eachedge has a non-negative capacity c(u,v)≥0.

Two distinguished vertices exist in G namely :

S (d t d b ) I d f thi t i 0• Source (denoted by s) : In-degree of this vertex is 0.• Sink (denoted by t) : Out-degree of this vertex is 0.

Flow in a network is an integer-valued function f definedOn the edges of G satisfying 0≤f(u,v)≤c(u,v), for every Edge (u,v) in E.

What is Network Flow ?• Each edge (u,v) has a non-negative capacity c(u,v).

• If (u,v) is not in E assume c(u,v)=0.

• We have source s and sink t.

• Assume that every vertex v in V is on some path from s to t.

Following is an illustration of a network flow:

c(s,v1)=16c(v1,s)=0c(v2,s)=0 …

Conditions for Network Flow

For each edge (u,v) in E, the flow f(u,v) is a real valued functionthat must satisfy following 3 conditions :

Sk S t V f( ) f( )

• Capacity Constraint : u,v V, f(u,v) c(u,v)

• Skew Symmetry : u,v V, f(u,v)= -f(v,u)

• Flow Conservation: u V – {s,t} f(s,v)=0vV

Skew symmetry condition implies that f(u,u)=0.

The Value of a Flow.

The value of a flow is given by :

tvfvsff ),(),(||

The flow into the node is same as flow going out from the node andthus the flow is conserved. Also the total amount of flow from source s = total amount of flow into the sink t.

VvVv

Example of a flow

s

1

2

t

10, 9 8,8

1,1

10,76, 6

Table illustrating Flows and Capacity across different edges of graph above:

Capacity

Flow

fs,1 = 9 , cs,1 = 10 (Valid flow since 10 > 9)

fs,2 = 6 , cs,2 = 6 (Valid flow since 6 ≥ 6)

f1,2 = 1 , c1,2 = 1 (Valid flow since 1 ≥ 1)

f1,t = 8 , c1,t = 8 (Valid flow since 8 ≥ 8)f2,t = 7 , c2,t = 10 (Valid flow since 10 > 7)

Table illustrating Flows and Capacity across different edges of graph above:

The flow across nodes 1 and 2 are also conserved as flow into them = flow out.

11.03.2009

32

The Maximum Flow Problem

Given a Graph G (V,E) such that:

xi,j = flow on edge (i,j)ui,j= capacity of edge (i,j)s = source nodet = sink node

In simple terms maximizethe s to t flow, while ensuringthat the flow is feasible.

Maximize v

Subject To Σjxij - Σjxji = 0 for each i ≠s,t

Σjxsj = v

0 ≤ xij ≤ uij for all (i,j) E.

Cuts of Flow Networks

A Cut in a network is a partition of V into S and T (T=V-S) such that s (source) is in S and t (target) is in T.

10 15

9

1015

2 5

Cut

s 3

4

6

7

t

15

5

30

15

10

8

15

6 10

10

10154

4

Capacity of Cut (S,T)

TvSu

vucTSc,

),(),(

10 15

9

1015

2 5

Cut

s 3

4

6

7

t

15

5

30

15

10

8

15

6 10

10

10154

4

Capacity = 30

Min Cut

Min s-t Cut

Min s-t cut (Also called as a Min Cut) is a cut of minimum capacity

2 5

10 15

9

10154

s 3

4

6

7

t

15

5

30

15

8

6 10

10

4

Capacity = 28

MJ2

Flow of Min Cut (Weak Duality)

Let f be the flow and let (S,T) be a cut. Then | f | ≤ CAP(S,T).

In maximum flow, minimum cut problems forward edges are fullor saturated and the backward edges are empty because of themaximum flow. Thus maximum flow is equal to capacity of cut.This is referred to as weak duality.

P f

0

Proof :

s

t

S T

7

8

Methods

Max-Flow Min-Cut Theorem

• The Ford-Fulkerson Method

• The Preflow-Push Method

Slide 190

MJ2 The vertices in S are colored in green.The vertices in T are colored in grey.The edges from S to T whose sum of capacity is minimum are colored in pink.Mayank Joshi, 1/27/2008

11.03.2009

33

The Ford-Fulkerson Method• Try to improve the flow, until we reach the maximum value of the flow

• The residual capacity of the network with a flow f is given by:

The residual capacity (rc) of an edge (i,j) equals c(i,j) – f(i,j) when (i,j)is a forward edge, and equals f(i,j) when (i,j) is a backward edge. Moreover the residual capacity of an edge is always non-negative.

)()()( vufvucvuc f ),(),(),( vufvucvuc f

s

1

2

t

10, 9 8,8

1,1

10,76, 6

Original Network

s

1

2

t

1 0

0

30

8

6

9

7

1

Residual Network

The Ford-Fulkerson Method

Beginx := 0; // x is the flow.create the residual network G(x);while there is some directed path from s to t in G(x) dobegin

let P be a path from s to t in G(x);∆:= δ(P);send ∆units of flow along P; update the r's;

endend {the flow x is now maximum}.

Click To See Ford Fulkerson’s Algorithm In Action (Animation)

Augmenting Paths ( A Useful Concept )

An augmenting path p is a simple path from s to t on a residual networkthat is an alternating sequence of vertices and edges of the forms,e1,v1,e2,v2,...,ek,t in which no vertex is repeated and no forward edgeis saturated and no backward edge is free.

Definition:

Characteristics of augmenting paths:

• We can put more flow from s to t through p.• The edges of residual network are the edges on which residual capacity

is positive.

• We call the maximum capacity by which we can increase the flow on pthe residual capacity of p.

}on is ),( :),(min{)( pvuvucpc ff

FORDFULKERSON(G,E,s,t)

FOREACH e E

f(e) 0

Gf residual graph

WHILE (there exists augmenting path P)

AUGMENT(f,P)

b bottleneck(P)

FOREACH e P

IF (e E)

// backwards arc

f(e) f(e) + b

The Ford-Fulkerson’s Algorithm

f augment(f, P)

update Gf

ENDWHILE

RETURN f

f(e) f(e) + b

ELSE

// forward arc

f(eR) f(e) - b

RETURN f

Click To See Ford Fulkerson’s Algorithm In Action (Animation)

Proof of correctness of the algorithmLemma: At each iteration all residual capacities are integers.

Proof: It’s true at the beginning. Assume it’s true after the firstk-1 augmentations, and consider augmentation k along path P.

The residual capacity ∆ of P is the smallest residual capacity on P, which is integral.

After updating, we modify the residual capacities by 0 or ∆, and thus residual capacities stay integers.

Theorem: Ford-Fulkerson’s algorithm is finite

Proof: The capacity of each augmenting path is atleast 1. The augmentation reduces the residual capacity of some edge (s,j)and doesn’t increase the residual capacity for some edge (s,i)for any i.So the sum of residual capacities of edges out of s keeps decr-easing, and is bounded below 0.Number of augmentations is O(nC) where C is the largest of thecapacity in the network.

When is the flow optimal ?

A flow f is maximum flow in G if :

(1) The residual network Gf contains no more augmented paths.(2) | f | = c(S,T) for some cut (S,T) (a min-cut)

Proof:

(1) Suppose there is an augmenting path in Gf then it implies that(1) Suppose there is an augmenting path in Gf then it implies that the flow f is not maximum, because there is a path through whichmore data can flow. Thus if flow f is maximum then residual n/wGf will have no more augmented paths.

(2) Let v=Fx(S,T) be the flow from s to t. By assumption v=CAP(S,T)By Weak duality, the maximum flow is at most CAP(S,T). Thus theflow is maximum.

11.03.2009

34

The Ford-Fulkerson Augmenting Path Algorithm for the Maximum Flow Problem

15.082 and 6.855J (MIT OCW)

Ford-Fulkerson Max Flow4

11

2

2

2

3

3

1

s

2

4

5

t2

13

3

This is the original network, plus reversals of the arcs.

Ford-Fulkerson Max Flow4

11

2

2

2

3

3

1

s

2

4

5

t2

13

3

This is the original network, and the original residual network.

4

11

2

2

2

3

3

1

Ford-Fulkerson Max Flow

s

2

4

5

t2

13

Find any s-t path in G(x)

3

4

1

2

3

Ford-Fulkerson Max Flow

11

1

2

2

3

2

1

s

2

4

5

t

11

3

Determine the capacity of the path.

Send units of flow in the path.Update residual capacities.

21

2

3

4

1

2

3

Ford-Fulkerson Max Flow

11

1

2

2

3

2

1

s

2

4

5

t

11

3

Find any s-t path

21

2

3

11.03.2009

35

4

21

1

2

2

1

1

3

Ford-Fulkerson Max Flow

11

1

1

3

2

1

s

2

4

5

t2

1

12

11

3 12

3

Determine the capacity of the path.

Send units of flow in the path.Update residual capacities.

4

21

1

2

2

1

1

3

Ford-Fulkerson Max Flow

11

1

1

3

2

1

s

2

4

5

t2

1

12

11

3 12

3

Find any s-t path

11 1

11

4

2

1

23

Ford-Fulkerson Max Flow

113

2

1

s

2

4

5

t1

1

11

132

3

Determine the capacity of the path.

Send units of flow in the path.Update residual capacities.

11 1

11

4

2

12

23

Ford-Fulkerson Max Flow

113

2

1

s

2

4

5

t1

1

11

132

3

Find any s-t path

12 1 1

11

4

2

12

2

Ford-Fulkerson Max Flow

113

1

1

s

2

4

5

t1

11

2

2

111

3

Determine the capacity of the path.

Send units of flow in the path.Update residual capacities.

2

2 1 1

11

4

2

12

2

Ford-Fulkerson Max Flow

113

1

1

s

2

4

5

t

11

2

2

111

3

Find any s-t path

2

11.03.2009

36

111

11

4

13

2 1 1

3

2 2

Ford-Fulkerson Max Flow

2

1s

2

4

5

t

11

2

2

111

32

Determine the capacity of the path.

Send units of flow in the path.Update residual capacities.

111

11

4

13

2 1 1

3

2 2

Ford-Fulkerson Max Flow

2

1s

2

4

5

t

11

2

2

111

32

There is no s-t path in the residual network. This flow is optimal

111

11

4

13

2 1 1

3

2 2

Ford-Fulkerson Max Flow

2

1s

2

4

5

ts

2

4

5

11

2

2

111

32

These are the nodes that are reachable from node s.

3

Ford-Fulkerson Max Flow1

1

2

2

2

1

2s

2

4

5

t22

3

Here is the optimal flow

Converting Matching to Network Flow

ts

July 3, 2008 215IUCEE: Algorithms