Graph Theory,Graph Terminologies,Planar Graph & Graph Colouring
Basic Principles of Graph Theory
description
Transcript of Basic Principles of Graph Theory
VL Netzwerke, WS 2007/08 Edda Klipp 1
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Networks in Metabolism and Signaling
Edda Klipp Humboldt University Berlin
Lecture 2 / WS 2007/08Basic Principles of Graph Theory and
Random Networks
VL Netzwerke, WS 2007/08 Edda Klipp 2
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Basic Principles of Graph Theory
Literature:
J. Sedlácek (1968) Einführung in die Graphentheorie. Teubner Verlagsgesellschaft, Leipzig.
Albert & Barabási (2002) Statistical mechanics of complex networks. Rev Mod Physics, 74, 47-97.
Barabási & Oltvai (2004) Network biology: understanding the cell’s functional organization, Nature Review Genetics, 5, 101-113.
VL Netzwerke, WS 2007/08 Edda Klipp 3
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Classical ExamplesThe problem of “Fährmann, Ziege, Wolf und Heu”
(F,Z,W,H)
(W,H)
(F,W,H)
(W)
(H)
(F,Z,W)
(F,Z,H)
(F,Z)
(Z) (0)
VL Netzwerke, WS 2007/08 Edda Klipp 4
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
The Bridges of KönigsbergDie Brücken von KönigsbergIm Zentrum der preussischen Stadt Königsberg (heute Kaliningrad) bildet der Fluss Pregel beim Zusammenfluss zweier Arme eine Insel. Im 18. Jahrhundert verbinden 7 Brücken die Flussufer mit der Insel. Es stellt sich die Frage, ob es einen Rundweg gibt, bei dem man alle 7 Brücken genau einmal überquert und wieder zum Ausgangspunkt zurück gelangt. GeschichteDas Problem der Königsberger Brücken stammt von Leonhard Euler. Im Jahre 1736 beweist er, dass es keinen solchen Rundweg geben kann. Er betrachtet den allgemeinen Fall mit einer beliebigen Anzahl Inseln und Brücken und zeigt, dass ein Rundweg der gesuchten Art genau dann möglich ist, wenn sich an keinem der Ufer eine ungerade Zahl von Brücken befindet. Gibt es an genau zwei Ufern eine ungerade Anzahl Brücken, dann existiert ein Weg, der bei diesen beiden Ufern beginnt und endet und dabei alle Brücken genau einmal überquert. Gibt es, wie in Königsberg, mehr als zwei Gebiete, zu denen eine ungerade Zahl von Brücken führt, dann kann kein Weg existieren, der genau einmal alle Brücken überquert.
VL Netzwerke, WS 2007/08 Edda Klipp 5
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graphs: Definitions
A,C,A,B
A,B,C
E
VA edgevertex,
node
A graph is a tuple (V,E) with V a set of n vertices and a set of m edges E:
G=(V,E)
Example: Proteins – vertices, interactions – edges
B
C
vertex – Knotenedge – Kantetuple – Tupel, geordnete
Menge set – Menge
VVE
VL Netzwerke, WS 2007/08 Edda Klipp 6
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graphs: Completeness A,C,A,B
A,B,C
E
V
VVE
A edgevertex
B
C
Edge AB is has vertices A and B.
Knoten A ist inzidiert mit Kante AB.
Be E0 the set of all sub-sets of V with two elements.
A graph is complete, if E=E0. a) b) c) d)
If G1=(V1,E1)G2=(V2,E2)
and021
21
EEE
EE
: G1 and G2 are complementary.
d)
VL Netzwerke, WS 2007/08 Edda Klipp 7
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph Types
BA
BA
,E
,V
ABBA
BA
,,,E
,V
Undirected graphs:
Directed graphs (digraphs): directed edge (i,j) E with i denoting the head and j denoting the tail of the edge.
A B
A B
Extension: Directed edge (i,j,s) E with s {+1,-1} to represent activatory or inhibitory influences.
A B
VL Netzwerke, WS 2007/08 Edda Klipp 8
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph Types: Biparite Graphs
A
B D
C
R1
Set of graph vertices decomposed into two disjoint sets such thatno two graph vertices within the same set are adjacent.
Graphs must represent two distinct classes of nodes such as metabolites (blue, circles) and reactions (yellow, boxes)
ATP
Fruc-6-P Fruc-1,6-P2
ADP
R1
VL Netzwerke, WS 2007/08 Edda Klipp 9
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph Representation: Adjacency Matrix
A B C D E F G
A 0 1 0 0 0 0 0
B 0 0 1 1 0 0 0
C 0 0 0 0 0 0 0
D 0 0 1 0 0 1 0
E 0 0 0 1 0 0 0
F 0 0 0 0 1 0 1
G 0 0 0 0 0 0 0
A B
C
E
D
F G
◊ Adjacency matrix A: non-zero entries represent edges- quadratic- unique assignment of adjacency matrix to graph- unique assignment of graph to adjacency matrix
◊ Bipartite graphs: sub-matrices for the two classes of nodes
◊ Alternative formats: edge lists, vertex lists
Adjacency matrix – Inzidenzmatrix
VL Netzwerke, WS 2007/08 Edda Klipp 10
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph Theoretical Measures: DegreeA B C D E F G
A 0 1 0 0 0 0 0
B 0 0 1 1 0 0 0
C 0 0 0 0 0 0 0
D 0 0 1 0 0 1 0
E 0 0 0 1 0 0 0
F 0 0 0 0 1 0 1
G 0 0 0 0 0 0 0
2
1
3
Bo
Bi
B
k
k
k
A B
C
E
D
F G
◊ Number of edges to which a vertex is connected: Degree k.
◊ For directed graphs: in-degree – edges ending at a vertex out-degree – edges starting a vertex
◊ Vertices with degree 0: isolated
Degree – Knotengrad
Be G a finite graph, v the number of nodes, k the number of edges and s1, s2,…su the degrees of the individual nodes, then holds:
ksv
ii 2
1
VL Netzwerke, WS 2007/08 Edda Klipp 11
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph Theoretical Measures: Degree
A B
C
E
D
F G
◊ Global connectivity properties of a graph:
Average degree <k>
Degree distribution P(k)
kin
012
P(kin)1/74/72/7
kout
012
P(kout)1/74/72/7
<kin> = (4x1 + 2x2)/7=8/7≈1,14
Degree distributions allow to distinguish between different types of networks
VL Netzwerke, WS 2007/08 Edda Klipp 12
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Einschub: Diskrete Wahrscheinlichkeitsverteilungen
Binomialverteilung:
Summe aller Wahrscheinlichkeiten
E(X) = np Erwartungswert (vgl.: Mittelwert für sehr viele Wiederholungen)
Var(X) = np(1-p) Varianz
knk ppk
nkP
1
Eigenschaften einer Stichprobe: Wenn das gewünschte Ergebnis eines Versuches die Wahrscheinlichkeit p besitzt, und die Zahl der Versuche n ist, dann gibt die Binomialverteilung an, mit welcher Wahrscheinlichkeit sich insgesamt k Erfolge einstellen. P(k) ist die Wahrscheinlichkeit (z.B. mit n Versuchen aus einem Topf von Bällen k schwarze zu ziehen)
p=1/2
P(k
)
VL Netzwerke, WS 2007/08 Edda Klipp 13
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Einschub: Diskrete Wahrscheinlichkeitsverteilungen
Poissonverteilung:
Eigenschaften einer Stichprobe: Wie vorher, nur bei sehr kleiner Wahrscheinlichkeit der Einzelereignisse, z.B. weil n sehr groß. - Ereignisrate (z.B. Fehlerrate bei der DNS-Replikation)E(X) = Erwartungswert (vgl.: Mittelwert für sehr viele Wiederholungen)
Var(X) = Varianz
Exponentialverteilung:
0 20 40 60 80 100
0.8
0.6
0.4
0.2
0.0
E(X) = 1/
Var(X) = 1/
VL Netzwerke, WS 2007/08 Edda Klipp 14
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Degree Distributions
Degree distribution of the World Wide Web from two different measurements: h, the 325 729-node sample of Albert et al. (1999); s, the measurements of over 200 million pages by Broder et al. (2000);
(a) degree distribution of the outgoing edges;
(b) degree distribution of the incoming edges. The data have been binned logarithmically to reduce noise.
Albert & Barabasi, 2002, Rev Mod Phys
VL Netzwerke, WS 2007/08 Edda Klipp 15
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Degree Distributions
Albert & Barabasi, 2002, Rev Mod Phys
The degree distribution of several real networks: (a) Internet at the router level. Data courtesy of Ramesh Govindan;(b) movie actor collaboration network. After Barabasi and Albert 1999. Note that if TV series are included as well,which aggregate a large number of actors, an exponential cutoff emerges for large k (Amaral et al., 2000); (c) co-authorship network of high-energy physicists. After Newman (2001a,2001b); (d) co-authorship network of neuroscientists. After Barabasi et al. (2001).
VL Netzwerke, WS 2007/08 Edda Klipp 16
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Degree Distributions
Jeong H et al, 2000, Nature
Connectivity distributions P(k) for substrates. a, Archaeoglobus fulgidus(archae); b, E. coli (bacterium); c, Caenorhabditis elegans (eukaryote),
shown on a log±log plot, counting separately the incoming (In) and outgoing links (Out) for each substrate.
kin (kout) corresponds to the number of reactions in which a substrate participates as a product (educt).
d, The connectivity distribution averaged over all 43 organisms.
VL Netzwerke, WS 2007/08 Edda Klipp 17
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Random Graphs
First well-know example: Model of Paul Erdős and Alfréd Rényi
History: Erdős number
beschreibt die Distanz im Graphen der Koautorenschaft bezogen auf den Mathematiker Paul Erdős. Im Graphen werden die publizistisch verwandten Autoren als Knoten repräsentiert, zwischen denen jeweils dann eine Kante existiert, wenn sie eine Publikation gemeinsam verfasst haben.Paul Erdős selbst hat die Erdős-Zahl 0, alle Koautoren, mit welchen er publiziert hat, haben die Erdős-Zahl 1. Autoren, die mit Koautoren von Paul Erdős publiziert haben, haben die Erdős-Zahl 2 usw. Wenn keine Verbindung in dieser Form zu einer Person herstellbar ist, ist ihre Erdős-Zahl ∞.Es zeigt sich, dass die Erdős-Zahl der meisten Personen entweder unendlich oder erstaunlich gering ist. Letzteres rührt vor allem daher, dass Erdős mit über 500 verschiedenen Wissenschaftlern gemeinsam publizierte und er in vielen Teilbereichen der Mathematik bewandert war.
VL Netzwerke, WS 2007/08 Edda Klipp 18
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Random Graphs
n1 n2 n
3
n4 n5
n1 x - x -
n2 - - x
n3 x -
n4 -
n5
A well-know example: Model of Paul Erdős and Alfréd Rényi
Start with N nodes.
Connect every pair of nodes with probability p
Dice number z [0;1]. If z<p then connection
Obtain graph with approx. ½ pN (N-1) edges
Degree distribution: Poisson distribution
Average degree: <k> = ½ pN (N-1) * 2/N = p(N-1) pN
VL Netzwerke, WS 2007/08 Edda Klipp 19
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Random Graphs: Evolution
Construction of random graphs is called evolution.
Starting with a set of isolated vertices, the graph develops by the successive addition of random edges.
The graphs obtained at different stages of this process correspond to larger and larger connection probabilities p, eventually obtaining a fully connected graph
having the maximum number of edges n=N(N-1)/2 for p1.
VL Netzwerke, WS 2007/08 Edda Klipp 20
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Random Networks
Questions:
Are real networks really random?
Display real networks organization principles?
Is a typical graph connected? (depending on p)
Does it contain a triangle of connected nodes?
Does its diameter depends on its size?
VL Netzwerke, WS 2007/08 Edda Klipp 21
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Random Networks: Subgraphs
A graph G1 consisting of a set V1 of nodes and a set E1 of edges is a subgraph of a graph G={V,E} if all nodes in V1 are also nodes of V and all edges in E1 are also edges of E.
A cycle of order k is a closed loop of k edges such that every two consecutive edges and only those have a common node.Average degree: 2
The opposite of cycles are the trees, which cannot form closed loops. More precisely, a graph is a tree of order k if it has k nodes and k-1 edges, and none of its subgraphs is a cycle. Average degree of a tree of order k: <k>=2-2/k (2 for large trees)
TriangleRectangle
VL Netzwerke, WS 2007/08 Edda Klipp 22
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Random Networks: Subgraphs
The threshold probabilities at which different subgraphs appear in a random graph. For pN3/20 the graph consists of isolated nodes and edges. For p~N-3/2
trees of order 3 appear, while for p~N-4/3 trees of order 4 appear. At p~N-1 trees of all orders are present, and at the same time cycles of all orders appear. The probability p~N-2/3 marks the appearance of complete subgraphs of order 4 and p~N-1/2 corresponds to complete subgraphs of order 5. As z approaches 0, the graph contains complete subgraphs of increasing order.
VL Netzwerke, WS 2007/08 Edda Klipp 23
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Degree Distribution
The degree distribution that results from the numerical simulation of a random graph. We generated a single random graph with N=10 000 nodes and connection probability P=0.0015, and calculated the number of nodes with degree k, Xk. The plot compares Xk /N with the expectation value of the Poisson distribution (13), E(Xk)/N=P(ki=k), and we can see that the deviation is small.
VL Netzwerke, WS 2007/08 Edda Klipp 24
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph-theoretical Measure: Distance
Path: Connection between two vertices u and v without repetition of nodes (i.e. no backtracking, no loops)
Shortest path length l(u,v) : Local measure for two nodes
Average shortest path length <l> Global network property indicating navigability
A B
C
E
D
F G
VL Netzwerke, WS 2007/08 Edda Klipp 25
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph-theoretical Measure: Distance
Breadth-first search: Exploration of all nodes in a graph starting from those adjacent to a current node.
Dijkstra’s algorithm: Construct shortest-path tree from a source to every other vertex (vertex number N: O(N2) )
A B
C
E
D
F G
VL Netzwerke, WS 2007/08 Edda Klipp 26
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph-theoretical Measure: Diameter
Strictly speaking, the diameter of a disconnected graph (i.e., one made up of severalisolated clusters) is infinite, but it can be defined as the maximum diameter of its clusters.
Random graphs tend to have small diameters, provided p is not too small.
• If <k> = pN < 1, a typical graph is composed of isolated trees and its diameter equals the diameter of a tree.• If <k> > 1, a giant cluster appears. The diameter of the graph equals the diameter of the giant cluster if <k> >3.5, and is proportional to ln(N)/ln(<k>).• If <k> >ln(N), almost every graph is totally connected. The diameters of the graphs having the same N and <k> are concentrated on a few values around ln(N)/ln(<k>).
A B
C
E
D
F G
The diameter of a graph is the maximal distance between any pair of its nodes.
VL Netzwerke, WS 2007/08 Edda Klipp 27
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph-theoretical Measures: Clustering
A B
C
E
D
F G
Clustering coefficient C(v) for node v: Ratio between the number of edges linking nodes adjacent to v and the total number of possible edges among them (at most kv(kv-1)/2 for kv neighbors)
C(D) =1/3
Adjacent nodes: B, C, E, FNumber of links: 2Possible number of links: 6
Idea behind: In many networks, if node A is connected to B, and B is connected to C, then it is highly probable that A also has a directlink to C.
VL Netzwerke, WS 2007/08 Edda Klipp 28
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph-theoretical Measures: Clustering
A B
C
E
D
F G
Average clustering coefficient <C>:
Tendency of the network to form clusters or groups
Average clustering coefficient for all nodes with k links C(k) :
Diversity of cohesiveness of local neighborhoods
C(A) =0C(B) =1/3C(C) =1C(D) =1/3C(E) =1C(F) =1/3C(G) =0
<C>=3/7
VL Netzwerke, WS 2007/08 Edda Klipp 29
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph-theoretical Measures: Clustering
A B
C
E
D
F G
Complex networks exhibit a large degree of clustering. If we consider a node in a random graph and its nearest neighbors, the probability that two of these neighbors are connected is equal to the probability that two randomly selected nodes are connected.
VL Netzwerke, WS 2007/08 Edda Klipp 30
Max Planck Institute Molecular Genetics
Humboldt University BerlinTheoretical Biophysics
Graph-theoretical Measures: Clustering
Clustering coefficients as predicted for random networks and Clustering coefficients for real networks(WWW, movie actors, co-authorship, E.coli substrate graph, E.coli reaction graph, food webs, word co-occurrence, power grids,…)