Basic Principles of Graph Theory

30
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism and Signaling Edda Klipp Humboldt University Berlin Lecture 2 / WS 2007/08 Basic Principles of Graph Theory and Random Networks

description

Networks in Metabolism and Signaling Edda Klipp Humboldt University Berlin Lecture 2 / WS 2007/08 Basic Principles of Graph Theory and Random Networks. Basic Principles of Graph Theory. Literature: J. Sedlácek (1968) Einführung in die Graphentheorie. Teubner Verlagsgesellschaft, Leipzig. - PowerPoint PPT Presentation

Transcript of Basic Principles of Graph Theory

Page 1: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 1

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Networks in Metabolism and Signaling

Edda Klipp Humboldt University Berlin

Lecture 2 / WS 2007/08Basic Principles of Graph Theory and

Random Networks

Page 2: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 2

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Basic Principles of Graph Theory

Literature:

J. Sedlácek (1968) Einführung in die Graphentheorie. Teubner Verlagsgesellschaft, Leipzig.

Albert & Barabási (2002) Statistical mechanics of complex networks. Rev Mod Physics, 74, 47-97.

Barabási & Oltvai (2004) Network biology: understanding the cell’s functional organization, Nature Review Genetics, 5, 101-113.

Page 3: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 3

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Classical ExamplesThe problem of “Fährmann, Ziege, Wolf und Heu”

(F,Z,W,H)

(W,H)

(F,W,H)

(W)

(H)

(F,Z,W)

(F,Z,H)

(F,Z)

(Z) (0)

Page 4: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 4

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

The Bridges of KönigsbergDie Brücken von KönigsbergIm Zentrum der preussischen Stadt Königsberg (heute Kaliningrad) bildet der Fluss Pregel beim Zusammenfluss zweier Arme eine Insel. Im 18. Jahrhundert verbinden 7 Brücken die Flussufer mit der Insel. Es stellt sich die Frage, ob es einen Rundweg gibt, bei dem man alle 7 Brücken genau einmal überquert und wieder zum Ausgangspunkt zurück gelangt. GeschichteDas Problem der Königsberger Brücken stammt von Leonhard Euler. Im Jahre 1736 beweist er, dass es keinen solchen Rundweg geben kann. Er betrachtet den allgemeinen Fall mit einer beliebigen Anzahl Inseln und Brücken und zeigt, dass ein Rundweg der gesuchten Art genau dann möglich ist, wenn sich an keinem der Ufer eine ungerade Zahl von Brücken befindet. Gibt es an genau zwei Ufern eine ungerade Anzahl Brücken, dann existiert ein Weg, der bei diesen beiden Ufern beginnt und endet und dabei alle Brücken genau einmal überquert. Gibt es, wie in Königsberg, mehr als zwei Gebiete, zu denen eine ungerade Zahl von Brücken führt, dann kann kein Weg existieren, der genau einmal alle Brücken überquert.

Page 5: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 5

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graphs: Definitions

A,C,A,B

A,B,C

E

VA edgevertex,

node

A graph is a tuple (V,E) with V a set of n vertices and a set of m edges E:

G=(V,E)

Example: Proteins – vertices, interactions – edges

B

C

vertex – Knotenedge – Kantetuple – Tupel, geordnete

Menge set – Menge

VVE

Page 6: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 6

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graphs: Completeness A,C,A,B

A,B,C

E

V

VVE

A edgevertex

B

C

Edge AB is has vertices A and B.

Knoten A ist inzidiert mit Kante AB.

Be E0 the set of all sub-sets of V with two elements.

A graph is complete, if E=E0. a) b) c) d)

If G1=(V1,E1)G2=(V2,E2)

and021

21

EEE

EE

: G1 and G2 are complementary.

d)

Page 7: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 7

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph Types

BA

BA

,E

,V

ABBA

BA

,,,E

,V

Undirected graphs:

Directed graphs (digraphs): directed edge (i,j) E with i denoting the head and j denoting the tail of the edge.

A B

A B

Extension: Directed edge (i,j,s) E with s {+1,-1} to represent activatory or inhibitory influences.

A B

Page 8: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 8

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph Types: Biparite Graphs

A

B D

C

R1

Set of graph vertices decomposed into two disjoint sets such thatno two graph vertices within the same set are adjacent.

Graphs must represent two distinct classes of nodes such as metabolites (blue, circles) and reactions (yellow, boxes)

ATP

Fruc-6-P Fruc-1,6-P2

ADP

R1

Page 9: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 9

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph Representation: Adjacency Matrix

A B C D E F G

A 0 1 0 0 0 0 0

B 0 0 1 1 0 0 0

C 0 0 0 0 0 0 0

D 0 0 1 0 0 1 0

E 0 0 0 1 0 0 0

F 0 0 0 0 1 0 1

G 0 0 0 0 0 0 0

A B

C

E

D

F G

◊ Adjacency matrix A: non-zero entries represent edges- quadratic- unique assignment of adjacency matrix to graph- unique assignment of graph to adjacency matrix

◊ Bipartite graphs: sub-matrices for the two classes of nodes

◊ Alternative formats: edge lists, vertex lists

Adjacency matrix – Inzidenzmatrix

Page 10: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 10

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph Theoretical Measures: DegreeA B C D E F G

A 0 1 0 0 0 0 0

B 0 0 1 1 0 0 0

C 0 0 0 0 0 0 0

D 0 0 1 0 0 1 0

E 0 0 0 1 0 0 0

F 0 0 0 0 1 0 1

G 0 0 0 0 0 0 0

2

1

3

Bo

Bi

B

k

k

k

A B

C

E

D

F G

◊ Number of edges to which a vertex is connected: Degree k.

◊ For directed graphs: in-degree – edges ending at a vertex out-degree – edges starting a vertex

◊ Vertices with degree 0: isolated

Degree – Knotengrad

Be G a finite graph, v the number of nodes, k the number of edges and s1, s2,…su the degrees of the individual nodes, then holds:

ksv

ii 2

1

Page 11: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 11

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph Theoretical Measures: Degree

A B

C

E

D

F G

◊ Global connectivity properties of a graph:

Average degree <k>

Degree distribution P(k)

kin

012

P(kin)1/74/72/7

kout

012

P(kout)1/74/72/7

<kin> = (4x1 + 2x2)/7=8/7≈1,14

Degree distributions allow to distinguish between different types of networks

Page 12: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 12

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Einschub: Diskrete Wahrscheinlichkeitsverteilungen

Binomialverteilung:

Summe aller Wahrscheinlichkeiten

E(X) = np Erwartungswert (vgl.: Mittelwert für sehr viele Wiederholungen)

Var(X) = np(1-p) Varianz

knk ppk

nkP

1

Eigenschaften einer Stichprobe: Wenn das gewünschte Ergebnis eines Versuches die Wahrscheinlichkeit p besitzt, und die Zahl der Versuche n ist, dann gibt die Binomialverteilung an, mit welcher Wahrscheinlichkeit sich insgesamt k Erfolge einstellen. P(k) ist die Wahrscheinlichkeit (z.B. mit n Versuchen aus einem Topf von Bällen k schwarze zu ziehen)

p=1/2

P(k

)

Page 13: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 13

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Einschub: Diskrete Wahrscheinlichkeitsverteilungen

Poissonverteilung:

Eigenschaften einer Stichprobe: Wie vorher, nur bei sehr kleiner Wahrscheinlichkeit der Einzelereignisse, z.B. weil n sehr groß. - Ereignisrate (z.B. Fehlerrate bei der DNS-Replikation)E(X) = Erwartungswert (vgl.: Mittelwert für sehr viele Wiederholungen)

Var(X) = Varianz

Exponentialverteilung:

0 20 40 60 80 100

0.8

0.6

0.4

0.2

0.0

E(X) = 1/

Var(X) = 1/

Page 14: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 14

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Degree Distributions

Degree distribution of the World Wide Web from two different measurements: h, the 325 729-node sample of Albert et al. (1999); s, the measurements of over 200 million pages by Broder et al. (2000);

(a) degree distribution of the outgoing edges;

(b) degree distribution of the incoming edges. The data have been binned logarithmically to reduce noise.

Albert & Barabasi, 2002, Rev Mod Phys

Page 15: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 15

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Degree Distributions

Albert & Barabasi, 2002, Rev Mod Phys

The degree distribution of several real networks: (a) Internet at the router level. Data courtesy of Ramesh Govindan;(b) movie actor collaboration network. After Barabasi and Albert 1999. Note that if TV series are included as well,which aggregate a large number of actors, an exponential cutoff emerges for large k (Amaral et al., 2000); (c) co-authorship network of high-energy physicists. After Newman (2001a,2001b); (d) co-authorship network of neuroscientists. After Barabasi et al. (2001).

Page 16: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 16

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Degree Distributions

Jeong H et al, 2000, Nature

Connectivity distributions P(k) for substrates. a, Archaeoglobus fulgidus(archae); b, E. coli (bacterium); c, Caenorhabditis elegans (eukaryote),

shown on a log±log plot, counting separately the incoming (In) and outgoing links (Out) for each substrate.

kin (kout) corresponds to the number of reactions in which a substrate participates as a product (educt).

d, The connectivity distribution averaged over all 43 organisms.

Page 17: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 17

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Random Graphs

First well-know example: Model of Paul Erdős and Alfréd Rényi

History: Erdős number

beschreibt die Distanz im Graphen der Koautorenschaft bezogen auf den Mathematiker Paul Erdős. Im Graphen werden die publizistisch verwandten Autoren als Knoten repräsentiert, zwischen denen jeweils dann eine Kante existiert, wenn sie eine Publikation gemeinsam verfasst haben.Paul Erdős selbst hat die Erdős-Zahl 0, alle Koautoren, mit welchen er publiziert hat, haben die Erdős-Zahl 1. Autoren, die mit Koautoren von Paul Erdős publiziert haben, haben die Erdős-Zahl 2 usw. Wenn keine Verbindung in dieser Form zu einer Person herstellbar ist, ist ihre Erdős-Zahl ∞.Es zeigt sich, dass die Erdős-Zahl der meisten Personen entweder unendlich oder erstaunlich gering ist. Letzteres rührt vor allem daher, dass Erdős mit über 500 verschiedenen Wissenschaftlern gemeinsam publizierte und er in vielen Teilbereichen der Mathematik bewandert war.

Page 18: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 18

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Random Graphs

n1 n2 n

3

n4 n5

n1 x - x -

n2 - - x

n3 x -

n4 -

n5

A well-know example: Model of Paul Erdős and Alfréd Rényi

Start with N nodes.

Connect every pair of nodes with probability p

Dice number z [0;1]. If z<p then connection

Obtain graph with approx. ½ pN (N-1) edges

Degree distribution: Poisson distribution

Average degree: <k> = ½ pN (N-1) * 2/N = p(N-1) pN

Page 19: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 19

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Random Graphs: Evolution

Construction of random graphs is called evolution.

Starting with a set of isolated vertices, the graph develops by the successive addition of random edges.

The graphs obtained at different stages of this process correspond to larger and larger connection probabilities p, eventually obtaining a fully connected graph

having the maximum number of edges n=N(N-1)/2 for p1.

Page 20: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 20

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Random Networks

Questions:

Are real networks really random?

Display real networks organization principles?

Is a typical graph connected? (depending on p)

Does it contain a triangle of connected nodes?

Does its diameter depends on its size?

Page 21: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 21

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Random Networks: Subgraphs

A graph G1 consisting of a set V1 of nodes and a set E1 of edges is a subgraph of a graph G={V,E} if all nodes in V1 are also nodes of V and all edges in E1 are also edges of E.

A cycle of order k is a closed loop of k edges such that every two consecutive edges and only those have a common node.Average degree: 2

The opposite of cycles are the trees, which cannot form closed loops. More precisely, a graph is a tree of order k if it has k nodes and k-1 edges, and none of its subgraphs is a cycle. Average degree of a tree of order k: <k>=2-2/k (2 for large trees)

TriangleRectangle

Page 22: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 22

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Random Networks: Subgraphs

The threshold probabilities at which different subgraphs appear in a random graph. For pN3/20 the graph consists of isolated nodes and edges. For p~N-3/2

trees of order 3 appear, while for p~N-4/3 trees of order 4 appear. At p~N-1 trees of all orders are present, and at the same time cycles of all orders appear. The probability p~N-2/3 marks the appearance of complete subgraphs of order 4 and p~N-1/2 corresponds to complete subgraphs of order 5. As z approaches 0, the graph contains complete subgraphs of increasing order.

Page 23: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 23

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Degree Distribution

The degree distribution that results from the numerical simulation of a random graph. We generated a single random graph with N=10 000 nodes and connection probability P=0.0015, and calculated the number of nodes with degree k, Xk. The plot compares Xk /N with the expectation value of the Poisson distribution (13), E(Xk)/N=P(ki=k), and we can see that the deviation is small.

Page 24: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 24

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph-theoretical Measure: Distance

Path: Connection between two vertices u and v without repetition of nodes (i.e. no backtracking, no loops)

Shortest path length l(u,v) : Local measure for two nodes

Average shortest path length <l> Global network property indicating navigability

A B

C

E

D

F G

Page 25: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 25

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph-theoretical Measure: Distance

Breadth-first search: Exploration of all nodes in a graph starting from those adjacent to a current node.

Dijkstra’s algorithm: Construct shortest-path tree from a source to every other vertex (vertex number N: O(N2) )

A B

C

E

D

F G

Page 26: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 26

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph-theoretical Measure: Diameter

Strictly speaking, the diameter of a disconnected graph (i.e., one made up of severalisolated clusters) is infinite, but it can be defined as the maximum diameter of its clusters.

Random graphs tend to have small diameters, provided p is not too small.

• If <k> = pN < 1, a typical graph is composed of isolated trees and its diameter equals the diameter of a tree.• If <k> > 1, a giant cluster appears. The diameter of the graph equals the diameter of the giant cluster if <k> >3.5, and is proportional to ln(N)/ln(<k>).• If <k> >ln(N), almost every graph is totally connected. The diameters of the graphs having the same N and <k> are concentrated on a few values around ln(N)/ln(<k>).

A B

C

E

D

F G

The diameter of a graph is the maximal distance between any pair of its nodes.

Page 27: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 27

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph-theoretical Measures: Clustering

A B

C

E

D

F G

Clustering coefficient C(v) for node v: Ratio between the number of edges linking nodes adjacent to v and the total number of possible edges among them (at most kv(kv-1)/2 for kv neighbors)

C(D) =1/3

Adjacent nodes: B, C, E, FNumber of links: 2Possible number of links: 6

Idea behind: In many networks, if node A is connected to B, and B is connected to C, then it is highly probable that A also has a directlink to C.

Page 28: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 28

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph-theoretical Measures: Clustering

A B

C

E

D

F G

Average clustering coefficient <C>:

Tendency of the network to form clusters or groups

Average clustering coefficient for all nodes with k links C(k) :

Diversity of cohesiveness of local neighborhoods

C(A) =0C(B) =1/3C(C) =1C(D) =1/3C(E) =1C(F) =1/3C(G) =0

<C>=3/7

Page 29: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 29

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph-theoretical Measures: Clustering

A B

C

E

D

F G

Complex networks exhibit a large degree of clustering. If we consider a node in a random graph and its nearest neighbors, the probability that two of these neighbors are connected is equal to the probability that two randomly selected nodes are connected.

Page 30: Basic Principles of Graph Theory

VL Netzwerke, WS 2007/08 Edda Klipp 30

Max Planck Institute Molecular Genetics

Humboldt University BerlinTheoretical Biophysics

Graph-theoretical Measures: Clustering

Clustering coefficients as predicted for random networks and Clustering coefficients for real networks(WWW, movie actors, co-authorship, E.coli substrate graph, E.coli reaction graph, food webs, word co-occurrence, power grids,…)