Networks - TU Graz
Transcript of Networks - TU Graz
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
NetworksComputational Social Systems 1 (VU) (706.616)
Denis Helic
ISDS, TU Graz
March 28, 2020
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 1 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Outline
1 Basic Definitions2 Paths3 Distance and Breadth-First Search4 Approximating Distance Distribution5 Node Structural Roles6 Components7 Node Degrees8 Clustering9 Les Miserables: Network Statistics Example10 Random Graph
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 2 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Graphs & Networks
DefinitionA network is a set of items called nodes and connections between thoseitems called links.
Terminology clarification:Mathematics: vertices (vertex) and edgesPhysics: sites and bondsSociology: actors and tiesComputer science: nodes and links
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 3 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Graphs & Networks
DefinitionA graph (network) is a pair of sets πΊ = (π, πΈ), whereas π denotes the setof nodes and πΈ the set of links.
In an undirected graph, the set πΈ β [π]2
[π]π is the set of all subsets of π with π elementsIn an undirected graph links are pairs of nodesIn a directed graph, the set πΈ β π Γ πIn a directed graph, links are ordered pairs of nodesIn graph theory literature often π(πΊ) and πΈ(πΊ) are used.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 4 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Example of a simple undirected graph
Figure: Simple undirected graph
π = 1, 2, 3, 4, 5πΈ = 1, 2, 1, 5, 2, 3, 2, 4, 3, 4, 3, 5
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 5 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Example of a simple directed graph
Figure: Simple directed graph
π = 1, 2, 3, 4, 5πΈ = (1, 3), (2, 1), (2, 3), (2, 5), (3, 2), (4, 1), (4, 5), (5, 2), (5, 3)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 6 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Some further notation
Simple graphs: graphs with no self-links or loopsβπ β π, π β πΈ (undirected graph). By defining that πΈ β [π]2 thisis never the case.βπ β π, (π, π) β πΈ (directed graph)Number of nodes in πΊ: π = |π|Number of links in πΊ: π = |πΈ|
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 7 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Graphs vs. Networks
Mathematical graph theoryAnalytical approach to studying of small graphs (typically tens orhundreds of nodes)With the emergence of ICT technology we are able to analyze largegraphs that exist in nature, societies, technologies, etc.Now, we are considering large-scale statistical properties of graphsNetwork science deal with the empirical analysis of large graphs(networks) that occur in different areas
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 8 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Types of networks
Nodes connected by links is the simplest type of networkDifferent types of nodesDifferent types of linksNodes and links can carry weights
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 9 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Types of networks
(b)
(d)
(a)
(c)
Figure: Various types of networks. From: The structure and function of complexnetworks, Newman, 2003.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 10 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Networks
Social networks. Nodes are people and links are acquaintances,friendship, and so on.Communication networks. Internet: nodes are computers and linksare cables connecting computersBiological networks. Metabolism: nodes are substances and links aremetabolic reactionsInformation networks. Web: nodes are Web pages and links arehyperlinks connecting pages
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 11 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Networks
Pajek
Figure: Social network of HP Labs constructed out of e-mail communication.From: How to search a social network, Adamic, 2005.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 12 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Basic Definitions
Networks
Figure: Network of pages and hyperlinks on a Website. From: Networks, MarkNewman, 2011.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 13 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Paths
Arpanet2.2. PATHS AND CONNECTIVITY 25
Figure 2.2: A network depicting the sites on the Internet, then known as the Arpanet, inDecember 1970. (Image from F. Heart, A. McKenzie, J. McQuillian, and D. Walden [214];on-line at http://som.csudh.edu/cis/lpress/history/arpamaps/.)
connections such as hyperlinks, citations, or cross-references. The list of areas in which
graphs play a role is of course much broader than what we can enumerate here; Figure 2.4
gives a few further examples, and also shows that many images we encounter on a regular
basis have graphs embedded in them.
2.2 Paths and Connectivity
We now turn to some of the fundamental concepts and definitions surrounding graphs. Per-
haps because graphs are so simple to define and work with, an enormous range of graph-
theoretic notions have been studied; the social scientist John Barnes once described graph
theory as a βterminological jungle, in which any newcomer may plant a treeβ [45]. Fortu-
nately, for our purposes, we will be able to get underway with just a brief discussion of some
of the most central concepts.
Figure: Image from:http://som.csudh.edu/cis/lpress/history/arpamaps/
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 14 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Paths
Arpanet
26 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 15 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Paths
Paths
Often things travel across the links of a graphA passenger taking a sequence of airline flightsA computer user navigating the Web, or WikipediaA data packet moving across the computer network, e.g. the Internet
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 16 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Paths
Paths
Path: a sequence of nodes such that each consecutive pair in thesequence is connected by a linkFor example, the sequence: (MIT, BBN, RAND, UCLA) is a path inthe Internet graphAnother sequence: (CASE, LINC, MIT, UTAH, SRI, UCSB) is also apathBut the sequence: LINC, BBN, HARV, CARN is not a path
26 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 17 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Paths
Paths Formally
DefinitionLet πΊ = (π, πΈ) be a graph. Given two nodes π , π‘ β π we defineππ ,π‘ = (π , π’1, π’2,β¦ , π’πβ1, π‘) to be a path between π and π‘ ifπ’1, π’2,β¦ , π’πβ1 β π and (π , π’1), (π’1, π’2),β¦ , (π’πβ1, π‘) β πΈ. Let Ξ π ,π‘ bea set of all paths from π to π‘.
πππ πΌ,ππΆπΏπ΄ = (ππ πΌ,ππΆπΏπ΄) because ππ πΌ,ππΆπΏπ΄ β πΈπππ πΌ,ππΆπΏπ΄ = (ππ πΌ, πππ΄π,ππΆπΏπ΄) becauseππ πΌ, πππ΄π, πππ΄π,ππΆπΏπ΄ β πΈ
26 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 18 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Paths
Paths
We can repeat nodes in a pathFor example, the sequence: (SRI, STAN, UCLA, SRI, UTAH, MIT) isa pathSRI is repeatedIf a path does not repeat nodes: simple path
26 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 19 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Paths
Cycles
An important kind of nonsimple path is a cycleCycle: is a path with at least three links, in which the first and thelast node are the sameFor example, (SRI, STAN, UCLA, SRI) is a cycleBy design, every link belongs to a cycle to make it robust to failure(alternative routes)
26 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 20 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Path length
Very often we want to know how long a path isIn transportation and communication network it is important howmany hops a packet or a person travelsPath length: the number of links in a path
DefinitionLet πΊ = (π, πΈ) be a graph. Given two nodes π , π‘ β π and a pathππ ,π‘ = (π , π’1, π’2,β¦ , π’πβ1, π‘) from π to π‘. We define the length of path ππ ,π‘as |ππ ,π‘| = π.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 21 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Path length
(MIT, BBN, RAND, UCL) has length 3; (MIT, UTAH) has length 126 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 22 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Distance
Distance: the length of the shortest path between two nodes π and π‘.We denote the distance with βπ ,π‘.In other words: βπ ,π‘ β€ |ππ ,π‘| for all paths ππ ,π‘ β Ξ π ,π‘LINC and SRI have distance 3, i.e. βπΏπΌππΆ,ππ πΌ = 3UTAH and RAND have distance 2, i.e. βπππ΄π»,π π΄ππ· = 2UTAH and SRI have distance 1, i.e. βπππ΄π»,ππ πΌ = 1
26 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 23 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Breadth-First Search
For a small graph we can figure out the distance by looking at thepictureFor larger graphs we need an algorithmAn efficient algorithm is breadth-first searchThe algorithm computes the distances from a single starting node toall other nodesFrom now on we assume that starting from an arbitrary node we canalways reach all other nodes
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 24 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Breadth-First Search
We begin at a given node π in the network1 We declare all neighbors of π (nodes connected to π) to be at distance 12 Then we find all neighbors of these neighbors (not counting nodes that
are already neighbors of π) and declare them to be at distance 23 Then we find all neighbors of the nodes from the previous step (again,
not counting nodes that we already found at distance 1 and 2) anddeclare them to be at distance 3
(β¦) We continue in this way and search in successive layers each of which isat the next distance out until we can not discover any new nodes
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 25 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Breadth-First Search2.3. DISTANCE AND BREADTH-FIRST SEARCH 33
you
distance 1
distance 2
distance 3
your friends
friends of friends
friends of friends
of friends
all nodes, not already discovered, that have an
edge to some node in the previous layer
Figure 2.8: Breadth-first search discovers distances to nodes one βlayerβ at a time; each layer
is built of nodes that have an edge to at least one node in the previous layer.
a pathβs length, we can talk about whether two nodes are close together or far apart in a
graph: we define the distance between two nodes in a graph to be the length of the shortest
path between them. For example, the distance between linc and sri is three, though to
believe this you have to first convince yourself that there is no length-1 or length-2 path
between them.
Breadth-First Search. For a graph like the one in Figure 2.3, we can generally figure
out the distance between two nodes by eyeballing the picture; but for graphs that are even
a bit more complicated, we need some kind of a systematic method to determine distances.
The most natural way to do this β and also the most efficient way to calculate distances
for a large network dataset using a computer β is the way you would probably do it if you
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 26 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Breadth-First Search26 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
34 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC RAND
UTAH
SRI
UCLASTANUCSB
distance 1
distance 2
distance 3
Figure 2.9: The layers arising from a breadth-first of the December 1970 Arpanet, startingat the node mit.
really needed to trace out distances in the global friendship network (and had the unlimited
patience and cooperation of everyone in the world). This is pictured in Figure 2.8:
(1) You first declare all of your actual friends to be at distance 1.
(2) You then find all of their friends (not counting people who are already friends of yours),
and declare these to be at distance 2.
(3) Then you find all of their friends (again, not counting people who youβve already found
at distances 1 and 2) and declare these to be at distance 3.
(...) Continuing in this way, you search in successive layers, each representing the next
distance out. Each new layer is built from all those nodes that (i) have not already
been discovered in earlier layers, and that (ii) have an edge to some node in the previous
layer.
This technique is called breadth-first search, since it searches the graph outward from a start-
ing node, reaching the closest nodes first. In addition to providing a method of determining
distances, it can also serve as a useful conceptual framework to organize the structure of a
graph, arranging the nodes based on their distances from a fixed starting point.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 27 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Complexity of Breadth-First Search
Recollect that we denote the number of nodes in a graph with π anda number of links with πDuring a breadth-first search we have to investigate all nodes at leastonce and follow all of their links at least onceThus, we perform π + π operationsComplexity of the breadth-first search algorithm is π(π + π)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 28 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Complexity of Breadth-First Search
Using breadth-first search we can compute the distances between allpairs of nodes in a network (all-pairs-shortest-path)We iterate over the nodes and start a BFS from each nodeThe complexity is π(π(π + π)) = π(π2 + ππ)
In a connected simple graph without selflinks: (π β 1) β€ π β€ π(πβ1)2
The overall complexity π(ππ)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 29 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Complexity of Breadth-First Search
Using breadth-first search we can compute the distances between allpairs of nodes in a network (all-pairs-shortest-path)We iterate over the nodes and start a BFS from each nodeThe complexity is π(π(π + π)) = π(π2 + ππ)In a connected simple graph without selflinks: (π β 1) β€ π β€ π(πβ1)
2The overall complexity π(ππ)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 29 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Summarizing distances
One interesting quantity with respect to distances is the diameterDiameter: maximum distance between any pair of nodes in thegraph (we denote it with βπππ₯)Another interesting quantity is the average distanceAverage distance over all pairs of nodes in a graph:
β = 1π(π β 1) β
ππβππ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 30 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Summarizing distances
In many networks diameter and average distance are close to eachotherIn some graphs, however, they can be very differentCan you think of a graph where the diameter is three (or arbitrarymany) times longer than the average distance
You need outliers in the distribution, i.e. a distant node connected bya chain of nodes to a tightly connected graph core
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 31 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Summarizing distances
In many networks diameter and average distance are close to eachotherIn some graphs, however, they can be very differentCan you think of a graph where the diameter is three (or arbitrarymany) times longer than the average distanceYou need outliers in the distribution, i.e. a distant node connected bya chain of nodes to a tightly connected graph core
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 31 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Distribution of distances
Interesting statistics: distribution of distancesHow many pairs have a given distanceTypically, we will normalize by the total number of pairs to obtainprobabilitiesWe can visualize it with a histogram
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 32 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Les Miserables
Les Miserables
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 33 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Distribution of distances
0 1 2 3 4 5 6`
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45p(`
)Distribution of distances: `=2.641148,`max=5
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 34 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Distance and Breadth-First Search
Ipython notebook
IPython Notebook examplehttp://kti.tugraz.at/staff/socialcomputing/courses/webscience/websci1.zip
Command Lineipython notebook βpylab=inline websci1.ipynb
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 35 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Complexity of Breadth-First Search
The overall complexity π(ππ)If we have π βΌ π this is π(π2)If π βΌ π2 this is π(π3)However, if π is in the order of millions or billions both situations areprohibitive for breadth-first searchWe will need some method for approximating the distancesThe basic idea: estimate the distance bounds and make those boundstight
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 36 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Distance bounds
DefinitionLet πππ ,π‘ β Ξ π ,π‘ be the set of paths ππ ,π‘ such that |ππ ,π‘| = βπ ,π‘.
πππ ,π‘ is the set of shortest paths from π to π‘The shortest-path distance or just distance for short is a metric
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 37 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Distance is a metric
DefinitionA metric on a set X is a function π βΆ π Γ π β [0,β) and for allπ₯, π¦, π§ β π the following conditions hold:
1 π(π₯, π¦) β₯ 02 π(π₯, π¦) = 0 βΊ π₯ = π¦3 π(π₯, π¦) = π(π¦, π₯)4 π(π₯, π§) β€ π(π₯, π¦) + π(π¦, π§) (Triangle inequality)
The triangle inequality can be written as: π(π₯, π§) β₯ |π(π₯, π¦) β π(π¦, π§)|
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 38 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Distance bounds
Given any three nodes π , π‘, and π’
βπ ,π‘ β€ βπ ,π’ + βπ’,π‘ (1)βπ ,π‘ β₯ |βπ ,π’ β βπ’,π‘| (2)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 39 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Distance bounds
Observation 1Let π , π‘, π’ β π. If there exist a path ππ ,π‘ β πππ ,π‘ such that π’ β ππ ,π‘ thenβπ ,π‘ = βπ ,π’ + βπ’,π‘.
Observation 2Let π , π‘, π’ β π. If there exist a path ππ ,π’ β πππ ,π’ such that π‘ β ππ ,π’ orthere exist a path ππ‘,π’ β πππ‘,π’ such that π β ππ‘,π’ then βπ ,π‘ = |βπ ,π’ β βπ’,π‘|.
Figure: From βFast Shortest Path Distance Estimation in Large Networksβ byPotamias et al.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 40 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Landmarks
We will use a set of landmarks π· = π’1, π’2,β¦ π’πGiven a graph πΊ and a set of π landmarks π· we precompute thedistances between each node in π and each landmarksWe perform breadth-first search from all landmarks in π(ππ)π is small, e.g. π βΌ πππ(π)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 41 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Landmarks
Due to the triangle inequality we have:
πππ₯π
|βπ ,π’πβ βπ‘,π’π
| β€ βπ ,π‘ β€ ππππ
βπ ,π’π+ βπ‘,π’π
(3)
In other words, with πΏ = πππ₯π
|βπ ,π’πβ βπ‘,π’π
| and π = ππππ
βπ ,π’π+ βπ‘,π’π
the true distance βπ ,π‘ β [πΏ,π]Estimation is very fast: π(π)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 42 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Landmarks
Thus, if we have a βniceβ set of landmarks π· the approximation isvery quickIf we take π upper bound as our approximation following theObservation 1 this approximation is exact if there is a landmark in π·that is on a shortest path from π to π‘If for all pairs of nodes from π there exist at least one landmark in π·that lies on one shortest path from π to π‘ then our approximation isexactIn such case we say that landmarks cover all pairs of nodes from π
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 43 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Landmark selection problem
LANDMARKS-COVERGiven a graph πΊ = (π, πΈ) select the minimum number of landmarksπ· β π such that all pairs of nodes (π , π‘) β π Γ π are covered.
TheoremLANDMARKS-COVER is NP-hard.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 44 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Landmark selection problem
NODE-COVERGiven a graph πΊ = (π, πΈ) we say πβ² β π covers π if every link has atleast one endpoint in πβ².
Figure: Node Cover (Source Wikipedia)
Figure: Minimal Node Cover (Source Wikipedia)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 45 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Landmark selection problem
Proof.We reduce LANDMARKS-COVER to NODE-COVER by transforming aninstance of NODE-COVER to LANDMARKS-COVER.
1 Consider a solution π· to LANDMARKS-COVER. π· covers all pairs ofnodes and thus it covers also pairs at distance 1, which are connectedby a single link. Therefore all links from πΈ are covered by π· and π· isthe solution to NODE-COVER.
2 Consider a solution πβ² to NODE-COVER. Some nodes from πβ² areon the links of the shortest path ππ ,π‘ from π to π‘, and therefore πβ² isalso a solution to LANDMARKS-COVER.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 46 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Landmark selection strategies
We can not select the landmarks optimally so we have to select themusing heuristicsThe basic idea: select βcentralβ nodes, which lie on many shortestpathsBaseline: random selectionSelect nodes with many links because the chance is higher that theyare on many shortest pathsEstimate average shortest path for each node and select the nodeswith the smallest average
Average path estimation: select randomly few nodes, perform BFSfrom those nodes, calculate averages to those nodes
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 47 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Landmark selection strategies
We can not select the landmarks optimally so we have to select themusing heuristicsThe basic idea: select βcentralβ nodes, which lie on many shortestpathsBaseline: random selectionSelect nodes with many links because the chance is higher that theyare on many shortest pathsEstimate average shortest path for each node and select the nodeswith the smallest averageAverage path estimation: select randomly few nodes, perform BFSfrom those nodes, calculate averages to those nodes
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 47 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Approximating Distance Distribution
Experimental results
Datasets: Flickr-E βΌ 600πΎ nodes, Flickr-I βΌ 800πΎ nods, DBLPβΌ 220πΎ
Figure: From βFast Shortest Path Distance Estimation in Large Networksβ byPotamias et al.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 48 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Structural Roles
Pivotal nodes
We say that a node π is pivotal for a pair of distinct nodes π and π if πlies on every shortest path between π and ππ is not equal to either π and πB is pivotal for (A,C) and (A,D)However, it is not pivotal for (D,E)
2.5. EXERCISES 45
B
A
C D
E
F
Figure 2.13: In this example, node B is pivotal for two pairs: the pair consisting of A andC, and the pair consisting of A and D. On the other hand, node D is not pivotal for anypairs.
(b) Give an example of a graph in which every node is pivotal for at least two different
pairs of nodes. Explain your answer.
(c) Give an example of a graph having at least four nodes in which there is a single
node X that is pivotal for every pair of nodes (not counting pairs that include
X). Explain your answer.
2. In the next set of questions, we consider a related cluster of definitions, which seek to
formalize the idea that certain nodes can play a βgatekeepingβ role in a network. The
first definition is the following: we say that a node X is a gatekeeper if for some other
two nodes Y and Z, every path from Y to Z passes through X. For example, in the
graph in Figure 2.14, node A is a gatekeeper, since it lies for example on every path
from B to E. (It also lies on every path between other pairs of nodes β for example,
the pair D and E, as well as other pairs.)
This definition has a certain βglobalβ flavor, since it requires that we think about paths
in the full graph in order to decide whether a particular node is a gatekeeper. A more
βlocalβ version of this definition might involve only looking at the neighbors of a node.
Hereβs a way to make this precise: we say that a node X is a local gatekeeper if there
are two neighbors of X, say Y and Z, that are not connected by an edge. (That is,
for X to be a local gatekeeper, there should be two nodes Y and Z so that Y and Z
each have edges to X, but not to each other.) So for example, in Figure 2.14, node
A is a local gatekeeper as well as being a gatekeeper; node D, on the other hand, is a
local gatekeeper but not a gatekeeper. (Node D has neighbors B and C that are not
connected by an edge; however, every pair of nodes β including B and C β can be
connected by a path that does not go through D.)
So we have two new definitions: gatekeeper, and local gatekeeper. When faced with
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 49 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Structural Roles
Pivotal nodes
Pivotal nodes play an important role in connecting other nodesSome nodes are more βimportantβ than the other nodesCan you think of an example of a graph in which every node is pivotalfor at least one pair of nodes
Can you think of an example of a graph in which every node is pivotalfor at least two different pairs of nodes
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 50 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Structural Roles
Pivotal nodes
Pivotal nodes play an important role in connecting other nodesSome nodes are more βimportantβ than the other nodesCan you think of an example of a graph in which every node is pivotalfor at least one pair of nodesCan you think of an example of a graph in which every node is pivotalfor at least two different pairs of nodes
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 50 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Structural Roles
Gatekeepers
Similar to pivotal nodes is an idea that some nodes play aβgatekeepingβ role in networksWe say that a node π is a gatekeeper if, for some other distinct nodesπ and π, π lies on every path between π and ππ is not equal to either π and πA is a gatekeeper because it lies on every path between B and E, orD, and E
46 CHAPTER 2. GRAPHS
A
B
D
E
C F
Figure 2.14: Node A is a gatekeeper. Node D is a local gatekeeper but not a gatekeeper.
new mathematical definitions, a strategy that is often useful is to explore them first
through examples, and then to assess them at a more general level and try to relate
them to other ideas and definitions. Letβs try this in the next few questions.
(a) Give an example (together with an explanation) of a graph in which more than
half of all nodes are gatekeepers.
(b) Give an example (together with an explanation) of a graph in which there are no
gatekeepers, but in which every node is a local gatekeeper.
3. When we think about a single aggregate measure to summarize the distances between
the nodes in a given graph, there are two natural quantities that come to mind. One is
the diameter, which we define to be the maximum distance between any pair of nodes
in the graph. Another is the average distance, which β as the term suggests β is the
average distance over all pairs of nodes in the graph.
In many graphs, these two quantities are close to each other in value. But there are
graphs where they can be very different.
(a) Describe an example of a graph where the diameter is more than three times as
large as the average distance.
(b) Describe how you could extend your construction to produce graphs in which the
diameter exceeds the average distance by as large a factor as youβd like. (That is,
for every number c, can you produce a graph in which the diameter is more than
c times as large as the average distance?)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 51 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Structural Roles
Gatekeepers
The last definition has a βglobalβ flavorWe have to consider paths in the full graph to decide if a node is agatekeeperWe can think also about a βlocalβ version of a gatekeeperWe say that a node π is a local gatekeeper if it has two distinctneighbors π and π that are not connected to each otherπ is not equal to either π and π
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 52 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Structural Roles
Gatekeepers
Node A is also a local gatekeeper, e.g. B and E are neighbors butthey are not connected to each otherNode D is a local gatekeeper for B and C but it is not a gatekeeper
46 CHAPTER 2. GRAPHS
A
B
D
E
C F
Figure 2.14: Node A is a gatekeeper. Node D is a local gatekeeper but not a gatekeeper.
new mathematical definitions, a strategy that is often useful is to explore them first
through examples, and then to assess them at a more general level and try to relate
them to other ideas and definitions. Letβs try this in the next few questions.
(a) Give an example (together with an explanation) of a graph in which more than
half of all nodes are gatekeepers.
(b) Give an example (together with an explanation) of a graph in which there are no
gatekeepers, but in which every node is a local gatekeeper.
3. When we think about a single aggregate measure to summarize the distances between
the nodes in a given graph, there are two natural quantities that come to mind. One is
the diameter, which we define to be the maximum distance between any pair of nodes
in the graph. Another is the average distance, which β as the term suggests β is the
average distance over all pairs of nodes in the graph.
In many graphs, these two quantities are close to each other in value. But there are
graphs where they can be very different.
(a) Describe an example of a graph where the diameter is more than three times as
large as the average distance.
(b) Describe how you could extend your construction to produce graphs in which the
diameter exceeds the average distance by as large a factor as youβd like. (That is,
for every number c, can you produce a graph in which the diameter is more than
c times as large as the average distance?)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 53 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Structural Roles
Gatekeepers
Can you think of an example of a graph in which more than half of allnodes are gatekeepers
Can you think of an example of a graph in which there are nogatekeepers but in which every node is a local gatekeeper
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 54 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Structural Roles
Gatekeepers
Can you think of an example of a graph in which more than half of allnodes are gatekeepersCan you think of an example of a graph in which there are nogatekeepers but in which every node is a local gatekeeper
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 54 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Components
Connectivity
Given a graph one important question is whether every node canreach every other node by a pathIf that is the case the graph is connectedARPANET is a connected graph, as it should be always the case withcommunication and transportation networks
26 CHAPTER 2. GRAPHS
LINC
CASE
CARN
HARV
BBN
MIT
SDC
RAND
UTAHSRI
UCLA
STANUCSB
Figure 2.3: An alternate drawing of the 13-node Internet graph from December 1970.
Paths. Although weβve been discussing examples of graphs in many different areas, there
are clearly some common themes in the use of graphs across these areas. Perhaps foremost
among these is the idea that things often travel across the edges of a graph, moving from
node to node in sequence β this could be a passenger taking a sequence of airline flights, a
piece of information being passed from person to person in a social network, or a computer
user or piece of software visiting a sequence of Web pages by following links.
This idea motivates the definition of a path in a graph: a path is simply a sequence of
nodes with the property that each consecutive pair in the sequence is connected by an edge.
Sometimes it is also useful to think of the path as containing not just the nodes but also the
sequence of edges linking these nodes. For example, the sequence of nodes mit, bbn, rand,
ucla is a path in the Internet graph from Figures 2.2 and 2.3, as is the sequence case,
lincoln, mit, utah, sri, ucsb. As we have defined it here, a path can repeat nodes: for
example, sri, stan, ucla, sri, utah, mit is a path. But most paths we consider will not
do this; if we want to emphasize that the path we are discussing does not repeat nodes, we
can refer to it as a simple path.
Cycles. A particularly important kind of non-simple path is a cycle, which informally is a
βringβ structure such as the sequence of nodes linc, case, carn, harv, bbn, mit, linc
on the right-hand-side of Figure 2.3. More precisely, a cycle is a path with at least three
edges, in which the first and last nodes are the same, but otherwise all nodes are distinct.
There are many cycles in Figure 2.3: sri, stan, ucla, sri is as short an example as possible
according to our definition (since it has exactly three edges), while sri, stan, ucla, rand,
bbn, mit, utah, sri is a significantly longer example.
In fact, every edge in the 1970 Arpanet belongs to a cycle, and this was by design: it means
that if any edge were to fail (e.g. a construction crew accidentally cut through the cable),
there would still be a way to get from any node to any other node. More generally, cycles
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 55 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Components
ConnectivityBut, in e.g. a social network that is not always the caseThen we say that a graph is disconnected
2.2. PATHS AND CONNECTIVITY 29
Figure 2.6: The collaboration graph of the biological research center Structural Genomics ofPathogenic Protozoa (SGPP) [134], which consists of three distinct connected components.This graph was part of a comparative study of the collaboration patterns graphs of nineresearch centers supported by NIHβs Protein Structure Initiative; SGPP was an intermediatecase between centers whose collaboration graph was connected and those for which it wasfragmented into many small components.
researchers, and there is an edge between two nodes if the researchers appear jointly on a
co-authored publication. (Thus the edges in this second figure represent a particular formal
definition of collaboration β joint authorship of a published paper β and do not attempt to
capture the network of more informal interactions that presumably take place at the research
center.)
Components. Figures 2.5 and 2.6 make visually apparent a basic fact about disconnected
graphs: if a graph is not connected, then it breaks apart naturally into a set of connected
βpieces,β groups of nodes so that each group is connected when considered as a graph in
isolation, and so that no two groups overlap. In Figure 2.5, we see that the graph consists
of three such pieces: one consisting of nodes A and B, one consisting of nodes C, D, and E,
and one consisting of the rest of the nodes. The network in Figure 2.6 also consists of three
pieces: one on three nodes, one on four nodes, and one that is much larger.
To make this notion precise, we we say that a connected component of a graph (often
shortened just to the term βcomponentβ) is a subset of the nodes such that: (i) every node
in the subset has a path to every other; and (ii) the subset is not part of some larger set
with the property that every node can reach every other. Notice how both (i) and (ii)
Figure: Collaboration graph of a biological research center
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 56 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Components
Components
If a graph is disconnected than it breaks apart into a set of connectedcomponentsComponent: a subset of nodes such that
1 every node in the subset has a path to every other node in that subset(internally connected)
2 the subset is not a part of some larger connected set (stands inisolation from the rest of the graph)
2.2. PATHS AND CONNECTIVITY 29
Figure 2.6: The collaboration graph of the biological research center Structural Genomics ofPathogenic Protozoa (SGPP) [134], which consists of three distinct connected components.This graph was part of a comparative study of the collaboration patterns graphs of nineresearch centers supported by NIHβs Protein Structure Initiative; SGPP was an intermediatecase between centers whose collaboration graph was connected and those for which it wasfragmented into many small components.
researchers, and there is an edge between two nodes if the researchers appear jointly on a
co-authored publication. (Thus the edges in this second figure represent a particular formal
definition of collaboration β joint authorship of a published paper β and do not attempt to
capture the network of more informal interactions that presumably take place at the research
center.)
Components. Figures 2.5 and 2.6 make visually apparent a basic fact about disconnected
graphs: if a graph is not connected, then it breaks apart naturally into a set of connected
βpieces,β groups of nodes so that each group is connected when considered as a graph in
isolation, and so that no two groups overlap. In Figure 2.5, we see that the graph consists
of three such pieces: one consisting of nodes A and B, one consisting of nodes C, D, and E,
and one consisting of the rest of the nodes. The network in Figure 2.6 also consists of three
pieces: one on three nodes, one on four nodes, and one that is much larger.
To make this notion precise, we we say that a connected component of a graph (often
shortened just to the term βcomponentβ) is a subset of the nodes such that: (i) every node
in the subset has a path to every other; and (ii) the subset is not part of some larger set
with the property that every node can reach every other. Notice how both (i) and (ii)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 57 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Components
Components
Components are a first, global way of describing the structure of anetworkWithin a given component there might be a richer structureThe large component: a prominent node at the center and tightlylinked groups at the peripheryThis large component would break apart without the central node
2.2. PATHS AND CONNECTIVITY 29
Figure 2.6: The collaboration graph of the biological research center Structural Genomics ofPathogenic Protozoa (SGPP) [134], which consists of three distinct connected components.This graph was part of a comparative study of the collaboration patterns graphs of nineresearch centers supported by NIHβs Protein Structure Initiative; SGPP was an intermediatecase between centers whose collaboration graph was connected and those for which it wasfragmented into many small components.
researchers, and there is an edge between two nodes if the researchers appear jointly on a
co-authored publication. (Thus the edges in this second figure represent a particular formal
definition of collaboration β joint authorship of a published paper β and do not attempt to
capture the network of more informal interactions that presumably take place at the research
center.)
Components. Figures 2.5 and 2.6 make visually apparent a basic fact about disconnected
graphs: if a graph is not connected, then it breaks apart naturally into a set of connected
βpieces,β groups of nodes so that each group is connected when considered as a graph in
isolation, and so that no two groups overlap. In Figure 2.5, we see that the graph consists
of three such pieces: one consisting of nodes A and B, one consisting of nodes C, D, and E,
and one consisting of the rest of the nodes. The network in Figure 2.6 also consists of three
pieces: one on three nodes, one on four nodes, and one that is much larger.
To make this notion precise, we we say that a connected component of a graph (often
shortened just to the term βcomponentβ) is a subset of the nodes such that: (i) every node
in the subset has a path to every other; and (ii) the subset is not part of some larger set
with the property that every node can reach every other. Notice how both (i) and (ii)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 58 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Components
Giant components
Consider the global friendship network, i.e. a social network of theentire worldIs this network connected?Probably not, since a single person without friends constitutes aone-node componentβRemote tropical islandβ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 59 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Components
Giant components
But, people (you) have friends in other countriesYou are in the same component as those friendsAs well as their friends, their parents, their parent friends, theirdescendants, and so onYou are in the same component with the people that you never heardof, with totally different experiences, etc.This component seems likely to contain a significant fraction of theworldβs population, and this is in fact true!We call such a component a giant component
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 60 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Components
Giant components
It is an informal definition: a component that contains a significantfraction of nodesTypically, when a network contains a giant component it containsalmost always only oneWhy?
If we have two giant components with e.g. 1 billion people in eachIt takes only a single link from a node from the first component to anode from the second to connect those two componentsPractically, such a link always exists
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 61 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Components
Giant components
It is an informal definition: a component that contains a significantfraction of nodesTypically, when a network contains a giant component it containsalmost always only oneWhy?If we have two giant components with e.g. 1 billion people in eachIt takes only a single link from a node from the first component to anode from the second to connect those two componentsPractically, such a link always exists
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 61 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Degrees
Degree
Degree: of a node is the number of links connected to itMeasures how βimportantβ a node isWe denote the degree of node π by ππEvery link has two ends, hence there are 2π link ends in anundirected networkThe number of link ends is equal to the sum of the degrees of all thenodes
2π =πβπ=1
ππ
Average degree
π = 1π
πβπ=1
ππ =2ππ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 62 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Degrees
Degree in directed networks
In directed networks we have in-degree and out-degreeIn-degree: ππππ is the number of ingoing linksOut-degree: πππ’π‘π is the number of outgoing linksThe number of links is equal to the sum of in-degrees, and is alsoequal to the sum of out-degrees
π =πβπ=1
ππππ =πβπ=1
πππ’π‘π
Average in-degree and average out-degree
πππ = 1π
πβπ=1
ππππ = 1
ππβπ=1
ππππ’π‘ = πππ’π‘
π = πππ = πππ’π‘ = ππ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 63 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Degrees
Degree distribution
Interesting statistics: degree distributionHow many nodes have a given degreeTypically, we will normalize by the total number of nodes to obtainprobabilitiesWe can visualize it with a histogram
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 64 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Degrees
Degree distribution
0 5 10 15 20 25 30 35 40k
0.00
0.05
0.10
0.15
0.20
0.25p(k
)Degree distribution: k=6.597403
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 65 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Node Degrees
Ipython notebook
IPython Notebook examplehttp://kti.tugraz.at/staff/socialcomputing/courses/webscience/websci1.zip
Command Lineipython notebook βpylab=inline websci1.ipynb
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 66 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
A special kind of paths are triangles (closed triads)Three nodes all connected to each otherClosely related to local gatekeepersLocal clustering coefficient of a node is a measure how transitiveconnections in a network areasI.e. a friend of a friend is also a friendClustering coefficient: fraction of node π neighbors that arethemselves connected (we denote it with πΆπ, Ξ(π) is the set ofneighbors of π, πΈ is the set of all links)
πΆπ =2|πππ |
ππ(ππ β 1), π, π β Ξ(π), πππ β πΈ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 67 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
πΆππππ¦ = 1
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 68 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
πΆππππ¦ = 1
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 68 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
πΆππππ¦ = 13
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 69 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
πΆππππ¦ = 13
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 69 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
πΆππππ¦ = 0What can you say about the clustering coefficient of a localgatekeeper?It is less than 1
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 70 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
πΆππππ¦ = 0What can you say about the clustering coefficient of a localgatekeeper?
It is less than 1
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 70 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
πΆππππ¦ = 0What can you say about the clustering coefficient of a localgatekeeper?It is less than 1
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 70 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient
Some statistics: average clustering coefficient
πΆ = 1π β
ππΆπ
Clustering coefficient distributionHow many nodes have a clustering coefficient in a certain rangeWe can visualize it with a histogram
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 71 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Clustering coefficient distribution
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0c
0
5
10
15
20
25
30f(c)
CC distribution: C=0.573137
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 72 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Clustering
Ipython notebook
IPython Notebook examplehttp://kti.tugraz.at/staff/socialcomputing/courses/webscience/websci1.zip
Command Lineipython notebook βpylab=inline websci1.ipynb
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 73 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Les Miserables: Network Statistics Example
Les Miserables
Random Graph n=77,m=216
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 74 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Les Miserables: Network Statistics Example
Distribution of distances
0 1 2 3 4 5 6`
0.0
0.1
0.2
0.3
0.4
0.5p(`
)
Distribution of distances: `=2.572796,`max=5
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 75 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Les Miserables: Network Statistics Example
Degree distribution
0 2 4 6 8 10 12 14k
0.00
0.05
0.10
0.15
0.20
0.25
0.30p(k
)Degree distribution: k=5.610390
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 76 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Les Miserables: Network Statistics Example
Clustering coefficient distribution
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0c
0
5
10
15
20
25
30
35f(c)
CC distribution: C=0.070041
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 77 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Les Miserables: Network Statistics Example
Ipython notebook
IPython Notebook examplehttp://kti.tugraz.at/staff/socialcomputing/courses/webscience/websci1.zip
Command Lineipython notebook βpylab=inline websci1.ipynb
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 78 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Modeling networks
Observe: e.g. collect data by crawlingMeasure: e.g. how many nodes, how many linksQuantify: e.g. distances, degrees, clustering coefficientMake a model: e.g. how networks are createdPredict with the model and validate: e.g. implement and evaluate(compare with the real networks)Apply: engineering approach to implementing the model in thesoftware
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 79 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Random graphs
Random graph: a model network where some properties take fixedvalues and other properties are randomThe simplest model: we fix π and π but place links at randomIterate over links, for each link select a random pair of nodesCreate a simple graph, i.e. no self-links are allowedWe will call this model πΊ(π,π)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 80 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
πΊ(π,π) model
Equivalent definition: we choose a graph uniformly at random amongall graphs with π nodes and π linksMathematically, this is a proper definitionA random graph defines an ensemble of networks, i.e. a probabilitydistribution π(πΊ) over possible networks:π(πΊ) = 1
Ξ© if a network has π nodes and π links and 0 otherwiseΞ© is the total number of such networks
Ξ© = ((π2)π )
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 81 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
πΊ(π,π) model
Equivalent definition: we choose a graph uniformly at random amongall graphs with π nodes and π linksMathematically, this is a proper definitionA random graph defines an ensemble of networks, i.e. a probabilitydistribution π(πΊ) over possible networks:π(πΊ) = 1
Ξ© if a network has π nodes and π links and 0 otherwiseΞ© is the total number of such networksΞ© = ((π
2)π )
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 81 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Random graphs
Properties of random graphs: mean values of the ensemble (typicalbehavior)Calculated as expectations of the probability distribution or a randomvariable
DefinitionThe expectation of a random property π(πΊ) of a random graph ensembleπΊ is
πΈ[π] = βπΊ
π(πΊ)π(πΊ)
when this sum is βwell-definedβ, otherwise the expectation does not exist.
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 82 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
πΊ(π,π) vs. πΊ(π, π) model
Some mean values are easy to calculate, i.e. the average number oflinks is πAverage degree π = 2π
πOther properties are more difficult to calculateA better approach is to fix π and π, which is the probability of linksbetween nodesThis model is πΊ(π, π)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 83 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
πΊ(π, π) model
Technical definition is again in terms of an ensemble, i.e. a probabilitydistribution over all possible networksWhat is the total number of possible simple graphs?
Ξ© = 2(π2)
But the π(πΊ) is not uniform anymore, i.e. in general π(πΊ) β 1Ξ©
Some graphs are more probable then other graphs in case of πΊ(π, π)What is the probability of a graph that has exactly π links
π(πΊ) = ππ(1 β π)(π2)βπ
Other names for πΊ(π, π): ErdΕsβRΓ©nyi, Bernoulli, Poisson
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 84 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
πΊ(π, π) model
Technical definition is again in terms of an ensemble, i.e. a probabilitydistribution over all possible networksWhat is the total number of possible simple graphs?
Ξ© = 2(π2)
But the π(πΊ) is not uniform anymore, i.e. in general π(πΊ) β 1Ξ©
Some graphs are more probable then other graphs in case of πΊ(π, π)What is the probability of a graph that has exactly π links
π(πΊ) = ππ(1 β π)(π2)βπ
Other names for πΊ(π, π): ErdΕsβRΓ©nyi, Bernoulli, Poisson
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 84 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
πΊ(π, π) model
Technical definition is again in terms of an ensemble, i.e. a probabilitydistribution over all possible networksWhat is the total number of possible simple graphs?
Ξ© = 2(π2)
But the π(πΊ) is not uniform anymore, i.e. in general π(πΊ) β 1Ξ©
Some graphs are more probable then other graphs in case of πΊ(π, π)What is the probability of a graph that has exactly π links
π(πΊ) = ππ(1 β π)(π2)βπ
Other names for πΊ(π, π): ErdΕsβRΓ©nyi, Bernoulli, Poisson
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 84 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Mean number of linksProbability that a simple graph πΊ has π links:π(πΊ) = ππ(1 β π)(
π2)βπ
The number of graphs with π nodes and π links: ((π2)π )
The total probability of drawing a graph with π links from theensemble
π(π) = ((π2)π )ππ(1 β π)(
π2)βπ (4)
This is binomial distributionThe expected (mean) number of links:
πΈ[π] =(π2)
βπ=0
ππ(π) (5)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 85 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Linearity of expectation
TheoremSuppose π and π are discrete r.v. such that πΈ[π] < β and πΈ[π] < β.Then,
πΈ[ππ] = ππΈ[π],βπ β βπΈ[π + π] = πΈ[π] + πΈ[π]
Proof left for exercise ;)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 86 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Bernoulli random variable
PMF
π(π₯) =β§β¨β©1 β π if π₯ = 0π if π₯ = 1
Bernoulli r.v. with parameter πModels situations with two outcomesE.g. coin flip
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 87 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Binomial random variable
Suppose π1,β¦ ,ππ are independent and identical Bernoulli r.v.The Binomial r.v. with parameters (π, π) is
π = π1 +β―+ππ
Models the number of successes in π Bernoulli trialsE.g. the number of heads in π coin flips
PMF
π(π) = (ππ)(1 β π)πβπππ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 88 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Expectation: Bernoulli r.v.
PMF
π(π₯) =β§β¨β©1 β π if π₯ = 0π if π₯ = 1
πΈ[π] = (1 β π) β 0 + π β 1 = π
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 89 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Expectation: Binomial r.v.
Binomial is the sum of π1,β¦ ,ππ, independent and identicalBernoulli r.v.
π = π1 +β―+ππ
πΈ[π] = πΈ[π1 + β― + ππ] = πΈ[π1] + β―+ πΈ[ππ] = ππ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 90 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Mean number of links and mean degree
πΈ[π] =(π2)
βπ=0
ππ(π) = (π2)π = π(π β 1)2 π (6)
π = πΈ[π] = 2πΈ[π]π = 2
ππ(π β 1)
2 π = (π β 1)π (7)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 91 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Degree distribution
A given node is connected with independent probability π to (π β 1)other nodesProbability of being connected to exactly π other nodes:ππ(1 β π)πβ1βπ
There are (πβ1π ) ways of selecting π nodes from π β 1 nodes
The total probability of having a degree π:
π(π) = (π β 1π )ππ(1 β π)πβ1βπ (8)
This is again binomial distributionThus, πΊ(π, π) has a binomial degree distribution
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 92 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Degree distribution (Binomial)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20k
0.00
0.05
0.10
0.15
0.20
0.25
0.30p(k
)Degree distribution random graph (n= 21); differing p values
p= 0. 10
p= 0. 40
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 93 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Degree distribution
In many cases π is large, π is small and the average degree, i.e.(π β 1)π is constantLet us introduce π = (π β 1)π
π(π) = (π β 1)(π β 2)β¦ (π β π)π!
ππ
(π β 1)π(1 β π
π β 1)πβ1βπ
= (π β 1)(π β 2)β¦ (π β π)(π β 1)π
ππ
π!(1 β π
πβ1)πβ1
(1 β ππβ1)π
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 94 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Degree distribution
What is limπββπ(π)
limπββ(π β 1)(π β 2)β¦ (π β π)
(π β 1)π= 1
limπββ(1 β ππ β 1)
π = 1
limπββ(1 β ππ β 1)
πβ1 = ?
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 95 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Degree distribution
The definiton of π: π = limπ₯ββ(1 + 1π₯)
π₯
Let us substitute: π β 1 = βπ₯π
limπββ(1 β ππ β 1)
πβ1 = limπββ(1 + 1π₯)
βπ₯π
= limπββ((1 + 1π₯)
π₯)βπ
= πβπ (9)
Put it all together: π(π) = ππ
π! πβπ
Poisson distribution
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 96 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Degree distribution (Poisson)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20k
0.00
0.05
0.10
0.15
0.20
0.25
0.30p(k
)Degree distribution random graph (n= 21); differing Ξ» values
Ξ»= 2. 0
Ξ»= 8. 0
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 97 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Degree distribution (Poisson approximation)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20k
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18p(k
)Degree distribution random graph (n= 21); Poisson approx. binomial
p= 0. 40
Ξ»= 8. 0
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 98 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Clustering coefficient
Probability that two neighbors of a node are themselves connectedWhat is this probability?
πΆ = π
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 99 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Clustering coefficient
Probability that two neighbors of a node are themselves connectedWhat is this probability?πΆ = π
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 99 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
What is the size of the largest component in πΊ(π, π)?How does it relate to π and π?Is there a giant component (GC)?How many nodes does it occupy?
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 100 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Two special casesWhen π = 0 there are no links in the graph and we have πcomponents of size 1The size of the largest component is constant and does not dependon πWhen π = 1 there are links between all pairs (complete graph) andthe largest component is of size πThere is a GC and its size depends on π, i.e. it is exactly π
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 101 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Two qualitatively different states (phases)What needs to happen with the random graph when we start withπ = 0 and slowly increase π until we reach π = 1
With increasing π the largest component will become bigger until itturns into a GCUntil it reaches the size of πTransition between two extremes, a.k.a. phase transitionThere will be a critical value for π at which phase transition occurs
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 102 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Two qualitatively different states (phases)What needs to happen with the random graph when we start withπ = 0 and slowly increase π until we reach π = 1With increasing π the largest component will become bigger until itturns into a GCUntil it reaches the size of πTransition between two extremes, a.k.a. phase transitionThere will be a critical value for π at which phase transition occurs
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 102 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
In the second case the size of the GC is in proportion to π, i.e. it isexactly πThat will be our definition of a GCWith this definition we can calculate the size of the GC in πΊ(π, π)With π’ we denote the fraction of nodes that do not belong to the GCThe size of the GC: π = 1 β π’π’ is the probability that a randomly chosen node does not belong tothe GC
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 103 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Probability of π β π and π β πΊπΆ is π’For the above to hold it must βπ β π:
1 (π, π) β πΈ or2 (π, π) β πΈ βΉ π β πΊπΆ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 104 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Probability of (1): 1 β π
Probability of (2): ππ’Total prob. of π not connected to GC via π: 1 β π + ππ’Total prob. of π not connected to GC via any other node:(1 β π + ππ’)πβ1
π’ = (1 β π + ππ’)πβ1 = (1 β π(1 β π’))πβ1 (10)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 105 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Probability of (1): 1 β πProbability of (2): ππ’
Total prob. of π not connected to GC via π: 1 β π + ππ’Total prob. of π not connected to GC via any other node:(1 β π + ππ’)πβ1
π’ = (1 β π + ππ’)πβ1 = (1 β π(1 β π’))πβ1 (10)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 105 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Probability of (1): 1 β πProbability of (2): ππ’Total prob. of π not connected to GC via π: 1 β π + ππ’
Total prob. of π not connected to GC via any other node:(1 β π + ππ’)πβ1
π’ = (1 β π + ππ’)πβ1 = (1 β π(1 β π’))πβ1 (10)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 105 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Probability of (1): 1 β πProbability of (2): ππ’Total prob. of π not connected to GC via π: 1 β π + ππ’Total prob. of π not connected to GC via any other node:(1 β π + ππ’)πβ1
π’ = (1 β π + ππ’)πβ1 = (1 β π(1 β π’))πβ1 (10)
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 105 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
With π = π(π β 1):
π’ = (1 β ππ β 1(1 β π’))πβ1
ππ(π’) = (π β 1)ππ(1 β ππ β 1(1 β π’))
What happens in the limit of large network size, i.e. when π β βπ
πβ1(1 β π’) β 0Taylorβs expansion about 1: ππ(1 β π₯) β βπ₯ for small π₯
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 106 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
ππ(π’) β β(π β 1) ππ β 1(1 β π’)
ππ(π’) β βπ(1 β π’)π’ β πβπ(1βπ’)
With π = 1 β π’:
1 β π = πβππ
π = 1 β πβππ
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 107 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)Expression for the size og GC in the limit of large network sizeNo close form solution but we can solve it graphically
0.0 0.2 0.4 0.6 0.8 1.0S
0.0
0.2
0.4
0.6
0.8
1.0
y
y=S
y= 1β eβkS, k= 2. 0
y= 1β eβkS, k= 0. 5
y= 1β eβkS, k= ?
Figure: Giant component in a random graphDenis Helic (ISDS, TU Graz) Networks March 28, 2020 108 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant component in πΊ(π, π)
Phase transition occurs when gradients are equal for π = 0Take derivatives of both sides and substitute π = 0:
1 = ππβππ
π = 1
Recollect π = π(π β 1)For π < 1 no GCFor π > 1 GCFor π = 1 phase transition
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 109 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Giant components demo
NetLogo Examplehttp://www.netlogoweb.org/launch#http://www.netlogoweb.org/assets/modelslib/SampleModels/Networks/GiantComponent.nlogo
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 110 / 111
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
...
.
Random Graph
Diameter of πΊ(π, π)
Average degree: πStarting at a random πAt distance 1 we have π other nodesAt distance 2 we have π β π other nodesAt distance π we π
π other nodes
We can repeat this until ππ β π
Or equivallently π β ππ(π)ππ(π)
Diameter grows as a logarithm of the number of nodes
Denis Helic (ISDS, TU Graz) Networks March 28, 2020 111 / 111