Graph Theory and Network Measurment

37
Graph Theory and Network Measurment Social and Economic Networks Jafar Habibi MohammadAmin Fazli Social and Economic Networks 1

Transcript of Graph Theory and Network Measurment

Page 1: Graph Theory and Network Measurment

Graph Theory and Network Measurment

Social and Economic Networks

Jafar Habibi

MohammadAmin Fazli

Social and Economic Networks 1

Page 2: Graph Theory and Network Measurment

ToC

• Network Representation

• Basic Graph Theory Definitions

• (SE) Network Statistics and Characteristics

• Some Graph Theory

• Readings:• Chapter 2 from the Jackson book

• Chapter 2 from the Kleinberg book

Social and Economic Networks 2

Page 3: Graph Theory and Network Measurment

Network Representation

• N = {1,2,…,n} is the set of nodes (vertices)

• A graph (N,g) is a matrix [gij]n×n where gij represents a link (relation, edge) between node i and node j

• Weighted network: 𝑔𝑖𝑗 ∈ 𝑅

• Unweighted network: 𝑔𝑖𝑗 ∈ {0,1}

• Undirected network: 𝑔𝑖𝑗 = 𝑔𝑗𝑖

Social and Economic Networks 3

Page 4: Graph Theory and Network Measurment

Network Representation

• Edge list representation: 𝑔 = 12, 23

• Edge addition and deletion: g+ij, g-ij

• Network isomorphism between (N, g) and (N’, g’): ∃𝑓:𝑁→𝑁′𝑔𝑖𝑗

= 𝑔𝑓 𝑖 𝑓(𝑗)′

• (N’,g’) is a subnetwork of g’ if 𝑁′ ⊆ 𝑁, 𝑔′⊆ 𝑔

• Induced (restricted graphs): 𝑔 𝑆 𝑖𝑗 = 𝑔𝑖𝑗 𝑖𝑓 𝑖 ∈ 𝑆, 𝑗 ∈ 𝑆

0

Social and Economic Networks 4

Page 5: Graph Theory and Network Measurment

Path and Cycles

• A Walk is a sequence of edges connecting a sequence of nodes𝑊 = 𝑖1𝑖2, 𝑖2𝑖3, 𝑖3𝑖4, … , 𝑖𝑛−1𝑖𝑘

∀𝑝: 𝑖𝑝𝑖𝑝+1 ∈ 𝑔

• A Path is a walk in which no node repeats

• A Cycle is a path which starts and ends at the same node𝑖𝑘 = 𝑖1

• The number of walks between two nodes:

Social and Economic Networks 5

Page 6: Graph Theory and Network Measurment

Components & Connectedness

• (N,g) is connected if every two nodes in g are connected by some path.

• A component of a network (N,g) is a non-empty subnetwork (N’,g’) which is• (N’,g’) is connected• If 𝑖 ∈ 𝑁′ and 𝑖𝑗 ∈ 𝑔 then 𝑗 ∈ 𝑁′and 𝑖𝑗 ∈ 𝑔′

• Strongly connectivity and strongly connected components for directed graphs.

• C(N,g) = C(g) = set of g’s connected components

• The link ij is a bridge iff g-ij has more components than g

• Giant component is a component which contains a significant fraction of nodes.• There is usually at most one giant component

Social and Economic Networks 6

Page 7: Graph Theory and Network Measurment

Special Kinds of Graphs

• Star:

• Complete Graph:

Social and Economic Networks 7

Page 8: Graph Theory and Network Measurment

Special Kinds of Graphs

• Tree: a connected network with no cycle • A connected network is a tree iff it has n-1 links

• A tree has at least two leaves

• In a tree, there is a unique path between any pair of nodes

• Forest: a union of trees

• Cycle: a connected graph with n edges in which the degree of every node is 2.

Social and Economic Networks 8

Page 9: Graph Theory and Network Measurment

Neighborhood

• 𝑁𝑖 𝑔 = 𝑗: 𝑔𝑖𝑗 = 1

• 𝑁𝑖2 𝑔 = 𝑁𝑖 𝑔 ∪ 𝑗∈𝑁𝑖 𝑔 𝑁𝑗 𝑔

• 𝑁𝑖𝑘 𝑔 = 𝑁𝑖(𝑔) ∪ 𝑗∈𝑁𝑖 𝑔 𝑁𝑗

𝑘−1 𝑔

• 𝑁𝑆𝑘 𝑔 = 𝑖∈𝑆𝑁𝑖

𝑘

• Degree: 𝑑𝑖 𝑔 = #𝑁𝑖(𝑔)

• For directed graphs out-degree and in-degree is defined

Social and Economic Networks 9

Page 10: Graph Theory and Network Measurment

Degree Distribution

• Degree distribution of a network is a description of relative frequencies of nodes that have different degrees.

• P(d) is the fraction of nodes that have degree d under the degree distribution P.

• Most of social and economical networks have scale-free degree distribution

• A scale-free (power-law) distribution P(d) satisfies:𝑃 𝑑 = cd−𝛾

• Free of Scale: P(2) / P(1) = P(20)/P(10)

Social and Economic Networks 10

Page 11: Graph Theory and Network Measurment

Degree Distribution

Social and Economic Networks 11

Page 12: Graph Theory and Network Measurment

Degree Distribution

• Scale-free distributions have fat-tails• For large degrees the number of

nodes that degree is much more than the random graphs.

Social and Economic Networks 12

log 𝑃 𝑑 = log 𝑐 − 𝛾log(𝑑)

Page 13: Graph Theory and Network Measurment

Diameter & Average Path Length

• The distance between two nodes is the length of the shortest path between them.

• The diameter of a network is the largest distance between any two nodes.

• Diameter is not a good measure to path lengths, but it can work as an upper-bound

• Average path length is a better measure.

Social and Economic Networks 13

Page 14: Graph Theory and Network Measurment

Diameter & Average Path Length

• The tale of Six-degrees of Separation• The diameter of SENs is 6!!!

• Based on Milgram’s Experiment

• The true story:• The diameter of SENs may be

high

• The average path length is low [𝑂(log 𝑛 )]

Social and Economic Networks 14

Page 15: Graph Theory and Network Measurment

Diameter & Average Path Length

• The distance distribution in graph of all active Microsoft Instant Messenger user accounts

Social and Economic Networks 15

Page 16: Graph Theory and Network Measurment

Cliquishness & Clustering

• A clique is a maximal complete subgraph of a given network (𝑆 ⊆ 𝑁, 𝑔 𝑆 is a complete network and for any 𝑖 ∈ 𝑁 ∖ 𝑆: 𝑔 𝑆∪ 𝑖 is not complete.

• Removing an edge from a network may destroy the whole clique structure (e.g. consider removing an edge from a complete graph).

• An approximation: Clustering coefficient,

• This is the overall clustering coefficient

Social and Economic Networks 16

Page 17: Graph Theory and Network Measurment

Cliquishness & Clustering

• Individual Clustering Coefficient for node i:

• Average Clustering Coefficient:

• These values may differ

Social and Economic Networks 17

Page 18: Graph Theory and Network Measurment

Cliquishness & Clustering

Social and Economic Networks 18

Page 19: Graph Theory and Network Measurment

Cliquishness & Clustering

• Average clustering goes to 1

• Overall clustering goes to 0

Social and Economic Networks 19

Page 20: Graph Theory and Network Measurment

Transitivity

• Consider a directed graph g, one can keep track of percentage of transitive triples:

Social and Economic Networks 20

Page 21: Graph Theory and Network Measurment

Centrality

• Centrality measures show how much central a node is.

• Different measures for centrality have been developed.

• Four general categories:• Degree: how connected a node is

• Closeness: how easily a node can reach other nodes

• Betweenness: how important a node is in terms of connecting other nodes

• Neighbors’ characteristics: how important, central or influential a node’s neighbors are

Social and Economic Networks 21

Page 22: Graph Theory and Network Measurment

Degree Centrality

• A simple measure:𝑑𝑖 𝑔

𝑛 − 1

Social and Economic Networks 22

Page 23: Graph Theory and Network Measurment

Closeness Centrality

• A simple measure:

𝑗≠𝑖 𝑙 𝑖, 𝑗

𝑛 − 1

−1

• Another measure (decay centrality)

𝑗≠𝑖

𝛿𝑙(𝑖,𝑗)

• What does it measure for 𝛿 = 1?

Social and Economic Networks 23

Page 24: Graph Theory and Network Measurment

Betweenness Centrality

• A simple measure:

Social and Economic Networks 24

Page 25: Graph Theory and Network Measurment

Neighbor-Related Measures

• Katz prestige:

𝑃𝑖𝐾 𝑔 =

𝑗≠𝑖

𝑔𝑖𝑗𝑃𝑗𝐾(𝑔)

𝑑𝑗 𝑔

• If we define 𝑔𝑖𝑗 =𝑔𝑖𝑗

𝑑𝑗 𝑔, we have

𝑃𝐾 𝑔 = 𝑔𝑃𝐾 𝑔

or

𝐼 − 𝑔 𝑃𝐾 𝑔 = 0

• Calculating Katz prestige reduces to finding the unit eigenvector.

Social and Economic Networks 25

Page 26: Graph Theory and Network Measurment

Eigenvectors & Eigenvalues

• For an 𝑛 × 𝑛 matrix T an eigenvector v is a 𝑛 × 1 vector for which

∃𝜆 𝑇𝑣 = 𝜆𝑣

• Left-hand eigenvector:𝑣𝑇 = 𝜆𝑣

• Perron-Ferobenius Theorem: if T is a non-negative column stochastic matrix (the sum of entries in each column is one), then there exists a right-hand eigenvector v and has a corresponding eigenvalue 𝜆 = 1.

• The same is true for right-hand eigenvectors and row stochastic matrixes.

Social and Economic Networks 26

Page 27: Graph Theory and Network Measurment

Eigenvectors & Eigenvalues

• How to calculate:𝑇 − 𝜆𝐼 𝑣 = 0

• For this equation to have a non-zero solution v, T − 𝜆𝐼 must be singular (non-invertible):

det 𝑇 − 𝜆𝐼 = 0

Social and Economic Networks 27

Page 28: Graph Theory and Network Measurment

Neighbor-Related Measures

• Computing Katz prestige for the following

• Katz prestige ≈ degree!

• Not interesting on undirected networks, but interesting on directed networks.

Social and Economic Networks 28

Page 29: Graph Theory and Network Measurment

Neighbor-Related Measures

• To solve the problem: Eigenvector Centrality: 𝜆𝐶𝑖𝑒 𝑔 = 𝑗 𝑔𝑖𝑗𝐶𝑗

𝑒 𝑔

𝜆𝐶𝑒 𝑔 = 𝑔𝐶𝑒(𝑔)

• Katz2: 𝑃𝐾2 𝑔, 𝑎 = 𝑎𝑔𝐼 + 𝑎2𝑔2𝐼 + 𝑎3𝑔3𝐼 + ⋯

𝑃𝐾2 𝑔, 𝑎 = 1 + 𝑎𝑔 + 𝑎2𝑔2 +⋯ 𝑎𝑔𝐼 = 𝐼 − 𝑎𝑔 −1𝑎𝑔𝐼

• Bonacich: 𝐶𝑒𝐵 𝑔, 𝑎, 𝑏 = 1 − 𝑏𝑔 −1𝑎𝑔𝐼

Social and Economic Networks 29

Page 30: Graph Theory and Network Measurment

Final Discussion about Centrality Measures

Social and Economic Networks 30

Page 31: Graph Theory and Network Measurment

Matching

• A matching is a subset of edges with no common end-point.

• Finding the maximum matching is an interesting problem specially in bipartite graphs (recall Matching Markets)• A bipartite network (N,g) is one for which N can be partitioned into two sets A

and B such that each edge in g resides between A and B.

• A perfect matching infects all vertices.

• Philip-Hall Theorem: For a bipartite graph (N,g), there exists a matching of a set 𝐶 ⊆ 𝐴, if and only if

∀𝑆⊆𝐶 𝑁𝑆 𝑔 ≥ 𝑆

Proof: see the whiteboard.

Social and Economic Networks 31

Page 32: Graph Theory and Network Measurment

Set Covering and Independent Set

• Independent Set: a subset of nodes 𝐴 ⊆ 𝑉 for which for each 𝑖, 𝑗 ∈ 𝐴, 𝑖𝑗∉ 𝑔

• Consider two graphs (N,g) and (N,g’) such that 𝑔 ⊂ 𝑔′. • Any independent set of g’ is an independent set of g.

• If 𝑔 ≠ 𝑔′, there exists an independent sets of g that are not independent set of g’.

• Free-rider game on networks: • Each player buy the book or he can borrow the book freely from one of the book

owners in his neighborhood.

• Indirect borrowing is not permitted.

• Each player prefer paying for the book over not having it.

• The equilibrium is where the nodes of a maximal independent set pays for the book.

Social and Economic Networks 32

Page 33: Graph Theory and Network Measurment

Coloring

• Example: We have a network of researchers in which an edge between node i and j means i or j wants to attends the others presentation. How many time slots are needed to schedule all the presentations?

• In each time slot, we should color the vertices in a way no two neighboring nodes get the same colors: The Coloring Problem.

• The minimum number of colors needed colors: the chromatic number

• Many number of results, most famous is the 4-color problem: Every planar graph can be colored with 4 colors.• A planar graph is a graph which can be drawn in a way that no two edges

cross each other.

Social and Economic Networks 33

Page 34: Graph Theory and Network Measurment

Coloring

• Intuition: The 6-color problem:• Any planar graph can be colored with 6 colors.

• Proof sktech:• Euler formula: v+f = e+2• 𝑒 ≤ 3𝑣 − 6

• 𝛿 ≤ 5

• Recursive coloring

• Four color is needed:

Social and Economic Networks 34

Page 35: Graph Theory and Network Measurment

Eulerian Tours & Hamilton Cycles

• Euler Tour: a closed walk which pass through all edges

• Euler theorem: A connected network g has a closed walk that involveseach link exactly once if and only if the degree of each node is even.

• Proof sketch: • Induction on the number of edges

Social and Economic Networks 35

Page 36: Graph Theory and Network Measurment

Eulerian Tours & Hamilton Cycles

• Hamilton Cycle: a cycle that passes through all vertices

• Dirac theorem: If a network has 𝑛 ≥ 3 nodes and each node has degree of at least n/2, then the network has a Hamilton cycle.

• Proof sketch:• Graph is connected

• Consider the longest path and prove it is in fact a cycle

• Consider a node outside this cycle

Social and Economic Networks 36

Page 37: Graph Theory and Network Measurment

Eulerian Tours and Hamilton Cycles

• Chvatal Theorem: Order the nodes of a network of 𝑛 ≥ 3 nodes inincreasing order of their degrees, so that node 1 has the lowest degree and node n has the highest degree. If the degrees are such that 𝑑𝑖 ≤ 𝑖 for some 𝑖 < 𝑛/2 implies 𝑑𝑛−𝑖 ≥ 𝑛 − 𝑖, then the network has a Hamilton cycle.

Social and Economic Networks 37