(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
-
Upload
kyunghoon-kim -
Category
Education
-
view
301 -
download
3
description
Transcript of (Tentative) Network Analysis with networkX : Fundamentals of network theory-2
Kyunghoon Kim
Network Analysis with networkX
Fundamentals of network theory-2
2014. 05. 28.
UNIST Mathematical Sciences
Kyunghoon Kim ( [email protected] )
5/28/2014 Fundamentals of network theory-2 1
Kyunghoon Kim
Indexing
5/28/2014 Fundamentals of network theory-2 2
Google Glasses is a
computer with a
head-mounted
display.
He wore thick
glasses. He worked
in google corporation.
He wore glasses to
be able to read signs
at a distance.
glasses
is
a
computer
with
head-mounted
display
he
1 2
1 2 3
1
1 3
1
1
1
1
2 3
wore
thick
worked
in
corporation
to
be
able
read
…
2 3
2
2
2
2
3
3
3
3
1 2 3
Kyunghoon Kim
Indexing
5/28/2014 Fundamentals of network theory-2 3
Google Glasses is a
computer with a
head-mounted
display.
He wore thick
glasses. He worked
in google corporation.
He wore glasses to
be able to read signs
at a distance.
glasses
is
a
computer
with
head-mounted
display
he
1 2
1 2 3
1
1 3
1
1
1
1
2 3
wore
thick
worked
in
corporation
to
be
able
read
…
2 3
2
2
2
2
3
3
3
3
1 2 3
Kyunghoon Kim
Indexing with position
5/28/2014 Fundamentals of network theory-2 4
Google Glass is a
computer with a
head-mounted
display.
He wore thick
glasses. He worked
in google corporation.
He wore glasses to
be able to read signs
at a distance.
glasses
is
a
computer
with
head-mounted
display
he
1-1 2-8
1-2 2-4 3-3
1-3
1-4 3-11
1-5
1-6
1-7
1-8
2-1 2-5 3-1
wore
thick
worked
in
corporation
to
be
able
read
…
2-2 3-2
2-3
2-6
2-7
2-9
3-4 3-7
3-5
3-6
1 2 3
Kyunghoon Kim
Indexing with position
5/28/2014 Fundamentals of network theory-2 5
Google Glass is a
computer with a
head-mounted
display.
He wore thick
glasses. He worked
in google corporation.
He wore glasses to
be able to read signs
at a distance.
glasses
is
a
computer
with
head-mounted
display
he
1-1 2-8
1-2 2-4 3-3
1-3
1-4 3-11
1-5
1-6
1-7
1-8
2-1 2-5 3-1
wore
thick
worked
in
corporation
to
be
able
read
…
2-2 3-2
2-3
2-6
2-7
2-9
3-4 3-7
3-5
3-6
1 2 3
Kyunghoon Kim
Indexing with position & metatag
5/28/2014 Fundamentals of network theory-2 6
<title>New Google
Glass</title><body>Google
Glass is a computer with a
head-mounted
display.</body>
<title>Daily life of
David</title><body>He
wore thick glasses. He
worked in google
corporation.</body>
<title>Black
Glasses</title><body>He
wore glasses to be able to
read signs at a
distance.</body>
glasses
is
a
computer
with
head-mounted
display
he
1-1 2-8
1-2 2-4 3-3
1-3
1-4 3-11
1-5
1-6
1-7
1-8
2-1 2-5 3-1
wore
thick
worked
in
corporation
<title>
</title>
<body>
</body>
2-2 3-2
2-3
2-6
2-7
2-9
#
#
#
#
1 2 3
Kyunghoon Kim
Indexing with position & metatag
5/28/2014 Fundamentals of network theory-2 7
Altavista
“Constrained Searching of an index”, 1999
Kyunghoon Kim
“uncanny knack for returning extremely relevant results.” – PC Magazine
The technology that launched google
5/28/2014 Fundamentals of network theory-2 8
Screenshot of “google.stanford.edu” 1997, http://blogoscoped.com/archive/2006-04-21-n63.html
Kyunghoon Kim
A hyperlink is a reference to data that the reader can directly follow either by clicking or by hovering or that is followed automatically.
– Merriam-Webster.com
Hyperlink
5/28/2014 Fundamentals of network theory-2 9
Kyunghoon Kim
Hyperlink Trick
5/28/2014 Fundamentals of network theory-2 10
Barny’s tomato pasta recipe
Cook Pasta Sheets, one at a time,
for 1 minute each. And before
serving, place pasta in the bottom
of a soup bowl.
Tony’s tomato pasta
Bring to a boil and add pasta.
Cook for 10 minutes. Drain
on paper towel.
I like barny’s recipe. I enjoyed tony’s recipe. Tony’s recipe is amazing!
I’m in admiration of tony’s recipe.
Kyunghoon Kim
Authority Trick
5/28/2014 Fundamentals of network theory-2 11
Barny’s tomato pasta recipe
Cook Pasta Sheets, one at a time,
for 1 minute each. And before
serving, place pasta in the bottom
of a soup bowl.
Tony’s tomato pasta
Bring to a boil and add pasta.
Cook for 10 minutes. Drain
on paper towel.
I like barny’s recipe. I enjoyed tony’s recipe. Tony’s recipe is amazing!
I’m in admiration of tony’s recipe.
100 1 1
1
Kyunghoon Kim
Authority Trick
5/28/2014 Fundamentals of network theory-2 12
Barny’s tomato pasta recipe
Cook Pasta Sheets, one at a time,
for 1 minute each. And before
serving, place pasta in the bottom
of a soup bowl.
Tony’s tomato pasta
Bring to a boil and add pasta.
Cook for 10 minutes. Drain
on paper towel.
I like barny’s recipe.
I enjoyed tony’s recipe. Tony’s recipe is amazing!
I’m in admiration of tony’s recipe.
100
1 1
1
100
3
A B
100 3
Kyunghoon Kim
The iterates will not converge no matter how long the process is run.
Cycle
5/28/2014 Fundamentals of network theory-2 13
1
2 3
41
2
Kyunghoon Kim
There is a person who is randomly surfing the internet.
Random Surfer Trick
5/28/2014 Fundamentals of network theory-2 14
Kyunghoon Kim
There is a person who is randomly surfing the internet.
Surfer starts off at a single web page selected at random from the entire World Wide Web.
Random Surfer Trick
5/28/2014 Fundamentals of network theory-2 15
Kyunghoon Kim
There is a person who is randomly surfing the internet.
Surfer starts off at a single web page selected at random from the entire World Wide Web.
The surfer then examines all the hyperlinks on the page, picks one of them at random, and clicks on it.
Random Surfer Trick
5/28/2014 Fundamentals of network theory-2 16
Kyunghoon Kim
There is a person who is randomly surfing the internet.
Surfer starts off at a single web page selected at random from the entire World Wide Web.
The surfer then examines all the hyperlinks on the page, picks one of them at random, and clicks on it.
The new page is then examined and one of its hyperlinks is chosen at random.
Random Surfer Trick
5/28/2014 Fundamentals of network theory-2 17
Kyunghoon Kim
There is a person who is randomly surfing the internet.
Surfer starts off at a single web page selected at random from the entire World Wide Web.
The surfer then examines all the hyperlinks on the page, picks one of them at random, and clicks on it.
The new page is then examined and one of its hyperlinks is chosen at random.
This process continues…
Random Surfer Trick
5/28/2014 Fundamentals of network theory-2 18
Kyunghoon Kim
import networkx as nx
from matplotlib import pyplot as plt
G = nx.DiGraph()
G.add_edges_from([(1,2),(2,3),(2,12),(3,4),(3,8),(3,12),(4,8),(5,1),
(6,1),(6,2),(7,2),(7,12),(8,12),(8,13),(8,14),(8,15),(8,9),(9,16),
(10,5),(10,6),(10,7),(11,10),(12,16),(13,16),(14,16),(15,16),
(16,11)])
plt.clf()
nx.draw_spring(G)
Random Surfer Trick
5/28/2014 Fundamentals of network theory-2 19
Example
Kyunghoon Kim
spring layout – places nodes using Fruchterman-Reingold force-directed algorithm
circular layout – places nodes in a circle
random layout – positions nodes based on an uniform distribution in a unit square
shell layout – places nodes in concentric circles
spectral layout – positions nodes using eigen-vectors of the graph laplacian
Layout of NetworkX
5/28/2014 Fundamentals of network theory-2 20
e.g., pos = nx.spectral_layout(G)
nx.draw(G, pos)
Kyunghoon Kim
nx.spring_layout has different position at each iteration.
To fix the position, we use nx.spectral_layout
Just be careful the following case:
networkx.spectral_layout
5/28/2014 Fundamentals of network theory-2 21
Kyunghoon Kim
import numpy as np
def addvalue(values,node):
values[node-1] += 1
lennode = len(G.nodes())
values = np.zeros(lennode)
Random Surfer Trick – Outline
5/28/2014 Fundamentals of network theory-2 22
Kyunghoon Kim
randomvector = np.random.rand(lennode)
selectednode = np.where(randomvector == max(randomvector))[0][0] + 1
print "First Node"
print selectednode
addvalue(values, selectednode)
nextnodes = G.edges(selectednode)
randomvector = np.random.rand(len(nextnodes))
selectedpos = np.where(randomvector == max(randomvector))[0][0]
selectednode = nextnodes[selectedpos][1]
print "Second Node"
print selectednode
Random Surfer Trick – Outline (Cont.)
5/28/2014 Fundamentals of network theory-2 23
First Node
7
Second Node
2
Kyunghoon Kim
def nextstep(selectednode):
nextnodes = G.edges(selectednode)
randomvector = np.random.rand(len(nextnodes))
selectedpos = np.where(randomvector == max(randomvector))[0][0]
selectednode = nextnodes[selectedpos][1]
print "choices", nextnodes
print selectednode
addvalue(values, selectednode)
return selectednode
for i in range(5):
selectednode = nextstep(selectednode)
Random Surfer Trick – Outline (Cont.)
5/28/2014 Fundamentals of network theory-2 24
First Node
5
choices [(5, 1)]
1
choices [(1, 2)]
2
choices [(2, 3), (2, 12)]
3
choices [(3, 8), (3, 12), (3, 4)]
12
choices [(12, 16)]
16
>>> values
array([ 1., 1., 1., 0., 1., 0., 0., 0., 0., 0.,
0., 1., 0., 0., 0., 1.])
Kyunghoon Kim
for i in range(20000):
selectednode = nextstep(selectednode)
plt.clf()
nx.draw_networkx_nodes(G, pos, node_size=values)
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
Random Surfer Trick – Outline (Cont.)
5/28/2014 Fundamentals of network theory-2 25
Kyunghoon Kim
for i in range(10000000):
selectednode = nextstep(selectednode)
plt.clf()
nx.draw_networkx_nodes(G, pos, node_size=10000*values/sum(values))
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
Random Surfer Trick – Outline (Cont.)
5/28/2014 Fundamentals of network theory-2 26
Kyunghoon Kim
There is one twist : restart probability(15%)
Surfer does not click on one of the available hyperlinks.
Instead, he restarts the procedure by picking another page randomly from the whole web. (boring, error of browser, etc.)
Twist
5/28/2014 Fundamentals of network theory-2 27
Kyunghoon Kim
def randomstart(G):
lennode = len(G.nodes())
randomvector = np.random.rand(lennode)
selectednode = np.where(randomvector == max(randomvector))[0][0] + 1
addvalue(values, selectednode)
return selectednode
def nextstep(selectednode):
nextnodes = G.edges(selectednode)
randomvector = np.random.rand(len(nextnodes))
selectedpos = np.where(randomvector == max(randomvector))[0][0]
selectednode = nextnodes[selectedpos][1]
addvalue(values, selectednode)
return selectednode
Random Surfer Trick with Twist
5/28/2014 Fundamentals of network theory-2 28
Kyunghoon Kim
selectednode = randomstart(G)
for i in range(1000000):
if np.random.rand(1)[0] > 0.15:
selectednode = nextstep(selectednode)
else:
selectednode = randomstart(G)
if i % 100000 == 0:
print i
plt.clf()
nx.draw_networkx_nodes(G, pos, node_size=10000*values/sum(values))
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
Random Surfer Trick with Twist (Cont.)
5/28/2014 Fundamentals of network theory-2 29
Kyunghoon Kim
plt.clf()
pos = nx.spectral_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=10000*values/sum(values))
nx.draw_networkx_edges(G, pos)
labels = 100*values/sum(values) % percent
labels = list(labels.astype('|S4'))
labels = dict(zip(range(1,lennode+1), labels))
nx.draw_networkx_labels(G, pos, labels)
Random Surfer Trick with Twist
5/28/2014 Fundamentals of network theory-2 30
Kyunghoon Kim
What is the connection between random surfer model and the authority trick that we would like to use for ranking web pages?
Connection
5/28/2014 Fundamentals of network theory-2 31
Kyunghoon Kim
What is the connection between random surfer model and the authority trick that we would like to use for ranking web pages?
The percentages calculated from random surfer simulations turn out to be exactly what we need to measure a page’s authority.
Connection
5/28/2014 Fundamentals of network theory-2 32
Kyunghoon Kim
What is the connection between random surfer model and the authority trick that we would like to use for ranking web pages?
The percentages calculated from random surfer simulations turn out to be exactly what we need to measure a page’s authority.
Surfer authority score= percentage of time that a random surfer would spend visiting that page
Connection
5/28/2014 Fundamentals of network theory-2 33
Kyunghoon Kim
Hyperlink Trick: the main idea was that a page with many incoming links should receive a high ranking.
Tricks for ranking web pages
5/28/2014 Fundamentals of network theory-2 34
Kyunghoon Kim
Hyperlink Trick: the main idea was that a page with many incoming links should receive a high ranking.
Tricks for ranking web pages
5/28/2014 Fundamentals of network theory-2 35
pos = nx.circular_layout(G)
Kyunghoon Kim
Authority Trick: the main idea was that an incoming link from a highly authoritative page should improve a page’s ranking more than an incoming link from a less authoritative page.
Tricks for ranking web pages
5/28/2014 Fundamentals of network theory-2 36
Kyunghoon Kim
Authority Trick: the main idea was that an incoming link from a highly authoritative page should improve a page’s ranking more than an incoming link from a less authoritative page.
Connection: An incoming link from a popular page will have more opportunities to be followed than a link from an unpopular page.
Tricks for ranking web pages
5/28/2014 Fundamentals of network theory-2 37
Kyunghoon Kim
Authority Trick
Tricks for ranking web pages
5/28/2014 Fundamentals of network theory-2 38
Kyunghoon Kim
Authority Trick
Tricks for ranking web pages
5/28/2014 Fundamentals of network theory-2 39
Kyunghoon Kim
The random surfer model take account the quantity(hyperlink trick) and quality(authority trick) of incoming links at each page.
Random Surfer Model
5/28/2014 Fundamentals of network theory-2 40
Kyunghoon Kim
The random surfer model take account the quantity(hyperlink trick) and quality(authority trick) of incoming links at each page.
This model works perfectly well whether or not there are cycles in the hyperlinks.
Random Surfer Model
5/28/2014 Fundamentals of network theory-2 41
Kyunghoon Kim
The random surfer model take account the quantity(hyperlink trick) and quality(authority trick) of incoming links at each page.
This model works perfectly well whether or not there are cycles in the hyperlinks.
Random Surfer Model
5/28/2014 Fundamentals of network theory-2 42
1
2 3
4
Kyunghoon Kim
This model works perfectly well whether or not there are cycles in the hyperlinks.
Random Surfer Model
5/28/2014 Fundamentals of network theory-2 43
1
2 3
4
Kyunghoon Kim
with constant term without constant term
Divide by out-degreePageRank
x=D(D−𝜶𝑨)−1𝟏
Degree centrality
x=𝑨D−1x
No divisionKatz centrality
x=(I−𝜶𝑨)−1𝟏
Eigenvector centrality
x=𝜅1−1𝑨x
Pagerank with Linear Algebra
5/28/2014 Fundamentals of network theory-2 44
Kyunghoon Kim
Matrix Equation
Linear Transformation
Eigenvalue
Eigenvector
Eigenvector centrality
Katz centrality
Pagerank centrality
Contents with Linear Algebra
5/28/2014 Fundamentals of network theory-2 45
Kyunghoon Kim
If 𝐴 is an 𝑚 × 𝑛matrix, with columnsand if x is in ℝ𝑛, then the product of 𝐴 and x is the linear combination of the columns of 𝐴using the corresponding entries in x as weights;that is,
The matrix equation Ax=b
5/28/2014 Fundamentals of network theory-2 46
1
1 2 1 1 2 2n n n
n
x
A x x x
x
x a a a a a a
1 2, , , na a a
matrix equation vector equation
Kyunghoon Kim
The matrix equation Ax=b
5/28/2014 Fundamentals of network theory-2 47
1
1 2 1 1 2 2n n n
n
x
A x x x
x
x a a a a a a
41 2 1 1 2 1 3
3 4 3 70 5 3 0 5 3 6
7
Kyunghoon Kim
from matplotlib import pyplot as plt
import numpy as np
pointset = [[1,2],[2,2],[2,3],[3,4],[3,5],[2,5],[1,5],[1,3],[1,2]]
plt.plot(*np.transpose(pointset), marker='.')
plt.axis([0, 6, 0, 6])
Example – Plot of point set
5/28/2014 Fundamentals of network theory-2 48
Kyunghoon Kim
from matplotlib import pyplot as plt
import numpy as np
pointset = [[1,2],[2,2],[2,3],[3,4],[3,5],[2,5],[1,5],[1,3],[1,2]]
plt.plot(*np.transpose(pointset), marker='.')
plt.axis([0, 6, 0, 6])
newset = []
for i in pointset:
x = np.matrix(i).transpose()
A = np.matrix([[0,1],[1,0]])
Ax = A*x
Ax = list(np.array(Ax).reshape(-1,))
newset.append(Ax)
plt.plot(*np.transpose(newset), marker='.')
Example (Cont.) – Plot of rotation
5/28/2014 Fundamentals of network theory-2 49
Kyunghoon Kim
newset = []
for i in pointset:
x = np.matrix(i).transpose()
deg = pi*35/180 # deg to radian
A = np.matrix([[cos(deg),-sin(deg)],[sin(deg),cos(deg)]]) # counter clockwise
Ax = A*x
Ax = list(np.array(Ax).reshape(-1,))
newset.append(Ax)
plt.plot(*np.transpose(newset), marker='.')
plt.axis([-6, 6, -6, 6])
Example (Cont.) – Plot of rotation
5/28/2014 Fundamentals of network theory-2 50
Kyunghoon Kim
In two dimensions,
Rotation matrix
5/28/2014 Fundamentals of network theory-2 51
http://librairie.immateriel.fr/fr/read_book/9780596516130/ch11s02
Kyunghoon Kim
newset = []
for i in pointset:
x = np.matrix(i).transpose()
A = np.matrix([[1.5, 0],[0, 1]])
Ax = A*x
Ax = list(np.array(Ax).reshape(-1,))
newset.append(Ax)
plt.plot(*np.transpose(newset), marker='.')
A = np.matrix([[-1.8, 0],[0, 1]])A = np.matrix([[1, 0],[0, -1]])
Example (Cont.) – Plot of dilation
5/28/2014 Fundamentals of network theory-2 52
Kyunghoon Kim
Linear Transformation
5/28/2014 Fundamentals of network theory-2 53
3 2 2 4
1 0 1 2
Kyunghoon Kim
Linear Transformation
5/28/2014 Fundamentals of network theory-2 54
3 2 2 4 22
1 0 1 2 1
Kyunghoon Kim
Linear Transformation
5/28/2014 Fundamentals of network theory-2 55
3 2 2 4 22
1 0 1 2 1
A x xThe number 𝜆 is the eigenvalue. It tells whether the special vector x is stretched or shrunk or reversed or left unchanged – when it is multiplied by 𝐴.
Kyunghoon Kim
The eigenvalues are a new way to see into the heart of a matrix.
Almost all vectors change direction, when they are multiplied by 𝐴. Certain exceptional vectors x are in the same direction as 𝐴x. Those are the eigenvectors.
Multiply an eigenvector by 𝐴, and the vector 𝐴x is a number 𝜆 times the original x.
Eigenvalues and Eigenvectors
5/28/2014 Fundamentals of network theory-2 56
Kyunghoon Kim
If (𝐴 − 𝜆𝐼)x=0 has a nonzero solution, then (𝐴 − 𝜆𝐼) cannot have an inverse, i.e., is not invertible. The determinant of 𝐴 − 𝜆𝐼 must be zero.
Example – eigenvalue and eigenvector
5/28/2014 Fundamentals of network theory-2 57
3 2 2 4
1 0 1 2
( ) 0
A
A I
x x
x( ) 0A I x
Kyunghoon Kim
A natural extension of the simple degree centrality is eigenvector centrality.
All neighbors are not equivalent.
A vertex’s importance is increased by having connections to other vertices that are themselves important.
Eigenvector Centrality
5/28/2014 Fundamentals of network theory-2 58
Kyunghoon Kim
Let 𝑥𝑖 = 1 for all 𝑖.
We define to be the sum of the centralities of 𝑖’s neighbors
𝑥𝑖′ =
𝑗
𝐴𝑖𝑗𝑥𝑗
We can write this expression in matrix notation as x′ = 𝐴x.
Repeating this process to make better estimates, we have after 𝑡 steps a vector x(𝑡).
Eigenvector Centrality
5/28/2014 Fundamentals of network theory-2 59
Kyunghoon Kim
x 𝑡 = 𝐴𝑡x(0).
Let’s write x(0) as a linear combination of the eigenvectors v𝑖 of the adjacency matrix
x(0)=
𝑖
𝑐𝑖𝐯𝑖
Then
x(t)=𝐴𝑡
𝑖
𝑐𝑖𝐯𝑖 =
𝑖
𝑐𝑖𝑘𝑖𝑡𝐯𝑖 = 𝑘1
𝑡
𝑖
𝑐𝑖𝑘𝑖𝑘1
𝑡
𝐯𝑖
where the 𝑘𝑖 are the eigenvalues of 𝐴, 𝑘1 is largest eigenvalue.
Eigenvector Centrality
5/28/2014 Fundamentals of network theory-2 60
Kyunghoon Kim
Since 𝑘𝑖
𝑘1< 1 for all 𝑖 ≠ 1, all terms in the sum
other than the first decay exponentially as 𝑡becomes large.
We get x 𝑡 → 𝑐1𝑘1𝑡v1 as 𝑡 → ∞.
The limiting vector of centralities is simply proportional to the leading eigenvector of the adjacency matrix.
Equivalently we could say that the centrality xsatisfies 𝐴x = 𝑘1x.
Eigenvector Centrality
5/28/2014 Fundamentals of network theory-2 61
Kyunghoon Kim
𝐴x = 𝑘1x.
The centrality 𝑥𝑖 of vertex 𝑖 is proportional to the sum of the centralities of 𝑖’s neighbors:
𝑥𝑖 = 𝑘1−1
𝑗
𝐴𝑖𝑗𝑥𝑗
It can be large1. a vertex has many neighbors2. a vertex has important neighbors
Eigenvector Centrality
5/28/2014 Fundamentals of network theory-2 62
Kyunghoon Kim
Katz centrality.
5/28/2014 Fundamentals of network theory-2 63
Kyunghoon Kim
Degree centrality
x=𝑨D−1x
D is the diagonal matrix
with elements 𝐷𝑖𝑖 = max(𝑘𝑖𝑜𝑢𝑡, 1).
Degree Centrality
5/28/2014 Fundamentals of network theory-2 64
Kyunghoon Kim
MacCormick, John. Nine Algorithms that Changed the Future: The Ingenious Ideas that Drive Today's Computers. Princeton University Press, 2011.
Newman, Mark. Networks: an introduction. Oxford University Press, 2010.
Strang, Gilbert. Introduction to linear algebra. SIAM, 2003.
References
5/28/2014 Fundamentals of network theory-2 65