Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf ·...
Transcript of Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf ·...
![Page 1: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/1.jpg)
Google PageRank
Francesco Ricci Faculty of Computer Science Free University of Bozen-Bolzano [email protected]
1
![Page 2: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/2.jpg)
Content
p Linear Algebra p Matrices p Eigenvalues and eigenvectors
p Markov chains p Google PageRank
2
![Page 3: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/3.jpg)
Literature
p C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. Chapter 21
p Markov chains description on wikipedia p Amy N. Langville & Carl D. Meyer,
Google's PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press, 2006.
3
![Page 4: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/4.jpg)
p Google is the leading search and online advertising company - founded by Larry Page and Sergey Brin (Ph.D. students at Stanford University)
p “googol” or 10100 is the mathematical term Google was named after
p Google’s success in search is largely based on its PageRank™ algorithm
p Gartner reckons that Google now make use of more than 1 million servers, spitting out search results, images, videos, emails and ads
p Google reports that it spends some 200 to 250 million US dollars a year on IT equipment.
4
![Page 5: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/5.jpg)
Matrices
p A Matrix is a rectangular array of numbers
p aij is the element of matrix A in row i and column j p A is said to be a n x m matrix if it has n rows and m
columns p A square matrix is a n x n matrix p The transpose AT of a matrix A is the matrix obtained by
exchanging the rows and the columns
⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟⎠
⎞⎜⎜⎝
⎛=
654321
232221
131211
aaaaaa
A
AT =
a11T a12
T
a21T a22
T
a31T a32
T
!
"
######
$
%
&&&&&&
=
a11 a21
a12 a22
a13 a23
!
"
######
$
%
&&&&&&
=
1 4
2 5
3 6
!
"
#####
$
%
&&&&& 5
![Page 6: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/6.jpg)
Exercise
p What is the size of these matrices
p Compute their transpose
6
![Page 7: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/7.jpg)
Exercise
p What is the size of these matrices
p Compute their transpose
7
2x3 3x1 3X4
1 20
9 5
−13 −6
"
#
$$$$$
%
&
'''''
4 1 8!"
#$
0 1 2
−1 0 1
−2 −1 0
−3 −2 −1
"
#
$$$$$$$
%
&
'''''''
![Page 8: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/8.jpg)
Matrices
p A square matrix is diagonal iff has aij = 0 for all i≠j
p The Identity matrix 1 is the diagonal matrix with 1´s along the diagonal
p A symmetric matrix A satisfy the condition A=AT
⎟⎟⎠
⎞⎜⎜⎝
⎛=
22
11
00a
aA
⎟⎟⎠
⎞⎜⎜⎝
⎛=
1001
A
8
![Page 9: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/9.jpg)
Exercise
p Is a diagonal matrix symmetric?
p Make an example of a symmetric matrix
p Make an example of a 2x3 symmetric matrix
9
![Page 10: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/10.jpg)
Exercise
p Is a diagonal matrix symmetric? n YES because if it is diagonal then aij = 0 for all
i≠j, hence aij = aji for all i≠j
p Make an example of a symmetric matrix
p Make an example of a 2x3 symmetric matrix n Impossible, a symmetric matrix is a square
matrix 10
1 22 3
!
"#
$
%&
![Page 11: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/11.jpg)
Vectors
p A vector v is a one-dimensional array of numbers (is an n x 1 matrix – column vector)
p Example:
p The standard form of a vector is a column
vector p The transpose of a column vector vT =(3 5 7)
is a row vector.
v =357
!
"
###
$
%
&&&
11
![Page 12: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/12.jpg)
Operation on matrices
p Addition: A=(aij), B=(bij), C=(cij) = A+B n cij = aij + bij
p Scalar multiplication: λ is a number, λ A = (λaij)
p Multiplication: if A and B are compatible, i.e., the number of columns of A is equal to the number of rows of B, then n C=(cij)= AB n cij = Σk aik bkj
12
![Page 13: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/13.jpg)
Examples
p If AB=1, then B is said to be the inverse of A and is denoted with A-1
p If a matrix has an inverse is called invertible or non singular
1 2 3
4 5 6
!
"
###
$
%
&&&
1 4
2 5
3 6
!
"
#####
$
%
&&&&&
= 1*1+ 2*2+3*3 1*4+ 2*5+3*64*1+ 5*2+ 6*3 4*4+ 5*5+ 6*6
!
"#
$
%&= 14 32
32 77
!
"#
$
%&
1 10 1
!
"#
$
%& 1 −10 1
!
"#
$
%&= 1 0
0 1
!
"#
$
%&
13
It is symmetric. Is it a general fact? Is AATalways symmetric?
![Page 14: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/14.jpg)
Exercise
p Compute the following operations
14
![Page 15: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/15.jpg)
Exercise
p Compute the following operations
15
![Page 16: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/16.jpg)
Rank of a Matrix
p The row (column) rank of a matrix is the maximum number of rows (columns) that are linearly independent
p The vectors v1, …, vn are linearly independent iff there is no linear combination a1v1+ … + anvn (with coefficients ai not all 0) of the vectors that is equal to 0
p Example 1: (1 2 3), (1 4 6), and (0 2 3) are linearly dependent: show it
p Example 2: (1 2 3) and (1 4 6) are not linearly dependent: show it
p The kernel of a matrix A is the subspace of vectors v such that Av=0
16
![Page 17: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/17.jpg)
Exercise solution
p 1*(1 2 3)T -1*(1 4 6)T + 1*(0 2 3)T =(0 0 0)T
p (1 -1 1)T is in the kernel of the matrix:
p a*(1 2 3) + b*(1 4 6) = (0 0 0) n Then a=-b and also a = -2b, absurd.
17
1 1 02 4 23 6 3
!
"
###
$
%
&&&
1−11
!
"
###
$
%
&&&=
000
!
"
###
$
%
&&&
1 1 02 4 23 6 3
!
"
###
$
%
&&&
![Page 18: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/18.jpg)
Rank and Determinant
p Theorem. A n x n square matrix is nonsingular iff has full rank (i.e. n).
p Theorem. A matrix has full column rank iff it does not have a null vector
p Theorem. A n x n matrix A is singular iff the det(A)=0
p A[ij] is the ij minor, i.e., the matrix obtained by deleting the i-th row and the j-th column from A.
11n
)det()1()det(1
]1[11
11
>
=
⎪⎩
⎪⎨
⎧
−= ∑=
+nif
ifAa
aA n
jjj
j
18
![Page 19: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/19.jpg)
Exercise
p Compute the determinant of the following matrices
19
1 1 02 4 23 6 3
!
"
###
$
%
&&&
1 12 4
!
"#
$
%&
![Page 20: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/20.jpg)
Exercise
p Compute the determinant of the following matrices
20
1 1 02 4 23 6 3
!
"
###
$
%
&&&
1 12 4
!
"#
$
%& = 1*4-1*2 = 2
= 1*(4*3-2*6)-(2*3-2*3)=0
http://www.bluebit.gr/matrix-calculator/
![Page 21: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/21.jpg)
Eigenvectors and Eigenvalues
p Definition. If M is a square matrix, v is a nonzero vector and λ is a number such that n M v = λ v
p then v is said to be an (right) eigenvector of A with eigenvalue λ
p If v is an eigenvector of M with eigenvalue λ, then so is any nonzero multiple of v
p Only the direction matters.
21
![Page 22: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/22.jpg)
Example p The matrix
p Has two (right) eigenvectors: n v1 =(1 1)t and v2 = (3 1)t
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−=
2132
M
Prove that
22
Is it singular?
![Page 23: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/23.jpg)
Example p The matrix
p Has two eigenvectors: n v1 =(1 1)t and v2 = (3 1)t
p Mv1 = (-1 -1)t = -1 v1
n The eigenvalue is -1 p Mv2 = (3 1)t = 1 v2
n The eigenvalue is 1
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−=
2132
M
Prove that
23
Is it singular?
![Page 24: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/24.jpg)
Transformation
p There is a lot of distortion in these directions (1 0)t, (1 1)t, (0 1)t
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−=
2132
M
24
![Page 25: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/25.jpg)
Transformation along eigenvectors
p There are two independent directions which are not twisted at all by the matrix M: (1 1) and (3 1)
p one of them is flipped (1 1)
p We see less distortion if our box is oriented in the two special directions.
25
![Page 26: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/26.jpg)
Results
p Theorem: every square matrix has at least one eigenvector
p The usual situation is that an n x n matrix has n linearly independent eigenvectors
p If there are n of them, they are a useful basis for Rn.
p Unfortunately, it can happen that there are fewer than n of them.
26
![Page 27: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/27.jpg)
Finding Eigenvectors
p M v = λ v n v is an eigenvector and is λ an eigenvalue
p If λ = 0, then finding eigenvectors is the same as finding nonzero vectors in the null space – iff det(M) = 0, i.e., the matrix is singular
p If λ != 0, then finding the eigenvectors is equivalent to finding the null space for the matrix M – λ1 (1 is the identity matrix)
p The matrix M – λ1 has a non zero vector in the null space iff det(M – λ1) = 0
p det(M – λ1) = 0 is called the characteristic equation. 27
![Page 28: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/28.jpg)
Exercise
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−=
2132
M Find the eigenvalues and the eigenvectors of this matrix
28
1) Find the solutions λ of the characteristic equation (eigenvalues)
2) Find the eigenvectors corresponding to the found eigenvalues.
![Page 29: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/29.jpg)
Exercise Solution
p det(M – λ1) = 0 n (2 - λ)(-2 - λ) + 3 = λ2 - 1
p The solutions are +1 and -1
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−=
2132
M Find the eigenvalues and the eigenvectors of this matrix
29
![Page 30: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/30.jpg)
Exercise Solution
p det(M – λ1) = 0 n (2 - λ)(-2 - λ) + 3 = λ2 - 1
p The solutions are +1 and -1 p Now we have to solve the set of linear
equations n Mv=v (for the first eigenvalue)
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−=
2132
M Find the eigenvalues and the eigenvectors of this matrix
30
![Page 31: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/31.jpg)
Exercise Solution
p det(M – λ1) = 0 n (2 - λ)(-2 - λ) + 3 = λ2 - 1
p The solutions are +1 and -1 p Now we have to solve the set of linear
equations n Mv=v (for the first eigenvalue)
n Has solution x=3y, (3 1)t – and all vectors obtained multiplying this with a scalar.
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−=
2132
M
yyxxyx
=−
=−
232
Find the eigenvalues and the eigenvectors of this matrix
31
![Page 32: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/32.jpg)
Algorithm
p To find the eigenvalues and eigenvectors of M: n First find the eigenvalues by solving the
characteristic equation. Call the solutions λ1,..., λn. (There is always at least one eigenvalue, and there are at most n of them.)
n For all λk, the existence of a nonzero vector in this null space is guaranteed. Any such vector is an eigenvector.
32
![Page 33: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/33.jpg)
Graphs
p A directed graphs G is a pair (V,E), where V is a finite set and E is a binary relations on V n V is the Vertex set of G: contains the
vertices n E is the Edge set of G: contains the edges
p In an undirected graphs G=(V,E) the edges consists of unordered pairs of vertices
p The in-degree of a vertex v (directed graph) is the number of edges entering in v
p The out-degree of a vertex v (directed graph) is the number of edges leaving v.
33
![Page 34: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/34.jpg)
The Web as a Directed Graph
Assumption 1: A hyperlink between pages denotes author perceived relevance (quality signal)
Assumption 2: The anchor of the hyperlink describes the target page (textual context)
Page A hyperlink Page B Anchor
34
![Page 35: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/35.jpg)
35
Ranking web pages
p To count inlinks: enter in google search form link:www.mydomain.com
p Web pages are not equally “important” n www.unibz.it vs. www.stanford.edu n Inlinks as votes
p www.stanford.edu has 3200 inlinks p www.unibz.it has 352 inlink (Feb 2013)
p Are all inlinks equal? n Recursive question!
![Page 36: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/36.jpg)
36
Simple recursive formulation
p Each link’s vote is proportional to the importance of its source page
p If page P with importance x has n outlinks, each link gets x/n votes
1000 $
333 $ 333 $
333 $
![Page 37: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/37.jpg)
37
Simple “flow” model
The web in 1839
Yahoo
Microsoft Amazon
y
a m
y/2
y/2
a/2
a/2
m
y = y /2 + a /2 a = y /2 + m m = a /2
a, m, and y are the importance of these pages
![Page 38: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/38.jpg)
38
Solving the flow equations
p 3 equations, 3 unknowns, no constants n No unique solution n If you multiply a solution by a constant (λ) you
obtain another solution - try with (2 2 1) p Additional constraint forces uniqueness
n y+a+m = 1 (normalization) n y = 2/5, a = 2/5, m = 1/5 n These are the scores of the pages under the
assumption of the flow model p Gaussian elimination method works for small
examples, but we need a better method for large graphs.
![Page 39: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/39.jpg)
39
Matrix formulation
p Matrix M has one row and one column for each web page (square matrix)
p Suppose page i has n outlinks n If i links to j, then Mij=1/n n Else Mij=0
p M is a row stochastic matrix n Rows sum to 1
p Suppose r is a vector with one entry per web page n ri is the importance score of page i n Call it the rank vector
![Page 40: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/40.jpg)
40
Example
y ½ ½ 0 a ½ 0 ½ m 0 1 0
y a m
y = y /2 + a /2 a = y /2 + m m = a /2
(y a m) = (y a m)M
Yahoo
Microsoft Amazon
y
a m
y/2
y/2
a/2
a/2
m
= M
![Page 41: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/41.jpg)
41
Power Iteration Solution
Yahoo
Microsoft Amazon
(1/3 1/3 1/3) (1/3 1/3 1/3)M = (1/3 1/2 1/6) (1/3 1/2 1/6)M = (5/12 1/3 1/4) (5/12 1/3 1/4)M = (3/8 11/24 1/6) … (2/5 2/5 1/5)
y ½ ½ 0 a ½ 0 ½ m 0 1 0
y a m
(y a m) = (y a m)M
= M
![Page 42: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/42.jpg)
Example
42
![Page 43: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/43.jpg)
States and probabilities
DRY RAIN
0.38
0.62
0.15
0.85
43
![Page 44: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/44.jpg)
Composing transitions
0.44 = 0.38*0.15+0.62*0.62 What kind of operation is on the matrix?
44
![Page 45: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/45.jpg)
Composing transitions
p The probabilities of the 12hours transitions are given by squaring the matrix representing the probabilities of the 6hours transitions n P(rain-in-12hours|rain-now)= P(rain-in-12hours|rain-
in-6hours)*P(rain-in-6hours|rain-now)+P(rain-in-12hours|dry-in-6hours)*P(dry-in-6hours|rain-now)=.62*.62+.15*.38=.44
n P(dry-in-12hours|rain-now)= P(dry-in-12hours|rain-in-6hours)*P(rain-in-6hours|rain-now)+P(dry-in-12hours|dry-in-6hours)*P(dry-in-6hours|rain-now)= 38*.62+.85*.38=.56
2
44.56.22.78.
62.38.15.85.
62.38.15.85.
A=⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟
⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛
45
dry
dry
rain
rain
![Page 46: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/46.jpg)
Behavior in the limit
⎟⎟⎠
⎞⎜⎜⎝
⎛=
62.38.15.85.
A ⎟⎟⎠
⎞⎜⎜⎝
⎛=
36.64.25.75.3A
⎟⎟⎠
⎞⎜⎜⎝
⎛=
29.71.28.72.7A ⎟⎟
⎠
⎞⎜⎜⎝
⎛=
28.72.28.72.8A
⎟⎟⎠
⎞⎜⎜⎝
⎛=
44.56.22.78.2A
⎟⎟⎠
⎞⎜⎜⎝
⎛=
32.68.27.73.4A ⎟⎟
⎠
⎞⎜⎜⎝
⎛=
29.71.28.72.6A⎟⎟
⎠
⎞⎜⎜⎝
⎛=
30.70.28.72.5A
⎟⎟⎠
⎞⎜⎜⎝
⎛=
28.72.28.72.9A
⎟⎟⎠
⎞⎜⎜⎝
⎛=∞
28.72.28.72.
A46
![Page 47: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/47.jpg)
Behavior in the limit
p If a,b <=1, and a+b=1, i.e., (a b) is a generic state with a certain probability a to be dry and b=1-a to be rain, then
p In particular (.72 .28)A=(.72 .28), i.e., it is a (left) eigenvector with eigenvalue 1
p The eigenvector (.72 .28) represents the limit situation starting from a generic situation (a b): it is called the stationary distribution.
( ) ( ) ( )28.72.28.72.28.72.
=⎟⎟⎠
⎞⎜⎜⎝
⎛=∞ baAba
47
![Page 48: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/48.jpg)
Exercise
p Find one (left) eigenvector of the matrix below: n Solve first the characteristic equation (to find
the eigenvalues) n and then find the left eigenvector
corresponding to the largest eigenvalue
48
.85 .15
.38 .62
!
"#
$
%&
![Page 49: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/49.jpg)
Exercise Solution
p Characteristic equation
49
det .85−λ .15.38 .62−λ
"
#$
%
&' =(0.85-λ)(0.62-λ) - 0.15*0.38 = λ2 – 1.47λ +0.47
λ =1.47± 1.472 − 4*0.47
2Solutions λ =1 and λ = 0.47
( x y ) .85 .15.38 .62
!
"#
$
%&= x y( ) 0.85x + 0.38y=x
x + y = 1
0.85x +0.38(1-x)=x -0.53x +0.38=0
x = 0.38/0.53=0.72 y = 1 – 0.72= 0.28
![Page 50: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/50.jpg)
Markov Chain
p A Markov chain is a sequence X1, X2, X3, ... of random variables (Σv all possible values of X P(X=v) = 1) with the property:
p Markov property: the conditional probability distribution of the next future state Xn+1 given the present and past states is a function of the present state Xn alone
p If the state space is finite then the transition probabilities can be described with a matrix Pij=P(Xn+1= j | Xn = i ), i,j =1, …m
50
.85 .15
.38 .62
!
"#
$
%&=
P(Xn+1 =1| Xn =1) P(Xn+1 = 2 | Xn =1)P(Xn+1 =1| Xn = 2) P(Xn+1 = 2 | Xn = 2)
!
"
##
$
%
&&
![Page 51: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/51.jpg)
Example: Web
p Xt is the page visited by a user (random surfer) at time t;
p At every time t the user can be in one among m pages (states)
p We assume that when a user is on page i at time t, then the probability to be on page j at time t+1 depends only on the fact that the user is on page i, and not on the pages previously visited.
51
![Page 52: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/52.jpg)
Probabilities
P0 P1
P2
P3
P4 Goal
P(P2|P1) = 0.4
P(P1|P1) = 0.1
P(P0|P1)= 0.05 P(P3 |P1) = 0.3
P(P4| P1) = 0.15
In this example there are 5 states and the probability to jump from a page/state to another is not constant (it is not 1/(#of outlinks of the node)) … as we have assumed before in the simple web graph
This is not a Markov chain! (why?)
P(P1|P0)= 1.0
P(P1|P2)= 1.0
P(P4 |P3) = 0.5
P(P1 |P3) = 0.5
52
![Page 53: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/53.jpg)
Examples
p Pij=P(Xn+1= j | Xn = i ), i,j =1, …m
p (1, 0, 0, …, 0) P = (P11, P12, P13, …, P1n) n if at time n it is in state 1, then at time n+1 it
is in state j with probability P1j, i.e., the first row of Pij gives the probabilities to be in the other states
p (0.5, 0.5, 0, …, 0) P = (P11·0.5 + P21·0.5, …, P1n·0.5 + P2n·0.5) n this is the linear combination of the first two
rows.
53
![Page 54: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/54.jpg)
Stationary distribution p A stationary distribution is a m-dimensional (sum
1) vector which satisfies the equation: p Where π is a (column) vector and πT (row vector) is
the transpose of π
p A stationary distribution always exists, but is not guaranteed to be unique (can you make an example of a Markov chain with more than one stationary distribution?)
p If there is only one stationary distribution then
p Where x is a generic distribution over the m states (i.e., it is an m-dimensional vector whose entries are <=1 and the sum is 1)
nT
n
T Pxlim=π
54
![Page 55: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/55.jpg)
Random Walk Interpretation
p Imagine a random web surfer n At any time t, surfer is on some page P n At time t+1, the surfer follows an outlink from
P uniformly at random n Ends up on some page Q linked from P n Process repeats indefinitely
p Let p(t) be a vector whose ith component is the probability that the surfer is at page i at time t n p(t) is a probability distribution on pages
55
![Page 56: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/56.jpg)
The stationary distribution
p Where is the surfer at time t+1? n Follows a link uniformly at random n p(t+1) = p(t)M
p Suppose the random walk reaches a state such that p(t+1) = p(t)M = p(t) n Then p(t) is a stationary distribution for the
random walk p Our rank vector r= p(t) satisfies r = rM.
56
![Page 57: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/57.jpg)
Ergodic Markov chains p A Markov chain is ergodic if:
n Informally: there is a path from any state to any other; and the states are not partitioned into sets such that all state transitions occur cyclically from one set to another.
n Formally: for any start state, after a finite transient time T0, the probability of being in any state at any fixed time T>T0 is nonzero.
Not ergodic (even/ odd).
Not ergodic: the probability to be in a state, at a fixed time, e.g., after 500 transitions, is always either 0 or 1 according to the initial state. 57
![Page 58: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/58.jpg)
Ergodic Markov chains
p For any ergodic Markov chain, there is a unique long-term visit rate for each state n Steady-state probability distribution
p Over a long time-period, we visit each state in proportion to this rate
p It doesn’t matter where we start. p Note: non ergodic Markov chains may still have a
steady state.
58
![Page 59: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/59.jpg)
Non Ergodic Example
p It is easy to show that the steady state (left eigenvector) is πT= (0 0 1), πTP=πT , i.e., is the state 3
p The user will always reach the state 3 and will stay there (spider trap)
p This is a non-ergodic Markov Chain (with a steady-state).
1
2
3 0.5
0.5 0.8 ⎟
⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
=
1008.002.05.05.00
P
1.0
0.2
59
![Page 60: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/60.jpg)
Random teleports
p The Google solution for spider traps (not for dead ends)
p At each time step, the random surfer has two options: n With probability β, follow a link at random n With probability 1-β, jump to some page
uniformly at random n Common values for β are in the range 0.8 to
0.9 p Surfer will teleport out of spider trap within a few
time steps 60
![Page 61: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/61.jpg)
Matrix formulation p Suppose there are N pages
n Consider a page i, with set of outlinks O(i) n We have
p Mij = 1/|O(i)| when i links j p and Mij = 0 otherwise
n The random teleport is equivalent to p adding a teleport link from i to every other
page with probability (1-β)/N p reducing the probability of following each
outlink from 1/|O(i)| to β/|O(i)| p Equivalent: tax each page a fraction (1-β) of
its score and redistribute evenly. 61
![Page 62: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/62.jpg)
p Simple example with 6 pages
p P(5|1)=P(4|1)=P(3|1)=P(2|1)= β/4 +(1-β)/6 p P(1|1)=P(6|1)= (1-β)/6 p P(*|1) = 4[β/4 +(1-β)/6] + 2(1-β)/6 = 1
Example
1
2
4 5
3
6
62
![Page 63: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/63.jpg)
Google Page Rank
p Construct the NxN matrix A as follows n Aij = βMij + (1-β)/N
p Verify that A is a stochastic matrix p The page rank vector r is the principal eigenvector of
this matrix n satisfying r = rA n The score of each page ri satisfies the following:
p I(i) is the set of nodes that have a link to page i p O(k) is the set of links exiting from k p r is the stationary distribution of the random walk with
teleports.
ri = βrk
|O(k) |k∈I (i)∑
#
$%%
&
'((+(1−β)N
63
![Page 64: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/64.jpg)
Example
0,03 0,24 0,24 0,24 0,24 0,030,03 0,03 0,45 0,03 0,03 0,450,03 0,03 0,03 0,03 0,88 0,030,03 0,88 0,03 0,03 0,03 0,030,03 0,03 0,03 0,03 0,03 0,880,03 0,03 0,03 0,88 0,03 0,03
1
2
4 5
3
6
P(4|1)=0.24=0.85/4 + 0.15/6 P(6|1)=0.03=0.15/6 P(4|6)=0.88=0.85/1 + 0.15/6
A =
0,03 0,23 0,13 0,24 0,14 0,240,03 0,23 0,13 0,24 0,14 0,240,03 0,23 0,13 0,24 0,14 0,240,03 0,23 0,13 0,24 0,14 0,240,03 0,23 0,13 0,24 0,14 0,240,03 0,23 0,13 0,24 0,14 0,24
A30 =
Stationary distribution = (0.03 0.23 0.13 0.24 0.14 0.24) 64
β=0.85
![Page 65: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/65.jpg)
Dead ends
p Pages with no outlinks are “dead ends” for the random surfer (dangling nodes) n Nowhere to go on next step
p When there are dead ends the matrix is no longer stochastic (the sum of the row elements is not 1)
p This is true even if we add the teleport n because the probability to follow a teleport link
is only (1-β)/N and there are just N of these teleports- hence any of them is (1-β)
65
![Page 66: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/66.jpg)
Dealing with dead-ends
p 1) Teleport n Follow random teleport links with probability
1.0 from dead-ends (i.e., for that pages set β = 0)
n Adjust matrix accordingly p 2) Prune and propagate
n Preprocess the graph to eliminate dead-ends n Might require multiple passes (why?) n Compute page rank on reduced graph n Approximate values for dead ends by
propagating values from reduced graph
66
![Page 67: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/67.jpg)
Computing page rank
p Key step is matrix-vector multiply n rnew = roldA
p Easy if we have enough main memory to hold A, rold, rnew
p Say N = 1 billion pages n We need 4 bytes (32 bits) for each entry
(say) n 2 billion entries for vectors rnew and rold,
approx 8GB n Matrix A has N2 entries, i.e., 1018
p it is a large number! 67
![Page 68: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/68.jpg)
Sparse matrix formulation
p Although A is a dense matrix, it is obtained from a sparse matrix M n 10 links per node, approx 10N entries
p We can restate the page rank equation n r = βrM + [(1-β)/N]N (see slide 63) n [(1-β)/N]N is an N-vector with all entries (1-β)/N
p So in each iteration, we need to: n Compute rnew = βroldM n Add a constant value (1-β)/N to each entry in
rnew
68
![Page 69: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/69.jpg)
Sparse matrix encoding
p Encode sparse matrix using only nonzero entries n Space proportional roughly to number of links n say 10N, or 4*10*1 billion = 40GB n still won’t fit in memory, but will fit on disk
0 3 1, 5, 7
1 5 17, 64, 113, 117, 245
2 2 13, 23
source node degree destination nodes
69
![Page 70: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/70.jpg)
Basic Algorithm
p Assume we have enough RAM to fit rnew, plus some working memory n Store rold and matrix M on disk
Basic Algorithm: p Initialize: rold = [1/N]N p Iterate:
n Update: Perform a sequential scan of M and rold and update rnew
n Write out rnew to disk as rold for next iteration n Every few iterations, compute |rnew-rold| and
stop if it is below threshold p Need to read in both vectors into memory 70
![Page 71: Google PageRank - Free University of Bozen-Bolzanoricci/ISR/slides-2015/1B-Google.pdf · 2015-03-05 · PageRank™ algorithm ! Gartner reckons that Google now make use of more than](https://reader033.fdocuments.in/reader033/viewer/2022060419/5f16b798ea54db79375d18a6/html5/thumbnails/71.jpg)
Update step
0 3 1, 5, 6
1 4 17, 64, 113, 117
2 2 13, 23
src degree destination 0 1 2 3 4 5 6
0 1 2 3 4 5 6
rnew rold
Initialize all entries of rnew to (1-β)/N For each page p (out-degree n):
Read into memory: p, n, dest1,…,destn, rold(p) for j = 1…N: rnew(destj) += β*rold(p)/n
The old value in 0 contributes to updating only the new values in 1,5, and 6. 71