Post on 24-Dec-2015
The PageRank Citation Ranking:The PageRank Citation Ranking:Bringing Order to the WebBringing Order to the Web
Larry Page etc.Stanford University, Technical Report 1998
Presented by:Ratiya Komalarachun
2
ContentsContents
Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion
3
MotivationMotivation
Web: heterogeneous and unstructured
Free of quality control on the web
Commercial interest to manipulate ranking
4
Related WorkRelated Work Academic citation analysis
Link based analysis
Clustering methods of link structure
Hubs & Authorities Model based on an eigenvector calculation
5
hubs
Hubs & Authorities ModelHubs & Authorities Model
authorities
6
Hubs & Authorities ModelHubs & Authorities Model
Mutually reinforcing relationship
“A good hub is a page that points to many good authorities”
“A good authority is a page that is pointed by many good hub”
7
Link Structure of the WebLink Structure of the Web Forward links (outedges) Backlinks (inedges) Approximation of importance /
quality
8
PageRankPageRank A page has high rank if the sum of
the ranks of its backlinks is high
Backlinks coming from important pages convey more importance to a page
Problem: Dangling Links, Rank Sink
9
Dangling LinksDangling Links
10
PageRank CalculationPageRank Calculation
uBv vN
vRcuR
)()(
Given: R(u) = Rank of u, R(v) = Rank of v,
c < 1 (used for normalization) Nv = number of link from v
Bu = the set of pages that point to u
11
PageRank CalculationPageRank Calculation
100 50
50
9
3
3
3
53
50
12
Page cycles pointed by some incoming link
Problem: Ranking increase, don’t effect any rank outside
Rank SinkRank Sink
.6
.6
.6
.6
13
Escape TermEscape Term Solution: Rank Source
E(u) is some vector over the web pages– uniform, favorite page etc.
)()(
)( ucEN
vRcuR
uBv v
14
R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
Matrix NotationMatrix Notation
ReEAcR TT )(
15
Computing PageRankComputing PageRank - initialize vector over web pages Loop: - new ranks sum of normalized backlink ranks
- compute normalizing factor
- add escape term
- control parameter
While - stop when converged
SR 0
iT
i RAR 1
111 ii RRd
dERR ii 11
ii RR 1
16
Page Rank vs. Random Surfer Model
E(u) = “the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever”
Random Surfer ModelRandom Surfer Model
17
ImplementationImplementation Computing resources — 24 million pages — 75 million URLs
— Process 550 pages/sec Memory and disk storage
Weight Vector (4 byte float)
Matrix A (linear access)
18
ImplementationImplementation
Assign a unique integer ID Sort and Remove dangling links Rank initial assignment Iteration until convergence Add back dangling links and Re-
compute
19
Convergence PropertiesConvergence Properties
Using theory of random walks on graphs
O(log(|V|)) due to rapidly mixing graph G of the web.
20
Convergence PropertiesConvergence Properties
21
Searching with PageRankSearching with PageRank
Using title search
Comparing with Altavista
22
Sample ResultsSample Results
23
Some Applications Some Applications
Estimate web traffic
Backlink predictor
User Navigation
24
ConclusionConclusion PageRank is a global ranking based
on the web's graph structure PageRank uses backlinks
information to bring order to the web
PageRank can separate out representative pages as cluster center
A great variety of applications