Link Building Martin Olsen Department of Computer Science Aarhus University 1.
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of Link Building Martin Olsen Department of Computer Science Aarhus University 1.
Outline
• Motivation and Introduction• Contribution
Link Building Communities in Networks Hedonic Games Simple Games
2
• ... in 2012, companies will spend almost $9 billion on search engine optimization …
The New York Times, January 2009
Objective of SEO: A link to your page appears here on page 1
What is Search Engine Optimization (SEO) ?
3
PageRank. Random Surfer Perspective
5
2
5 76
3
4
1
8 109
100
100
100 100
100
100 100 100 100 100
1000 random surfers
Random SurferZaps with probability 0.15
PageRank. Random Surfer PerspectiveRandom SurferZaps with probability 0.15
6
2
5 76
3
4
1
8 109
143 = 85 + 85/2 +15
270
100 58
355 = 4 85 + 15
15 15 15 15 15
1000 random surfersDistribution after one tick
PageRank. Random Surfer Perspective
7
2
5 76
3
4
1
8 109
281
280
254 43
66
15 15 15 15 15
1000 random surfersStationary distribution after 50 ticks
Random SurferZaps with probability 0.15
PageRank. Random Surfer Perspective
visits#total
) nodetovisitsE(# )P
iiXi (
8
2
5 76
3
4
1
8 109
0.281
0.280
0.254 0.043
0.066
0.015 0.015 0.015 0.015 0.015
Random SurferZaps with probability 0.15
PageRank. Random Surfer Perspective
9
2
5 76
3
4
1
8 109
0.281
0.280
0.254 0.043
0.066
0.015 0.015 0.015 0.015 0.015
PageRank Ranking: 1, 2, 4, 3, 6PageRank is an important ingredient of the ranking mechanismRelevance counts as well!
Random SurferZaps with probability 0.15
Contribution/Link Building
The Computational Complexity of Link Building (Cocoon ´08)Olsen
Maximizing PageRank with new Backlinks (submitted)Olsen
MILP for Link Building (In preparation)Olsen, Viglas
11
12
The Link Building Problem. Formal Definition
LINK BUILDINGInstance : G(V, E), t V, k Z+
Solution : S V {t} with S k maximizing t after adding S {t} to E
13
Link Building is not Trivial
2
5
7
6
3
4
1
8
0.272
0.096
0.0690.085
0.0600.091
0.078
0.250
2
5
7
6
3
4
1
8
0.367
0.039
0.0490.070
0.0350.049
0.060
0.331
2
5
7
6
3
4
1
8
0.375
0.054
0.0540.054
0.0420.042
0.042
0.337
PageRank Topology Theorem*)
: The expected number of visits to p for a random surfer starting at u prior to the first zapping event
111
111
1
11
111 85.0
85.0
85.085.0
85.085.0
j
i
jjjjiji
jijiiiiji zz
zz
zzoutzz
zzzzout
upz
14
i
j1 1 increase in PageRank
Does the graph contain an independent set of size k? Can we turn this question into a Link Building problem?
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING
15
j
i
ji
16
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING
1
:~1
x y
OPT!
i
j
Basic idea: Make zij relatively big
17
k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING
1
:~1
x y
OPT!
i
j
Basic idea: Make zij relatively big
LINK BUILDING is W[1]-hard *) :LINK BUILDING solvable in time f(k) nc k-REGULAR INDEPENDENT SET solvable in time f(k) nc W[1] = FPT
Another result:FPTAS for LINK BUILDING NP = P
Upper Bound: k = 1 fixed
The dashed link can be found in time corresponding to O(1) PageRank computations with a randomized scheme *).
18
2
5
7
6
3
4
1
8
0.272
0.096
0.0690.085
0.0600.091
0.078
0.250
2
5
7
6
3
4
1
8
0.338
0.070
0.0600.048
0.0480.060
0.070
0.306
Upper Bound: Mixed Integer Linear Programming Approach *)
Price for link from i
Compute the cheapest set of new incoming links that would make node 5 rank highest
1
i
i
out
19
2
5
7
6
3
4
1
8
0.187
0.061
0.0490.189
0.0360.099
0.200
0.178
Contribution/Communities in Networks
Communities in Large Networks: Identification and Ranking (WAW ´06)Olsen
21
23
What is a Community?
Informally: A community C is a set of nodes with relatively many links between them
Assumption/Observation: A CS site has relatively many CS links!
Formal definition based on assumption *) :
iiC
Cjjiw
out
}:),{(# v C, u C: wvC ≤ wuC
C
24
A Greedy Approach for Detecting Members of a Community *)
Repeat until C is a Community:• Find v C with maximum attention to C• C C {v}
• Update attentions
Use two priority queues holding elements in C and V C
C Old 1)
1) Old C 2) New C
25
An Experiment. A Danish CS Community
• Crawl of the dk-domain with 180.468 sites in total
• Representatives = 4 CS sites• CS-Community with 556 sites• Minimum attention, : 15.8%• Maximum attention, : 15.4%
CuCv
Ranking:
1) www.daimi.au.dk (CS U Aarhus)2) www.diku.dk (CS U Copenhagen)3) www.itu.dk (ITU Copenhagen)4) www.cs.auc.dk (CS U Aalborg)5) www.brics.dk (CS PhD School)6) www.imm.dtu.dk
(Informatics/Mathematical modeling DTU Copenhagen)
…17) www.imada.sdu.dk (CS/Mathematics U
Southern Denmark)
26
Other Results
Computing non trivial communities by the definition given is NP-hard
A simple model for the evolution of communities is presented.
These communities are probably obeying the definition for large n if the out degree of the nodes is (log n).
Contribution/Hedonic Games
Nash Stability in Additively Separable Hedonic Games Is NP-Hard (CiE ´07)Olsen
Extended version:Nash Stability in Additively Separable Hedonic Games and Community Structures (Theory of Computing Systems ´09)Olsen
27
An Additively Separable Hedonic Game
Five waterholes w1, …, w5 with capacities 1, 2, 3, 4 and 8 l/h respectively.
Two buffaloes b1 and b2 that hate each other. They are only thirsty if they have a parasite on their back in which case they have to drink 9 l/h.
Two gigantic parasites p1 and p2. They only want to sit on b1 and b2 respectively.
28
An Additively Separable Hedonic Game
w 1
b 1 b 2
w 2
w 5
1 1
-9 -9
-19
p 1 p 2
-19
w 3
w 4
8
3
1
2
4
8
3
1
2
4
One Nash Equilibrium for the game:
PARTITION ≤ NE in ASHG NPC *)
29
30
Community Structures in Networks
Put a 1 on each connection between two dolphins. The community structure is a NE!
NE community structure?NE’s are NP-hard to compute even with symmetric and positive payoffs*)
Contribution/Simple Games
On the Complexity of Problems on Simple Games (submitted)Freixas, Molinero, Olsen, Serna
31
32
Open Problems/Future Work
• In the thesis we show LINK BUILDING APX. Is there a PTAS for LINK BUILDING?
• Surgical Link Building: Isolate the Community C Model all pages in V C as one page Use MILP
• Use information on distribution of PageRank• Does the stuff presented really work?
• Thank You!
Link Building. A Real World Example
Dear X We are trying to get more links to our website to help improve its
rating on the search engines.
We were wondering if you could put a link to our site … on your webpage or blog.
If you have a website or a Blog and put a link to our page on it then to say thank you for each month it is up, I will give you …
Source: An e-mail to a colleague X
33
34
Link Building is not Trivial. 2nd Example
Assumption: Obtaining a link from one green node is slightly better for node 1 compared to obtaining a link from one blue node.
Now node 1 can pick three incoming links for free. What should node 1 choose?
1
37
Fixed Parameter Tractability: FPT and W[1]
W[1]
FPTk-VERTEX COVER
k-REGULAR INDEPENDENT SET
k-INDEPENDENT SET
)( 792.0 knO
Complete for W[1]LINK BUILDING is W[1]-hard *)
Solvable in time f(k) nc
38
Other Results
Computing non trivial communities by the definition given is NP-hard
A simple model for the evolution of communities is presented.
These communities are probably obeying the definition for large n if the out degree of the nodes is (log n).
25.01.01 p
75.02 p
9.0
C
Upper Bound: Mixed Integer Linear Programming Approach *)
The dashed links show the cheapest modification that will bring node 5 to the top of the ranking. Computed using a MILP approach.
Alternatively we could go for the maximum improvement in the ranking for a given budget.
1),(
i
i
outji
price for
39
2
5
7
6
3
4
1
8
0.272
0.096
0.0690.085
0.0600.091
0.078
0.250
2
5
7
6
3
4
1
8
0.187
0.061
0.0490.189
0.0360.099
0.200
0.178