Link Building Martin Olsen Department of Computer Science Aarhus University 1.

39
Link Building Martin Olsen Department of Computer Science Aarhus University 1
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of Link Building Martin Olsen Department of Computer Science Aarhus University 1.

Link Building

Martin Olsen

Department of Computer Science

Aarhus University

1

Outline

• Motivation and Introduction• Contribution

Link Building Communities in Networks Hedonic Games Simple Games

2

• ... in 2012, companies will spend almost $9 billion on search engine optimization …

The New York Times, January 2009

Objective of SEO: A link to your page appears here on page 1

What is Search Engine Optimization (SEO) ?

3

www as a Graph

4

=

=

PageRank. Random Surfer Perspective

5

2

5 76

3

4

1

8 109

100

100

100 100

100

100 100 100 100 100

1000 random surfers

Random SurferZaps with probability 0.15

PageRank. Random Surfer PerspectiveRandom SurferZaps with probability 0.15

6

2

5 76

3

4

1

8 109

143 = 85 + 85/2 +15

270

100 58

355 = 4 85 + 15

15 15 15 15 15

1000 random surfersDistribution after one tick

PageRank. Random Surfer Perspective

7

2

5 76

3

4

1

8 109

281

280

254 43

66

15 15 15 15 15

1000 random surfersStationary distribution after 50 ticks

Random SurferZaps with probability 0.15

PageRank. Random Surfer Perspective

visits#total

) nodetovisitsE(# )P

iiXi (

8

2

5 76

3

4

1

8 109

0.281

0.280

0.254 0.043

0.066

0.015 0.015 0.015 0.015 0.015

Random SurferZaps with probability 0.15

PageRank. Random Surfer Perspective

9

2

5 76

3

4

1

8 109

0.281

0.280

0.254 0.043

0.066

0.015 0.015 0.015 0.015 0.015

PageRank Ranking: 1, 2, 4, 3, 6PageRank is an important ingredient of the ranking mechanismRelevance counts as well!

Random SurferZaps with probability 0.15

Link Building is an Important Aspect of SEO

10

Contribution/Link Building

The Computational Complexity of Link Building (Cocoon ´08)Olsen

Maximizing PageRank with new Backlinks (submitted)Olsen

MILP for Link Building (In preparation)Olsen, Viglas

11

12

The Link Building Problem. Formal Definition

LINK BUILDINGInstance : G(V, E), t V, k Z+

Solution : S V {t} with S k maximizing t after adding S {t} to E

13

Link Building is not Trivial

2

5

7

6

3

4

1

8

0.272

0.096

0.0690.085

0.0600.091

0.078

0.250

2

5

7

6

3

4

1

8

0.367

0.039

0.0490.070

0.0350.049

0.060

0.331

2

5

7

6

3

4

1

8

0.375

0.054

0.0540.054

0.0420.042

0.042

0.337

PageRank Topology Theorem*)

: The expected number of visits to p for a random surfer starting at u prior to the first zapping event

111

111

1

11

111 85.0

85.0

85.085.0

85.085.0

j

i

jjjjiji

jijiiiiji zz

zz

zzoutzz

zzzzout

upz

14

i

j1 1 increase in PageRank

Does the graph contain an independent set of size k? Can we turn this question into a Link Building problem?

k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING

15

j

i

ji

16

k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING

1

:~1

x y

OPT!

i

j

Basic idea: Make zij relatively big

17

k-REGULAR INDEPENDENT SET ≤FPT LINK BUILDING

1

:~1

x y

OPT!

i

j

Basic idea: Make zij relatively big

LINK BUILDING is W[1]-hard *) :LINK BUILDING solvable in time f(k) nc k-REGULAR INDEPENDENT SET solvable in time f(k) nc W[1] = FPT

Another result:FPTAS for LINK BUILDING NP = P

Upper Bound: k = 1 fixed

The dashed link can be found in time corresponding to O(1) PageRank computations with a randomized scheme *).

18

2

5

7

6

3

4

1

8

0.272

0.096

0.0690.085

0.0600.091

0.078

0.250

2

5

7

6

3

4

1

8

0.338

0.070

0.0600.048

0.0480.060

0.070

0.306

Upper Bound: Mixed Integer Linear Programming Approach *)

Price for link from i

Compute the cheapest set of new incoming links that would make node 5 rank highest

1

i

i

out

19

2

5

7

6

3

4

1

8

0.187

0.061

0.0490.189

0.0360.099

0.200

0.178

A Quiz: Which of the two situations would be optimal for Martin?

20

Contribution/Communities in Networks

Communities in Large Networks: Identification and Ranking (WAW ´06)Olsen

21

22

Communities in Networks

Dolphins in Doubtful Sound [Newman, Girvan ´04]:

23

What is a Community?

Informally: A community C is a set of nodes with relatively many links between them

Assumption/Observation: A CS site has relatively many CS links!

Formal definition based on assumption *) :

iiC

Cjjiw

out

}:),{(# v C, u C: wvC ≤ wuC

C

24

A Greedy Approach for Detecting Members of a Community *)

Repeat until C is a Community:• Find v C with maximum attention to C• C C {v}

• Update attentions

Use two priority queues holding elements in C and V C

C Old 1)

1) Old C 2) New C

25

An Experiment. A Danish CS Community

• Crawl of the dk-domain with 180.468 sites in total

• Representatives = 4 CS sites• CS-Community with 556 sites• Minimum attention, : 15.8%• Maximum attention, : 15.4%

CuCv

Ranking:

1) www.daimi.au.dk (CS U Aarhus)2) www.diku.dk (CS U Copenhagen)3) www.itu.dk (ITU Copenhagen)4) www.cs.auc.dk (CS U Aalborg)5) www.brics.dk (CS PhD School)6) www.imm.dtu.dk

(Informatics/Mathematical modeling DTU Copenhagen)

…17) www.imada.sdu.dk (CS/Mathematics U

Southern Denmark)

26

Other Results

Computing non trivial communities by the definition given is NP-hard

A simple model for the evolution of communities is presented.

These communities are probably obeying the definition for large n if the out degree of the nodes is (log n).

Contribution/Hedonic Games

Nash Stability in Additively Separable Hedonic Games Is NP-Hard (CiE ´07)Olsen

Extended version:Nash Stability in Additively Separable Hedonic Games and Community Structures (Theory of Computing Systems ´09)Olsen

27

An Additively Separable Hedonic Game

Five waterholes w1, …, w5 with capacities 1, 2, 3, 4 and 8 l/h respectively.

Two buffaloes b1 and b2 that hate each other. They are only thirsty if they have a parasite on their back in which case they have to drink 9 l/h.

Two gigantic parasites p1 and p2. They only want to sit on b1 and b2 respectively.

28

An Additively Separable Hedonic Game

w 1

b 1 b 2

w 2

w 5

1 1

-9 -9

-19

p 1 p 2

-19

w 3

w 4

8

3

1

2

4

8

3

1

2

4

One Nash Equilibrium for the game:

PARTITION ≤ NE in ASHG NPC *)

29

30

Community Structures in Networks

Put a 1 on each connection between two dolphins. The community structure is a NE!

NE community structure?NE’s are NP-hard to compute even with symmetric and positive payoffs*)

Contribution/Simple Games

On the Complexity of Problems on Simple Games (submitted)Freixas, Molinero, Olsen, Serna

31

32

Open Problems/Future Work

• In the thesis we show LINK BUILDING APX. Is there a PTAS for LINK BUILDING?

• Surgical Link Building: Isolate the Community C Model all pages in V C as one page Use MILP

• Use information on distribution of PageRank• Does the stuff presented really work?

• Thank You!

Link Building. A Real World Example

Dear X We are trying to get more links to our website to help improve its

rating on the search engines.

We were wondering if you could put a link to our site … on your webpage or blog.

If you have a website or a Blog and put a link to our page on it then to say thank you for each month it is up, I will give you …

Source: An e-mail to a colleague X

33

34

Link Building is not Trivial. 2nd Example

Assumption: Obtaining a link from one green node is slightly better for node 1 compared to obtaining a link from one blue node.

Now node 1 can pick three incoming links for free. What should node 1 choose?

1

35

No FPTAS for LINK BUILDING if NP ≠ P *)

1

:~1

x y

OPT!

i

j

961 kdn OPT

36

Power Law

37

Fixed Parameter Tractability: FPT and W[1]

W[1]

FPTk-VERTEX COVER

k-REGULAR INDEPENDENT SET

k-INDEPENDENT SET

)( 792.0 knO

Complete for W[1]LINK BUILDING is W[1]-hard *)

Solvable in time f(k) nc

38

Other Results

Computing non trivial communities by the definition given is NP-hard

A simple model for the evolution of communities is presented.

These communities are probably obeying the definition for large n if the out degree of the nodes is (log n).

25.01.01 p

75.02 p

9.0

C

Upper Bound: Mixed Integer Linear Programming Approach *)

The dashed links show the cheapest modification that will bring node 5 to the top of the ranking. Computed using a MILP approach.

Alternatively we could go for the maximum improvement in the ranking for a given budget.

1),(

i

i

outji

price for

39

2

5

7

6

3

4

1

8

0.272

0.096

0.0690.085

0.0600.091

0.078

0.250

2

5

7

6

3

4

1

8

0.187

0.061

0.0490.189

0.0360.099

0.200

0.178