Fast Random Walk with Restart and Its Applications

44
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan

description

Fast Random Walk with Restart and Its Applications. Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan. ICDM 2006 Dec. 18-22, HongKong. Motivating Questions. Q: How to measure the relevance? - PowerPoint PPT Presentation

Transcript of Fast Random Walk with Restart and Its Applications

Page 1: Fast Random Walk with Restart and Its Applications

Fast Random Walk with Restart and Its Applications

Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan

ICDM 2006 Dec. 18-22, HongKong

Page 2: Fast Random Walk with Restart and Its Applications

2

Motivating Questions

• Q: How to measure the relevance?

• A: Random walk with restart

• Q: How to do it efficiently?

• A: This talk tries to answer!

Page 3: Fast Random Walk with Restart and Its Applications

5

1

4

3

2

56

7

910

8

11

12

Random walk with restart

Page 4: Fast Random Walk with Restart and Its Applications

6

Random walk with restart

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

4

3

2

56

7

910

811

120.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector

4r

Page 5: Fast Random Walk with Restart and Its Applications

7

Automatic Image Caption

• [Pan KDD04]

Text

Image

Region

Test Image

Jet Plane RunwayCandy

Texture

Background

Page 6: Fast Random Walk with Restart and Its Applications

8

Neighborhood Formulation

• [Sun ICDM05]

Page 7: Fast Random Walk with Restart and Its Applications

9

Center-Piece Subgraph

• [Tong KDD06]

R. Agrawal Jiawei Han

V. Vapnik M. Jordan

H.V. Jagadish

Laks V.S. Lakshmanan

Heikki Mannila

Christos Faloutsos

Padhraic Smyth

Corinna Cortes

15 1013

1 1

6

1 1

4 Daryl Pregibon

10

2

11

3

16

Page 8: Fast Random Walk with Restart and Its Applications

10

Other Applications

• Content-based Image Retrieval• Personalized PageRank• Anomaly Detection (for node; link)• Link Prediction [Getoor], [Jensen], …• Semi-supervised Learning• ….

• [Put Authors]

Page 9: Fast Random Walk with Restart and Its Applications

11

Roadmap

• Background– RWR: Definitions– RWR: Algorithms

• Basic Idea• FastRWR

– Pre-Compute Stage– On-Line Stage

• Experimental Results• Conclusion

Page 10: Fast Random Walk with Restart and Its Applications

12

Computing RWR

1

43

2

5 6

7

9 10

811

12

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0

0.13

0.22

0.13

0.050.9

0.05

0.08

0.04

0.03

0.04

0.02

0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0 0 0 0 0 0 0 0 0 1/3 1/3 0

0.13 0

0.10 0

0.13 0

0.22

0.13 0

0.05 00.1

0.05 0

0.08 0

0.04 0

0.03 0

0.04 0

2 0

1

0.0

n x n n x 1n x 1

Ranking vector starting vectorAdjacent matrix

(1 )i i ir cWr c e

Q: Given ei, how to solve?

1

Page 11: Fast Random Walk with Restart and Its Applications

13

1

43

2

5 6

7

9 10

8 11

120.130.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 1/4 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4

0.9

0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

0

0

0

0

00.1

0

0

0

0

0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

0

0

0

1

0

0

0

0

0

0

0

0

0.13

0.10

0.13

0.22

0.13

0.05

0.05

0.08

0.04

0.03

0.04

0.02

1

43

2

5 6

7

9 10

811

12

0.3

0

0.3

0.1

0.3

0

0

0

0

0

0

0

0.12

0.18

0.12

0.35

0.03

0.07

0.07

0.07

0

0

0

0

0.19

0.09

0.19

0.18

0.18

0.04

0.04

0.06

0.02

0

0.02

0

0.14

0.13

0.14

0.26

0.10

0.06

0.06

0.08

0.01

0.01

0.01

0

0.16

0.10

0.16

0.21

0.15

0.05

0.05

0.07

0.02

0.01

0.02

0.01

0.13

0.10

0.13

0.22

0.13

0.05

0.05

0.08

0.04

0.03

0.04

0.02

e

Wr

r

No pre-computation/ light storage

Slow on-line response

(1 )r cWr c e

O(mE)

Page 12: Fast Random Walk with Restart and Its Applications

14

2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34

1.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45

1.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33

1.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32

0.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56

0.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22

0.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13

0.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79

0.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80

0.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72

0.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

PreCompute: 1 1( )Q I cW

1 1( )Q I cW

0.13

0.10

0.13

0.22

0.13

0.05

0.05

0.08

0.04

0.03

0.04

0.02

0.1 1

43

2

5 6

7

9 10

8 11

120.130.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

1

43

2

5 6

7

9 10

811

12

Fast on-line response

Heavy pre-computation/storage costO(n^3) O(n^2)

Page 13: Fast Random Walk with Restart and Its Applications

15

Q: How to Balance?

On-line Off-line

Page 14: Fast Random Walk with Restart and Its Applications

16

Roadmap

• Background– RWR: Definitions– RWR: Algorithms

• Basic Idea• FastRWR

– Pre-Compute Stage– On-Line Stage

• Experimental Results• Conclusion

Page 15: Fast Random Walk with Restart and Its Applications

17

1

43

2

5 6

7

9 10

811

12

Basic Idea

1

43

2

5 6

7

9 10

811

12

1

43

2

5 6

7

9 10

8 11

120.130.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

1

43

2

5 6

7

9 10

811

12

Find Community

Fix the remaining

Combine

Page 16: Fast Random Walk with Restart and Its Applications

18

Basic Idea: Pre-computational stage

• A few small, instead of ONE BIG, matrices inversions

U V

Q-matrices Link matrices

+

1Q

Page 17: Fast Random Walk with Restart and Its Applications

19

Basic Idea: On-Line Stage

• A few, instead of MANY, matrix-vector multiplication

1Q

UV

+ +

Query

0

0

0

0

0

0

1

0

0

0

0

0

ir

ie

Result

Page 18: Fast Random Walk with Restart and Its Applications

20

Roadmap

• Background

• Basic Idea

• FastRWR– Pre-Compute Stage– On-Line Stage

• Experimental Results

• Conclusion

Page 19: Fast Random Walk with Restart and Its Applications

21

Pre-compute Stage

• p1: B_Lin Decomposition– P1.1 partition– P1.2 low-rank approximation

• p2: Q matrices– P2.1 computing (for each partition)– P2.2 computing (for concept space)

11Q

Page 20: Fast Random Walk with Restart and Its Applications

22

P1.1: partition

1

43

2

5 6

7

9 10

811

12

1

43

2

5 6

7

9 10

811

12

1 2WW W Within-partition links cross-partition links

Page 21: Fast Random Walk with Restart and Its Applications

23

P1.1: block-diagonal 1W

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 0 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1/2 1/2 0 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0 0 0 0 0 0 0 0 0 1/3 1/3 0

11

1 22

33

0 0

0 0

0 0

W

W W

W

1

43

2

5 6

7

9 10

811

12

11W

12W

13W

1

43

2

5 6

7

9 10

811

12

Page 22: Fast Random Walk with Restart and Its Applications

24

P1.2: LRA for

2 SW U V

2W

31

4

2

5 6

7

9 10

811

12

0 0 0 0

-0.18 -0.36 0.13 -0.90

0 0 0 0

0.36 -0.18 0.90 0.13

-0.40 -0.81 -0.06 0.40

0 0 0 0

0 0 0 0

0.81 -0.40 -0.40 -0.06

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0.60 0 -0.30 0.65 0 0 -0.32 0 0 0 0

0 -0.30 0 -0.60 -0.32 0 0 -0.65 0 0 0 0

0 -0.72 0 -0.11 0.66 0 0 0.10 0 0 0 0

0 -0.11 0

0.72 0.10 0 0 -0.66 0 0 0 0

0.44 0 0 0

0 0.44 0 0

0 0 0.18 0

0 0 0 0.18

U

VS

1

43

2

5 6

7

9 10

811

12

Page 23: Fast Random Walk with Restart and Its Applications

25

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 0 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1/2 1/2 0 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0 0 0 0 0 0 0 0 0 1/3 1/3 0

11W

12W

13W

31

4

29 10

811

12

5 6

7

c3c1

c4

c21

43

2

5 6

7

9 10

811

12

0 0 0 0

-0.18 -0.36 0.13 -0.90

0 0 0 0

0.36 -0.18 0.90 0.13

-0.40 -0.81 -0.06 0.40

0 0 0 0

0 0 0 0

0.81 -0.40 -0.40 -0.06

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0.60 0 -0.30 0.65 0 0 -0.32 0 0 0 0

0 -0.30 0 -0.60 -0.32 0 0 -0.65 0 0 0 0

0 -0.72 0 -0.11 0.66 0 0 0.10 0 0 0 0

0 -0.11 0

0.72 0.10 0 0 -0.66 0 0 0 0

0.44 0 0 0

0 0.44 0 0

0 0 0.18 0

0 0 0 0.18

UVS

+W

1W

Page 24: Fast Random Walk with Restart and Its Applications

26

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 0 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1/2 1/2 0 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0 0 0 0 0 0 0 0 0 1/3 1/3 0

11W

12W

13W

p2.1 Computing

11

1 12

13

0 0

0 0

0 0

W

W W

W

1.85 0.88 1.08 0.88 0 0 0 0 0 0 0 0

0.88 1.52 0.88 0.52 0 0 0 0 0 0 0 0

1.08 0.88 1.85 0.88 0 0 0 0 0 0 0 0

0.88 0.52 0.88 1.52 0 0 0 0 0 0 0 0

0 0 0 0 1.58 1.29 1.29 0 0 0 0 0

0 0 0 0 0.64 1.78 1.09 0 0 0 0 0

0 0 0 0 0.64 1.09 1.78 0 0 0 0 0

0 0 0 0 0 0 0 1.42 1.00 0.79 0.89 0.76

0 0 0 0 0 0 0 0.50 1.61 0.86 0.60 0.66

0 0 0 0 0 0 0 0.59 1.30 2.29 1.35 1.64

0 0 0 0 0 0 0 0.67 0.91 1.35 2.07 1.54

0 0 0 0 0 0 0 0.38 0.66 1.09 1.02 1.95

11Q

1,1

1,2

1

1 11

11,3

0 0

0 0

0 0

Q

Q Q

Q

1,

1 1( )i iiQ I cW

1,1

1Q

11,2Q

1,

1 1( )i iiQ I cW

1,

1 1( )i iiQ I cW 1

1,3Q

Page 25: Fast Random Walk with Restart and Its Applications

27

Comparing and

• Computing Time– 100,000 nodes; 100 partitions– Computing 100,00x is Faster!

• Storage Cost (100x saving!)

11Q

11Q1Q

11Q

11Q

Page 26: Fast Random Walk with Restart and Its Applications

28

p2.2 Computing:

1S 1

1Q UV=_

1 1 11( )S cVQ U

-1

1

43

2

5 6

7

9 10

811

12

Page 27: Fast Random Walk with Restart and Its Applications

29

SM Lemma says:

We have:

U V

1 1 1 11 1 1Q Q cQ U VQ

Q-matricies Link matrices1

1Q

Page 28: Fast Random Walk with Restart and Its Applications

30

Roadmap

• Background

• Basic Idea

• FastRWR– Pre-Compute Stage– On-Line Stage

• Experimental Results

• Conclusion

Page 29: Fast Random Walk with Restart and Its Applications

31

On-Line Stage

• Q

+

Query

0

0

0

0

0

0

1

0

0

0

0

0

ir

ie

Result

?1Q

UV

+

11Q

• A (SM lemma)

Page 30: Fast Random Walk with Restart and Its Applications

32

On-Line Query Stage

q1:q2:q3:q4:q5:q6:

Page 31: Fast Random Walk with Restart and Its Applications

33

ir

ie

0r

ir

ir

ir

ir

+ (1-c)

U c11Q

11Q

V

q1: Find the community

q2-q5: Compensate out-community Links

q6: Combine

Page 32: Fast Random Walk with Restart and Its Applications

34

Example

• We have

1Q

UV

+

11Q

• we want to: 4r

1

4

3

2

5 6

7

9 10

811

12

Page 33: Fast Random Walk with Restart and Its Applications

35

1.85 0.88 1.08 0.88 0 0 0 0 0 0 0 0

0.88 1.52 0.88 0.52 0 0 0 0 0 0 0 0

1.08 0.88 1.85 0.88 0 0 0 0 0 0 0 0

0.88 0.52 0.88 1.52 0 0 0 0 0 0 0 0

0 0 0 0 1.58 1.29 1.29 0 0 0 0 0

0 0 0 0 0.64 1.78 1.09 0 0 0 0 0

0 0 0 0 0.64 1.09 1.78 0 0 0 0 0

0 0 0 0 0 0 0 1.42 1.00 0.79 0.89 0.76

0 0 0 0 0 0 0 0.50 1.61 0.86 0.60 0.66

0 0 0 0 0 0 0 0.59 1.30 2.29 1.35 1.64

0 0 0 0 0 0 0 0.67 0.91 1.35 2.07 1.54

0 0 0 0 0 0 0 0.38 0.66 1.09 1.02 1.95

q1:Find Community

q1:

0r

1

43

21

43

2

5 6

7

9 10

811

12

Page 34: Fast Random Walk with Restart and Its Applications

36

q2-q5: out-community

0r

q2:q3:q4:

5 6

7

9 10

811

12

1

43

2

11 0ir Q U V r

Page 35: Fast Random Walk with Restart and Its Applications

37

q6: Combination

4r

q6:

+ 0.9 0.1 =

5 6

7

9 10

811

12

1

43

21

43

2

5 6

7

9 10

8 11

120.130.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

Page 36: Fast Random Walk with Restart and Its Applications

38

Roadmap

• Background

• Basic Idea

• FastRWR– Pre-Compute Stage– On-Line Stage

• Experimental Results

• Conclusion

Page 37: Fast Random Walk with Restart and Its Applications

39

Experimental Setup

• Dataset– DBLP/authorship– Author-Paper– 315k nodes– 1,800k edges

• Quality: Relative Accuracy

• Application: Center-Piece Subgraph

Page 38: Fast Random Walk with Restart and Its Applications

40

Query Time vs. Pre-Compute Time

Log Query Time

Log Pre-compute Time

Page 39: Fast Random Walk with Restart and Its Applications

41

Query Time vs. Pre-Storage

Log Query Time

Log Storage

Page 40: Fast Random Walk with Restart and Its Applications

43

Roadmap

• Background

• Basic Idea

• FastRWR– Pre-Compute Stage– On-Line Stage

• Experimental Results

• Conclusion

Page 41: Fast Random Walk with Restart and Its Applications

44

Conclusion

• FastRWR– Reasonable quality preservation (90%+)– 150x speed-up: query time– Orders of magnitude saving: pre-compute & storage

• More in the paper– The variant of FastRWR and theoretic justification– Implementation details

• normalization, low-rank approximation, sparse

– More experiments• Other datasets, other applications

Page 42: Fast Random Walk with Restart and Its Applications

45

Q&A

Thank you!

[email protected]

www.cs.cmu.edu/~htong

Page 43: Fast Random Walk with Restart and Its Applications

46

Future work

• Incremental FastRWR

• Paralell FastRWR– Partition– Q-matraces for each partition

• Hierarchical FastRWR– How to compute one Q-matrix for

Page 44: Fast Random Walk with Restart and Its Applications

47

Possible Q?

• Why RWR?