SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

115
SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture

Transcript of SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

Page 1: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

Proximity on Large Graphs

Speaker:

Hanghang Tong

2008-4-10 15-826 Guest Lecture

Page 2: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

2

Graphs are everywhere!

Page 3: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

4

Graph Mining: the big picture Graph/Global Level

Subgraph/Community Level

Node Level We are here!

Page 4: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

5

Proximity on Graph: What?

A BH1 1

D1 1

E

F

G1 11

I J1

1 1

a.k.a Relevance, Closeness, ‘Similarity’…

Page 5: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

6

Proximity is the main tool behind…

• Link prediction [Liben-Nowell+], [Tong+]• Ranking [Haveliwala], [Chakrabarti+]• Email Management [Minkov+]• Image caption [Pan+]• Neighborhooh Formulation [Sun+]• Conn. subgraph [Faloutsos+], [Tong+], [Koren+]• Pattern match [Tong+]• Collaborative Filtering [Fouss+]• Many more…

Will return to this later

Page 6: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

7

Roadmap

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

• Basic: RWR

• Variants

• Asymmetry of Prox.

• Group Prox

• Prox w/ Attributes

• Prox w/ Time

Page 7: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

8

Why not shortest path?

‘pizza delivery guy’ problem

A BD1 1

A BD1 1

E

F

G1 11

A BD1 1

E F1

1 1

A BD1 1

‘multi-facet’ relationship

Some ``bad’’ proximities

Page 8: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

9

A BD1 1

A BD1 11 E

Why not max. netflow?

No punishment on long paths

Some ``bad’’ proximities

Page 9: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

11

What is a ``good’’ Proximity?

A BH1 1

D1 1

E

F

G1 11

I J1

1 1

• Multiple Connections

• Quality of connection

•Direct & In-directed Conns

•Length, Degree, Weight…

Page 10: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

12

1

4

3

2

56

7

910

8

11

12

Random walk with restart

Page 11: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

13

Random walk with restart

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

4

3

2

56

7

910

811

120.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant

Nearby nodes, higher scores

4r

Page 12: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

Why RWR is a good score?

14

2c 3cc ...

W 2W 3W

all paths from i to j with length 1

all paths from i to j with length 2

all paths from i to j with length 3

Page 13: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

15

Roadmap

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

• Basic: RWR

• Variants

• Asymmetry of Prox.

• Group Prox

• Prox w/ Attributes

• Prox w/ Time

Page 14: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

16

Variant: escape probability

• Define Random Walk (RW) on the graph• Esc_Prob(AB)

– Prob (starting at A, reaches B before returning to A)

Esc_Prob = Pr (smile before cry)

A Bthe remaining graph

Page 15: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

17

Other Variants

• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]

• Equivalence of Random Walks– Electric Networks:

• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+]

– String Systems

• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]

Page 16: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

18

Other Variants

• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]

• Equivalence of Random Walks– Electric Networks:

• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+]

– String Systems

• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]

All are related to, or similar to random walk with restart!

Page 17: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

19

Roadmap

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

• Basic: RWR

• Variants

• Asymmetry of Prox.

• Group Prox

• Prox w/ Attributes

• Prox w/ Time

Page 18: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

20

Asymmetry of Proximity [Tong+ KDD07 a]

A B

1 1

1

111

1 A B

1 1

1

10.51

0.5

What is Prox from A to B?What is Prox from B to A?

What is Prox between A and B?

Page 19: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

21

Asymmetry also exists in un-directed graphs

• Hanghang’s most important conf. is KDD

• The most important author in KDD is ...

So is love…

HanghangKDD

Page 20: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

22

Roadmap

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

• Basic: RWR

• Variants

• Asymmetry of Prox.

• Group Prox

• Prox w/ Attributes

• Prox w/ Time

Page 21: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

23

Group Proximity [Tong+ 2007]

• Q: How close are Accountants to SECs?

• A: Prob (starting at any RED, reaches any GREEN before touching any RED again)

Accountant

CEO

Manager

SEC

Page 22: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

24

Proximity on Attribute Graphs

What is the proximity from node 7 to 10?If we know that…

Accountant

CEO

Manager

SEC

Page 23: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

25

Sol: Augmented graphs

12

1311

9

10

5

6

7

8

1

2

3

4

Page 24: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

26

Attributes on nodes/edges (ER graph) [Chakrabarti+ WWW07]

skip

Wrote Sent Received

In-Replied-toCited

Works

Page 25: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

27

Proximity w/ Time

• Sol #1: treat time an categorical attr. [Minkov+]

• Sol #2: aggregate slice matrices [Tong+]

Time

•Global aggregation

•Slide window

•Exponential emphasis

Page 26: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

28

Summary of Part I

• Goal: Summarize multiple … relationships

• Solutions– Basic: Random Walk with Restart– Property: Asymmetry– Variants: Esc_Prob and many others.– Generalization: Group Prox.; w/ Attr.; w/ Time

Page 27: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

29

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

Roadmap

• B_Lin: RWR

• FastAllDAP: Esc_Prob

• BB_Lin: Skewed BGs

• FastUpdate: Time-Evolving

Page 28: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

Preliminary: Sherman–Morrison Lemma

30

Tv

uAA =

If:

1 11 1 1

1( )

1

TT

T

A u v AA A u v A

v A u

Then:

Page 29: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

SM Lemma: Applications

• RLS – and almost any algorithm in time series!

• Leave-one-out cross validation for LS

• Kalman filtering

• Incremental matrix decomposition

• … and all the fast sols we will introduce!

31

Page 30: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

32

Computing RWR

1

43

2

5 6

7

9 10

811

12

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0

0.13

0.22

0.13

0.050.9

0.05

0.08

0.04

0.03

0.04

0.02

0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0 0 0 0 0 0 0 0 0 1/3 1/3 0

0.13 0

0.10 0

0.13 0

0.22

0.13 0

0.05 00.1

0.05 0

0.08 0

0.04 0

0.03 0

0.04 0

2 0

1

0.0

n x n n x 1n x 1

Ranking vector Starting vectorAdjacent matrix

1

(1 )i i ir cWr c e

Restart p

Page 31: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

33

Beyond RWR

P-PageRank[Haveliwala]

PageRank[Haveliwala]

RWR[Pan, Sun]

SM Learning[Zhou, Zhu]

RL in CBIR[He]

Fast RWR (B_Lin) Finds the Root Solution !

: Maxwell Equation for Web![Chakrabarti]

Page 32: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

34

• RWR is the building block for computing…– Escape Probability (augmented w/sink)

[Tong+]– ..Effective ConductancResistance Dist.

Commute Time– MRF (special structure) [Cohen]

• Similar Idea of B_Lin to compute other measurements

Beyond RWR

Page 33: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

35

• Q: Given query i, how to solve it?

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 1/4 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4

0.9

0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

0

0

0

0

00.1

0

0

0

0

0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

??

Adjacent matrix Starting vector

Page 34: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

36

1

43

2

5 6

7

9 10

8 11

120.130.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 1/4 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4

0.9

0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

0

0

0

0

00.1

0

0

0

0

0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

0

0

0

1

0

0

0

0

0

0

0

0

0.13

0.10

0.13

0.22

0.13

0.05

0.05

0.08

0.04

0.03

0.04

0.02

1

43

2

5 6

7

9 10

811

12

0.3

0

0.3

0.1

0.3

0

0

0

0

0

0

0

0.12

0.18

0.12

0.35

0.03

0.07

0.07

0.07

0

0

0

0

0.19

0.09

0.19

0.18

0.18

0.04

0.04

0.06

0.02

0

0.02

0

0.14

0.13

0.14

0.26

0.10

0.06

0.06

0.08

0.01

0.01

0.01

0

0.16

0.10

0.16

0.21

0.15

0.05

0.05

0.07

0.02

0.01

0.02

0.01

0.13

0.10

0.13

0.22

0.13

0.05

0.05

0.08

0.04

0.03

0.04

0.02

No pre-computation/ light storage

Slow on-line response O(mE)

ir

ir

Page 35: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

37

0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34

0.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45

0.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33

0.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32

0.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56

0.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

0.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22

0.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13

0.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79

0.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80

0.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72

0.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

4

PreCompute

1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r

1

43

2

5 6

7

9 10

8 11

120.130.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

13

2

5 6

7

9 10

811

12

[Haveliwala]

R:

Page 36: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

38

2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34

1.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45

1.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33

1.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32

0.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56

0.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22

0.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13

0.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79

0.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80

0.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72

0.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

PreCompute:

1

43

2

5 6

7

9 10

8 11

120.130.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

1

43

2

5 6

7

9 10

811

12

Fast on-line response

Heavy pre-computation/storage costO(n ) O(n )

0.13

0.10

0.13

0.22

0.13

0.05

0.05

0.08

0.04

0.03

0.04

0.02

3 2

Page 37: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

39

Q: How to Balance?

On-line Off-line

Page 38: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

40

B_Lin: Basic Idea[Tong+]

1

43

2

5 6

7

9 10

8 11

120.130.10

0.13

0.130.05

0.05

0.08

0.04

0.02

0.04

0.03

1

43

2

5 6

7

9 10

811

12

Find Community

Fix the remaining

Combine1

43

2

5 6

7

9 10

8 11

12

1

43

2

5 6

7

9 10

8 11

12

5 6

7

9 10

811

12

1

43

2

5 6

7

9 10

8 11

12

1

43

2

5 6

7

9 10

8 11

12

1

43

2

Page 39: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

41

Pre-computational stage

• Q: • A: A few small, instead of ONE BIG, matrices inversions

Efficiently compute and store Q-1

Page 40: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

42

• Q: Efficiently recover one column of Q• A: A few, instead of MANY, matrix-vector multiplication

On-Line Query Stage

+

0

0

0

0

0

0

1

0

0

0

0

0

-1

ie ir

Page 41: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

43

Pre-compute Stage

• p1: B_Lin Decomposition– P1.1 partition– P1.2 low-rank approximation

• p2: Q matrices– P2.1 computing (for each partition)– P2.2 computing (for concept space)

11Q

Page 42: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

44

P1.1: partition

1

43

2

5 6

7

9 10

811

12

1

43

2

5 6

7

9 10

811

12

Within-partition links cross-partition links

skip

Page 43: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

45

P1.1: block-diagonal

1

43

2

5 6

7

9 10

811

12

1

43

2

5 6

7

9 10

811

12

skip

Page 44: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

46

P1.2: LRA for

31

4

2

5 6

7

9 10

811

12

1

43

2

5 6

7

9 10

811

12

|S| << |W2|~

skip

Page 45: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

47

+

=

skip

Page 46: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

48

p2.1 Computing

c

skip

Page 47: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

49

Comparing and

• Computing Time– 100,000 nodes; 100 partitions– Computing 100,00x is Faster!

• Storage Cost – 100x saving!

Q 1,1

Q 1,2

Q 1,k

11Q

=

11Q

11Q

skip

Page 48: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

50

• Q: How to fix the green portions?

W +~

~~

11Q

+ ?

skip

Page 49: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

51

p2.2 Computing:

1S UV=_

-1

1

43

2

5 6

7

9 10

811

12

Q 1,1

Q 1,2

Q 1,k

skip

Page 50: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

52

SM Lemma says:

We have:

Communities Bridges

1 1 11 1

11U VcQQ Q Q

skip

Page 51: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

53

On-Line Stage

• Q

+

Query

0

0

0

0

0

0

1

0

0

0

0

0

Result

?

• A (SM lemma)

Pre-Computation

ie ir

skip

Page 52: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

54

On-Line Query Stage

q1:q2:q3:q4:q5:q6:

skip

Page 53: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

55

skip

Page 54: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

56

Query Time vs. Pre-Compute Time

Log Query Time

Log Pre-compute Time

•Quality: 90%+ •On-line:

•Up to 150x speedup•Pre-computation:

•Two orders saving

Page 55: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

57

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

Roadmap

• B_Lin: RWR

• FastAllDAP: Esc_Prob

• BB_Lin: Skewed BGs

• FastUpdate: Time-Evolving

Page 56: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

58

FastAllDAP [Tong+]

Footnote: augmented w/ universal sink as practical modification

A Bthe remaining graph

• Q: How to compute – Esc_Prob = Pr (smile before cry)?

Page 57: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

59

Solving DAP (Straight-forward way)

One matrix inversion, one proximity!

2 1,

ˆProx( )=c ( )ti j i ji j p I cP p c p

1 x (n-2) 1 x (n-2)(n-2) x (n-2)

1-c: fly-out probability (to black-hole)

Page 58: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

60

Esc_Prob(1->5) =

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

P=

I - +

-1

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

P: Transition matrix (row norm.)

2cc

Page 59: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

61

• Case 1, Medium Size Graph– Matrix inversion is feasible, but…– What if we want many proximities?– Q: How to get all (n ) proximities efficiently?– A: FastAllDAP!

• Case 2: Large Size Graph – Matrix inversion is infeasible– Q: How to get one proximity efficiently?– A: FastOneDAP!

Challenges

2

skip

Page 60: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

62

FastAllDAP

• Q1: How to efficiently compute all possible proximities on a medium size graph?– a.k.a. how to efficiently solve multiple

linear systems simultaneously?

• Goal: reduce # of matrix inversions!

Page 61: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

63

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

FastAllDAP: Observation

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

Need two different matrix inversions!

P=

P=

Page 62: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

64

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

FastAllDAP: Rescue

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

Redundancy among different linear systems!

P=

P=

Overlap between two gray parts!

Prox(1 5)

Prox(1 6)

Page 63: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

65

FastAllDAP: Theorem

• Theorem:

• Proof: by SM Lemma

• Example:

Page 64: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

66

FastAllDAP: Algorithm

• Alg.– Compute Q– For i,j =1,…, n, compute

• Computational Save O(1) instead of O(n )!

• Example– w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix!

2

Page 65: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

67

FastAllDAP

Size of Graph

Time (sec)

Straight-Solver

FastAllDAP

1,000xfaster!

Page 66: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

68

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

Roadmap

• B_Lin: RWR

• FastAllDAP: Esc_Prob

• BB_Lin: Skewed BGs

• FastUpdate: Time-Evolving

Page 67: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

RWR on Bipartite Graph

69

n

m

authors

Conferences

Author-Conf. Matrix

Observation: n >> m!

Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs, 18k mvs

Page 68: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

70

• Q: Given query i, how to solve it?

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 1/4 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4

0.9

0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

0

0

0

0

00.1

0

0

0

0

0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

RWR on Skewed bipartite graphs

??

… . . .. . …. .. .

. …. . ..

0

0

n m

Ar

… .

. .. . …. .. .

. …. . .. Ac

Page 69: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

• Step 1:

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes

71

BB_Lin: Pre-Computation [Tong+ 06]

M = Ac ArX

1( 0.9 )I M 3( )O m m E

2-step RWR for Conferences

All Conf-Conf Prox. Scores

Page 70: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

72

BB_Lin: Pre-Computation [Tong+ 06]

• Step 1:

• Step 2:

M = Ac ArX

1( 0.9 )I M

2-step RWR for Conferences

All Conf-Conf Prox. Scores

Page 71: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

73

BB_Lin: Pre-Computation [Tong+ 06]

• Step 1:

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes

M = Ac ArX

1( 0.9 )I M 3( )O m m E

2-step RWR for Conferences

All Conf-Conf Prox. Scores

Ac/ArE edges

m x m

Page 72: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

BB_Lin: On-Line Stage

74Ac/Ar

E edges

Case 1: - Conf - Conf

authors

Conferences

Read out !

Page 73: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

BB_Lin: On-Line Stage

75Ac/Ar

E edges

Case 2: - Au - Conf

authors

Conferences

1 matrix-vec!

Page 74: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

BB_Lin: On-Line Stage

76Ac/Ar

E edges

Case 3: - Au - Au

authors

Conferences

2 m atrix-vec!

Page 75: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

BB_Lin: Examples

• NetFlix dataset (2.7m user x 18k movies)– 1.5hr for pre-computation; – <1 sec for on-line

• DBLP dataset (400k authors x 3.5k confs)– A few minutes for pre-computation– <0.01 sec for on-line

77

Page 76: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

78

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

Roadmap

• B_Lin: RWR

• FastAllDAP: Esc_Prob

• BB_Lin: Skewed BGs

• FastUpdate: Time-Evolving

Page 77: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

79

Challenges

• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– w/ 1.5 hr pre-computation for m x m core matrix– fraction of seconds for on-line query

• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– 1.5hr itself becomes a part of on-line cost!

Page 78: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

80

1( 0.9 )I M 3( )O m m E t=0

Q: How to update the core matrix?

t=11( 0.9 )I M

~ ~3( )O m m E

?

Page 79: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

Update the core matrix

• Step 1:

• Step 2:

81

M = Ac ArX

1( 0.9 )I M ~

~

~ ?

M= X+

Rank 2 update

= + X

2( )O m 3( )O m m E

Page 80: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

Update : General Case [Tong+ 2008]

• E’ edges changed

• Involves n’ authors, m’ confs.

• Observation

82

M = Ac ArX~

min( ', ') 'n m E

n authors

m Conferences

Page 81: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

83

• Observation: – the rank of update is small!

• Algorithm:– E’ edges changed– Involves n’ authors, m’ confs.

– our Alg.

– (details in the paper)

Update : General Case

2(min( ', ') ')O n m m E3( )O m m E

min( ', ') 'n m E

83

n authors

m Conferences

Page 82: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

84

FastOneUpdate

176x speedup

40x speedup

Time (Seconds)

Datasets

Page 83: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

85

Fast-Batch-Update

Min (n’, m’)E’

Time (Seconds)Time (Seconds)

15x speed-up on average!

Page 84: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

86

Summary of Part II

• Goal: Efficiently Solve Linear System(s)

• Sols.– B_Lin: Approximate one large linear system– FastAllDAP: multiple inner-related linear

systems– BB_Lin: the intrinsic complexity is small– FastUpdate: (smooth) dynamic linear system

Page 85: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

87

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

B_Lin

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

FastAllDAP

BB_Lin…FastUpdate

Page 86: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

88

• Motivation

• Part I: Definitions

• Part II: Fast Solutions

• Part III: Applications

• Conclusion

Roadmap

• Link Prediction

• NF

• gCap

• CePS

• G-Ray

• pTrack/cTrack

Page 87: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

890 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0

0.05

0.1

0.15

0.2

0.25

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18 Link Prediction: existence

no link

with link

density

density

Prox (ij)+Prox (ji)

Prox (ij)+Prox (ji)

Prox. is effective to distinguish red and blue!

Page 88: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

90

Link Prediction: direction

• Q: Given the existence of the link, what is the direction of the link?

• A: Compare prox(ij) and prox(ji)>70%

Prox (ij) - Prox (ji)

density

Page 89: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

91

Neighborhood Formulation

ICDM

KDD

SDM

Philip S. Yu

IJCAI

NIPS

AAAI M. Jordan

Ning Zhong

R. Ramakrishnan

… …

Conference Author

A: RWR! [Sun ICDM2005]

Q: what is most related conference to ICDM

Page 90: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

92

NF: example

ICDM

KDD

SDM

ECML

PKDD

PAKDD

CIKM

DMKD

SIGMOD

ICML

ICDE

0.009

0.011

0.0080.007

0.005

0.005

0.005

0.0040.004

0.004

Page 91: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

93

gCaP: Automatic Image Caption• Q

Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger

{?, ?, ?,}

?A: RWR! [Pan KDD2004]

Page 92: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

94

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

Keyword

Region

Page 93: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

95

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

Keyword

Region

{Grass, Forest, Cat, Tiger}

Page 94: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

96

Center-Piece Subgraph(CePS)

A C

B

A C

B

?

Original GraphBlack: query nodes

CePS

Q

A: RWR! [Tong KDD 2006]

Red: Max (Prox(Red, A) x Prox(Red, B) x Prox(Red, C))

CePS guy

Page 95: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

97

CePS: Example

R. Agrawal Jiawei Han

V. Vapnik M. Jordan

H.V. Jagadish

Laks V.S. Lakshmanan

Heikki Mannila

Christos Faloutsos

Padhraic Smyth

Corinna Cortes

15 1013

1 1

6

1 1

4 Daryl Pregibon

10

2

11

3

16

Page 96: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

98

K_SoftAnd: Relaxation of AND

Asking AND query? No Answer!

Disconnected Communities

Noise

Page 97: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

99

1 7

5

10

11

9 8

12

13

4

3

62

0.0103

0.1617

0.1387

0.1617

0.0849

0.1617

0.1617

0.0103

0.1387

0.13870.16170.1617

0.0103

2_SoftAnd

And

1_SoftAnd (OR)

x 1e-4

1 7

5

10

11

9 8

12

13

4

3

62

0.4505

0.1010

0.0710

0.1010

0.2267

0.1010

0.1010

0.4505

0.0710

0.07100.10100.1010

0.4505

1 7

5

10

11

9 8

12

13

4

3

62

0.0103

0.0046

0.0019

0.0046

0.0024

0.0046

0.0046

0.0103

0.0019

0.00190.00460.0046

0.0103

Page 98: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

100

R. Agrawal Jiawei Han

V. Vapnik M. Jordan

H.V. Jagadish

Laks V.S. Lakshmanan

Umeshwar Dayal

Bernhard Scholkopf

Peter L. Bartlett

Alex J. Smola

1510

13

3 3

5 2 2

327

4

CePS: 2 Soft_AND

Stat.

DB

Page 99: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

101

OutputInput

Attributed Data Graph

Query Graph

Matching SubgraphAccountant

CEO

Manager

SEC

Graph X-Ray

Page 100: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

102

G-Ray: How to?

matching node

matching node

matching node

matching node

Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)

Page 101: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

103

Effectiveness: star-query

Query Result

Page 102: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

104

Effectiveness: line-query

Query

Result

Page 103: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

105

Query

Result

Effectiveness: loop-query

Page 104: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

106

pTrack

• [Given] – (1) a large, skewed time-evolving bipartite

graphs, – (2) the query nodes of interest

• [Track] – (1) top-k most related nodes for each query

node at each time step t; – (2) the proximity score (or rank of proximity)

between any two query nodes at each time step t

Author A’ Rank in KDD

Year

Page 105: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

107

Philip S. Yu’s Top-5 conferences up to each year

ICDE

ICDCS

SIGMETRICS

PDIS

VLDB

CIKM

ICDCS

ICDE

SIGMETRICS

ICMCS

KDD

SIGMOD

ICDM

CIKM

ICDCS

ICDM

KDD

ICDE

SDM

VLDB

1992 1997 2002 2007

DatabasesPerformanceDistributed Sys.

DatabasesData Mining

Page 106: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

108

KDD’s Rank wrt. VLDB over years

Rank

Year

Data Mining and Databases are more and more relavant!

Page 107: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

109

cTrack

• [Given] – (1) a large, skewed time-evolving graphs, – (2) the query nodes of interest

• [Track] – (1) top-k most central nodes at each time step t; – (2) the centrality score (or rank of centrality) for

each query node at each time step t

Page 108: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

110

Ranking of Centrality up to each year (in NIPS)

M. Jordan

G.HintonC. Koch

T. Sejnowski

Year

Rank of Influential-ness

Page 109: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

111

10 most influential authors up to each year

Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years

T. Sejnowski

M. Jordan

Page 110: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

112

A BH1 1

D1 1

E

F

G1 11

I J1

1 1

RWR

Varia

nts

w/ Tim

e

w/ Attr

ibut

e

Gro

up P

orx.

Definitions

B_Lin

FastAllDAP

BB_Lin

FastUpdate

Computations

Link Prediction

NF

gCap

CePS

G-Ray

pTrack

cTrack

Applications

ProximityOn

Graphs

Weighted Multiple

Relationship

EfficientlySolve Linear

System(s)

Use Proximity as

Building block

Page 111: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

Take-home Messages

• Proximity Definitions – RWR

– and a lot of variants

• Computations– SM Lemma

113

(1 )i i ir cWr c e

1 11 1 1

1( )

1

TT

T

A u v AA A u v A

v A u

Page 112: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

References

• L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library.

• T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, 517-526, 2002

• J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004.

• C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118-127, 2004.

• J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418-425, 2005.

• W. Cohen. (2007) Graph Walks and Graphical Models. Draft.114

Page 113: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

References

• P. Doyle & J. Snell. (1984) Random walks and electric networks, volume 22. Mathematical Association America, New York.

• Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, 2006.

• A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006.

• S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571-580, 2007.

• F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007.

115

Page 114: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

References

• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition and fast solutions. In KDD, 404-413, 2006.

• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk with Restart and Its Applications. In ICDM, 613-622, 2006.

• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-aware proximity for graph mining. In KDD, 747-756, 2007.

• H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007) Fast best-effort pattern matching in large attributed graphs. In KDD, 737-746, 2007.

• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008.

116

Page 115: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

SCS CMU

117

Thank you!

[email protected]

www.cs.cmu.edu/~htong