Proximity on Large Graphs

39
4/8/2008 1 SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture SCS CMU Graphs are everywhere! 2 SCS CMU Graph Mining: the big picture Graph/Global Level Subgraph/ Community Level 3 Node Level We are here!

Transcript of Proximity on Large Graphs

Page 1: Proximity on Large Graphs

4/8/2008

1

SCS CMU

Proximity on Large Graphs

Speaker:

Hanghang Tong

2008-4-10 15-826 Guest Lecture

SCS CMU

Graphs are everywhere!

2

SCS CMU

Graph Mining: the big picture Graph/Global Level

Subgraph/Community Level

3Node Level We are here!

Page 2: Proximity on Large Graphs

4/8/2008

2

SCS CMU

Proximity on Graph: What?

A BH1 1

I J1

1 1

4

D1 1

EF

G1 11

a.k.a Relevance, Closeness, ‘Similarity’…

SCS CMU

Proximity is the main tool behind…

• Link prediction [Liben-Nowell+], [Tong+]• Ranking [Haveliwala], [Chakrabarti+]• Email Management [Minkov+]• Image caption [Pan+]

5

g p [ ]• Neighborhooh Formulation [Sun+]• Conn. subgraph [Faloutsos+], [Tong+], [Koren+]• Pattern match [Tong+]• Collaborative Filtering [Fouss+]• Many more…

Will return to this later

SCS CMU

Roadmap

• Motivation• Part I: Definitions• Part II: Fast Solutions

• Basic: RWR

• Variants

• Asymmetry of Prox.

• Group Prox

• Prox w/ Attributes

6

• Part III: Applications• Conclusion

Prox w/ Attributes

• Prox w/ Time

Page 3: Proximity on Large Graphs

4/8/2008

3

SCS CMU

Why not shortest path?

A BD1 1

A BD1 1

E F11 1

Some ``bad’’ proximities

7

‘pizza delivery guy’ problem

A BD1 1

EF

G1 11 A BD1 1

‘multi-facet’ relationship

SCS CMU

A BD1 1

Why not max. netflow?Some ``bad’’ proximities

8

A BD1 11 E

No punishment on long paths

SCS CMU

What is a ``good’’ Proximity?

A BH1 1

I J1

1 1

9

D1 1

EF

G1 11

• Multiple Connections

• Quality of connection

•Direct & In-directed Conns

•Length, Degree, Weight…

Page 4: Proximity on Large Graphs

4/8/2008

4

SCS CMU

1

2

910

811

12

Random walk with restart

10

4

3

56

7

11

SCS CMU

Random walk with restartNode 4

Node 1Node 2Node 3Node 4Node 5Node 6

0.130.100.130.220.130.05

13

2

910

811

120.13

0.10

0.13

0.08

0.04

0.02

0.04

0.03

11

Node 7Node 8Node 9Node 10Node 11Node 12

0.050.080.040.030.040.02

4

5 6

7

0.13

0.05

0.05

Ranking vector More red, more relevantNearby nodes, higher scores

4r

SCS CMU

Why RWR is a good score?

2c 3cc ...

W 2W 3W

12

c c

all paths from ito j with length 1

all paths from ito j with length 2

all paths from ito j with length 3

Page 5: Proximity on Large Graphs

4/8/2008

5

SCS CMU

Roadmap

• Motivation• Part I: Definitions• Part II: Fast Solutions

• Basic: RWR

• Variants

• Asymmetry of Prox.

• Group Prox

• Prox w/ Attributes

13

• Part III: Applications• Conclusion

Prox w/ Attributes

• Prox w/ Time

SCS CMU

Variant: escape probability• Define Random Walk (RW) on the graph• Esc_Prob(A B)

– Prob (starting at A, reaches B before returning to A)

14Esc_Prob = Pr (smile before cry)

A Bthe remaining graph

SCS CMU

Other Variants

• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]

• Equivalence of Random Walks

15

Equivalence of Random Walks– Electric Networks:

• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems

• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]

Page 6: Proximity on Large Graphs

4/8/2008

6

SCS CMU

Other Variants

• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]

• Equivalence of Random WalksAll are related to, or similar to

16

Equivalence of Random Walks– Electric Networks:

• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems

• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]

,random walk with restart!

SCS CMU

Roadmap

• Motivation• Part I: Definitions• Part II: Fast Solutions

• Basic: RWR

• Variants

• Asymmetry of Prox.

• Group Prox

• Prox w/ Attributes

17

• Part III: Applications• Conclusion

Prox w/ Attributes

• Prox w/ Time

SCS CMU

Asymmetry of Proximity [Tong+ KDD07 a]

A B

1 1

1 A B

1 1

0.5

18

111

11

10.51

What is Prox from A to B?What is Prox from B to A?

What is Prox between A and B?

Page 7: Proximity on Large Graphs

4/8/2008

7

SCS CMU

Asymmetry also exists in un-directed graphs

• Hanghang’s most important conf. is KDD• The most important author in KDD is ...

19

So is love…Hanghang

KDD

SCS CMU

Roadmap

• Motivation• Part I: Definitions• Part II: Fast Solutions

• Basic: RWR

• Variants

• Asymmetry of Prox.

• Group Prox

• Prox w/ Attributes

20

• Part III: Applications• Conclusion

Prox w/ Attributes

• Prox w/ Time

SCS CMU

Group Proximity [Tong+ 2007]

• Q: How close are Accountants to SECs?

21

• A: Prob (starting at any RED, reaches anyGREEN before touching any RED again)

Page 8: Proximity on Large Graphs

4/8/2008

8

SCS CMU

Proximity on Attribute Graphs

22

What is the proximity from node 7 to 10?If we know that…

SCS CMU

Sol: Augmented graphs

23

SCS CMU

Attributes on nodes/edges (ER graph) [Chakrabarti+ WWW07]

skip

Works

24

Wrote Sent Received

In-Replied-toCited

Page 9: Proximity on Large Graphs

4/8/2008

9

SCS CMU

Proximity w/ Time

• Sol #1: treat time an categorical attr. [Minkov+]

S l #2 t li t i [T ]

25

• Sol #2: aggregate slice matrices [Tong+]

Time

•Global aggregation

•Slide window

•Exponential emphasis

SCS CMU

Summary of Part I

• Goal: Summarize multiple … relationships

• Solutions

26

– Basic: Random Walk with Restart– Property: Asymmetry– Variants: Esc_Prob and many others.– Generalization: Group Prox.; w/ Attr.; w/ Time

SCS CMU

• Motivation• Part I: Definitions• Part II: Fast Solutions

Roadmap

• B_Lin: RWR

• FastAllDAP: Esc_Prob

• BB_Lin: Skewed BGs

27

• Part III: Applications• Conclusion

• FastUpdate: Time-Evolving

Page 10: Proximity on Large Graphs

4/8/2008

10

SCS CMU

Preliminary: Sherman–Morrison Lemma

Tv

uAA =If:

28

1 11 1 1

1( )1

TT

T

A u v AA A u v Av A u

− −− − −

⋅ ⋅= + ⋅ = −+ ⋅ ⋅

Then:

SCS CMU

SM Lemma: Applications

• RLS – and almost any algorithm in time series!

• Leave-one-out cross validation for LS• Kalman filtering• Incremental matrix decomposition• … and all the fast sols we will introduce!

29

SCS CMU

Computing RWR

9 1012

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 00.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0.13

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

01/3 1/3 0 1/3 0 0 0 0 0 0 0 0

⎛ ⎞⎜⎜⎜⎜

0.13 00.10 00.13 0

⎛ ⎞ ⎛ ⎞⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟

Ranking vector Starting vectorAdjacent matrix

(1 )i i ir cWr c e= + −Restart p

30

1

43

2

5 6

7

811

120.220.130.05

0.90.050.080.040.030.040.02

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ = ×⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

1/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0 0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0 0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 1/3 1/3 0

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ⎠

0.220.13 00.05 0

0.10.05 00.08 00.04 00.03 00.04 0

2 0

1

0.0

⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟+ ×⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

n x n n x 1n x 1

1

Page 11: Proximity on Large Graphs

4/8/2008

11

SCS CMU

Beyond RWR : Maxwell Equation for Web![Chakrabarti]

31

P-PageRank[Haveliwala]

PageRank[Haveliwala]

RWR[Pan, Sun]

SM Learning[Zhou, Zhu]

RL in CBIR[He]

Fast RWR (B_Lin) Finds the Root Solution !

SCS CMU

• RWR is the building block for computing…– Escape Probability (augmented w/sink)

[Tong+]– Effective Conductanc Resistance Dist

Beyond RWR

32

.. Effective Conductanc Resistance Dist.Commute Time

– MRF (special structure) [Cohen]• Similar Idea of B_Lin to compute other

measurements

SCS CMU

• Q: Given query i, how to solve it?

0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0001

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

33

1/3 0 1/3 0 1/4

0.9= ×

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

00

0.10000

0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

??

Adjacent matrix Starting vector

Page 12: Proximity on Large Graphs

4/8/2008

12

SCS CMU

14

32

6

9 10

8 11120.13

0.10

0.13

0 05

0.08

0.04

0.02

0.04

0.03

OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4

0.9= ×

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 0

000

00

0.1000

1

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

000100000

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

0.130.100.130.220.130.050.050.080.04

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

34

5 6

70.13

0.05

0.05

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0

0 0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3 0 0

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

000

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

0.030.040.02

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

No pre-computation/ light storage

Slow on-line response O(mE)

ir ir

SCS CMU

0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.340.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.450.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.330.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.320.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

PreCompute1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r

14

32

6

9 10

8 11120.13

0.10

0.13

0 05

0.08

0.04

0.02

0.04

0.03

R:

35

0.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.220.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

5 6

70.13

0.05

0.05

[Haveliwala]

SCS CMU

2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.341.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.451.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.331.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.320.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

PreCompute:

14

32

6

9 10

8 11120.13

0.10

0.13

0.08

0.04

0.02

0.04

0.03

36

0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.220.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

5 6

70.13

0.05

0.05

Fast on-line response

Heavy pre-computation/storage costO(n ) O(n )3 2

Page 13: Proximity on Large Graphs

4/8/2008

13

SCS CMU

Q: How to Balance?

37

On-line Off-line

SCS CMU B_Lin: Basic Idea[Tong+]

1 32

9 10

8 11120.13

0.10

0.13

0.08

0.04

0.02

0 04

0.03

13

29 10

811

12

Find Community 14

32

5 6

7

9 10

8 1112

9 10

811

12

14

32

5 6

7

9 10

8 1112

13

2

38

45 6

70.13

0.05

0.05

0.044

5 6

7

Fix the remaining

Combine1

43

2

5 6

7

9 10

8 1112

5 6

7

14

32

5 6

7

9 10

8 1112

4

SCS CMU

Pre-computational stage• Q: • A: A few small, instead of ONE BIG, matrices inversions

Efficiently compute and store Q-1

39

Page 14: Proximity on Large Graphs

4/8/2008

14

SCS CMU

• Q: Efficiently recover one column of Q• A: A few, instead of MANY, matrix-vector multiplication

On-Line Query Stage

0⎛ ⎞⎜ ⎟

-1

ie ir

40

+

00

000

1

00000

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

SCS CMU

Pre-compute Stage

• p1: B_Lin Decomposition– P1.1 partition– P1.2 low-rank approximation

• p2: Q matrices

41

p– P2.1 computing (for each partition)– P2.2 computing (for concept space)

11Q−

Λ

SCS CMU

P1.1: partition

1

43

2

5 6

9 10

811

12

1

43

29 10

811

12

skip

42

5 6

7

4

5 6

7

Within-partition links cross-partition links

Page 15: Proximity on Large Graphs

4/8/2008

15

SCS CMU

P1.1: block-diagonal

1

43

2

5 6

9 10

811

121

43

2

5 6

9 10

811

12

skip

43

5

7

5

7

SCS CMU

P1.2: LRA for

31

4

2

5 6

7

9 10

811

121

43

2

5 6

7

9 10

811

12

skip

44

|S| << |W2|~

SCS CMU

=

skip

45

+

Page 16: Proximity on Large Graphs

4/8/2008

16

SCS CMU

p2.1 Computing skip

46

c

SCS CMU

Comparing and

• Computing Time– 100,000 nodes; 100 partitions– Computing 100,00x is Faster!

St C t

11Q−

11Q−

skip

47

• Storage Cost – 100x saving!

Q1,1

Q1,2

Q1,k

=11Q−

SCS CMU

W +~ ~~

skip

48

11Q− + ?

Page 17: Proximity on Large Graphs

4/8/2008

17

SCS CMU

p2.2 Computing:

1S − UV=_

-1Q

1,1

Q1,2

Q

skip

49

1

43

2

5 6

7

9 10

8 11

12

Q1,k

SCS CMU

We have:

Communities Bridges

skip

50

SM Lemma says:1 1 1

1 11

1U VcQQ Q Q− − −− + Λ≈

SCS CMU

On-Line Stage• Q

+

000

000

1

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ?

ie ir

skip

51

+

Query

000000

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

Result

?

• A (SM lemma)Pre-Computation

Page 18: Proximity on Large Graphs

4/8/2008

18

SCS CMU

On-Line Query Stage

q1:

skip

52

q1:q2:q3:q4:q5:q6:

SCS CMU skip

53

SCS CMU

Query Time vs. Pre-Compute Time

Log Query Time

•Quality: 90%+

54

Log Pre-compute Time

•Quality: 90%+ •On-line:

•Up to 150x speedup•Pre-computation:

•Two orders saving

Page 19: Proximity on Large Graphs

4/8/2008

19

SCS CMU

• Motivation• Part I: Definitions• Part II: Fast Solutions

Roadmap

• B_Lin: RWR

• FastAllDAP: Esc_Prob

• BB_Lin: Skewed BGs

55

• Part III: Applications• Conclusion

• FastUpdate: Time-Evolving

SCS CMU

FastAllDAP [Tong+]

• Q: How to compute – Esc_Prob = Pr (smile before cry)?

56Footnote: augmented w/ universal sink as practical modification

A Bthe remaining graph

SCS CMU

Solving DAP (Straight-forward way)

2 1,

ˆProx( )=c ( )ti j i ji j p I cP p c p−→ − +i i i i

1-c: fly-out probability (to black-hole)

57

One matrix inversion, one proximity!

1 x (n-2) 1 x (n-2)(n-2) x (n-2)

Page 20: Proximity on Large Graphs

4/8/2008

20

SCS CMU1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

P=1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

P: Transition matrix (row norm )

58

Esc_Prob(1->5)= I - +

-1P: Transition matrix (row norm.)

2cc

SCS CMU

• Case 1, Medium Size Graph– Matrix inversion is feasible, but…– What if we want many proximities?– Q: How to get all (n ) proximities efficiently?

Challenges

2

59

Q g ( ) p y– A: FastAllDAP!

• Case 2: Large Size Graph – Matrix inversion is infeasible– Q: How to get one proximity efficiently?– A: FastOneDAP! skip

SCS CMU

FastAllDAP

• Q1: How to efficiently compute all possible proximities on a medium size graph?

60

g p– a.k.a. how to efficiently solve multiple

linear systems simultaneously?• Goal: reduce # of matrix inversions!

Page 21: Proximity on Large Graphs

4/8/2008

21

SCS CMU

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

FastAllDAP: Observation

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

P=

61

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Need two different matrix inversions!

P=

SCS CMU

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

FastAllDAP: Rescue

P=

Overlap between

Prox(1 5)

Prox(1 6)

62

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Redundancy among different linear systems!

P=

Overlap between two gray parts!

( )

SCS CMU

FastAllDAP: Theorem

• Theorem: • Example:

63

• Proof: by SM Lemma

Page 22: Proximity on Large Graphs

4/8/2008

22

SCS CMU

FastAllDAP: Algorithm• Alg.

– Compute Q– For i,j =1,…, n, compute

64

• Computational Save O(1) instead of O(n )!

• Example– w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix!

2

SCS CMU

FastAllDAP

Time (sec)Straight-Solver 1,000x

faster!

65

Size of Graph

FastAllDAP

SCS CMU

• Motivation• Part I: Definitions• Part II: Fast Solutions

Roadmap

• B_Lin: RWR

• FastAllDAP: Esc_Prob

• BB_Lin: Skewed BGs

66

• Part III: Applications• Conclusion

• FastUpdate: Time-Evolving

Page 23: Proximity on Large Graphs

4/8/2008

23

SCS CMU

RWR on Bipartite Graph

n

authorsAuthor-Conf. Matrix

Observation: n >> m!

Examples:

67

n

mConferences

Examples: 1. DBLP: 400k aus, 3.5k confs2. NetFlix: 2.7M usrs, 18k mvs

SCS CMU

• Q: Given query i, how to solve it?

0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0001

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

RWR on Skewed bipartite graphs

… .. .

. .

68

1/3 0 1/3 0 1/4

0.9= ×

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

00

0.10000

0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

??…. .. . . …. ...

0

0

n m

Ar

… .

. .. . …. .. . . …. ... Ac

SCS CMU

• Step 1:

• Step 2:

BB_Lin: Pre-Computation [Tong+ 06]

M = Ac ArX

1( 0 9 )I M −Λ = ×

2-step RWR for Conferences

All Conf-Conf Prox. Scores

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes 69

( 0.9 )I MΛ = − ×3( )O m m E+ ⋅

Page 24: Proximity on Large Graphs

4/8/2008

24

SCS CMU

BB_Lin: Pre-Computation [Tong+ 06]

• Step 1:

• Step 2:

M = Ac ArX

1( 0 9 )I M −Λ = ×

2-step RWR for Conferences

All Conf-Conf Prox. Scores

70

• Step 2: ( 0.9 )I MΛ = − ×

SCS CMU

BB_Lin: Pre-Computation [Tong+ 06]

• Step 1:

• Step 2:

M = Ac ArX

1( 0 9 )I M −Λ = ×

2-step RWR for Conferences

All Conf-Conf Prox. Scores

71

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes

( 0.9 )I MΛ = − ×3( )O m m E+ ⋅

Ac/ArE edges

m x m

SCS CMU

BB_Lin: On-Line Stage

Case 1: - Conf - Conf

authors

Conferences

Read out !

72Ac/Ar

E edges

Page 25: Proximity on Large Graphs

4/8/2008

25

SCS CMU

BB_Lin: On-Line Stage

Case 2: - Au - Conf

authors

Conferences

73Ac/Ar

E edges

1 matrix-vec!

SCS CMU

BB_Lin: On-Line Stage

Case 3: - Au - Au

authors

Conferences

74Ac/Ar

E edges

2 m atrix-vec!

SCS CMU

BB_Lin: Examples

• NetFlix dataset (2.7m user x 18k movies)– 1.5hr for pre-computation; – <1 sec for on-line

• DBLP dataset (400k authors x 3.5k confs)– A few minutes for pre-computation– <0.01 sec for on-line

75

Page 26: Proximity on Large Graphs

4/8/2008

26

SCS CMU

• Motivation• Part I: Definitions• Part II: Fast Solutions

Roadmap

• B_Lin: RWR

• FastAllDAP: Esc_Prob

• BB_Lin: Skewed BGs

76

• Part III: Applications• Conclusion

• FastUpdate: Time-Evolving

SCS CMU

Challenges

• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– w/ 1.5 hr pre-computation for m x m core matrix– fraction of seconds for on-line query

77

• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– 1.5hr itself becomes a part of on-line cost!

SCS CMU

1( 0.9 )I M −Λ = − × 3( )O m m E+ ⋅t=0

Q: How to update the core matrix?

78

t=11( 0.9 )I M −Λ = − ×

~ ~ 3( )O m m E+ ⋅

?

Page 27: Proximity on Large Graphs

4/8/2008

27

SCS CMU

Update the core matrix

• Step 1:

M = Ac ArX~ M= X+

• Step 2:

79

1( 0.9 )I M −Λ = − ×~ ~ ?Rank 2 update

= + X

2( )O m 3( )O m m E+ ⋅

SCS CMU

Update : General Case [Tong+ 2008]

M = Ac ArX~n authors

m Conferences

• E’ edges changed• Involves n’ authors, m’ confs.

• Observation

80min( ', ') 'n m E

SCS CMU

• Observation: – the rank of update is small!

• Algorithm:E’ d h d

Update : General Case

min( ', ') 'n m E n authors

m Conferences

81

– E’ edges changed– Involves n’ authors, m’ confs.

– our Alg.

– (details in the paper)

2(min( ', ') ')O n m m E+3( )O m m E+ ⋅

81

Page 28: Proximity on Large Graphs

4/8/2008

28

SCS CMU

FastOneUpdateTime (Seconds)

82

176x speedup

40x speedup

Datasets

SCS CMU

Fast-Batch-UpdateTime (Seconds)Time (Seconds)

83

Min (n’, m’)E’

15x speed-up on average!

SCS CMU

Summary of Part II

• Goal: Efficiently Solve Linear System(s)

• Sols.

84

– B_Lin: Approximate one large linear system– FastAllDAP: multiple inner-related linear

systems– BB_Lin: the intrinsic complexity is small– FastUpdate: (smooth) dynamic linear system

Page 29: Proximity on Large Graphs

4/8/2008

29

SCS CMU1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3 1 3 2 3 3 3 4 3 5 3 6

p p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥B Li

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

FastAllDAP…

85

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p p

⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

B_Lin

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p pp p p p p pp p p p p pp p p p p pp p p p p pp p p p p p

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

BB_Lin…FastUpdate

SCS CMU

• Motivation• Part I: Definitions• Part II: Fast Solutions

Roadmap

• Link Prediction

• NF

• gCap

86

• Part III: Applications• Conclusion

• CePS

• G-Ray

• pTrack/cTrack

SCS CMU

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18 Link Prediction: existence

with link

density

Prox (i j)+Prox (j i)

P i ff ti t di ti i h d d bl !

870 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0

0.05

0.1

0.15

0.2

0.25

no link

density

Prox (i j)+Prox (j i)

Prox. is effective to distinguish red and blue!

Page 30: Proximity on Large Graphs

4/8/2008

30

SCS CMU

Link Prediction: direction

• Q: Given the existence of the link, what is the direction of the link?

• A: Compare prox(i j) and prox(j i)>70%

88

>70%

Prox (i j) - Prox (j i)

density

SCS CMU

Neighborhood Formulation

KDDPhilip S. Yu

IJCAI

Ning Zhong

… …

Q: what is most relatedconference to ICDM

89

ICDM

SDM

NIPS

AAAI M. Jordan

Ning Zhong

R. Ramakrishnan

…Conference Author

A: RWR![Sun ICDM2005]

SCS CMU

NF: example

KDD

SDM

PKDD

PAKDD

ICML0.009

0.0080.007

0.005

90

ICDM

KDD

ECML

CIKM

DMKD

SIGMOD

ICML

ICDE

0.011

0.005

0.0050.004

0.004

0.004

Page 31: Proximity on Large Graphs

4/8/2008

31

SCS CMU

gCaP: Automatic Image Caption• Q

Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger

91

{?, ?, ?,}

?A: RWR!

[Pan KDD2004]

SCS CMU

Image

Region

92

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

Keyword

SCS CMU

Image

Region

93

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

Keyword

Page 32: Proximity on Large Graphs

4/8/2008

32

SCS CMU

Center-Piece Subgraph(CePS)

B

?

Q

CePS guy

94

A C

?

Original GraphBlack: query nodes

CePSA: RWR! [Tong KDD 2006]

Red: Max (Prox(Red, A) x Prox(Red, B) x Prox(Red, C))

SCS CMU

CePS: Example

R. Agrawal Jiawei Han

H.V. Jagadish

Laks V.S. Lakshmanan

Heikki

15 10 13

1 110

95

V. Vapnik M. Jordan

Mannila

Christos Faloutsos

Padhraic Smyth

Corinna Cortes

1 1

6

1 1

4 Daryl Pregibon

2

11 3

16

SCS CMU

K_SoftAnd: Relaxation of AND

96

Asking AND query? No Answer!

Disconnected Communities

Noise

Page 33: Proximity on Large Graphs

4/8/2008

33

SCS CMU

0.0103

2_SoftAnd

x 1e-4 5

11 124

0.07100.10100.1010

0.4505

1 7

5

10

11

9 8

12

13

4

3

62

0.0103

0.0046

0.0019

0.0046

0.0024

0.0046

0.0046

0.0103

0.0019

0.00190.00460.0046

0.0103

97

1 7

5

10

11

9 8

12

13

4

3

62

0.0103

0.1617

0.1387

0.1617

0.0849

0.1617

0.1617

0.0103

0.1387

0.13870.16170.1617

And

1_SoftAnd (OR)

1 7

10

9 8

133

62

0.4505

0.1010

0.0710

0.1010

0.2267

0.1010

0.1010

0.4505

0.0710

SCS CMU

R. Agrawal Jiawei Han

H.V. Jagadish

Laks V.S. Lakshmanan

Umeshwar D l

1510 13

3 3

CePS: 2 Soft_ANDDB

98

V. Vapnik M. Jordan

Dayal

Bernhard Scholkopf

Peter L. Bartlett

Alex J. Smola

5 2 2

327

4

Stat.

SCS CMU OutputInputQuery Graph

99Attributed Data Graph

Matching Subgraph

Graph X-Ray

Page 34: Proximity on Large Graphs

4/8/2008

34

SCS CMU

G-Ray: How to?

matching nodematching node

matching node

100

matching node

Goodness = Prox (12, 4) x Prox (4, 12) xProx (7, 4) x Prox (4, 7) x

Prox (11, 7) x Prox (7, 11) xProx (12, 11) x Prox (11, 12)

SCS CMU

Effectiveness: star-query

101

Query Result

SCS CMU

Effectiveness: line-query

102

Query

Result

Page 35: Proximity on Large Graphs

4/8/2008

35

SCS CMU

Query

Effectiveness: loop-query

103

Result

SCS CMU

pTrack

• [Given]– (1) a large, skewed time-evolving bipartite

graphs, – (2) the query nodes of interest

Author A’ Rank in KDD

Year

104

• [Track]– (1) top-k most related nodes for each query

node at each time step t; – (2) the proximity score (or rank of proximity)

between any two query nodes at each time step t

SCS CMU

Philip S. Yu’s Top-5 conferences up to each year

ICDEICDCS

SIGMETRICSPDISVLDB

CIKMICDCSICDE

SIGMETRICSICMCS

KDDSIGMOD

ICDMCIKM

ICDCS

ICDMKDDICDESDMVLDB

105

1992 1997 2002 2007

DatabasesPerformanceDistributed Sys.

DatabasesData Mining

Page 36: Proximity on Large Graphs

4/8/2008

36

SCS CMU

KDD’s Rank wrt. VLDB over years

Rank

106

Year

Data Mining and Databases are more and more relavant!

SCS CMU

cTrack

• [Given]– (1) a large, skewed time-evolving graphs, – (2) the query nodes of interest

[T k]

107

• [Track]– (1) top-k most central nodes at each time step t; – (2) the centrality score (or rank of centrality) for

each query node at each time step t

SCS CMU

Ranking of Centrality up to each year (in NIPS)

T. SejnowskiRank of Influential-ness

108

M. Jordan

G.HintonC. Koch

Year

Page 37: Proximity on Large Graphs

4/8/2008

37

SCS CMU 10 most influential authors up to each year

T. Sejnowski

109

Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years

M. Jordan

SCS CMU

FastAllDAP

BB_Lin

FastUpdate

Computations

Link Prediction

NF

gCap

CePS

G-Ray

pTrack

cTrack

Applications

EfficientlySolve Linear

System(s)

Use Proximity as

Building block

110

A BH1 1

D1 1

EF

G1 11

I J11 1

Definitions

B_LinFastAllDAPcTrack

ProximityOn

Graphs

Weighted Multiple

Relationship

System(s)

SCS CMU

Take-home Messages

• Proximity Definitions – RWR– and a lot of variants

• Computations– SM Lemma

111

1 11 1 1

1( )1

TT

T

A u v AA A u v Av A u

− −− − −

⋅ ⋅= + ⋅ = −+ ⋅ ⋅

Page 38: Proximity on Large Graphs

4/8/2008

38

SCS CMU

References• L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The

PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library.

• T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, 517-526, 2002

• J Y Pan H J Yang C Faloutsos & P Duygulu (2004) Automatic• J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004.

• C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118-127, 2004.

• J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418-425, 2005.

• W. Cohen. (2007) Graph Walks and Graphical Models. Draft.112

SCS CMU

References• P. Doyle & J. Snell. (1984) Random walks and electric

networks, volume 22. Mathematical Association America, New York.

• Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, 2006.

• A Agarwal S Chakrabarti & S Aggarwal (2006) Learning to• A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006.

• S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571-580, 2007.

• F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007.

113

SCS CMU

References• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs:

problem definition and fast solutions. In KDD, 404-413, 2006.• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk

with Restart and Its Applications. In ICDM, 613-622, 2006.• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-

aware proximity for graph mining In KDD 747 756 2007aware proximity for graph mining. In KDD, 747-756, 2007.• H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007)

Fast best-effort pattern matching in large attributed graphs. In KDD, 737-746, 2007.

• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008.

114

Page 39: Proximity on Large Graphs

4/8/2008

39

SCS CMU

Thank you!

115

[email protected]

www.cs.cmu.edu/~htong