Proximity on Graphs - Carnegie Mellon School of Computer...

39
H. Tong 15-826 Guest-lecture 1 SCS CMU Proximity on Graphs -Definitions, Fast Solutions, and Applications Hanghang Tong [email protected] SCS CMU - - - - - Graphs are everywhere! Internet Map [Koren 2009] Food Web [2007] Terrorist Network 2 Protein Network [Salthe 2004] Social Network [Newman 2005] Web Graph [Krebs 2002] SCS CMU Graph Mining: Big Picture John Smith Tom Jones Alan Peter Adam + Graph Level -Patterns -Laws -Generators + Subgraph Level - Community Adam Dan Jack Alice Amy Anna Beck Tom Cell Phone Message E-Mail Community + Node Level -Association -Correlation -Causality -Proximity Anna We are here! 826 guest lecture 3

Transcript of Proximity on Graphs - Carnegie Mellon School of Computer...

Page 1: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 1

SCS CMU

Proximity on Graphs-Definitions, Fast Solutions, and Applications

Hanghang [email protected]

SCS CMU

-----

Graphs are everywhere!

Internet Map [Koren 2009] Food Web [2007] Terrorist Network

2

p [ ] [ ]

Protein Network [Salthe 2004]

Social Network [Newman 2005]

Web Graph

[Krebs 2002]

SCS CMU

Graph Mining: Big Picture

John

Smith

TomJones

Alan

Peter

Adam

+ Graph Level-Patterns-Laws-Generators

+ Subgraph Level- Community

Adam

Dan

Jack

Alice

Amy Anna

Beck

Tom

Cell PhoneMessageE-Mail

Community+ Node Level

-Association-Correlation-Causality-Proximity

Anna

We are here! 826 guest lecture 3

Page 2: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 2

SCS CMU

Proximity on Graph: What?

a.k.a Relevance, Closeness, ‘Similarity’…826 guest lecture 4

SCS CMU

Proximity on Graphs: Why?

• Link prediction [Liben-Nowell+], [Tong+]• Ranking [Haveliwala], [Chakrabarti+]• Email Management [Minkov+]• Image caption [Pan+]g p [ ]• Neighborhooh Formulation [Sun+]• Conn. subgraph [Faloutsos+], [Tong+], [Koren+]• Pattern match [Tong+]• Collaborative Filtering [Fouss+]• Many more…

Will return to this later826 guest lecture 5

SCS CMU

0.2

0.25

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Link Prediction

Prox. Hist. for a set of deleted links

density

density

Prox (i j)+Prox (j i)

Prox. is effective to ‘deleted’ and absent edges!

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.05

0.1

0.15

Prox (i j)+Prox (j i)

Q: How to predict the existence of the link?A: Proximity! [Liben-Nowell + 2003]

Prox. Hist. for a set of absent links

826 guest lecture 6

Page 3: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 3

SCS CMU Neighborhood Search on graphs

… …

Conference Author

A: Proximity! [Sun+ ICDM2005]

Q: what is most related conference to ICDM?

826 guest lecture 7

SCS CMU

Image

Region Automatic Image Caption

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Keyword

Q: How to assign keywords to the test image?A: Proximity! [Pan+ 2004]

826 guest lecture 8

SCS CMU

A A

Center-Piece Subgraph Discovery

Input: original graph Output: CePS

CePS Node

3-9

B C B C

Q: How to find hub for the black nodes?A: Proximity! [Tong+ KDD 2006]

Page 4: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 4

SCS CMU

OutputInput

Data Graph

Query Graph

Best-Effort Pattern Match

Matching Subgraph

Q: How to find matching subgraph?A: Proximity![Tong+ KDD 2007 b]826 guest lecture

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• Basic: RWR

• Variants

• Properties

• Applications• Conclusion

• Generalizations

826 guest lecture 11

SCS CMU

Why not shortest path?Some ``bad’’ proximities

‘pizza delivery guy’ problem

A BD1 1

‘multi-facet’ relationship

826 guest lecture 12

Page 5: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 5

SCS CMU

Why not max. netflow?Some ``bad’’ proximities

No punishment on long paths

826 guest lecture 13

SCS CMU

Random Walk with Restart Node 4

Node 1Node 2Node 3Node 4Node 5Node 6

0.130.100.130.220.130.05

13

2

910

811

120.13

0.10

0.13

0.08

0.04

0.02

0.04

0.03

Node 7Node 8Node 9Node 10Node 11Node 12

0.050.080.040.030.040.02

4

5 6

7

0.13

0.05

0.05

Ranking vector More red, more relevantNearby nodes, higher scores

4r14

SCS CMU

RWR: Think of it as Wine Spill

15

1. Spill a drop of wine on cloth 2. Spread/diffuse to the neighborhood

Page 6: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 6

SCS CMU

13

2

910

811

12

RWR: Wine Spill on a Graph

16

4

3

56

7

11

wine spill on cloth RWR on a graph

Query

Same Diffusion Eq.

SCS CMU

13

2

910

811

12

Random Walk with Restart

4

3

56

7

17Same Diffusion Eq.

SCS CMU

Why RWR is a good score?

W : adjacency matrix. c: damping factor

1( )Q I cW −= − =

,( , ) i jQ i j r∝

i

j

2c 3cQ c= ...W 2W 3W

all paths from ito j with length 1

all paths from ito j with length 2

all paths from ito j with length 3

826 guest lecture 18

Page 7: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 7

SCS CMU

Why RWR is A Good Score?

Prox (A B) =A B

High proximity many, short, heavy-weighted paths

Prox (A, B) = Score (Red Path) + Score (Green Path) + Score (Blue Path) +Score (Purple Path) +

19

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• Basic: RWR

• Variants

• Properties

• Applications• Conclusion

• Generalizations

826 guest lecture 20

SCS CMU

Variant: escape probability• Define Random Walk (RW) on the graph• Esc_Prob(CMU Napa)

– Prob (starting at CMU, reaches Napa before returning to CMU)

CMU Napathe remaining graph

Esc_Prob = Pr (smile before cry) 21

Page 8: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 8

SCS CMU

Other Variants

• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]

• Equivalence of Random WalksAll are “related to” or “similar to”Equivalence of Random Walks– Electric Networks:

• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems

• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]

All are “related to” or “similar to” random walk with restart!

826 guest lecture 22

SCS CMU

Chaptering different measurements

RWR

Esc_Prob+ Sink

Hitting Time/Commute Time

RegularizedUn-constrainedQuad Opt.

relax

4 ssp decides 1 esc_prob

KatzNormalize

+ Sink Commute Time

Effective Conductance

String System

Harmonic Func.ConstrainedQuad Opt.

Mathematic Tools

X out-degree

“voltage = position”

Physical Models

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• Basic: RWR

• Variants

• Properties

• Applications• Conclusion

• Generalizations

826 guest lecture 24

Page 9: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 9

SCS CMU

Property: Monotonicity

We want: Prox ( ) Prox ( )candi origianla b a b≤→ →A: degree preserving! [Koren+ KDD06][Tong+ KDD07a][Tong+ SDM08]

826 guest lecture25

SCS CMU

Property: Asymmetry [Tong+ KDD07 a]

What is Prox from A to B?What is Prox from B to A?

What is Prox between A and B?

826 guest lecture 26

SCS CMU

Asymmetry in un-directed graphs• Hanghang’s # 1 Conf. is CIKM• The #1 author of CIKM is ...

So is love…

HanghangCIKM

826 guest lecture27

Page 10: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 10

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• Basic: RWR

• Variants

• Properties

• Applications• Conclusion

• Generalizations

826 guest lecture 28

SCS CMU

Group Proximity [Tong+ KDD07 a]

• Q: How close are Accountants to SECs?

826 guest lecture

• A: Prob (starting at any RED, reaches anyGREEN before touching any RED again)

29

SCS CMU

Proximity on Attributed Graphs [Tong+ KDD07 b]

What is the proximity from node 7 to 10?If we know that…

826 guest lecture30

Page 11: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 11

SCS CMU

A: Augmented graphs [Tong+ KDD07 b]

826 guest lecture 31

SCS CMU

More on Generalizations

• Attributed on edges [Chakrabarti+ KDD 06]

• Proximity w/ Time– [Minkov+], [Tong+ SDM 2008], [Tong+ CIKM 2008]

• Proximity w/ Side Information [Tong+ 2008]

• …

826 guest lecture 32

SCS CMU

Summary of Proximity Definitions• Goal: Summarize multiple … relationship

• Solutions– Basic: Random Walk with Restart

• [Pan+ 2004][Sun+ 2006][Tong+ 2006][ ][ ][ g ]

– Properties: Asymmetry, monotonicity• [Koren+ 2006][Tong+ 2007] [Tong+ 2008]

– Variants: Esc_Prob and many others.• [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007]

– Generalizations: Group Prox, w/ Attr., w/ Time, w/ Side Information

• [Charkrabarti+ 2006][Tong+ 2007] [Tong+ 2008]826 guest lecture

33

Page 12: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 12

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions• Applications• Conclusion

826 guest lecture 34

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• B_Lin: RWR

• BB_Lin: Skewed BGs

• Applications• Conclusion

826 guest lecture

• FastUpdate: Time-Evolving

35

SCS CMU

Preliminary: Sherman–Morrison Lemma

Tv

uAA =If:

1 11 1 1

1( )1

TT

T

A u v AA A u v Av A u

− −− − −

⋅ ⋅= + ⋅ = −

+ ⋅ ⋅

Then:

826 guest lecture 36

Page 13: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 13

SCS CMU

SM: The block-form A

D

B

C1 1 1 1 1 1 1 1 1

1 1 1 1 1

( ) ( )( ) ( )

A B A A B D CA B CA A B D CA BC D D CA B CA D CA B

− − − − − − − − −

− − − − −

⎡ ⎤+ − − −⎡ ⎤= ⎢ ⎥⎢ ⎥ − − −⎢ ⎥⎣ ⎦ ⎣ ⎦

Or1 1 1 1 1 1

1 1 1 1 1

( ) ( )( ) ( )

A B A BD C A B D CA BC D D C A BD C D CA B

− − − − − −

− − − − −

⎡ ⎤− − −⎡ ⎤= ⎢ ⎥⎢ ⎥ − − −⎢ ⎥⎣ ⎦ ⎣ ⎦

And many other variants…Also known as Woodburg Identity

Or…

826 guest lecture 37

SCS CMU

SM Lemma: Applications

• RLS (Recursive least square)– and almost any algorithm in time series!

• Leave-one-out cross validation for LSR• Kalman filtering• Incremental matrix decomposition• …• … and all the fast sol.s we will introduce!

826 guest lecture 38

SCS CMU

Computing RWR

9 1012

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 00.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0.13

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

01/3 1/3 0 1/3 0 0 0 0 0 0 0 0

⎛ ⎞⎜⎜⎜⎜

0.13 00.10 00.13 0

⎛ ⎞ ⎛ ⎞⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟

Ranking vector Starting vectorAdjacency matrix

(1 )i i ir cWr c e= + −Restart p

1

43

2

5 6

7

811

120.220.130.05

0.90.050.080.040.030.040.02

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ = ×⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

1/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 1/3 1/3 0

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ⎠

0.220.13 00.05 0

0.10.05 00.08 00.04 00.03 00.04 0

2 0

1

0.0

⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟+ ×⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

n x n n x 1n x 1

1

826 guest lecture 39

Page 14: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 14

SCS CMU

0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

000

00

1

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

Q: Given query i, how to solve it?

??Query

0.9= ×0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

00.1

0000

0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 0 1/3 1/3 0 0

⎜ ⎟ ⎜ ⎟+ ×⎜ ⎟ ⎜ ⎟

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

??

Adjacency matrix Starting vectorRanking vectorRanking vector

826 guest lecture 40

SCS CMU

14

32

6

9 10

8 11120.13

0.10

0.13

0 05

0.08

0.04

0.02

0.04

0.03

OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4

0.9= ×

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 0

000

00

0.1000

1

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

000100000

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

0.130.100.130.220.130.050.050.080.04

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

1

43

2

6

9 10

8 11

12

0.300.30.10.30000

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

0.120.180.120.350.030.070.070.070

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

0.190.090.190.180.180.040.040.060.02

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

0.140.130.140.260.100.060.060.080.01

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

0.160.100.160.210.150.050.050.070.02

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

0.130.100.130.220.130.050.050.080.04

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ 5 6

70.13

0.05

0.05

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0

0 0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3 0 0

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

000

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

0.030.040.02

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

5 6

7

000

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

000

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

00.020

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

0.010.010

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

0.010.020.01

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

0.030.040.02

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

No pre-computation/ light storage

Slow on-line response O(mE)

ir ir

826 guest lecture 41

SCS CMU

0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.340.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.450.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.330.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.320.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

4

PreCompute1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r

14

32

6

9 10

8 11120.13

0.10

0.13

0 05

0.08

0.04

0.02

0.04

0.03

1 32

6

9 10

8 11

12

R:0.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.220.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

5 6

70.13

0.05

0.05

5 6

7

[Haveliwala+ 2002]c x Q

Q 826 guest lecture

42

Page 15: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 15

SCS CMU

2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.341.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.451.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.331.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.320.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

PreCompute:

14

32

6

9 10

8 11120.13

0.10

0.13

0.08

0.04

0.02

0.04

0.03

1

43

29 10

8 11

12

0.130.100.130.220.130.050 05

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22

0.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

5 6

70.13

0.05

0.05

5 6

7

Fast on-line response

Heavy pre-computation/storage costO(n ) O(n )

0.050.080.040.030.040.02

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

3 2 43

SCS CMU

Q: How to Balance?

On-line Off-line

826 guest lecture 44

SCS CMU

B_Lin: Pre-Compute[Tong+ ICDM 2006]

10

Find Communities

10 1010

Compute & Store Within-Communities Scores

Q33Q11

1

4

3

2

5 6

7

9 10

811

12

5 6

7

9 10

811

12

1

4

3

2

45

1

4

3

2

5 6

7

9 10

811

12

5 6

7

9 10

811

12

1

4

3

2

Q22

Q11

Page 16: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 16

SCS CMU

B_Lin: On-Line[Tong+ ICDM 2006]

13

29 10

811

120.130.10

0.13

0.08

0.04

0.02

0 04

0.03

13

2

9 10

811

12

Find Communities

9 10

811

12

1

43

2

13

2

1

43

2

4

5 6

70.13

0.05

0.05

0.044

5 6

7

Estimate the impact of the bridges

Combine1

43

2

5 6

7

9 10

811

12

5 6

7

1

43

2

5 6

7

9 10

811

12

4

46

SCS CMU

B_Lin: details

+W =

W1,1

W2,2

W2,2

+~~

W1: within community Cross community 47

W1,1

W2,2

W2,2

U

S V

SCS CMU

B_Lin: details

-1 -1

W1,1

W2,2

W2,2

U

S V

WI – c ~~ I – c – cUSVW1

Easy to be inverted LRA difference

48

Page 17: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 17

SCS CMU

B_Lin: details

-1 -1

W1,1

W2,2

W2,2

U

S V

WI – c ~~ I – c – cUSVW1

Easy to be inverted LRA difference

Sherman–Morrison Lemma: If B=A-USVThen B-1 = A-1 + A-1U(S-1-VA-1U)-1VA-1

A USV

SCS CMU

B_Lin: Pre-Compute Stage• Q: Efficiently compute and store Q• A: A few small, instead of ONE BIG, inverted

matrices Proximities within communities Effect of bridges

50Footnote: Q1=(I-cW1)-1

U

VQ1,1

Q2,2

Qk,k

>Q1,1

Q2,2

Qk,k

. . .Q1=

SCS CMU

B_Lin: On-Line Stage• Q: Efficiently recover one column of Q• A: A few, instead of MANY, matrix-vector

multiplicationsriei

51

U

Vii

>

Pre-computed Query Result51

Page 18: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 18

SCS CMU

Query Time vs. Pre-Compute Time

Log Query Time

•Quality: 90%+ •On-line:

Log Pre-compute Time

•Up to 150x speedup•Pre-computation:

•Two orders saving

826 guest lecture 52

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• B_Lin: RWR

• BB_Lin: Skewed BGs

• Applications• Conclusion

826 guest lecture

• FastUpdate: Time-Evolving

53

SCS CMU

RWR on Bipartite Graph

n

authorsAuthor-Conf. Matrix

Observation: n >> m!

Examples: n

mConferences

p1. DBLP: 400k aus, 3.5k confs2. NetFlix: 2.7M usrs,18k mvs

826 guest lecture 54

Page 19: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 19

SCS CMU

• Q: Given query i, how to solve it?

0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0001

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

RWR on Skewed bipartite graphs

… .. .

. .

m confs

1/3 0 1/3 0 1/4

0.9= ×

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

00

0.10000

0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

??…. .. . . …. ...

0

0

n m

Ar

… .

. .. . …. .. . . …. ... Ac n aus

826 guest lecture 55

SCS CMU

• Step 1:

• Step 2:

BB_Lin: Pre-Computation dd[Tong+ ICDM 06]

M = AcArX

1( 0 9 )I M −Λ = ×

2-step RWR for Conferences

All Conf-Conf Prox. Scores

m conferences

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes

( 0.9 )I MΛ = − ×3( )O m m E+ ⋅

n authors826 guest lecture 56

SCS CMU

• Step 1:

• Step 2:

M = AcArX

1( 0 9 )I M −Λ = ×

2-step RWR for Conferences

All Conf-Conf Prox. Scores

m conferences

BB_Lin: Pre-Computation dd[Tong+ ICDM 06] details

• Step 2: ( 0.9 )I MΛ = − ×

n authors826 guest lecture 57

Page 20: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 20

SCS CMU

• Step 1:

• Step 2:

M = Ac ArX

1( 0 9 )I M −Λ = ×

2-step RWR for Conferences

All Conf-Conf Prox. Scores

BB_Lin: Pre-Computation dd[Tong+ ICDM 06] details

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes

( 0.9 )I MΛ = − ×

3( )O m m E+ ⋅

Ac/ArE edges

m x m

826 guest lecture58

SCS CMU

BB_Lin: On-Line Stage

(Base) Case 1: - Conf - Conf

authors

Conferences

Read out !

Ac/ArE edges826 guest lecture 59

SCS CMU

BB_Lin: On-Line Stage

Case 2: - Au - Conf

authors

Conferences

Ac/ArE edges

1 matrix-vec!

826 guest lecture 60

Page 21: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 21

SCS CMU

BB_Lin: On-Line Stage

Case 3: - Au - Au

authors

Conferences

Ac/ArE edges

2 matrix-vec!

826 guest lecture 61

SCS CMU

BB_Lin: Examples

Dataset Off-Line Cost On-Line CostDBLP a few minutes frac. of sec.

NetFlix 1.5 hours <0.01 sec.

400k authors x 3.5k conf.s 2.7m user

x 18k movies

826 guest lecture 62

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• B_Lin: RWR

• BB_Lin: Skewed BGs

• Applications• Conclusion

826 guest lecture

• FastUpdate: Time-Evolving

63

Page 22: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 22

SCS CMU

Challenges

• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– On-line cost for query: fraction of seconds

• w/ 1.5 hr pre-computation for m x m core matrix

• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– On-line cost: 1.5hr itself becomes a part of this!

826 guest lecture 64

SCS CMU

1( 0.9 )I M −Λ = − × 3( )O m m E+ ⋅t=0

Q: How to update the core matrix?

t=11( 0.9 )I M −Λ = − ×

~ ~ 3( )O m m E+ ⋅

?826 guest lecture 65

SCS CMU

Update the core matrix• Step 1:

M = Ac ArX~ M= X+

details

• Step 2:1( 0.9 )I M −Λ = − ×

~ ~ Rank 2 update

= + X

2( )O m 3( )O m m E+ ⋅826 guest lecture 66

Page 23: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 23

SCS CMU

Update : General Case

M = AcArX~

n authors

m Conferences

details

• E’ edges changed• Involves n’ authors, m’ confs.

• Observationmin( ', ') 'n m E

826 guest lecture 67

SCS CMU

• Observation: – the rank of update is small!– Real Example (DBLP Post)

• 1258 time steps

Update : General Case

min( ', ') 'n m E n authors

m Conferences

details

• 1258 time steps• E’ up to ~20,000!• min(n’,m’) <=132

• Our Algorithm2(min( ', ') ')O n m m E+

3( )O m m E+ ⋅826 guest lecture 68

SCS CMU

176xlog(Time) (Seconds)

Speed Comparison

6969

40x

Data Sets

Our

DBLP Netflix

Our

Page 24: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 24

SCS CMU

More on “Fast Solutions”

• FastAllDAP– Simultaneously solve multiple linear systems– [Tong+ KDD 2007 a]

• MT3– Multiple-Resolution Analysis on Time [Tong+ CIKM 2008]

• Fast-ProSIN– On-Line response for users’ feedback [Tong+ ICDM 2008]

826 guest lecture 70

SCS CMU

Summary of Fast Solutions

• Goal: Efficiently Solve Linear System(s)

• Sols.– B_Lin: one large linear system [Tong+ ICDM06]

– BB_Lin: the intrinsic complexity is small [Tong+ ICDM06]

– FastUpdate: dynamic linear system [Tong+ SDM08]– FastAllDAP: multiple linear systems [Tong+ KDD07 a]– MT3: [Tong+ CIKM 2008]

– Fast-ProSIN: [Tong+ 2008]826 guest lecture 71

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions• Applications• Conclusion

826 guest lecture 72

Page 25: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 25

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• Link Prediction & +

• Ranking Related Tasks• Applications• Conclusion

826 guest lecture

• Ranking Related Tasks

• User Specific Patterns

• Time Related Tasks

73

SCS CMU

i

i i

i

i i

Link Prediction: existence

826 guest lecture

i

i

i

i

Prox (deleted) >> Prox (absent) !Footnote: - Red pair: ``deleted’’; - Blue pair: ``absent’’

74

SCS CMU

0.2

0.25

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18 Link Prediction:existence

Prox. Hist. for a set of deleted links

density

density

Prox (i j)+Prox (j i)

Prox. is effective to ‘deleted’ and absent edges!

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.05

0.1

0.15

Prox (i j)+Prox (j i)

Q: How to predict the existence of the link?A: Proximity! [Liben-Nowell + 2003]

Prox. Hist. for a set of absent links

826 guest lecture 75

Page 26: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 26

SCS CMU

Link Prediction: direction [Tong+ KDD 07 a]

• Q: Given the existence of the link, what is the direction of the link?

• A: Compare prox(i j) and prox(j i)>70%

i

j

i

i

i

>70%

Prox (i j) - Prox (j i)

density

826 guest lecture 76

SCS CMU

Beyond Link Prediction

• Collaborative Filtering [Fouss+]

• Name Disambiguation H. Tong

C. Faloutsos

J. Pan

P. S. Yu

– [Minkov+ SIGIR 06]

• Anomaly Nodes/Edges – ‘a’ is abnormal if the neighborhood of ‘a’ is so different– [Sun+ ICDM 2005]

826 guest lecture

Hanghang TongM.-J. Li

77

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• Link Prediction & +

• Ranking Related Tasks• Applications• Conclusion

826 guest lecture

• Ranking Related Tasks

• User Specific Patterns

• Time Related Tasks

78

Page 27: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 27

SCS CMU Neighborhood Search on graphs

… …

Conference Author

A: Proximity! [Sun+ ICDM2005]

Q: what is most related conference to ICDM?

826 guest lecture 79

SCS CMU

NF: example

826 guest lecture 80

SCS CMU

gCaP: Automatic Image Caption• Q

Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger

{?, ?, ?,}

?A: Proximity!

[Pan+ KDD2004]

826 guest lecture 81

Page 28: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 28

SCS CMU

Image

Region

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

Keyword826 guest lecture 82

SCS CMU

Image

Region

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

Keyword

{Grass, Forest, Cat, Tiger}

83

SCS CMU

C-DEM: Multi-Modal Query System for DrosophilaEmbryo Databases [Guo+ VLDB 2008]

826 guest lecture 84

Page 29: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 29

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• Link Prediction & +

• Ranking Related Tasks• Applications• Conclusion

826 guest lecture

• Ranking Related Tasks

• User Specific Patterns

• Time Related Tasks

85

SCS CMU

Center-Piece Subgraph Discovery [Tong+ KDD 06]

A

• Given: a graph W, and a query set • Find: the most central node (wrt the query

set)

Q: Who is the most central node wrt the black nodes?

(e.g., master-mind criminal, common advisor/collaborator, etc)

B C

86

SCS CMU

A A

Center-Piece Subgraph Discovery [Tong+ KDD 06]

Input: original graph Output: CePS

CePS Node

3-87

B C B C

Q: How to find hub for nodes A, B, C?Our Solution: Max (Prox(A, Red) x Prox(B, Red) x Prox(C, Red))

Page 30: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 30

SCS CMU

?

CePS: Example (AND Query)

88

DBLP co-authorship network: -400,000 authors, 2,000,000 edges

Code at: http://www.cs.cmu.edu/~htong/soft.htm

SCS CMU

CePS: Example (AND Query)

10

10

1 2

1

89

DBLP co-authorship network: -400,000 authors, 2,000,000 edges

Code at: http://www.cs.cmu.edu/~htong/soft.htm

1

1 1

4 6

SCS CMU

K_SoftAnd: Relaxation of AND

Asking AND query? No Answer!

Disconnected Communities

Noise

826 guest lecture 90

Page 31: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 31

SCS CMU

CePS: 2 SoftANDDB

Stat.

SCS CMU

OutputInput

Data Graph

Query Graph

Best-Effort Pattern Match

Matching Subgraph

Q: How to find matching subgraph?A: Proximity![Tong+ KDD 2007 b]

826 guest lecture

SCS CMU

G-Ray: How to?

matching nodematching node

matching node

details

matching node

Goodness = Prox (12, 4) x Prox (4, 12) xProx (7, 4) x Prox (4, 7) xProx (11, 7) x Prox (7, 11) xProx (12, 11) x Prox (11, 12)

826 guest lecture 93

Page 32: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 32

SCS CMU

Effectiveness: star-query Databases

Query Result826 guest lecture

Bio-medicalIntelligent Agent

94

SCS CMU

Effectiveness: line-queryDatabases Learning Bio-medicalTheory

Query

Result826 guest lecture 95

SCS CMU

Outline

• Motivation• Definitions• Fast Solutions

• Link Prediction & +

• Ranking Related Tasks• Applications• Conclusion

826 guest lecture

• Ranking Related Tasks

• User Specific Patterns

• Time Related Tasks

96

Page 33: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 33

SCS CMU

Challenge• Graphs are evolving over time!

–New nodes/edges show up; –Existing nodes/edges die out;

Ed i ht h–Edge weights change…

Q: How to generalize everything? A: Track Proximity! [Tong+ SDM 2008]

826 guest lecture 97

SCS CMU

pTrack/cTrack: Trend analysis on graph level

G.Hinton

T. SejnowskiRank of Influential-ness

M. Jordan

C. Koch

Year

826 guest lecture 98

SCS CMU

pTrack: Problem Definition

• [Given]– (1) a large, skewed time-evolving bipartite

graphs, – (2) the query nodes of interest

• [Track]– (1) top-k most related nodes for each query

node at each time step t; – (2) the proximity score (or rank of proximity)

between any two query nodes at each time step t

826 guest lecture 99

Page 34: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 34

SCS CMU

pTrack: Philip S. Yu’s Top-5 conferences up to each year

ICDEICDCS

SIGMETRICSPDISVLDB

CIKMICDCSICDE

SIGMETRICSICMCS

KDDSIGMOD

ICDMCIKM

ICDCS

ICDMKDDICDESDMVLDB

1992 1997 2002 2007

DatabasesPerformanceDistributed Sys.

DatabasesData Mining

DBLP: (Au. x Conf.)- 400k aus, - 3.5k confs - 20 yrs

826 guest lecture 100

SCS CMU

KDD’s Rank wrt. VLDB over years

Prox.Rank

Year

Data Mining and Databases are getting closer & closer

826 guest lecture 101

SCS CMU

cTrack:10 most influential authors in NIPS community up to each year

T. Sejnowski

Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years

M. Jordan

102

Page 35: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 35

SCS CMU

T3/MT3: How to understand time?

826 guest lecture 103

SCS CMU

A Motivating Example: InputsTime Event(e.g., Session) EntityOct. 26 Link Analysis Tom, Bob

Clustering Bob, AlanOct. 27 Classification Bob, Alan

Anomaly Detection Alan, BeckOct. 28 Party Beck, DanOct. 29 Web Search Dan, Jack

Advertising Jack, PeterOct. 30 Enterprise Search Jack, PeterOct. 31 Q & A Peter, Smith

SCS CMU

Time Cluster, rep. entities: b7,b6, b8A Motivating Example: Outputs

JackOct. 29

Oct. 30Oct. 30

Oct 28

Time ClusterRep. Entities:

``Jack’’, ``Peter’’, ``Smith’’

Abnormal TimeOct. 28

Oct. 26

Oct. 27

Rep. Entities:``Beck’’ ``Dan’’

Time ClusterRep. Entities:

``Tom’’, ``Bob’’,``Alan’’

105

Page 36: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 36

SCS CMU

Problem Definitions: (How to Understand Time in such complex context)

• Given datasets collected at different time stamps;

• Find– Q1: Time Cluster – Q2: Abnormal Time stamp– Q3: Interpretations– Q4: Right time granularity

826 guest lecture 106

SCS CMU

T3: Basic IdeaOct. 26, 2008

Oct. 27, 2008

Link Analysis

Clustering

Classification

Anomaly Dect.

Tom

Bob

AlanTT

Oct. 28, 2008

Oct. 29, 2008

Oct. 30, 2008

Oct. 31, 2008

Party

Web Search

Advertising

En. Search

Q & A

Beck

Dan

Jack

Peter

Smith

TP

826 guest lecture107

SCS CMU

Data Sets• CIKM: from cikm proceedings

• Time: Publication years (1993-2007, 15)• Event: Paper (952)• Entities: Authors (1895) & Sessions

(279)• Attribute: Keyword (158)

826 guest lecture 108

Page 37: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 37

SCS CMU

T3 on `CIKM’ Data Set Rep. Authors Rep. Keywords

James. P. CallanW. Bruce CroftJames AllanPhilip S. Yu

George KarypisCharles Clarke

WebCluster

ClassificationXML

LanguageStream

Rep. Authors Rep. KeywordsElke Rundensteiner

Daniel MirankerAndreas Henrich

Il-Yeol SongScott B Huffman

Robert J. Hall

KnowledgeSystem

UnstructuredRule

Object-orientedDeductive 109

SCS CMU

More Applications• Clustering

– Proximity as input [Ding+ KDD 2007]• Email management [Minkov+ CEAS 06].

• Business Process Management [Qu+ 2008]

• ProSINProSIN– Listen to clients’ comments [Tong+ 2008]

• Ghost Edge– Within Network Classification [Gallagher & Tong+ KDD08 b]

• …

826 guest lecture 112

SCS CMU

gCap:

pTrack/cTrack:

LP: [Liben-Nowell+][Tong+ 2007]

NF:

CePS:

G-Ray:

T3:GhostEdge:

[Gallagher & 2008

[Tong+ 2007]

[Tong+ 2006]

[Pan+ 2004]

[Pan+ 2005]

[Tong+ 2008]

FastAllDAP: [Tong+ 2007]

BB Lin: [Tong+ 2006]

FastUpdate: [Tong+ 2008]

ComputationsApplications

EfficientlySolve Linear

System(s)

MT3: [Tong+ 2008]Use Proximity as

Building block(gCap, CePS…)

Fast-ProSIN: [Tong+ 2008]

T3: & Tong+ 2008][Tong+ 2008]

RWR: [Pan+ 2004][Sun+ 2006][Tong+ 2006]

Variants: [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007]

Generalizations: [Charkrabarti+ 2006][Tong+ 2007, 2008]

Properties.: [Koren+ 2006]]Tong+ 2007, 2008]

Definitions

B_Lin: [Tong+ 2006]

BB_Lin: [Tong+ 2006]

ProximityOn

Graphs

Weighted Multiple

Relationship

System(s)

Page 38: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 38

SCS CMU

Take-home messages• Proximity Definitions

– RWR– and a lot of variants

Computations• Computations– Sherman–Morrison Lemma– Fast Incremental Computation

• Applications– Proximity as a building block

826 guest lecture 114

SCS CMU

References• L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The

PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library.

• T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, 517-526, 2002

• J Y Pan H J Yang C Faloutsos & P Duygulu (2004) Automatic• J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004.

• C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118-127, 2004.

• J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418-425, 2005.

• W. Cohen. (2007) Graph Walks and Graphical Models. Draft.826 guest lecture 115

SCS CMU

References• P. Doyle & J. Snell. (1984) Random walks and electric

networks, volume 22. Mathematical Association America, New York.

• Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, 2006.

• A Agarwal S Chakrabarti & S Aggarwal (2006) Learning to• A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006.

• S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571-580, 2007.

• F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007.

826 guest lecture 116

Page 39: Proximity on Graphs - Carnegie Mellon School of Computer …christos/courses/826.S10/FOILS-pdf/900_proximity... · Proximity on Graphs-Definitions, Fast Solutions, and Applications

H. Tong 15-826

Guest-lecture 39

SCS CMU

References• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs:

problem definition and fast solutions. In KDD, 404-413, 2006.• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk

with Restart and Its Applications. In ICDM, 613-622, 2006.• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-

aware proximity for graph mining In KDD 747 756 2007aware proximity for graph mining. In KDD, 747-756, 2007.• H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007)

Fast best-effort pattern matching in large attributed graphs. In KDD, 737-746, 2007.

• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008.

826 guest lecture 117

SCS CMU

References• B. Gallagher, H. Tong, T. Eliassi-Rad, C. Faloutsos. Using

Ghost Edges for Classification in Sparsely Labeled Networks. KDD 2008

• H. Tong, Y. Sakurai, T. Eliassi-Rad, and C. Faloutsos. Fast Mining of Complex Time-Stamped Events CIKM 08

• H Tong H Qu and H Jamjoom Measuring Proximity on• H. Tong, H. Qu, and H. Jamjoom. Measuring Proximity on Graphs with Side Information. ICDM 2008

826 guest lecture 118