Neighborhood Formation and Anomaly Detection in Bipartite Graphs

31
NEIGHBORHOOD FORMATION AND ANOMALY DETECTION IN BIPARTITE GRAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Presented By Bhavana Dalvi

description

Neighborhood Formation and Anomaly Detection in Bipartite Graphs. Jimeng Sun, Huiming Qu , Deepayan Chakrabarti & Christos Faloutsos. Presented By Bhavana Dalvi. Outline. Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work - PowerPoint PPT Presentation

Transcript of Neighborhood Formation and Anomaly Detection in Bipartite Graphs

Page 1: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

NEIGHBORHOOD FORMATION AND ANOMALY DETECTION IN BIPARTITE GRAPHS

Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos

Presented ByBhavana Dalvi

Page 2: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

OUTLINE

Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

Page 3: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

BIPARTITE GRAPHS AND INTERESTING QUESTIONSAuthor Paper graph

Authors Papers

a

Page 4: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

BIPARTITE GRAPHS AND INTERESTING QUESTIONSAuthor Paper graph

Authors Papers

a Which authors are most related to ‘a’ ?

Page 5: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

BIPARTITE GRAPHS AND INTERESTING QUESTIONSAuthor Paper graph

Authors Papers

a Which authors are most related to ‘a’ ?

Page 6: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

BIPARTITE GRAPHS AND INTERESTING QUESTIONSAuthor Paper graph

Authors Papers

a Which authors are most related to ‘a’ ?

0.8 b

Page 7: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

BIPARTITE GRAPHS AND INTERESTING QUESTIONSAuthor Paper graph

Authors Papers

a Which authors are most related to ‘a’ ?

0.8

0.6

0.2

0.4

b

Page 8: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

BIPARTITE GRAPHS AND INTERESTING QUESTIONSAuthor Paper graph

Authors Papers

a

Which is the uncommon paper written by ‘a’ ?

0.8

0.6

0.2

0.4

Page 9: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

BIPARTITE GRAPHS AND INTERESTING QUESTIONSAuthor Paper graph

Authors Papers

a

Which is the uncommon paper written by ‘a’ ?

0.8

0.6

0.2

0.4

Page 10: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

BIPARTITE GRAPHS AND INTERESTING QUESTIONSP2P Network

10

users

files

Which users have similar preferences as a particular user?

Jimeng Sun’s presentation at ICDM 2005

Which files are downloaded by users with very different preferences?

Page 11: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

OUTLINE

Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

Page 12: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

Neighborhood formation (NF)Input : query node q in V1

Output : relevance scores of all the nodes in V1 to q

Anomaly detection (AD)Input : query node q in V1, Output : normality scores for nodes in V2 that link to q

PROBLEM DEFINITIONV1 V2

q

E

Page 13: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

OUTLINE

Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

Page 14: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

NEIGHBORHOOD FORMATION

Relevance (b, q) (# short length paths from q to b)

b

q

The connection that links only b and q brings more relevance than the connection which links b, q and other nodes.

b

q

Page 15: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

EXACT NF ALGORITHM : RANDOM WALK WITH RESTARTInput : a graph G and a query node qOutput : relevance scores to q Construct the transition matrix where

every node in the graph becomes a state every state has a restart probability c to jump back to the query node q. transition probability

Find the steady-state probability u which is the relevance score of all the nodes to q

q

c c c

c

c

Jimeng Sun’s presentation at ICDM 2005

Page 16: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

FINDING STEADY STATE PROBABILITIES |V1| = k , |V2| = n M : k*n matrix representing weighted graph G Adjacency matrix : PA = col_norm(MA) qA : transform query node ‘a’ to (k+n)*1 vector

where only ath column has 1 and rest are 0. uA : steady state probability vector with restart

probability c

Bipartite structure :

k << n then savings are significant

Page 17: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

EXTENSIONS TO NF ALGORITHM

Parallel NF If multiple queries, computation can be done in

parallel.

Approximate NF Cluster the nodes in to k partitions

(preprocessing) Given query node q, find partition Gi it belongs to Run Exact NF algorithm only on Gi Set relevance = 0 for nodes not in Gi

Page 18: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

OUTLINE

Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

Page 19: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

ANOMALY DETECTION A node x in V2 is normal if

Nodes in V1 that links to x are in same neighbourhood.

e.g. V1

V2

V1 V2

low normalityhigh normality

x x

Page 20: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

ANOMALY DETECTION ALGORITHM Input : node t in V2, Bipartite transition matrix

P, Output : Normality score(t)

1. Set St = neighbours of t in V1

2. RSt : Pairwise relevance scores for nodes in St

3. Normality score ns(t) = function (RSt) e.g. mean over non-diagonal elements in

RSt

Page 21: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

OUTLINE

Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

Page 22: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

DATASETS

datasets |V1| |V2| |E| Avgdeg (V1)

Avgdeg (V2)

Conference-Author (CA)

2687 288K 662K 510 5

Author-Paper (AP)

316K 472K 1M 3 2

IMDB 553K 204K 2.2M 4 11

Page 23: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

DO THE NEIGHBORHOODS MAKE SENSE?

rele

vanc

e sc

ore

rele

vanc

e sc

ore

rele

vanc

e sc

ore

most relevant neighbors most relevant neighbors

The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node.

Page 24: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

HOW ACCURATE IS THE APPROXIMATE NF?

neighborhood size = 20 num of partitions = 10

Precision = fraction of overlaps between ApprNF and NF among top k neighbors The precision drops slowly while increasing the number of partition The precision remain high for a wide range of neighborhood size

Page 25: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

DO THE ANOMALIES MAKE SENSE?

avg

. nor

mal

ity sc

ore

Injection : • Inject 100 nodes in V2 connecting k nodes each in V1 where k = avg. degree of nodes in V2• Nodes in V1 are randomly picked such that degree = 10 * avg. degree of nodes in V1 • Assumption : will induce connections across neighbourhoods

Page 26: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

WHAT ABOUT THE COMPUTATIONAL COST?

Computational cost drops significantly even with small increment in number of partitions

Page 27: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

OUTLINE

Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

Page 28: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

RELATEDWORK

Random walk on Graphs Page-Rank [ISDN 1998], Topic Sensitive Page-Rank [WWW 2002]

Outlier detection Outlier detection in high dimensional data : Aggarwal

and Yu [SIGMOD 2001] Outlier Detection Using Random Walks [ICTAI 2006]

Find outlier clusters

Graph partitioning : METIS package Spectral clustering methods Neighbourhoods can become personalized clusters

Page 29: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

OUTLINE

Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

Page 30: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

CONCLUSIONS AND FUTURE WORK

Solution to two problems for Bipartite Graphs Neighborhood Formation (NF) Anomaly Detection (AD)

Random walk with restart along with graph partitioning can be used to solve NF efficiently.

AD can be done based on relevance scores generated by NF

Experiments on real datasets show good results.

Proximity Tracking on Time-Evolving Graphs (SIAM 2008 paper) Defines proximity scores in dynamic setting. Efficient incremental updates

Page 31: Neighborhood Formation and Anomaly Detection in Bipartite Graphs

THANK YOU