ddBall: Spotting A n o m a l i e s in Weighted Graphs
description
Transcript of ddBall: Spotting A n o m a l i e s in Weighted Graphs
![Page 1: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/1.jpg)
ddBall: Spotting Anomalies in Weighted Graphs
Leman Akoglu, Mary McGlohon, Christos FaloutsosCarnegie Mellon University
School of Computer Science
Pittsburgh, Pennsylvania, USA
![Page 2: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/2.jpg)
Motivation Anomaly detection in networks (graph data) has
important applications: Computer networks
spammers, port scanners
Phone-call networks telemarketers, misbehaving
costumers, faulty equipment
Social networks ‘popularity contests’
Account networks scammers, transfer fraud
Terrorist networks tight groups of people
PAKDD 2010 Akoglu, McGlohon, Faloutsos2
![Page 3: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/3.jpg)
ProblemQ1. Given a weighted and unlabeled graph, how can we spot strange, abnormal, extreme nodes?
Q2. Can we explain why the spotted nodes are anomalous?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 3
![Page 4: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/4.jpg)
PAKDD 2010 Akoglu, McGlohon, Faloutsos 4
Preliminaries I – What is an anomaly?
No clear and unique definition!
“An observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism.” [Hawkins, 80]
![Page 5: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/5.jpg)
Preliminaries II – Weights
PAKDD 2010 Akoglu, McGlohon, Faloutsos 55
1
$10K
Bipartite Unipartite
$5K$15K
3
![Page 6: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/6.jpg)
Preliminaries III – Power Laws
PAKDD 2010 Akoglu, McGlohon, Faloutsos 66
Pr[X≥x] ~ cx-α
ln(Pr[X≥x]) ~ -α(c lnx)
c ≥ 0, α ≥ 0
lin-lin plot log-log plot
slope = -α
![Page 7: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/7.jpg)
PAKDD 2010 Akoglu, McGlohon, Faloutsos 77
DBLP Keyword-to-Conference Network# Edges
Total weight
#Source nodes
#Destination nodes
‘Power Law’ Example
Densification Power Law [Leskovec ‘05]
Weight Power Law [McGlohon ‘08]
![Page 8: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/8.jpg)
PAKDD 2010 Akoglu, McGlohon, Faloutsos 8
In-degree (# donors)2004 US FEC Committees to Candidates network
e.g. John Kerry,
$10M received,
from 1K donors
Snapshot Power Law [McGlohon et al.‘08]
In-weights($)
‘Power Law’ Example
![Page 9: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/9.jpg)
Preliminaries IV – how to fit
PAKDD 2010 Akoglu, McGlohon, Faloutsos 9
Least Squares
fit to medians!
![Page 10: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/10.jpg)
Problem revisitedQ1. Given a weighted and unlabeled graph, how can we spot strange, abnormal, extreme nodes?
Q2. Can we explain why the spotted nodes are anomalous?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 10
![Page 11: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/11.jpg)
Problem sketch
PAKDD 2010 Akoglu, McGlohon, Faloutsos 11
![Page 12: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/12.jpg)
Main ideaFor each node,
P.1) extract ‘ego-net’ (=1-step-away neighbors)
P.2) extract features (#edges, total weight, etc.)
P.3) extract patterns (norms)
P.4) anomaly detection: compare with the rest of the population
LLNL'10 C. Faloutsos (CMU) 12
![Page 13: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/13.jpg)
Outline1. Motivation
2. Preliminaries and Problem Definition
3. Proposed Method
a. Study of ego-nets
b. Laws and Observations
c. Anomaly detection
1. Datasets
2. Experiments
3. Discussion & Conclusion
PAKDD 2010 Akoglu, McGlohon, Faloutsos 13
![Page 14: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/14.jpg)
P.1 What is an egonet?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 14
ego
ego-net
![Page 15: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/15.jpg)
What is odd?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 15
![Page 16: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/16.jpg)
PwC 2009 Leman Akoglu 16
What is “anomalous”?
Near-star
Near-clique
telemarketer, port scanner,
people adding friends
indiscriminatively, etc.
tightly connected people,
terrorist groups?, discussion
group, etc.
![Page 17: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/17.jpg)
PwC 2009 Leman Akoglu 17
What is “anomalous”?
Heavy vicinity
Dominant heavy link17
too much money wrt number
of accounts, high donation
wrt number of donors, etc.
single-minded,
tight company
![Page 18: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/18.jpg)
P.2 What features…
PwC 2009 Leman Akoglu 18
… should we extract so that to project nodes into a low-dimensional space?
features that could yield “laws”
features easy to compute
and interpret
18
![Page 19: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/19.jpg)
Selected Features
PAKDD 2010 Akoglu, McGlohon, Faloutsos 19
Ni: number of neighbors (degree) of ego i
Ei: number of edges in egonet i
Wi: total weight of egonet i
λw,i: principal eigenvalue of the weighted adjacency matrix of egonet i
![Page 20: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/20.jpg)
20
λw,i = √N = √E = √W
λw,i > √N
~ √E, √Wλw,i = N ≈ √W
λw,i = W λw,i ≈ W
λw,i √W
N: #neighbors, W: total weightPAKDD 2010 Akoglu, McGlohon, Faloutsos 20
details
![Page 21: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/21.jpg)
Other Features
PAKDD 2010 Akoglu, McGlohon, Faloutsos 21
Si: number of singleton neighbors of ego i with degree 1
max(Wi): maximum edge weight in egonet i
max(Wi, d=1): maximum edge weight to/from a degree 1 neighbor of ego i
max(di): maximum degree of the neighbors of ego i
2-step neighborhood features
![Page 22: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/22.jpg)
Outline1. Motivation
2. Preliminaries
3. Proposed Method
a. Study of egonets
b. Laws and Observations
c. Anomaly detection
4. Datasets
5. Experiments
6. Discussion & Conclusion
PAKDD 2010 Akoglu, McGlohon, Faloutsos 2222
![Page 23: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/23.jpg)
P.3 What patterns?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 23
Observation 1: Egonet Density Power Law (EDPL)
23
Q1: How does the number of neighbors N
of the egonet relate to the
number of edges E?
![Page 24: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/24.jpg)
Observation 1: Egonet Density Power Law (EDPL)
PwC 2009 Leman Akoglu 2424
Ei N∝ iα
1 ≤ α ≤ 2
![Page 25: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/25.jpg)
P.3 What patterns?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 25
Observation 2: Egonet Weight Power Law (EWPL)
25
Q2: How does the total weight W of the egonet
relate to the number of edges E?
![Page 26: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/26.jpg)
Observation 2: Egonet Weight Power Law (EWPL)
2626
Wi E∝ iβ
β ≥ 1
![Page 27: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/27.jpg)
P.3 What patterns?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 27
Observation 3: Egonet λw Power Law (ELWPL)
27
Q3: How does the largest eigenvalue λw of the weighted adjacency matrix of the egonet
relate to the total weight W?
![Page 28: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/28.jpg)
Observation 3: Egonet λw Power Law (ELWPL)
2828
λw,i W∝ iγ
0.5 ≤ γ ≤ 1
![Page 29: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/29.jpg)
Outline1. Motivation
2. Preliminaries
3. Proposed Method
a. Study of egonets
b. Laws and Observations
c. Anomaly detection
4. Datasets
5. Experiments
6. Discussion & Conclusion
PAKDD 2010 Akoglu, McGlohon, Faloutsos 2929
![Page 30: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/30.jpg)
P.4 Anomaly detection
PAKDD 2010 Akoglu, McGlohon, Faloutsos 30
violates our “laws”
too far away from the rest of the pointsAnomaly ≈
30
![Page 31: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/31.jpg)
31
scoredist = distance to fitting linescoreoutl = outlierness score
score = func ( scoredist , scoreoutl )
can tell what kind
of anomaly a node
belongs to can sort nodes wrt
their outlierness scoresPAKDD 2010 Akoglu, McGlohon, Faloutsos 31
![Page 32: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/32.jpg)
Outline1. Motivation
2. Preliminaries
3. Proposed Method
a. Study of egonets
b. Laws and Observations
c. Anomaly detection
4. Datasets
5. Experiments
6. Discussion & Conclusion
PAKDD 2010 Akoglu, McGlohon, Faloutsos 3232
![Page 33: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/33.jpg)
Datasets
PAKDD 2010 Akoglu, McGlohon, Faloutsos 3333
Bipartite networks: |N| |E|
1. Don2Com 1.6M 2M
2. Com2Cand 6K 125K
3. Auth2Conf 421K 1M
Unipartite networks: |N| |E|
5. BlogNet 27K 126K
6. PostNet 223K 217K
7. Enron 36K 183K
8. Oregon 11K 38K
![Page 34: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/34.jpg)
Outline1. Motivation
2. Preliminaries
3. Proposed Methoda. Study of egonets
b. Laws and Observations
c. Anomaly detection
4. Datasets
5. Experiments
6. Discussion & Conclusion
PAKDD 2010 Akoglu, McGlohon, Faloutsos 3434
![Page 35: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/35.jpg)
Experimental Results
PAKDD 2010 Akoglu, McGlohon, Faloutsos 3535
Anomaly /Dataset
Near-clique,Near-star
Heavy vicinity Dominant pair,Uniform weights
Don2Com N/A ? ?
Com2Cand N/A ? ?
Auth2Conf N/A ? ?
PostNet ? ? ?
BlogNet ? ? ?
Enron ? N/A N/A
Oregon ? N/A N/A
![Page 36: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/36.jpg)
Near-Clique/Star
PwC 2009 Leman Akoglu 3636
![Page 37: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/37.jpg)
37
Near-Clique/Star
PAKDD 2010 Akoglu, McGlohon, Faloutsos 37
![Page 38: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/38.jpg)
Experimental Results
PAKDD 2010 Akoglu, McGlohon, Faloutsos 3838
Anomaly /Dataset
Near-clique,Near-star
Heavy vicinity Dominant pair,Uniform weights
Don2Com N/A ? ?
Com2Cand N/A ? ?
Auth2Conf N/A ? ?
PostNet self-linking post,post w/ numerous links
to diverse posts
? ?
BlogNet “link blogs” devoted to a wide array of content
? ?
Enron Kenneth Lay (>1K contacts)
N/A N/A
Oregon 3 large ASPs,Verizon, Sprint, AT&T
N/A N/A
![Page 39: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/39.jpg)
Heavy Vicinity
PAKDD 2010 Akoglu, McGlohon, Faloutsos 3939
![Page 40: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/40.jpg)
Heavy Vicinity
PAKDD 2010 Akoglu, McGlohon, Faloutsos 4040
![Page 41: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/41.jpg)
Experimental Results
PAKDD 2010 Akoglu, McGlohon, Faloutsos 4141
Anomaly /Dataset
Near-clique,Near-star
Heavy vicinity Dominant pair,Uniform weights
Don2Com N/A Bush-Cheney ’04 Inc, Kerry Committee
?
Com2Cand N/A Liberty Congressional PAC, Aaron Russo
?
Auth2Conf N/A Averill M. Law - Winter Simulation Conference
?
PostNet self-linking post,post w/ numerous links
to diverse posts
post listed as blog homepage, post w/single repeated link
?
BlogNet “link blogs” devoted to a wide array of content
Automotive News Today – GM blog
?
Enron Kenneth Lay (>1K contacts)
N/A N/A
Oregon 3 large ASPs,Verizon, Sprint, AT&T
N/A N/A
![Page 42: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/42.jpg)
Dominant Heavy Link
PAKDD 2010 Akoglu, McGlohon, Faloutsos 4242
$87M - DNC$25M - RNC
![Page 43: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/43.jpg)
Dominant Heavy Link
PwC 2009 Leman Akoglu 4343
![Page 44: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/44.jpg)
Experimental Results
PAKDD 2010 Akoglu, McGlohon, Faloutsos 4444
Anomaly /Dataset
Near-clique,Near-star
Heavy vicinity Dominant pair,Uniform weights
Don2Com N/A Bush-Cheney ’04 Inc, Kerry Committee
Negative edge weights due returns
Com2Cand N/A Liberty Congressional PAC, Aaron Russo
DNC against George Bush
Auth2Conf N/A Averill M. Law - Winter Simulation Conference
Toshio Fukuda-ICRAPLaTD- Hans Bekic
PostNet self-linking post,post w/ numerous links
to diverse posts
post listed as blog homepage, post w/single repeated link
“ThinkProgress” and “A Freethinker’s Paradise” on
on a leak scandal
BlogNet “link blogs” devoted to a wide array of content
Automotive News Today – GM blog
“Drudge” (298 links to 4)“Nocapital” (300 links to 2)
Enron Kenneth Lay (>1K contacts)
N/A N/A
Oregon 3 large ASPs,Verizon, Sprint, AT&T
N/A N/A
![Page 45: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/45.jpg)
Outline1. Motivation
2. Preliminaries
3. Proposed Methoda. Study of egonets
b. Laws and Observations
c. Anomaly detection
4. Datasets
5. Experiments
6. Discussion & Conclusion
PAKDD 2010 Akoglu, McGlohon, Faloutsos 4545
![Page 46: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/46.jpg)
46
Scalability
PAKDD 2010 Akoglu, McGlohon, Faloutsos 46
Counting number of edges in egonets for ALL
nodes is expensive!
need to scan connections for all pairs of neighbors!
Can be reworded as counting local triangles A fast method [Tsourakakis,08] exists!
IDEA: o #triangles = (# paths of length 3) / 2
o # paths of length 3 for node i = (A3)ii
o Computing A3 is still expensive!o Low-rank approximation!
![Page 47: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/47.jpg)
PAKDD 2010 Akoglu, McGlohon, Faloutsos 47
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 0 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1/2 1/2 0 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
0 0 0 0
-0.18 -0.36 0.13 -0.90
0 0 0 0
0.36 -0.18 0.90 0.13
-0.40 -0.81 -0.06 0.40
0 0 0 0
0 0 0 0
0.81 -0.40 -0.40 -0.06
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0.60 0 -0.30 0.65 0 0 -0.32 0 0 0 0
0 -0.30 0 -0.60 -0.32 0 0 -0.65 0 0 0 0
0 -0.72 0 -0.11 0.66 0 0 0.10 0 0 0 0
0 -0.11 0
0.72 0.10 0 0 -0.66 0 0 0 0
0.44 0 0 0
0 0.44 0 0
0 0 0.18 0
0 0 0 0.18
US UT
A ~
nxn nxk
kxk kxn
A3 =O(n3) ~ O(nk2)
A3S3
Prune d=1 nodes Prune d=2 as well as d=1 nodes
smaller & sparser A matrix
details
![Page 48: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/48.jpg)
Scalability – time vs. size
PAKDD 2010 Akoglu, McGlohon, Faloutsos 4848
Time vs. number of edges.
Effect of pruning on computation time.
Solid (–): no pruning,
Dashed (−−): pruning nodes w/ d ≤1,
Dotted (…): pruning nodes w/ d ≤ 2
Computation time increases linearly
with increasing number of edges,
while decreasing with pruning.
![Page 49: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/49.jpg)
49
Scalability – accuracy vs time
PAKDD 2010 Akoglu, McGlohon, Faloutsos 49
Time vs. accuracy.
Effect of pruning on accuracy of finding
top anomalies as in the original ranking
before pruning.
New rankings are scored using
Normalized Cumulative Discounted Gain.
Pruning reduces time for both
Node-Iterator and Eigen-Triangle
while keeping accuracy at as high as
~1 and ~.9, respectively.
![Page 50: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/50.jpg)
Conclusion OddBall, a fast, unsupervised method to detect abnormal
nodes in weighted graphs. Study of egonets; list of numerical features Discovery of new patterns in density (Obs.1: EDPL),
weights (Obs.2: EWPL), and principal eigenvalues (Obs.3: ELWPL).
Speed-up in feature extraction, with accuracy ~.9 Experiments on real graphs of over 1M nodes, that reveal
strange/extreme nodes from many different domains
Software available online!http://www.cs.cmu.edu/~lakoglu/#tools
PAKDD 2010 Akoglu, McGlohon, Faloutsos 5050
![Page 51: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/51.jpg)
Related Work
PAKDD 2010 Akoglu, McGlohon, Faloutsos 51
Noble and Cook [KDD,03] detect anomalous sub-graphs using variants of the MDL principle.
Eberle and Holder [ICDM,07] detect unexpected/missing nodes/edges in labeled graphs.
Liu et. al [SDM,05] detect non-crashing bugs in software using frequent execution flow graphs combined with supervised classification.
Sun et al.[ICDM,05] use proximity and random walks to assess normality of nodes in bipartite graphs.
Chakrabarti [PKDD,04] spot anomalous edges as a by-product of cross-associations.
51
![Page 52: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/52.jpg)
http://www.cs.cmu.edu/~lakoglu/#tools
52
QUESTIONS?
![Page 53: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/53.jpg)
OddBall over time? Rank nodes at each time tick t
Node i will have rank vector Ri ri,1 ri,2 … ri,t
Sort nodes w.r.t. |Ri<=threshold| threshold =3 will sort nodes w.r.t. the number of time-
ticks they appear in top 3 outliers
Note: not all nodes appear at all time-ticks
PAKDD 2010 Akoglu, McGlohon, Faloutsos 53
![Page 54: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/54.jpg)
OddBall over time?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 54
threshold=3
- Node was active at 46 time-ticks.
- At 26 of them, it was in top-3 outliers.
- Score becomes:
(26/46) * 26
- Rank: 1
84321653332|MR|10-MAY-65|400010|15-MAR-03|ACTIVE|RCVALUE
![Page 55: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/55.jpg)
OddBall over time?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 55
threshold=3
- Node was active at 46 time-ticks.
- At 26 of them, it was in top-3 outliers.
- Score becomes:
(26/46) * 26
E.g. from t=154 (MAY-2)
to t=170 (MAY-18),
it appears in top-3
![Page 56: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/56.jpg)
OddBall over time?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 56
![Page 57: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/57.jpg)
OddBall over time?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 57
threshold=3
- Node was active at 177 time-ticks.
- At 35 of them, it was in top-3 outliers.
- Rank: 5
84354420350|Mr|21-OCT-76|400033|23-JAN-04|ACTIVE|RCVALUE
![Page 58: ddBall: Spotting A n o m a l i e s in Weighted Graphs](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681462c550346895db33a2b/html5/thumbnails/58.jpg)
OddBall over time?
PAKDD 2010 Akoglu, McGlohon, Faloutsos 58
threshold=3
- Node was active at 177 time-ticks.
- At 35 of them, it was in top-3 outliers.
- Rank: 5
84354420350|Mr|21-OCT-76|400033|23-JAN-04|ACTIVE|RCVALUE