SpotLight - cs.cmu.edudeswaran/talks/kdd18-spotlight-slides.pdf · PROBLEM SpotLight sketching...

49
Detecting Anomalies in Streaming Graphs Nina Mishra Dhivya Eswaran Christos Faloutsos Sudipto Guha SpotLight Carnegie Mellon University Amazon This work was performed at Amazon.

Transcript of SpotLight - cs.cmu.edudeswaran/talks/kdd18-spotlight-slides.pdf · PROBLEM SpotLight sketching...

Detecting Anomalies in Streaming Graphs

Nina MishraDhivya Eswaran Christos Faloutsos Sudipto Guha

SpotLight

Carnegie Mellon University Amazon

This work was performed at Amazon.

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Graphs are being created everywhere

�2

INTRODUCTION

You Alice

6 Jun 2018, 1.34am

………

………

………

………………

………

………………

………

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Many other settings…

�3

INTRODUCTION

IM/e-mail networks Computer networks

Transportation networks Edit networks

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

As a sequence of graph snapshots

�4

INTRODUCTION

time

Monday PM Tuesday PM

Monday AM Tuesday AM Wednesday AMMORNINGS

NIGHTS

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

But sometimes unusual events happen

�5

INTRODUCTION

NormalTax scamNetwork failure

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Unusual events in other settings

�6

INTRODUCTION

Computer networks (e.g., port scans,

denial-of-service)Transportation networks (events/weather)

stadium

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

How do we detect such anomalies in streaming graphs?

�7

INTRODUCTION

How do we even characterize these anomalies?

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

INSIGHT

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Anomalies tend to involve…

�9

INSIGHT

sudden (dis)appearance of large dense directed subgraph

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

sudden (dis)appearance of large dense directed subgraph

�10

INSIGHT

sourcessources

destinationsdestinations

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 �11

INSIGHT

sudden (dis)appearance of large dense directed subgraph

sources

destinationsmany nodes

many many edges

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 �12

INSIGHT

sudden (dis)appearance of large dense directed subgraph

steady evolution?

suddeninitial final

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 �13

INSIGHT

appearance disappearance

sudden (dis)appearance of large dense directed subgraph

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

PROBLEM

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 �15

PROBLEM

time

anomaly!Ok! Ok!Ok!

• (Un)directed weighted edges • Time-evolving node set • Known node-correspondence

STREAMING MODEL

• Real-time and fast detection • Bounded working memory

ALGORITHMIC CONSTRAINTS

GIVEN

FIND

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

ALGORITHM

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Overview of SpotLight

�17

ALGORITHM

Graph

Sketching

v(G3)

v(G1)

v(G2) v(G4)

G1

G3 G4

G2

anomaly! v(G3)

v(G1)

v(G2) v(G4)

Anomaly

Detection

Many off-the-shelf methods for anomaly detection:

‣ Robust Random Cut Forests [Guha, Mishra, Roy & Schrijvers; ICML 2016]

‣ Light-weight Online Detector of Anomalies [Pevny; ML 2016]

‣ Randomized Space Forests [Wu, Zhang, Fan, Edwards & Yu; ICDM 2014]

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

0 100

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

0 100 20

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight randomized graph sketching

�18

ALGORITHM

0 100 20

THREE PARAMETERS:

‣ Probability of sampling source ‘p’ ‣ Probability of sampling destination ‘q’ ‣ Number of sketching dimensions ‘K’

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

time5pm

0 0 0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

time5pm

0 0 0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

time5pm

0 0 0

ahS hS hS

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

time5pm

0 0 0

ahS hS hS

bhD hD hD

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

time5pm

0 0 1

ahS hS hS

bhD hD hD

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm

0 0 1

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm

0 0 1

bhS hS hS

chD hD hD

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm

0 2 3

bhS hS hS

chD hD hD

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

0 2 3

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

5-6pm

0 2 3

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

5-6pm

0 0 0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

a

d2

a

a1

b

c1

7pm

5-6pm

1 0 2

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

a

d2

a

a1

b

c1

7pm

5-6pm 6-7pm

0 0 0

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

SpotLight at work on a stream

�19

ALGORITHM

STREAMING ANOMALY DETECTOR

Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}

anom

aly s

core

time

b

a1

c

b2

time5pm 6pm

a

d2

a

a1

b

c1

7pm

5-6pm 6-7pm

0 0 0

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

GUARANTEES

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Intuition behind our theorems

�21

GUARANTEES

G GBGR

v(GR)

v(GB)

K-dim SpotLight Space

v(G)dR

dB dR - dB > O(K m2)

Deterministic Experiment: Add ‘m’ unit-weight edges.

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Thm 1: Focus-awareness in expectation

�22

GUARANTEES

<

GGR GB

Randomized Experiment: Add ‘m’ unit-weight edges uniformly at random.

K-dim SpotLight Space

dR

dB

distance

proba

bility

E[dB]

Focus-awareness property was introduced by Koutra, Vogelstein & Faloutsos [SDM 2013].

E[dR]

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Thm 2: Criterion for anomaly detection

�23

GUARANTEES

distance

proba

bility dR dB

FN FP

decision thresholdanomalynormal

distancepro

babil

ity

dR dB

FPR ≤ 𝛅

𝛜

➡ Pr[dR-dB > 𝛜] ≥ 1-𝛅

“EXPECTED” GAP “HIGH PROBABILITY” GAP

sketch size, K ≥ K*

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

EXPERIMENTS

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

The labeled DARPA dataset

�25

EXPERIMENTS

4.5M edges in 87.7K time ticks 9.5K sources, 24K destinations Edges labeled as attack/not

Stream of 1.5K hourly graphs(24% anomalous)

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

DARPA: Precision and recall

�26

EXPERIMENTS

#graphs correctly flagged

#graphs flaggedPrecision =

#graphs correctly flagged

#anomalous graphsRecall =

RHSS: (Ranshous, Harenburg, Sharma & Samatova, SDM 2016)STA: Streaming Tensor Analysis (Sun, Tao & Faloutsos, KDD 2006)

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

DARPA: Challenges and successes

�27

EXPERIMENTS

SpotLight

Edge Weight = SL with K=p=q=1 (+misses medium size attacks)

(misses small attacks)

RHSS = Edge likelihood function (+misses repeated attacks)

PROBLEM

ALGORITHM

GUARANTEES

EXPERIMENTS

INSIGHT

CONCLUSION

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Summary

29

CONCLUSION

Memory efficient Theoretical guaranteesReal-time

Ok!

anomaly!

Ok! Ok! time

PROBLEM

SpotLight sketching

SOLUTION

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018

Future directions

�30

CONCLUSION

MORE CHALLENGING ANOMALIES

‣ Slow and/or small attacks

‣ Sequence of suspicious events rather than a single event

STREAMING ANOMALY ATTRIBUTION

‣ Blame a small set of sources and destinations for the anomaly

ESWARAN, FALOUTSOS, GUHA & MISHRA

SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS

KDD 2018 31

CONCLUSION

Thank you! Questions:

[email protected]

distance

proba

bility

dR dB

FPR ≤ 𝛅

𝛜

ALGORITHM THEORY PRACTICE