Nonparametric Sequential Change Detection for High...

26
Nonparametric Sequential Change Detection for High-Dimensional Problems Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Yılmaz Electrical Engineering, University of South Florida Allerton 2017

Transcript of Nonparametric Sequential Change Detection for High...

Page 1: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Nonparametric Sequential Change Detection forHigh-Dimensional Problems

Yasin YılmazElectrical Engineering, University of South Florida

Allerton 2017

Page 2: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Outline

1 Introduction

2 Background

3 ODIT: Online Discrepancy Test

4 Numerical Results

5 Conclusion

Page 3: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Introduction

Introduction

Page 4: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Introduction

Anomaly Detection

Objective: identify patterns that deviate from a nominal behavior

Applications: cybersecurity, quality control, fraud detection, faultdetection, health care, . . .

In literature typically

statistical outlier detection=

anomaly detection

However an outlier could be

nominal tail eventor

real anomalous event(e.g., mean shift)

-5 0 5 10

x

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

f(x

)

f0(x) f1(x)

Page 5: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Introduction

Anomaly Detection

Objective: identify patterns that deviate from a nominal behavior

Applications: cybersecurity, quality control, fraud detection, faultdetection, health care, . . .

In literature typically

statistical outlier detection=

anomaly detection

However an outlier could be

nominal tail eventor

real anomalous event(e.g., mean shift)

-5 0 5 10

x

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

f(x

)

f0(x) f1(x)

Page 6: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Introduction

Problem Formulation

Instead of anomaly = outlier, consideralso temporal dimension

Proposed Model

anomaly = persistent outliers

Objective

Timely and accurate detection ofanomalies in high-dimensionaldatasets

Approach

Sequential & Nonparametric anomalydetection

0 2 4 6 8 10 12 14 16 18 20

t

-4

-2

0

2

4

x(t

)

Nominal

0 2 4 6 8 10 12 14 16 18 20

t

-4

-2

0

2

4

x(t

)

Anomaly after t=10 with prob. 0.2

persistent outliers

outlier

Page 7: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Introduction

Motivating Facts: IoT Security, Smart Grid, . . .

IoT devices: 8.4B in 2017 and expected to hit 20B by 2020 1

IoT systems: highly vulnerable – needs scalable security solutions 2

Mirai IoT botnet: largest recorded DDoS attack with at least 1.1Tbps bandwidth (Oct. 2016) 2

Persirai IoT botnet targets at least 120,000 IP cams (May 2017) 3

A plausible cyberattack against the US grid: 100M people may beleft without power with up to $1 trillion of monetary loss 4

1R. Minerva, A. Biru, and D. Rotondi, “Towards a definition of the Internet ofThings (IoT),” IEEE Internet Initiative, no. 1, 2015.

2E. Bertino and N. Islam, “Botnets and Internet of Things Security,” Computer,vol. 50, no. 2, pp. 76-79, Feb. 2017.

3Trend Micro, “Persirai: New Internet of Things (IoT) Botnet Targets IPCameras”, May 9 , 2017, available online

4Trevor Maynard and Nick Beecroft, “Business Blackout,” Lloyd’s Emerging RiskReport, p. 60, May 2015.

Page 8: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Introduction

Motivating Facts: IoT Security, Smart Grid, . . .

Challenges:

Unknown anomalous distribution: parametric methods, as well assignature-based methods (e.g., antivirus) are not feasible

High-dimensional problems: even nominal distribution is difficult toknow

Nonparametric methods are needed

Timely and accurate detection is critical

Page 9: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Background

Background

Page 10: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Background

Sequential Change Detection - CUSUM

infT

supτ

supx 1,...,xT

Eτ [T − τ |T ≥ τ ] s.t. E∞[T ] ≥ β

Wt = max

Wt−1 + log

f1(x t)

f0(x t), 0

T = mint : Wt ≥ h

Page 11: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Background

Statistical Outlier Detection

Needs to know a statistical description f0 of the nominal(e.g., no attack) behavior (baseline)

Determines instances that significantly deviate from the baseline

With f0 completely known, x is outlier if∫∞x

f0(y)dy < α (p-value)

Equivalently, if x 6∈ most compact set of data points under f0(minimum volume set)

Ωα = arg minA

∫A

dy subject to

∫Af0(y)dy ≥ 1− α

-5 -4 -3 -2 -1 0 1 2 3 4 5

x

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

f 0(x

)

Uniformly most powerful test whenanomalous distribution is a linear mixtureof f0 and the uniform distribution

Coincides with minimum entropy set whichminimizes the Renyi entropy whilesatisfying the same false alarm constraint

Page 12: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Background

Geometric Entropy Minimization (GEM)

High-dimensional datasets: even if f0 isknown, very computationally expensive(if not impossible) to determine Ωα

Various methods for learning Ωα

GEM is very effective withhigh-dimensional datasets whileasymptotically achieving Ωα forlimK ,N→∞ K/N → 1− α

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

xij1t

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

xij

2t

L(K)

L1

L2

Training set 1Training set 2Test set

Training: Randomly partitions training set into two and formsK -kNN graph 5

XN1

K = arg minXN1

K

Lk(XN1

K ,XN2 ) =K∑i=1

k∑l=k∗

|ei(l)|γ

Test: new point x t ∈ Rd outlier if x t 6∈ XN1+1K ,

equivalently if Lt =∑k

l=k∗ |et(l)|γ > L(K)

5A. O. Hero III, “Geometric entropy minimization (GEM) for anomaly detectionand localization”, NIPS, pp. 585-592, 2006

Page 13: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

ODIT: Online Discrepancy Test

ODIT: Online Discrepancy Test

Page 14: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

ODIT: Online Discrepancy Test

Online Discrepancy Test (ODIT)

GEM lacks the temporal aspect

In GEM, x t is outlier ifLt =

∑kl=k∗ |ei(l)|γ > L(K)

In ODIT, Dt = Lt − L(K) is treated assome positive/negative evidence foranomaly

Dt approximates `t = log p(r(x t)|H1)p(r(x t)|H0)

between H1 claiming x t is anomalous andH0 claiming x t is nominal

1 2 3 4 5 6 7 8 9 10

t

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

ODIT statistic, sijt

Detection threshold, h

Assuming independence,∑T

t=1 Dt gives aggregate anomaly evidence

until time T (as∑T

t=1 `t , sufficient statistic for optimum detection)

Similar to CUSUM (optimum minimax sequential change detector),ODIT decides using

Td = mint : st ≥ h, st = maxst−1 + Dt , 0

Page 15: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

ODIT: Online Discrepancy Test

Online Discrepancy Test (ODIT)

GEM lacks the temporal aspect

In GEM, x t is outlier ifLt =

∑kl=k∗ |ei(l)|γ > L(K)

In ODIT, Dt = Lt − L(K) is treated assome positive/negative evidence foranomaly

Dt approximates `t = log p(r(x t)|H1)p(r(x t)|H0)

between H1 claiming x t is anomalous andH0 claiming x t is nominal

1 2 3 4 5 6 7 8 9 10

t

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

ODIT statistic, sijt

Detection threshold, h

Assuming independence,∑T

t=1 Dt gives aggregate anomaly evidence

until time T (as∑T

t=1 `t , sufficient statistic for optimum detection)

Similar to CUSUM (optimum minimax sequential change detector),ODIT decides using

Td = mint : st ≥ h, st = maxst−1 + Dt , 0

Page 16: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

ODIT: Online Discrepancy Test

Theoretical Justification - Asymptotic

Asymptotic Optimality - Scalarized problem

As training set grows (N2 →∞) ODIT is asymptotically optimum for

H0 : r(x t) ∼ f k0 ,∀tH1 : r(x t) ∼ f k0 , t < τ, and r(x t) ∼ f kuni , t ≥ τ

x t independent

r(x t) kNN distance

f0(x t) > 0 Lebesgue continuous

f k0 and f kuni distributions of kNN distance under f0 and uniform distr.on a d-dimensional grid with spacing rα where

∫∞rα

f k0 (r)dr = α

Page 17: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

ODIT: Online Discrepancy Test

Sketch of the Proof

For independent x t, continuous f0 > 0 defines a non-homogeneousPoisson point process with continuous rate λ(x) > 0.

Obtain a homogeneous Poisson point process with rate k by defining ad-dimensional non-homogeneous grid with volume k/λ(x) 6

For this homogeneous Poisson point process, nearest neighbor function isgiven by

Dx (rd) = kdvd(x , r)

drde−kvd (x ,r)

Under H0, r(x t) = rt comes from f k0 which can be computed usingtraining set as Lt .

Under H1, r(x t) = rα comes from f kuni which has a single atom at rα,computed as L(K).

As training set grows, Lt → rt and L(K) → rα

The optimum CUSUM test computes logDx (rα)

Dx (rt )= kc(rdt − rdα)

6Robert Gallager. 6.262 Discrete Stochastic Processes, Chapter 2. Spring 2011.Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu.License: Creative Commons BY-NC-SA.

Page 18: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

ODIT: Online Discrepancy Test

Theoretical Justification - Nonasymptotic

CUSUM procedure can be expressed in terms of a generaldiscrepancy metric, applicable to any number sequence

stop when discrepancy g(`t)7 of observations with respect to f0 is

large enough

Discrepancy and CUSUM

Tc = mint : g(`t) ≥ hc,

`t =

[log

f1(x1)

f0(x1). . . log

f1(x t)

f0(x t)

],

g(`t) = max1≤n1≤n2≤t

n2∑i=n1

`it ,0 2 4 6 8 10 12 14 16 18 20 22

t

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

Qt

g(`t) max15j5t

Qjt

Qt =∑t

i=1 `t

7B. A. Moser et al., “On stability of distance measures for event sequences inducedby level-crossing sampling”, IEEE Trans. Signal Process., vol. 62, no. 8, pp.1987–1999, 2014.

Page 19: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

ODIT: Online Discrepancy Test

ODIT Algorithm

Initialize: s ← 0, t ← 1

Partition training set into XN1

and XN2

Determine L(K) from K -kNN

graph XN1

K

While s < h

Get new data x t andcompute Dt = Lt − L(K)

s = maxs + Dt , 0t ← t + 1

Declare anomaly

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

xij1t

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

xij

2t

L(K)

L1

L2

Training set 1Training set 2Test set

1 2 3 4 5 6 7 8 9 10

t

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

ODIT statistic, sijt

Detection threshold, h

Page 20: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Numerical Results

Numerical Results

Page 21: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Numerical Results

Simulations

f0 is a 2D independent Gaussian with zero mean and σ = 0.1f1 = 0.8f0 + 0.2U[0, 1]Training set 10, 000 points (N1 = 1000, N2 = 9000)α = 0.05, k = 1, K = αN1

Parametric clairvoyant CUSUM knows both f0 and f1 exactlyGeneralized CUSUM exactly knows f0, but estimates the uniformdistribution upper bound as 0.9

0 0.5 1 1.5 2 2.5 3 3.5 4

! log10 P (? = H1jH0)

4

4.5

5

5.5

6

6.5

Aver

age

Det

ection

Delay

ODITCUSUMG-CUSUM

Page 22: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Numerical Results

Cybersecurity in Smart GridCC

DA i

SM jSM 1 SM J

DA 1 DA N

xij1t xijkt xijKt

yijtyi1t yiJtsijt

siJtsi1t

uit, situ1t , s

1t

uNt , sNtzit

z1tzNt

vt, ut, st

SA kSA 1 SA K

Control center, 10 data aggregators, 1, 000 smartmeters, 10, 000 smart appliances

3% of the HANs are attacked. In each attacked HAN,each smart appliance is attacked with prob. 0.5

Baseline iid ∼ N (0.5, 0.12)

Attack data either∼ N (0.5, (0.1η)2), η > 1 (Jamming) or∼ N (0.5 + ∆, 0.12), ∆ ∈ R (False Data Injection)

Even a small mismatch between the actual andassumed parameter values degrade the performance ofCUSUM

1.5 2 2.5 3

Jamming noise level (nominal standard deviation)

0

20

40

60

80

100

120

140

160

180

200

Ave

rage

det

ection

delay

ODITCUSUMG-CUSUM

0 0.5 1 1.5 2 2.5

! log10 P (false alarm), False Data Injection

0

5

10

15

20

25

30

35

40

45

Averagedetectiondelay

ODITCUSUMG-CUSUM

Page 23: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Numerical Results

Human Activity Recognition

Online monitoring of a dynamic system using “Heterogeneity HumanActivity Recognition Dataset” 8 obtained from the UCI MachineLearning Repository

Smartwatch accelerometer data: 3.5Mdata points with 5 numeric features

6 activities: biking, sitting, standing,walking, stair up, and stair down

Focusing on activity transitions wetested online detection performance

G-CUSUM fits multivariate Gaussianmodels to baseline and anomalous dist.

Re-train after detecting a change inthe activity (N1 = 10, N2 = 20) 0 2000 4000 6000 8000 10000 12000

False alarm period

0

200

400

600

800

1000

1200

Ave

rage

det

ection

delay

ODITG-CUSUM

8A. Stisen et al., “Smart devices are different: Assessing and mitigating mobilesensing heterogeneities for activity recognition,” SenSys, 2015.

Page 24: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Conclusion

Conclusion

Page 25: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Conclusion

Conclusions

With the proliferation of IoT devices, and the ease of triggering DoSattacks even from unsophisticated malicious parties, there is anincreasing need for developing scalable and effective solutions.

A novel anomaly detection framework

Scalable: applicable to high-dimensional datasets (big data problems)Nonparametric: agnostic to data-type and protocolOnline system monitoringAsymptotically optimum for testing against uniformly distributedanomalies

Outperforms sequential change detector CUSUM that estimatesparameters from data

Outperforms even clairvoyant CUSUM in case of a small tomoderate variance increase (e.g., Jamming attack)

Page 26: Nonparametric Sequential Change Detection for High ...sis.eng.usf.edu/Presentations/Allerton17-slides.pdf · Training:Randomly partitions training set into two and forms K-kNN graph

Nonparametric Sequential Change Detection for High-Dimensional Problems

Conclusion

Questions?

Thank you!