Anomaly Detection with Extreme Value Theory
Transcript of Anomaly Detection with Extreme Value Theory
![Page 1: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/1.jpg)
Anomaly Detection with Extreme Value Theory
A. Siffer, P-A Fouque, A. Termier and C. LargouetApril 26, 2017
![Page 2: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/2.jpg)
Contents
Context
Providing better thresholds
Finding anomalies in streams
Application to intrusion detection
A more general framework
1
![Page 3: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/3.jpg)
Context
![Page 4: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/4.jpg)
General motivations
⊸ Massive usage of the Internet
• More and more vulnerabilities• More and more threats
⊸ Awareness of the sensitive data and infrastructures
2
![Page 5: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/5.jpg)
General motivations
⊸ Massive usage of the Internet• More and more vulnerabilities
• More and more threats
⊸ Awareness of the sensitive data and infrastructures
2
![Page 6: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/6.jpg)
General motivations
⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats
⊸ Awareness of the sensitive data and infrastructures
2
![Page 7: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/7.jpg)
General motivations
⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats
⊸ Awareness of the sensitive data and infrastructures
2
![Page 8: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/8.jpg)
General motivations
⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats
⊸ Awareness of the sensitive data and infrastructures
2
⇒ Network security :a major concern
![Page 9: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/9.jpg)
A Solution
⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks
⊸ Current methods : rule-based
• Work fine on common and well-known attacks• Cannot detect new attacks
⊸ Emerging methods : anomaly-based
• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)
3
![Page 10: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/10.jpg)
A Solution
⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks
⊸ Current methods : rule-based• Work fine on common and well-known attacks• Cannot detect new attacks
⊸ Emerging methods : anomaly-based
• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)
3
![Page 11: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/11.jpg)
A Solution
⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks
⊸ Current methods : rule-based• Work fine on common and well-known attacks• Cannot detect new attacks
⊸ Emerging methods : anomaly-based• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)
3
![Page 12: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/12.jpg)
Overview
⊸ Basic scheme
Algorithmdata alerts
⊸ Many ”standard” algorithms have been tested⊸ Complex pipelines are emerging (ensemble/hybrid techniques)
dataalerts
4
![Page 13: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/13.jpg)
Overview
⊸ Basic scheme
Algorithmdata alerts
⊸ Many ”standard” algorithms have been tested
⊸ Complex pipelines are emerging (ensemble/hybrid techniques)
dataalerts
4
![Page 14: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/14.jpg)
Overview
⊸ Basic scheme
Algorithmdata alerts
⊸ Many ”standard” algorithms have been tested⊸ Complex pipelines are emerging (ensemble/hybrid techniques)
dataalerts
4
![Page 15: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/15.jpg)
Inherent problem
⊸ Algorithms are not magic• They give some information about data (scores)
• But the decision often rely on a human choice
if score>threshold then trigger alert
⊸ The thresholds are often hard-set
• Expertise• Fine-tuning• Distribution assumption
⊸ Our idea: provide dynamic threshold with a probabilisticmeaning
5
![Page 16: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/16.jpg)
Inherent problem
⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice
if score>threshold then trigger alert
⊸ The thresholds are often hard-set
• Expertise• Fine-tuning• Distribution assumption
⊸ Our idea: provide dynamic threshold with a probabilisticmeaning
5
![Page 17: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/17.jpg)
Inherent problem
⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice
if score>threshold then trigger alert
⊸ The thresholds are often hard-set• Expertise• Fine-tuning• Distribution assumption
⊸ Our idea: provide dynamic threshold with a probabilisticmeaning
5
![Page 18: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/18.jpg)
Inherent problem
⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice
if score>threshold then trigger alert
⊸ The thresholds are often hard-set• Expertise• Fine-tuning• Distribution assumption
⊸ Our idea: provide dynamic threshold with a probabilisticmeaning
5
![Page 19: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/19.jpg)
Providing better thresholds
![Page 20: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/20.jpg)
My problem
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
⊸ How to set zq such that P(X€ > zq) < q ?
6
![Page 21: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/21.jpg)
My problem
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
⊸ How to set zq such that P(X€ > zq) < q ?6
![Page 22: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/22.jpg)
Solution 1: empirical approach
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
⊸ Drawbacks: stuck in the interval, poor resolution
7
![Page 23: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/23.jpg)
Solution 1: empirical approach
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
1− q q
zq
⊸ Drawbacks: stuck in the interval, poor resolution
7
![Page 24: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/24.jpg)
Solution 1: empirical approach
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
1− q q
zq
⊸ Drawbacks: stuck in the interval, poor resolution7
![Page 25: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/25.jpg)
Solution 2: Standard Model
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
⊸ Drawbacks: manual step, distribution assumption
8
![Page 26: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/26.jpg)
Solution 2: Standard Model
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
⊸ Drawbacks: manual step, distribution assumption
8
![Page 27: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/27.jpg)
Solution 2: Standard Model
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
zq
⊸ Drawbacks: manual step, distribution assumption
8
![Page 28: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/28.jpg)
Solution 2: Standard Model
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
zq
⊸ Drawbacks: manual step, distribution assumption8
![Page 29: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/29.jpg)
Realities
0 100 200 3000.000
0.005
0.010
0.015
16 18 20 22 240.00
0.05
0.10
0.15
0.20
20 40 60 80 1000.000
0.005
0.010
0.015
0.020
0 25 50 750.00
0.01
0.02
0.03
⊸ Different clients and/or temporal drift
9
![Page 30: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/30.jpg)
Results
Properties Empirical quantile Standard modelstatistical guarantees Yes Yes
easy to adapt Yes Nohigh resolution No Yes
10
![Page 31: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/31.jpg)
Inspection of extreme events
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
Probability estimation ?
11
![Page 32: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/32.jpg)
Inspection of extreme events
0 25 50 75 100 125 150 175 200Daily payment by credit card ( )
0.000
0.005
0.010
0.015
0.020
0.025
Freq
uenc
y
Probability estimation ?
11
![Page 33: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/33.jpg)
Extreme Value Theory
⊸ Main result (Fisher-Tippett-Gnedenko, 1928)
The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)
0 1 2 3 4 5 6x
0.0
0.1
0.2
0.3
0.4
0.5
0.6
(X>
x)
heavy tailexponential tailbounded tail
12
![Page 34: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/34.jpg)
Extreme Value Theory
⊸ Main result (Fisher-Tippett-Gnedenko, 1928)
The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)
0 1 2 3 4 5 6x
0.0
0.1
0.2
0.3
0.4
0.5
0.6
(X>
x)
heavy tailexponential tailbounded tail
12
![Page 35: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/35.jpg)
Extreme Value Theory
⊸ Main result (Fisher-Tippett-Gnedenko, 1928)
The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)
0 1 2 3 4 5 6x
0.0
0.1
0.2
0.3
0.4
0.5
0.6
(X>
x)
heavy tailexponential tailbounded tail
12
![Page 36: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/36.jpg)
An impressive analogy
⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with
Sn =n∑i=1
Xi Mn = max1≤i≤n
(Xi)
⊸ Central Limit TheoremSn − nµ√n
d−→ N (0, σ2)
⊸ FTG Theorem Mn − anbn
d−→ EVD(γ)
13
![Page 37: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/37.jpg)
An impressive analogy
⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with
Sn =n∑i=1
Xi Mn = max1≤i≤n
(Xi)
⊸ Central Limit TheoremSn − nµ√n
d−→ N (0, σ2)
⊸ FTG Theorem Mn − anbn
d−→ EVD(γ)
13
![Page 38: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/38.jpg)
An impressive analogy
⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with
Sn =n∑i=1
Xi Mn = max1≤i≤n
(Xi)
⊸ Central Limit TheoremSn − nµ√n
d−→ N (0, σ2)
⊸ FTG Theorem Mn − anbn
d−→ EVD(γ)
13
![Page 39: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/39.jpg)
A more practical result
⊸ Second theorem of EVT (Pickands-Balkema-de Haan, 1974)
The excesses over a high threshold follow a Generalized ParetoDistribution (with parameters γ, σ)
⊸ What does it imply ?
• we have a model for extreme events• we can compute zq for q as small as desired
14
![Page 40: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/40.jpg)
A more practical result
⊸ Second theorem of EVT (Pickands-Balkema-de Haan, 1974)
The excesses over a high threshold follow a Generalized ParetoDistribution (with parameters γ, σ)
⊸ What does it imply ?• we have a model for extreme events• we can compute zq for q as small as desired
14
![Page 41: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/41.jpg)
How to use EVT
⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t
when Xkj > t
⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q
EVT
q
X1, X2 . . . Xn zq
15
![Page 42: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/42.jpg)
How to use EVT
⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t
when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)
⊸ Compute zq such as P(X > zq) < q
EVT
q
X1, X2 . . . Xn zq
15
![Page 43: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/43.jpg)
How to use EVT
⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t
when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q
EVT
q
X1, X2 . . . Xn zq
15
![Page 44: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/44.jpg)
How to use EVT
⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t
when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q
EVT
q
X1, X2 . . . Xn zq
15
![Page 45: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/45.jpg)
Finding anomalies in streams
![Page 46: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/46.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 47: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/47.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 48: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/48.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 49: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/49.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 50: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/50.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 51: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/51.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 52: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/52.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 53: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/53.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 54: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/54.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 55: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/55.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 56: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/56.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 57: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/57.jpg)
Streaming Peaks-Over-Threshold (SPOT) algorithm
(initial batch)
X1, X2 . . . Xn Calibration
q
0
0.10
0.20
0.30
20 40 60 80 100 120
t zq
(stream)
Xi>n Xi > zq
trigger alarmyes
no Xi > t
yesupdate model
no drop
16
![Page 58: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/58.jpg)
Can we trust that threshold zq ?
⊸ An example with ground truth : a Gaussian White Noise• 40 streams with 200 000 iid variables drawn from N (0, 1)• q = 10−3 ⇒ theoretical threshold zth ≃ 3.09
⊸ Averaged relative error
0 50000 100000 150000 200000Number of observations
0.01
0.02
0.03
0.04
Rela
tive
erro
r
17
![Page 59: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/59.jpg)
Can we trust that threshold zq ?
⊸ An example with ground truth : a Gaussian White Noise• 40 streams with 200 000 iid variables drawn from N (0, 1)• q = 10−3 ⇒ theoretical threshold zth ≃ 3.09
⊸ Averaged relative error
0 50000 100000 150000 200000Number of observations
0.01
0.02
0.03
0.04
Rela
tive
erro
r
17
![Page 60: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/60.jpg)
Application to intrusion detection
![Page 61: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/61.jpg)
About the data
⊸ Lack of relevant public datasets to test the algorithms ...
⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI
• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]
⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)
18
![Page 62: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/62.jpg)
About the data
⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]
⊸ We rather use MAWI
• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]
⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)
18
![Page 63: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/63.jpg)
About the data
⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI
• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]
⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)
18
![Page 64: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/64.jpg)
About the data
⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI
• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]
⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)
18
![Page 65: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/65.jpg)
An example to detect network syn scan
⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)
0.0
0.2
0.4
0.6
0.8
ratio
of S
YN p
acke
ts in
50m
s tim
e wi
ndow
⊸ Goal: find peaks
19
![Page 66: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/66.jpg)
An example to detect network syn scan
⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)
0.0
0.2
0.4
0.6
0.8ra
tio o
f SYN
pac
kets
in 5
0ms t
ime
wind
ow
⊸ Goal: find peaks
19
![Page 67: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/67.jpg)
An example to detect network syn scan
⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)
0.0
0.2
0.4
0.6
0.8ra
tio o
f SYN
pac
kets
in 5
0ms t
ime
wind
ow
⊸ Goal: find peaks19
![Page 68: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/68.jpg)
SPOT results
⊸ Parameters : q = 10−4,n = 2000 (from the previous day record)
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)
0.0
0.2
0.4
0.6
0.8
ratio
of S
YN p
acke
ts in
50m
s tim
e wi
ndow
20
![Page 69: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/69.jpg)
SPOT results
⊸ Parameters : q = 10−4,n = 2000 (from the previous day record)
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)
0.0
0.2
0.4
0.6
0.8ra
tio o
f SYN
pac
kets
in 5
0ms t
ime
wind
ow
20
![Page 70: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/70.jpg)
Do we really flag scan attacks ?
⊸ The main parameter q: a False Positive regulator
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
FPr (%)
0.0
20.0
40.0
60.0
80.0
100.0
TPr
(%)
9. 10−6
1. 10−5 5. 10−5
8. 10−5
8. 10−6
6. 10−3
⊸ 86% of scan flows detected with less than 4% of FP
21
![Page 71: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/71.jpg)
Do we really flag scan attacks ?
⊸ The main parameter q: a False Positive regulator
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
FPr (%)
0.0
20.0
40.0
60.0
80.0
100.0TPr
(%)
9. 10−6
1. 10−5 5. 10−5
8. 10−5
8. 10−6
6. 10−3
⊸ 86% of scan flows detected with less than 4% of FP
21
![Page 72: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/72.jpg)
Do we really flag scan attacks ?
⊸ The main parameter q: a False Positive regulator
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
FPr (%)
0.0
20.0
40.0
60.0
80.0
100.0TPr
(%)
9. 10−6
1. 10−5 5. 10−5
8. 10−5
8. 10−6
6. 10−3
⊸ 86% of scan flows detected with less than 4% of FP21
![Page 73: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/73.jpg)
A more general framework
![Page 74: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/74.jpg)
SPOT Specifications
⊸ A single main parameter q• With a probabilistic meaning → P(X > zq) < q• False Positive regulator
⊸ Stream capable
• Incremental learning• Fast (∼ 1000 values/s)• Low memory usage (only the excesses)
22
![Page 75: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/75.jpg)
SPOT Specifications
⊸ A single main parameter q• With a probabilistic meaning → P(X > zq) < q• False Positive regulator
⊸ Stream capable• Incremental learning• Fast (∼ 1000 values/s)• Low memory usage (only the excesses)
22
![Page 76: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/76.jpg)
Other things ?
⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies
⊸ But it could be adapted to
• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT
23
![Page 77: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/77.jpg)
Other things ?
⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies
⊸ But it could be adapted to
• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT
23
![Page 78: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/78.jpg)
Other things ?
⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies
⊸ But it could be adapted to• compute upper and lower thresholds
• other fields• drifting contexts (with an additional parameter) → DSPOT
23
![Page 79: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/79.jpg)
Other things ?
⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies
⊸ But it could be adapted to• compute upper and lower thresholds• other fields
• drifting contexts (with an additional parameter) → DSPOT
23
![Page 80: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/80.jpg)
Other things ?
⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies
⊸ But it could be adapted to• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT
23
![Page 81: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/81.jpg)
A recent example
⊸ Thursday the 9th of February 2017
• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF
⊸ What about the EDF stock prices ?
24
![Page 82: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/82.jpg)
A recent example
⊸ Thursday the 9th of February 2017
• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF
⊸ What about the EDF stock prices ?
24
![Page 83: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/83.jpg)
A recent example
⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant
• 11h : official declaration of the incident by EDF
⊸ What about the EDF stock prices ?
24
![Page 84: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/84.jpg)
A recent example
⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF
⊸ What about the EDF stock prices ?
24
![Page 85: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/85.jpg)
A recent example
⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF
⊸ What about the EDF stock prices ?
24
![Page 86: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/86.jpg)
EDF stock prices
09:02
09:42
10:34
11:32
12:19
13:30
14:43
15:34
16:21
17:14
Time
9.05
9.10
9.15
9.20
9.25
9.30
9.35
9.40ED
F st
ock
price
()
25
![Page 87: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/87.jpg)
EDF stock prices
09:02
09:42
10:34
11:32
12:19
13:30
14:43
15:34
16:21
17:14
Time
9.05
9.10
9.15
9.20
9.25
9.30
9.35
9.40ED
F st
ock
price
()
25
![Page 88: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/88.jpg)
Conclusion
⊸ Context: A great deal of work has been done to developanomaly detection algorithms
⊸ Problem: Decision thresholds rely on either distributionassumption or expertise
⊸ Our solution: Building dynamic threshold with a probabilisticmeaning
• Application to detect network anomalies• But a general tool to monitor online time series in a blind way
⊸ Future: Adapt the method to higher dimensions
26
![Page 89: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/89.jpg)
Conclusion
⊸ Context: A great deal of work has been done to developanomaly detection algorithms
⊸ Problem: Decision thresholds rely on either distributionassumption or expertise
⊸ Our solution: Building dynamic threshold with a probabilisticmeaning
• Application to detect network anomalies• But a general tool to monitor online time series in a blind way
⊸ Future: Adapt the method to higher dimensions
26
![Page 90: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/90.jpg)
Conclusion
⊸ Context: A great deal of work has been done to developanomaly detection algorithms
⊸ Problem: Decision thresholds rely on either distributionassumption or expertise
⊸ Our solution: Building dynamic threshold with a probabilisticmeaning
• Application to detect network anomalies• But a general tool to monitor online time series in a blind way
⊸ Future: Adapt the method to higher dimensions
26
![Page 91: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/91.jpg)
Conclusion
⊸ Context: A great deal of work has been done to developanomaly detection algorithms
⊸ Problem: Decision thresholds rely on either distributionassumption or expertise
⊸ Our solution: Building dynamic threshold with a probabilisticmeaning
• Application to detect network anomalies• But a general tool to monitor online time series in a blind way
⊸ Future: Adapt the method to higher dimensions
26
![Page 92: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/92.jpg)
Conclusion
⊸ Context: A great deal of work has been done to developanomaly detection algorithms
⊸ Problem: Decision thresholds rely on either distributionassumption or expertise
⊸ Our solution: Building dynamic threshold with a probabilisticmeaning
• Application to detect network anomalies
• But a general tool to monitor online time series in a blind way
⊸ Future: Adapt the method to higher dimensions
26
![Page 93: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/93.jpg)
Conclusion
⊸ Context: A great deal of work has been done to developanomaly detection algorithms
⊸ Problem: Decision thresholds rely on either distributionassumption or expertise
⊸ Our solution: Building dynamic threshold with a probabilisticmeaning
• Application to detect network anomalies• But a general tool to monitor online time series in a blind way
⊸ Future: Adapt the method to higher dimensions
26
![Page 94: Anomaly Detection with Extreme Value Theory](https://reader035.fdocuments.in/reader035/viewer/2022071518/613c597ef237e1331c51192b/html5/thumbnails/94.jpg)
Conclusion
⊸ Context: A great deal of work has been done to developanomaly detection algorithms
⊸ Problem: Decision thresholds rely on either distributionassumption or expertise
⊸ Our solution: Building dynamic threshold with a probabilisticmeaning
• Application to detect network anomalies• But a general tool to monitor online time series in a blind way
⊸ Future: Adapt the method to higher dimensions
26