Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding...

40
17 th IEEE International Conference on Machine Learning and Applications Unsupervised Anomaly Detection in Energy Time Series Data using Variational Recurrent Autoencoders with Attention Jo˜ ao Pereira & Margarida Silveira Signal and Image Processing Group Institute for Systems and Robotics Instituto Superior T´ ecnico (Lisbon, Portugal) Orlando, Florida, USA December 19 th , 2018

Transcript of Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding...

Page 1: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

17th IEEE International Conference on Machine Learning and Applications

Unsupervised Anomaly Detection in Energy Time Series Data

using Variational Recurrent Autoencoders with Attention

Joao Pereira & Margarida Silveira

Signal and Image Processing GroupInstitute for Systems and Robotics

Instituto Superior Tecnico (Lisbon, Portugal)

Orlando, Florida, USA

December 19th, 2018

Page 2: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Introduction

Anomaly detection is about finding patterns in data that do notconform to expected or normal behaviour.

AnomalyDetection

Communications

NetworkIntrusion

NetworkKPIs

monitoring

Robotics

AssistedFeeding

AutomatedInspection

Energy

SmartMaintenance

FaultDetection

Healthcare

MedicalDiagnostics

PatientMonitoring Security

VideoSurveillance

FraudDetection

Joao Pereira ICMLA’18 2

Page 3: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Introduction

Anomaly detection is about finding patterns in data that do notconform to expected or normal behaviour.

AnomalyDetection

Communications

NetworkIntrusion

NetworkKPIs

monitoring

Robotics

AssistedFeeding

AutomatedInspection

Energy

SmartMaintenance

FaultDetection

Healthcare

MedicalDiagnostics

PatientMonitoring Security

VideoSurveillance

FraudDetection

Joao Pereira ICMLA’18 3

Page 4: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Main Challenges

I Most data in the world are unlabelled

Dataset D =(

x(i),y∗(i))N

i=1

anomaly labels

I Annotating large datasets is difficult, time-consuming andexpensive

Normal

Anomalous

I Time series have temporal structure/dependencies

x =(x1,x2, ...,xT

), xt ∈ Rdx

Joao Pereira ICMLA’18 4

Page 5: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

The Principle in a Nutshell

I Based on a Variational Autoencoder12

Inpute.g., time series x

x = (x1,x2, ...,xT )

xt ∈ Rdx

qφ(z|x)Encoder

µzσz

z

Latent Space (z ∈ Rdz)

z = µz + σzε

Decoderpθ(x|z)

Reconstruction(µxt, bxt

)Tt=1

I Train a VAE on data with mostly normal patterns;

I It reconstructs well normal data, while it fails to reconstructanomalous data;

I The quality of the reconstructions is used as anomaly score.

1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’142Rezende et al., Stochastic Backpropagation and Approximate Inference in Deep Generative Models, ICML’14

Joao Pereira ICMLA’18 5

Page 6: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

The Principle in a Nutshell

I Based on a Variational Autoencoder12

Inpute.g., time series x

x = (x1,x2, ...,xT )

xt ∈ Rdx

qφ(z|x)Encoder

µzσz

z

Latent Space (z ∈ Rdz)

z = µz + σzε

Decoderpθ(x|z)

Reconstruction(µxt, bxt

)Tt=1

I Train a VAE on data with mostly normal patterns;

I It reconstructs well normal data, while it fails to reconstructanomalous data;

I The quality of the reconstructions is used as anomaly score.

1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’142Rezende et al., Stochastic Backpropagation and Approximate Inference in Deep Generative Models, ICML’14

Joao Pereira ICMLA’18 5

Page 7: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Proposed Approach

ReconstructionModel

Detection

Page 8: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Proposed Approach

ReconstructionModel

Detection

Page 9: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xT

xt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent SpaceCombines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

DecoderReconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 10: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xTxt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent SpaceCombines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

DecoderReconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 11: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xT

xt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent SpaceCombines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

DecoderReconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 12: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xT

xt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent SpaceCombines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

DecoderReconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 13: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xT

xt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent Space

Combines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

DecoderReconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 14: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xT

xt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent Space

Combines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

DecoderReconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 15: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xT

xt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent SpaceCombines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

Decoder

Reconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 16: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xT

xt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent SpaceCombines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

Decoder

Reconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 17: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Reconstruction Model

EncoderBi-LSTM

−→h e

1

←−h e

1

−→h e

2

←−h e

2

−→h e

3

←−h e

3

−→h eT

←−h eT

+n +n +n +n

x1 x2 x3 xT

xt ∈ Rdx

Input sequence

• • •

• • •

µz

σz

z

−→h d

1

←−h d

1

−→h d

2

←−h d

2

−→h d

3

←−h d

3

−→h dT

←−h dT

• • •

• • •

µx1 bx1 µx2 bx2 µx3 bx3µxT bxT

DecoderBi-LSTM

Variational Layer

Reconstruction

cdet1 cdet2 cdet3 cdetT

a11 a12a13a1T

Linear

SoftPlus

µc1 σc1 µc2 σc2 µc3 σc3 µcT σcT

c1 c2 c3 cT

VariationalSelf-Attention

Mechanism

Corruption

Corruption process: additive Gaussian noise

p(x|x) = x + n , n ∼ Normal(0, σ2nI)

Vincent et al., Extracting and Composing Robust Features with Denoising Autoencoders, ICML’08

Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, ICLR’15

Denoising Autoencoding Criterion

Bidirectional Long-Short Term Memory network

ht =[−→h t;←−h t

]

I 256 units, 128 in each direction

I Sparse regularization, Ω(z) = λ∑dzi=1 |zi|

Hochreiter et al., Long-Short Term Memory, Neural Computation’97

Graves et al., Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition, ICANN’05

Learning temporal dependencies

Variational parameters derived using neural networks

(µz,σz) = Encoder(x)

Sample from the approximate posterior qφ(z|x)

z = µz + σz ε ε ∼ Normal(0, I)

Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14

Variational Latent SpaceCombines self-attention with variational inference.

cdett =

T∑

j=1

atjhj (µct,σct) = NN(cdett ), ct ∼ Normal(µct,σ

2ctI)

Vaswani et al., Attention is All You Need, NIPS’17

Variational Self-Attention

Another Bi-LSTM.

Decoder ≡ Bi-LSTM([z; c])

DecoderReconstruction parameters:(µxt, bxt

)Tt=1

Laplace log-likelihood:

log p(xt|zl) ∝ ‖xt − µxt‖1

Output

Joao Pereira ICMLA’18 7

Page 18: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Loss Function

L(θ, φ;x(n)) =

Reconstruction Term︷ ︸︸ ︷−Ez∼qφ(z|x(n)), ct∼qaφ(ct|x

(n))

[log pθ

(x(n)|z, c

)]

+ λKL

[DKL

(qφ(z|x(n))‖pθ(z)

)

︸ ︷︷ ︸Latent Space KL loss

+ ηT∑

t=1

DKL

(qaφ(ct|x(n))‖pθ(ct)

)

︸ ︷︷ ︸Attention KL loss

]

Joao Pereira ICMLA’18 8

Page 19: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Loss Function

L(θ, φ;x(n)) = −Ez∼qφ(z|x(n)), ct∼qaφ(ct|x(n))

[log pθ

(x(n)|z, c

)]

L(θ, φ;x(n)) =

Reconstruction Term︷ ︸︸ ︷−Ez∼qφ(z|x(n)), ct∼qaφ(ct|x

(n))

[log pθ

(x(n)|z, c

)]

+ λKL

[DKL

(qφ(z|x(n))‖pθ(z)

)+ η

T∑

t=1

DKL

(qaφ(ct|x(n))‖pθ(ct)

)]

+ λKL

[DKL

(qφ(z|x(n))‖pθ(z)

)

︸ ︷︷ ︸Latent Space KL loss

+ η

T∑

t=1

DKL

(qaφ(ct|x(n))‖pθ(ct)

)

︸ ︷︷ ︸Attention KL loss

]

DKL denotes the Kullback-Leibler Divergence

Joao Pereira ICMLA’18 8

Page 20: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Loss Function

L(θ, φ;x(n)) =

Reconstruction Term︷ ︸︸ ︷−Ez∼qφ(z|x(n)), ct∼qaφ(ct|x

(n))

[log pθ

(x(n)|z, c

)]

+ λKL

[DKL

(qφ(z|x(n))‖pθ(z)

)

︸ ︷︷ ︸Latent Space KL loss

+ η

T∑

t=1

DKL

(qaφ(ct|x(n))‖pθ(ct)

)

︸ ︷︷ ︸Attention KL loss

]

DKL denotes the Kullback-Leibler Divergence

Joao Pereira ICMLA’18 8

Page 21: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Training Framework

Optimization & Regularization

I About 270k parameters to optimize

I AMS-Grad optimizer3

I Xavier weight initialization4

I Denoising autoencoding criterion5

I Sparse regularization in the encoder Bi-LSTM6

I KL cost annealing7

I Gradient clipping8

Training executed on a single GPU (NVIDIA GTX 1080 TI)

3Reddi, Kale & Kumar, On the Convergence of Adam and Beyond, ICLR’184Bengio et al., Understanding the Difficulty of Training Deep Feedforward Neural Networks, AISTATS’105Bengio et al., Denoising Criterion for Variational Auto-Encoding Framework, AAAI’176Arpit et al., Why Regularized Auto-Encoders Learn Sparse Representation?, ICML’167Bowman, Vinyals et al., Generating Sentences from a Continuous Space, SIGNLL’168Bengio et al., On the Difficulty of Training Recurrent Neural Networks, ICML’13

Joao Pereira ICMLA’18 9

Page 22: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Proposed Approach

ReconstructionModel

ReconstructionModel

Detection

Page 23: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Proposed Approach

ReconstructionModel

ReconstructionModel

Detection

Page 24: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Anomaly Scores

I Use the reconstruction error as anomaly score.

Anomaly Score = Ezl∼qφ(z|x)

[∥∥x− E[pθ(x|zl)]︸ ︷︷ ︸µx

∥∥1

]

I Take the variability of the reconstructions into account.

Anomaly Score = −Ezl∼qφ(z|x)

[log p(x|zl)

]

︸ ︷︷ ︸”Reconstruction Probability”

µ

qφ(z|x) = Normal(µz,σ2zI)

qφ(z|x) = Normal(µz,σ2zI)

zl ∼ qφ(z|x)

1

Joao Pereira ICMLA’18 11

Page 25: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Anomaly Scores

I Use the reconstruction error as anomaly score.

Anomaly Score = Ezl∼qφ(z|x)

[∥∥x− E[pθ(x|zl)]︸ ︷︷ ︸µx

∥∥1

]

I Take the variability of the reconstructions into account.

Anomaly Score = −Ezl∼qφ(z|x)

[log p(x|zl)

]

︸ ︷︷ ︸”Reconstruction Probability”

µ

qφ(z|x) = Normal(µz,σ2zI)

qφ(z|x) = Normal(µz,σ2zI)

zl ∼ qφ(z|x)

1

Joao Pereira ICMLA’18 11

Page 26: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Anomaly Scores

I Use the reconstruction error as anomaly score.

Anomaly Score = Ezl∼qφ(z|x)

[∥∥x− E[pθ(x|zl)]︸ ︷︷ ︸µx

∥∥1

]

I Take the variability of the reconstructions into account.

Anomaly Score = −Ezl∼qφ(z|x)

[log p(x|zl)

]

︸ ︷︷ ︸”Reconstruction Probability”

µ

qφ(z|x) = Normal(µz,σ2zI)

qφ(z|x) = Normal(µz,σ2zI)

zl ∼ qφ(z|x)

1

Joao Pereira ICMLA’18 11

Page 27: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Experiments & Results

Page 28: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Time Series Data

Solar PV Generation

0 96 192 288 384Samples

0.00

0.25

0.50

0.75

En

ergy

(Production in a day without clouds)

I Provided by

I Recorded every 15min (96 samples per day)

I Data normalized to the installed capacity

I Daily seasonality

I Unlabelled

Joao Pereira ICMLA’18 13

Page 29: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Variational Latent Space

z-space in 2D (X normaltrain )

t-SNE PCA

T = 32 (< 96)dz = 3

online mode

Joao Pereira ICMLA’18 14

Page 30: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Finding Anomalies in the Latent Space

Z

Cloudy Day

Snow

Spike

Inverter Fault

Brief Shading

Normal

Joao Pereira ICMLA’18 15

Page 31: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Anomaly Scores

Reconstruction Error

(top bar)

−Reconstruction Probability

(bottom bar)

0.0

0.5

1.0E

ner

gy

Ground Truth Brief Shading

0

1

En

ergy

Inverter Fault Spike

0.0

0.5

1.0

En

ergy

Snow Cloudy Day

HighLow

Joao Pereira ICMLA’18 16

Page 32: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Attention Visualization

Attention Map

x1 x2 x3

. . .

xT

+

a11 a12 a13 a1T

c1 . . .

Hidden

States

Context

Vectors

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

10−3 10−2 10−1 100

Attention Weights

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

Joao Pereira ICMLA’18 17

Page 33: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Attention Visualization

Attention Map

x1 x2 x3

. . .

xT

+

a11 a12 a13 a1T

c1 . . .a11 a12 a13 a1T

Hidden

States

Context

Vectors

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

10−3 10−2 10−1 100

Attention Weights

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

Joao Pereira ICMLA’18 17

Page 34: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Attention Visualization

Attention Map

x1 x2 x3

. . .

xT

+

a21a22 a23

a2T

c2 . . .a11 a12 a13 a1T

a21 a22 a23 a2T

c1

Hidden

States

Context

Vectors

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

10−3 10−2 10−1 100

Attention Weights

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

Joao Pereira ICMLA’18 17

Page 35: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Attention Visualization

Attention Map

x1 x2 x3

. . .

xT

+

a31a32 a33

a3T

c3 . . .a11 a12 a13 a1T

a21 a22 a23 a2T

c1

a31 a32 a33 a3T

c2

Hidden

States

Context

Vectors

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

10−3 10−2 10−1 100

Attention Weights

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

Joao Pereira ICMLA’18 17

Page 36: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Attention Visualization

Attention Map Examples

x1 x2 x3

. . .

xT

+

aT1

aT2 aT3aTT

cT. . .a11 a12 a13 a1T

a21 a22 a23 a2T

c1

a31 a32 a33 a3T

c2

aT1 aT2 aT3 aTT

c3

Hidden

States

Context

Vectors

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

10−3 10−2 10−1 100

Attention Weights

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

Joao Pereira ICMLA’18 17

Page 37: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Attention Visualization

Attention Map Examples

x1 x2 x3

. . .

xT

+

aT1

aT2 aT3aTT

cT. . .a11 a12 a13 a1T

a21 a22 a23 a2T

c1

a31 a32 a33 a3T

c2

aT1 aT2 aT3 aTT

c3

Hidden

States

Context

Vectors

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

0 4 8 12 16 20 24

Input Timestep [h]

0

4

8

12

16

20

24

Ou

tpu

tT

imes

tep

[h]

10−3 10−2 10−1 100

Attention Weights

0 4 8 12 16 20 24

Time [h]

0.0

0.2

0.4

0.6

0.8

En

ergy

Joao Pereira ICMLA’18 17

Page 38: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Conclusions & Future Work

I Effective on detecting anomalies in time series data;

I Unsupervised;

I Suitable for both univariate and multivariate data;

I Efficient: inference and anomaly scores computation is fast;

I General: works with other kinds of sequential data (e.g.,text, videos);

I Exploit the usefulness of the attention maps for detection;

I Make it robust to changes of the normal pattern over time.

Joao Pereira ICMLA’18 18

Page 39: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Acknowledgements

Joao Pereira ICMLA’18 19

Page 40: Unsupervised Anomaly Detection in Energy Time Series Data ... · 1Kingma & Welling, Auto-Encoding Variational Bayes, ICLR’14 2 Rezende et al., Stochastic Backpropagation and Approximate

Acknowledgements

Thank you for your attention!

[email protected]

www.joao-pereira.pt

Joao Pereira ICMLA’18 20