Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model · 2020-06-16 · ANOMALY DETECTION...

ANOMALY DETECTION FOR TIME SERIES USING VAE-LSTM HYBRID MODEL

Shuyu Lin1, Ronald Clark2, Robert Birke3, Sandro Schonborn3, Niki Trigoni1, Stephen Roberts1

1 University of Oxford, Oxford OX1 2JD, UK2 Imperial College London, South Kensington, London SW7 2AZ, UK

3 ABB Future Labs, Segelhofstrasse 1K, 5404 Baden-Dattwil, Switzerland

ABSTRACT

In this work, we propose a VAE-LSTM hybrid model as anunsupervised approach for anomaly detection in time series.Our model utilizes both a VAE module for forming robustlocal features over short windows and a LSTM module forestimating the long term correlation in the series on top ofthe features inferred from the VAE module. As a result, ourdetection algorithm is capable of identifying anomalies thatspan over multiple time scales. We demonstrate the effective-ness of our detection algorithm on five real world problemsand find our method outperforms three other commonly useddetection methods.

Index Terms— Anomaly Detection, Time Series, DeepLearning, Unsupervised Learning

1. INTRODUCTION

Anomaly detection for time series is concerned with detect-ing unexpected system behaviours across time to provide in-formative insights. In many industrial applications, anomalydetection is used for monitoring sensor failures, alerting usersof external attacks and detecting potential catastrophic eventsat an early stage [1]. Despite all the benefits, designing agood anomaly detection algorithm is extremely challenging.This is because the training data are often unbalanced withvery few labelled anomalies. Furthermore, most anomalousbehaviours are not known a priori and a good anomaly algo-rithm is expected to be able to detect even unseen anomalies.Due to these constraints, anomaly detection algorithms oftenhave to be trained in an unsupervised fashion.

Broadly, we may characterize three types of anomaliesthat commonly occur in time series: namely point, contextand collective anomalies [2]. Of the three types, point anoma-lies are the easiest to detect as data points can be treated in-dependently during detection and no temporal relationshipsneed to be considered. For this reason, simple threshold ap-proaches or multi-layer perceptron (MLP) based methods [3]work relatively well for point anomalies. Context and collec-tive anomalies, on the contrary, are more challenging. Con-text anomalies depend on the value of surrounding data pointsand thus local information is required for their detection and

convolutional neural networks (CNNs) with larger receptivefields have been shown to work well in this case [4]. Collec-tive anomalies occur when a series of data points together ex-hibits abnormal behaviours. As collective anomalies alwaysoccur in sequence over a reasonably long period, recurrentneural networks (RNNs), have been shown to be effective [5].However, although a number of successful approaches havebeen proposed, none of the existing methods work well forall anomaly types.

In this paper, we propose a hybrid anomaly detectionmethod that combines the representation learning power of adeep generative model - in the form of a variational autoen-coder (VAE) - with the temporal modelling ability of a longshort-term memory RNN (LSTM), as shown in Figure 1. Viathe VAE module our model aims to capture the structuralregularities of the time series over local windows, while theLSTM module attempts to model the longer term trend. Boththe VAE and LSTM units do not require labelled anomaliesfor training. The code of our algorithm and experimentsincluded in this paper is available at https://github.com/lin-shuyu/VAE-LSTM-for-anomaly-detection. In summary, ourcontributions are:

• We utilize a VAE model to summarize the local infor-mation of a short window into a low-dimensional em-bedding.• We utilize a LSTM model, which acts on the low-

dimensional embeddings produced by the VAE model,to manage the sequential patterns over longer term.• The hierarchical structure allows us to detect anoma-

lies occurring over both short and long periods.

The remainder of the paper is structured as follows. Wefirst briefly outline VAE and LSTM models and ways inwhich they have been used for anomaly detection. We thenpresent our hybrid VAE-LSTM model, followed by detectionresults given by our and other methods on real world timeseries. Finally we conclude with avenues for future research.

2. BACKGROUND AND RELATED WORK

In this section, we provide an overview of two machine learn-ing models, namely VAEs and LSTMs, which serve as major

4322978-1-5090-6631-5/20/$31.00 ©2020 IEEE ICASSP 2020

Authorized licensed use limited to: University of Oxford Libraries. Downloaded on June 16,2020 at 14:15:34 UTC from IEEE Xplore. Restrictions apply.

building blocks for our anomaly detection algorithm. We alsorelate to existing anomaly detection algorithms.

VAE: VAEs [6, 7] are a type of generative probabilisticmodel known for learning embedding schemes that can inferthe generation factors for the majority of the training data.This makes VAEs extremely suited for modelling the normalbehaviours in an anomaly detection task. As a result, VAEshave been used for anomaly detection in various works withpromising results [8, 9, 10, 11]. However, anomaly detectionalgorithms based only on VAEs often fail in detecting long-term anomalies, as VAE models cannot analyse informationbeyond a short local window. Our approach overcomes thislimitation by using VAEs as local feature extractor and cou-pling it with a LSTM module to take care of long-term trends.

LSTM: LSTMs are a type of RNN that can capture longterm dependencies in the input. This makes them ideallysuited for our task where anomalies occur infrequently. Re-searchers have explored the idea of using RNN models foranomaly detection [5, 12]. Our method differs from those ap-proaches in the way that our LSTM module is not applied toraw samples but to embeddings that represent a local window.Such setup makes our algorithm capable of ignoring redun-dant raw samples and tracking events over longer terms.

Hybrid: Hybrid models are a common approach for videoanalysis where a representation learning module is used todistil spatial information in a single image frame and a se-quential module is applied to model the temporal correlationacross a series of frames [13, 14]. [15] applies such a hybridmodel to detect rare events in a surveillance video clip. Themajor difference between our approach and the hybrid modelsin video applications is that the representation learning mod-ule for videos is applied to an image, i.e. a data point at asingle time stamp, whereas our representation learning mod-ule for time series processes a sequence of data points overa short period of time to form a local temporal basis for thelatter sequential module to build upon.

3. OUR MODEL

Given a time series X = {x1,x2, · · · ,xN}, where xi ∈Rm is a m-dimensional reading at the i-th time stamp thatcontains information about m different channels, we formu-late the anomaly detection task as follows. At time t (L ≤t ≤ N), we are allowed to use a sequence of L past read-ings, i.e. St = [xt−L+1, · · · ,xt], to predict a binary outputyt ∈ {0, 1} with 1 indicating an anomaly has occurred in St.Such formulation allows our algorithm for online detection.Figure 1 gives an overview of our detection algorithm, whichconsists of a VAE module for extracting local features of ashort window and a LSTM module for estimating the long-term trends. In this section, we will first introduce how thesetwo modules are trained in an unsupervised manner and thenexplain how our algorithm is used for anomaly detection.

t-k+1

Time

Encoder

Embedding

LSTM

PredictedEmbedding

Decoder

ReconstructedWindow

PredictionError

AnomalyDetection

Score

Original Window

=

t-k+2

t

t

t-k+3t-k+2

Fig. 1. Our VAE-LSTM model detects anomalies over a se-quence of k consecutive windows of a given time series. i-thwindow wi is encoded into a low-dimensional embedding ei,which is fed into a LSTM model to predict the next window’sembedding ei+1. The predicted embedding is then decodedto reconstruct the original window wi+1. The reconstructionerror serves as our anomaly detection score.

3.1. Training VAE-LSTM Models

To train our VAE-LSTM models in an unsupervised manner,we first need to separate a training and a test set from the giventime series. An example of training-test set separation is givenin Figure 2, where we take a continuous segment of the giventime series which contains no anomalies as the training dataand the rest of the time series with anomalies is kept as thetest data for evaluation.

The VAE model consists of an encoder and a decoder.It takes a local window of p consecutive readings as input,estimates a low-dimensional embedding of q dimensionsthrough the encoder and reconstructs the original windowthrough the decoder. To train the VAE model, we gen-erate rolling windows from the training data. For exam-ple, wt = [xt−p+1, · · · ,xt] denotes the window endingat time t. The LSTM model operates on the VAE embed-dings of a sequence of k non-overlapping windows. We useWt = [wt−(k−1)×p,wt−(k−2)×p, · · · ,wt] to denote the se-quence of windows ending at time t and Et = [e1t , · · · , ekt ]to denote the corresponding embeddings in Wt with eit in-dicating the embedding of the i-th window in Wt. For atraining data of Ntrain readings, we can generate Ntrain − prolling windows for training the VAE model and Ntrain − pkrolling sequences for training the LSTM model. We reservea randomly drawn 10% of the generated sequences from thetraining data as a validation set and all the windows andsequences in the validation set are excluded from training.

With the remaining windows in the training set, we opti-mise the VAE model parameters to maximise the ELBO lossdefined in [16]. After the VAE model has been optimised, weuse the encoder from the trained VAE model to estimate all

4323


0 2000-4

-3

-2

-1

0

1

2

3

4

4000 6000 8000 10000Timestamp (every 30 min)

norm

alis

ed

read

ing

s

Number of taxis in NYC (normalised by training set mean 15234, std 6780)

datatrain-test set split

anomalies

training set test set

Fig. 2. An example of training and test set separation on theNYC taxi request time series.

the embedding sequences Et in the training set. To train theLSTM model, we have the LSTM model take the first k − 1embeddings in a sequence Et and predict the next k − 1 em-beddings, i.e.

[e2t , · · · , ekt ] = LSTM([e1t , · · · , ek−1

t ])

(1)

We optimise the LSTM model parameters by minimizing theprediction error of the final embedding, i.e. min ‖ekt − ekt ‖.Notice that all the model parameters for both VAE and LSTMunits are optimised without anomaly labels.

3.2. Anomaly Detection using the VAE-LSTM ModelAfter training, our VAE-LSTM model can be used foranomaly detection in real time. At time t, the VAE-LSTMmodel analyses a test sequence Wt that contains k × p pastreadings tracing back from t. Our model first uses the encoderfrom the VAE to estimate the sequence of embeddings Et inWt. Then it feeds the first k − 1 embeddings to the LSTMmodel to predict the next k − 1 embeddings [e2t , · · · , ekt ], asgiven in Equation (1). Finally, our model reconstructs the lastk − 1 windows using the predicted embeddings and the VAEdecoder, i.e.

wt−(k−i)×p = Decoder(eit), for i = 2, · · · , k. (2)

With the reconstructed windows, we can define a score func-tion dt to flag anomalous behaviours by summing up the pre-diction errors for Wt, i.e.

dt =

k∑i=2

‖wt−(k−i)×p −wt−(k−i)×p‖2 . (3)

To detect an anomaly, we also need to define a threshold θon the score function dt, above which we will flag an anomalyalert yt = 1 at the current t. The corresponding sequenceWt will mark a suspicious window where the anomaly eventmight have occurred. When there is sufficient data, we shoulddetermine θ using a validation set that contains both normaland anomalous examples. The threshold that gives the bestperformance, such as F1 score or other metrics, on this vali-dation set is the detection threshold for the given time series.When there is limited data, we can use a validation that onlycontains normal samples to evaluate the distribution of thescore function and define the threshold as a given percentileof this distribution, for example.

Table 1. Precision, recall and F1 score at the threshold thatgives the best F1 score (L: detection window length).

Dataset Method L Prec Recall F1

Ambienttemperature

Ours 168 0.806 1.0 0.892VAE [8] 24 0.686 0.5 0.573

LSTM-AD [5] 24 1.0 0.5 0.666ARMA [17] 24 0.184 1.0 0.311

CPUutilization

AWS

Ours 144 0.694 1.0 0.819VAE [8] 24 0.348 0.5 0.410

LSTM-AD [5] 24 0.274 1.0 0.430ARMA [17] 24 0.234 1.0 0.380

CPUutilization

EC2

Ours 192 0.993 1.0 0.996VAE [8] 24 0.949 1.0 0.974

LSTM-AD [5] 24 1.0 0.436 0.608ARMA [17] 24 0.938 1.0 0.968

Machinetemperature

Ours 288 0.559 1.0 0.717VAE [8] 48 0.211 1.0 0.207

LSTM-AD [5] 48 1.0 0.5 0.667ARMA [17] 48 0.142 1.0 0.248

NYC taxi

Ours 168 0.961 1.0 0.980VAE [8] 24 0.662 0.8 0.725

LSTM-AD [5] 24 1.0 0.2 0.333ARMA [17] 24 0.769 0.4 0.526

4. EXPERIMENTS AND RESULTS

We evaluate our VAE-LSTM algorithm on five real worldtime series with actual anomaly events: ambient temperaturein an office, CPU usage from Amazon Web Services (AWS)and from a server in Amazon’s East Coast data center, in-ternal temperature of an industrial machine and number oftaxi passengers in New York City [18]. We compare ouralgorithm with three other commonly used anomaly detec-tion algorithms for time series: VAE [8], LSTM-AD [5] andARMA [17]. Table 1 lists the numerical results as well asthe detection window length. We evaluate three metrics - pre-cision, recall and F1 score (all metrics are evaluated at thethreshold that gives the best F1 score). The detection windowlength for each dataset is chosen to be equal across all meth-ods. Ours appears longer because of the hierarchical structureof our model which allows us to detect events over longer pe-riods. Counting the true and false positives/negatives amongthe detection results can be difficult, as anomalous events oc-cur only at a single time stamp whereas all detection algo-rithms reason over a window. We adopt a simple strategyproposed by [11] to alleviate this issue.

A visualisation of the anomaly detection results given byall four methods on the machine temperature series is shownin Figure 3. LSTM-AD achieves a high precision in mostdatasets but suffers from low recall, indicating accurate de-tected anomalies but high chance of missing out true anoma-lies (Figure 3.c). VAE has a good recall but low precision, in-dicating a significant amount of false positive detections (Fig-

4324


datatrue anomaliesdetected anomaliesdetected anomalywindows

0

2

4Machine temperature (normalised by training set mean 87.9, std 7.6)

timestamp (every 5 min)

norm

alis

ed

read

ing

s

0 1000 2000 3000 4000 5000 6000 7000

-2

-4

-6

-8

-10

-12

(a) Our VAE-LSTM algorithm.Machine temperature (normalised by training set mean 87.9, std 7.6)

timestamp (every 5 min)0 1000 2000 3000 4000 5000 6000 7000

0

2

4

norm

alis

ed r

eadin

gs

-2

-4

-6

-8

-10

-12

(b) VAE.


0

2

4

norm

alis

ed r

eadin

gs

-2

-4

-6

-8

-10

-12

Machine temperature (normalised by training set mean 87.9, std 7.6)

(c) LSTM-AD.


0

2

4

norm

alis

ed r

eadin

gs

-2

-4

-6

-8

-10

-12

Machine temperature (normalised by training set mean 87.9, std 7.6)

(d) ARMA.

Fig. 3. Anomalies detected using our VAE-LSTM, VAE,LSTM-AD and ARMA methods on the industrial machinetemperature series. Blue line: time series, red dashed line:ground truth anomalies, short green lines: detected anoma-lies, light yellow windows: detected anomaly windows.

ure 3.b). ARMA does not perform well in either precision andrecall and its detection is clearly the worst as (Figure 3.d).

In contrast, our VAE-LSTM algorithm achieves 100% re-call for all datasets, meaning no missed anomaly and the abil-ity to detecting all types of anomalies. At the same time, ourmethod also achieves a decent precision, indicating that thenumber of false positives is low. For example, a false posi-tive is reported in the ambient temperature series (Figure 4.a)around time t = 2000. From visual inspection there existsan unusual spike in the highlighted window and, hence, itmight be sensible to raise up human supervisors’ attention.We would argue such precaution is indeed beneficial in fail-ure critical scenarios. Good performance in both precisionand recall results in a high F1 score achieved by our method,leading all other methods by a good margin. Examples of ourmethod’s detection result are given in Figures 3a and 4.

A potential downside of our method is the delay in theanomaly detection for some cases. For example, the first

datatrue anomaliesdetected anomaliesdetected anomalywindows

0

0

2

4

timestamp (every hour)

norm

alis

ed r

eadin

gs

-2

-4

-6500 1000 1500 2000 2500 3000 3500

Office ambient temperature (normalised by training set mean 72.0, std 3.3)

(a) Ambient office temperature.


norm

alis

ed r

eadin

gs

0 500 1000 1500 2000 2500

4

2

0

-2

-4

CPU utilization at AWS (normalised by training set mean 37.5, std 14.4)

(b) CPU utilization at AWS server.

norm

alis

ed r

eadin

gs


CPU utilization at EC2 server (normalised by training set mean 45.1, std 1.9)

0 500250 750 1000 1250 1500 1750 2000

0

10

5

-5

-10

-15

(c) CPU utilization at EC2 server.


norm

alis

ed r

eadin

gs

Number of taxi in NYC (normalised by training set mean 15234, std 6780)

0 1000 2000 3000 4000-4

-3

-2

-1

0

1

2

4

3

(d) The number of NYC taxi passengers.

Fig. 4. Anomalies detected by our VAE-LSTM hybrid model.

anomaly in the EC2 CPU utilization series is only detectedafter about 150 time steps. This could be alleviated usingmultiple scale windows and we leave this for future research.

5. CONCLUSION

In this work, we propose a VAE-LSTM hybrid model as anunsupervised learning approach for anomaly detection in timeseries. Our model utilizes both a VAE module for forming ro-bust local features over a short window and a LSTM modulefor estimating the long term correlations in the sequence. Asa result, our detection algorithm is capable of identifying alltypes of anomalies that might span over multiple time scales.We demonstrate the effectiveness of our detection algorithmon five real world sequences and show that our method out-performs other commonly used detection methods.

AcknowledgementThis work was supported by the EPSRC Centre for DoctoralTraining, EP/L015897/1, and the China Scholarship Council.We thank the reviewers for their useful comments.

4325


6. REFERENCES

[1] Markus Goldstein and Seiichi Uchida, “A compara-tive evaluation of unsupervised anomaly detection algo-rithms for multivariate data.,” PloS one, vol. 11 4, pp.e0152173, 2016.

[2] Varun Chandola, Arindam Banerjee, and Vipin Kumar,“Anomaly detection: A survey,” ACM computing sur-veys (CSUR), vol. 41, no. 3, pp. 15, 2009.

[3] Anup K Ghosh and Aaron Schwartzbard, “A study in us-ing neural networks for anomaly and misuse detection.,”in USENIX security symposium, 1999, vol. 99, p. 12.

[4] Tailai Wen and Roy Keyes, “Time series anomaly de-tection using convolutional neural networks and transferlearning,” IJCAI’19 Workshops, 2019.

[5] Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, andPuneet Agarwal, “Long short term memory networksfor anomaly detection in time series,” in Proceedings.Presses universitaires de Louvain, 2015, p. 89.

[6] Diederik P. Kingma and Max Welling, “Auto-encodingvariational bayes,” CoRR, vol. abs/1312.6114, 2013.

[7] Danilo Jimenez Rezende, Shakir Mohamed, and DaanWierstra, “Stochastic backpropagation and approximateinference in deep generative models,” in Proceedingsof the 31th International Conference on Machine Learn-ing, ICML 2014, Beijing, China, 21-26 June 2014, 2014,pp. 1278–1286.

[8] Jinwon An and Sungzoon Cho, “Variational autoen-coder based anomaly detection using reconstructionprobability,” 2015.

[9] Suwon Suh, Daniel Chae, Hyon-Goo Kang, and Se-ungjin Choi, “Echo-state conditional variational autoen-coder for anomaly detection,” 07 2016, pp. 1015–1022.

[10] Yuta Kawachi, Yuma Koizumi, and Noboru Harada,“Complementary set variational autoencoder for super-vised anomaly detection,” 2018 IEEE InternationalConference on Acoustics, Speech and Signal Processing(ICASSP), pp. 2366–2370, 2018.

[11] Haowen Xu, Yang Feng, Jie Chen, Zhaogang Wang,Honglin Qiao, Wenxiao Chen, Nengwen Zhao, ZeyanLi, Jiahao Bu, Zhihan Li, and et al., “Unsupervisedanomaly detection via variational auto-encoder for sea-sonal kpis in web applications,” Proceedings of the 2018World Wide Web Conference on World Wide Web - WWW’18, 2018.

[12] Kyle Hundman, Valentino Constantinou, ChristopherLaporte, Ian Colwell, and Tom Soderstrom, “Detect-ing spacecraft anomalies using lstms and nonparametric

dynamic thresholding,” Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining - KDD ’18, 2018.

[13] Sen Wang, Ronald Clark, Hongkai Wen, and NikiTrigoni, “Deepvo: Towards end-to-end visual odome-try with deep recurrent convolutional neural networks,”in 2017 IEEE International Conference on Robotics andAutomation (ICRA). IEEE, 2017, pp. 2043–2050.

[14] Ronald Clark, Sen Wang, Hongkai Wen, AndrewMarkham, and Niki Trigoni, “Vinet: Visual-inertialodometry as a sequence-to-sequence learning problem,”in Thirty-First AAAI Conference on Artificial Intelli-gence, 2017.

[15] Yong Shean Chong and Yong Haur Tay, “Abnormalevent detection in videos using spatiotemporal autoen-coder,” in International Symposium on Neural Net-works. Springer, 2017, pp. 189–196.

[16] Shuyu Lin, Stephen Roberts, Niki Trigoni, and RonaldClark, “Balancing reconstruction quality and regulari-sation in elbo for vaes,” 2019.

[17] Brandon Pincombea, “Anomaly detection in time seriesof graphs using arma processes,” 2007.

[18] Subutai Ahmad, Alexander Lavin, Scott Purdy, andZuha Agha, “Unsupervised real-time anomaly detec-tion for streaming data,” Neurocomputing, vol. 262, pp.134–147, 2017.

4326


Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model · 2020-06-16 · ANOMALY DETECTION...

Documents

Transcript of Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model · 2020-06-16 · ANOMALY DETECTION...