UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL:...

26
UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems Dean, Daniel Joseph and Nguyen, Hiep and Gu, Xiaohui, Proceedings of the 9th international conference on Autonomic computing (ICAC'12), San Jose, California, USA, 2012. Summarized by: Drew Wicke November 30th 2015

Transcript of UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL:...

Page 1: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

UBL: unsupervised behavior learning for predicting

performance anomalies in virtualized cloud systems

Dean, Daniel Joseph and Nguyen, Hiep and Gu, Xiaohui, Proceedings of the 9th international conference on Autonomic computing (ICAC'12), San Jose,

California, USA, 2012.

Summarized by: Drew Wicke November 30th 2015

Page 2: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Overview• Introduction

• Self-Organizing Maps (SOMs)

• Experimental Setup

• Experiments and Results

• Conclusion & Critique

Page 3: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Introduction• Problem:

• Anomaly prediction in IaaS (Infrastructure as a service) clouds

• Challenges

• VMs are black boxes to the provider

• Thousands of concurrent jobs

• Impossible to get labeled training data

Page 4: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Unsupervised Learning

• Unlabeled training data = No reward or error signal

• Clustering (K-means, DBSCAN, Birch, etc.)

• Latent Variables (Principle Component Analysis)

• Neural Networks (Self-Organizing Maps, Adaptive Resonance Theory)

Page 5: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Self-Organizing Maps

• No labeled training data

• Maps high dimensional space to low dimensions

• Keeps topological order

• Predict both known and unknown anomalies

Page 6: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

SOM Training• Data is normalized to [0-100]

• 32 x 32 lattice network (1024 total neurons)

• Weights are randomly initialized to [0,100]

• K-Fold cross validation for learning phase (K = 3)

• Maps data to neuron using euclidean distance metric

Page 7: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Weight Update• Weight update: W(t+1) =W(t)+N(v, t)L(t)(D(t)−W(t))

• W(t) - weight at time t

• D(t) - data input vector

• N(v, t) - neighborhood function calculates lattice distance to a neighbor neuron v. (Gaussian Function)

• L(t) - learning rate (set to .7)

• Iterated over input data 10 times

Page 8: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Distance Equations

• Distance function:

• w - weight vectors for neurons i and j

• Neighborhood Area Size:

• Top, Left, Right, Bottom Neurons

Page 9: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Unsupervised Anomaly Prediction

• System states:

• Normal

• Pre-Failure

• Failure

Page 10: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Anomaly Prediction

• Threshold based classification to the 3 system states based on neighborhood area size

• Threshold value selected based on percentile (85th-percentile) of the sorted neighborhood area sizes

• Alarm only after 3 consecutive anomalous samples

Page 11: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Anomaly Cause Inference• Indicated by difference between nearby normal

states and the anomalous state. Not exact root cause.

• Distance metrics for 5 normal neurons near the anomalous neuron.

• Sort the metrics high to low

• Each neuron votes which feature is the cause of the problem.

Page 12: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Decentralized

• The learning method is run inside a VM

• Uses residual resources

• Monitors resources and moves to a host with sufficient resources to learn

Page 13: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Experimental Setup• RUBiS online auction benchmark

• NASA web server trace July 1995 for request rate

• SLO violation if average request response time >100ms

• Faults

• Memleak - memory intensive program on VM running database

• CpuLeak - gradually increasing cpu consumption competes with database CPU

• NetHog - large number of http requests to the web server

Page 14: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Experimental Setup• IBM System S - high-performance data stream processing system

(SLO average processing time < 20ms)

• ClarkNet web server trace from August 1995 modulate data arrival rate

• Faults

• MemLeak - start a memory-intensive program in one randomly selected processing element (PE)

• CpuHog - CPU bound program competes with a random PE

• Bottleneck - set a low CPU cap for the VM running a random PE

Page 15: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Experimental Setup• Hadoop - sorting application (sample app)

• SLO violation is marked when job does not make progress.

• 3 VMs for Map and 6 for Reduce

• 12 GB of data to process

• Faults:

• MemLeak - memory leak bug in map tasks memory is allocated from heap without releasing

• CpuHog - inject infinite loop bug into all map tasks

Page 16: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Experiment Measures

• ROC - Receiver Operating Characteristic Curves

• tradeoff between true positive rate and false positive rate

• Achieved lead time - amount of time prior to a SLO violation occurring

Page 17: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Comparisons

• PCA (Principle Component Analysis)

• k-NN scheme (k-nearest neighbor)

• Both need normal and anomalous data unlike SOM which only needs normal data

Page 18: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Prediction Accuracy Results

• In all experiments the SOM method achieves better prediction accuracy than PCA and k-NN

RUBiS IBM System S Hadoop

* (UBL-kPtS) UBL scheme using the k-point moving average smoothing

Page 19: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Lead Time Results

• In all experiments the SOM, UBL method achieves the highest lead time prediction

Page 20: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Anomaly Cause Inference Results

• System S achieves near perfect inference.

Page 21: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Scalability Results

Page 22: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

System Overhead

Overall UBL is lightweight

Page 23: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Conclusions• Black-box unsupervised behavior learning and

anomaly prediction for IaaS

• Predict unknown performance anomalies

• Provides hints to causes of anomalies

• Prediction accuracy up to 98% true positive rate and 1.7% false positive rate

• Advanced alarms with up to 47s lead time

Page 24: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Critique• Different method for initializing weights for SOM rather than random (Principle

Components Initialization)

• Was the method able to maintain the SLOs?

• What were the features?

• “For each fault injection we repeated the experiment 30-40 times.” Not useful for repeatability.

• No confidence intervals on the lead times

• K-NN is a supervised learning algorithm. Why not compare to another unsupervised learning method?

• What value of k was used?

• Did they mean k-means?

Page 25: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Interesting

• Overall a very interesting paper

• This method is patented

• Recently licensed to Google!

Xiaohui (Helen) Gu: http://www.csc.ncsu.edu/faculty/gu/

Page 26: UBL: unsupervised behavior learning for predicting …menasce/cs788/slides/wicke-d-UBL.pdf · UBL: unsupervised behavior learning for predicting performance anomalies in virtualized

Thank You

• Questions?