ALAR2010_EMD
description
Transcript of ALAR2010_EMD
-
A Forecasting Capability Study of Empirical Mode
Decomposition for the Arrival Time of a Parallel
Batch System
Linh Ngo and Amy Apon Doug Hoffman
University of Arkansas Acxiom Corporation
1
-
Introduction:
Empirical Mode Decomposition (EMD)
Huang et. al.
Represents non-stationary complex time signals as sum of Intrinsic Mode Function (IMF)
IMF:
The number of extremes and the number of zero crossing in the whole data set must either equal or differ at most by oneThe number of extremes and the number of zero crossing in the whole data set must either equal or differ at most by one
At any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero
2
-
Introduction:
EMD Sifting Process
Construct upper envelope cubic spline and lower envelope
cubic spline
Find mean of upper and lower envelopes, and
subtract this mean from data
IMF?Data
Yes
No
Monotonic?Manual?
Subtract IMF from Data
StopYes
No
3
-
Introduction:
IMF example
4
-
Introduction:
Application to Arrival Histogram
150
200
250
300
Arrival Count Per Bucket
Arrival Histogram of March 2007
0
50
100
150
1 25 49 73 97 121 145 169 193 217 241 265 289 313 337 361 385 409 433 457 481 505 529 553 577 601 625 649 673 697 721
Arrival Count Per Bucket
One-Hour BucketBeginning on 00:00 Wednesday March 01
Ending on 23:58 Saturday March 31
5
-
Introduction:
Application to Arrival Histogram
6
-
Workload Characterization using EMD
EMD and Workload Characterization
Workload Histogram decomposition
Piecewise sine fitting
Characterization Results
Improvements over hyper-exponential distribution
Require non-trivial manual fine tuning
Impractical comparing to traditional distribution techniques
7
-
Can EMD do anything else?
Workload Forecasting
Preprocessing data for forecasting techniques
Improve accuracy and flexibility of predicted data
A common preprocessing technique to both workload characterization and forecasting
Workload characterization model that reflects actual future workload
Forecasting model with extended range and modification capability
8
-
Characterization and Forecasting:
Comparison
Original Workload
Characterization Model
StatisticalMeasurements
PastWorkload
Forecasting Model
ExactMeasurements
Future of Past Workload
NoNo
Measurements
System/Simulator
PerformanceMeasurements
Model modification to create hypothetical scenarios for capacity
planning
Real Time Prediction
ExactMeasurements
Scheduling/Resource Management based on prediction
Real Future Workload
Yes
No
Yes
Yes
Yes
No
No
9
-
Forecasting Feasibility Study:
Data Preprocessing
Pattern isolation:
Workloads, enterprise workloads in particular, contain patterns
9-5, M-F, monthly, yearly, holidays,
Individual patterns = Individual Signals with unique frequencies
Signal decomposition techniques
Difficulties: Difficulties:
Different arrival sources carry different patterns
Patterns that exist only for a period of time
10
-
Forecasting Feasibility Study:
Arrival Patterns
11
Arrival Time Histogram Comparison (Wednesday and Thursday, last week of May 2006)
Arrival Time Histogram Comparison ( Wednesdays, June 2006)
Existence of daily arrival patterns
-
Forecasting Feasibility Study:
Arrival PatternsComparing IMFs ofadjacent groups:1.1, 1.2, 1.3, and 1.4are the IMFs of thefirst group (4193 10,000) - 2.1, 2.2,2.3, and 2.4 are forthe second group (0- 5,000), and 3.1, 3.2,3.3, and 3.4 are forthe overall group (0- 10,000)
12
- 10,000)
IMFs generated from adjacent data subset exhibit a sense of continuity
-
Forecasting Feasibility Study:
Hypothesis
IMFs can be used to isolate signals with patterns to be inputs of a forecasting technique
13
-
Forecasting Feasibility Study:
Algorithm and Evaluation Metric
Algorithm:
Using two set of data, estimation data and prediction data
Calculate the optimal estimated weights of the estimation data set
Apply the calculated weights to the prediction data set in order to find out the future dataorder to find out the future data
Evaluation:
Mean Average Percentage Error
14
=
=
1
0
n
i i
ii
measured
measuredpredictedMAPE
-
Forecasting Feasibility Study:
Preliminary Results
Experiment 1:predict the first Thursday of June 2006 based on the original histograms of the first
Experiment IMF Count Estimation MAPE Prediction MAPE
1 0 0.0% 53.89%
2 2 13.68% 32.27%
3 10 2.65% 39.08%
4 12 0.9% 36.2%
predict the first Thursday of June 2006 based on the original histograms of the first Wednesday of June 2006, and the last Wednesday and Thursday of May 2006.
Experiment 2: predict the first Thursday of June 2006 with the same data set, but this time the days are
decomposed to IMFs.
Experiment 3 and 4: predict the first Thursday of June 2006, using the IMFs of the first Wednesday of June, and
the last Wednesday and Thursday of May to predict first Thursday of June.
The ranges of the empirical data used by the EMD process are extended.
Experiment 3: Full month of May until the first Wednesday of June.
Experiment 4: Full month of April and May until the first Wednesday of June.
15
-
Forecasting Feasibility Study:
Preliminary Analysis
Experiments with EMD preprocessing offer a better prediction result.
An increases in data retention in Experiment 4 from Experiment 3 improved the prediction.
Experiment 2:
Low number of IMFs
Low estimation MAPE
High prediction MAPE
16
-
Forecasting Feasibility Study:
Conclusion
EMD: Potential data processing platform
Need better predictive tools to increase the accuracy of EMD-based predictions
17
-
Proposed Work
Prediction techniques for EMD-based data
Comparison of EMD as a data preprocessing tool against traditional decomposition techniques (Wavelet and Fourier)
Application on different workloads
Effect of the length of retained data for forecasting
18
-
Questions?
19