ALAR2010_EMD

19
A Forecasting Capability Study of Empirical Mode Decomposition for the Arrival Time of a Parallel Batch System Linh Ngo and Amy Apon Doug Hoffman University of Arkansas Acxiom Corporation 1

description

Capability of Empirical Mode Decomposition for a Parallel Batch System

Transcript of ALAR2010_EMD

  • A Forecasting Capability Study of Empirical Mode

    Decomposition for the Arrival Time of a Parallel

    Batch System

    Linh Ngo and Amy Apon Doug Hoffman

    University of Arkansas Acxiom Corporation

    1

  • Introduction:

    Empirical Mode Decomposition (EMD)

    Huang et. al.

    Represents non-stationary complex time signals as sum of Intrinsic Mode Function (IMF)

    IMF:

    The number of extremes and the number of zero crossing in the whole data set must either equal or differ at most by oneThe number of extremes and the number of zero crossing in the whole data set must either equal or differ at most by one

    At any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero

    2

  • Introduction:

    EMD Sifting Process

    Construct upper envelope cubic spline and lower envelope

    cubic spline

    Find mean of upper and lower envelopes, and

    subtract this mean from data

    IMF?Data

    Yes

    No

    Monotonic?Manual?

    Subtract IMF from Data

    StopYes

    No

    3

  • Introduction:

    IMF example

    4

  • Introduction:

    Application to Arrival Histogram

    150

    200

    250

    300

    Arrival Count Per Bucket

    Arrival Histogram of March 2007

    0

    50

    100

    150

    1 25 49 73 97 121 145 169 193 217 241 265 289 313 337 361 385 409 433 457 481 505 529 553 577 601 625 649 673 697 721

    Arrival Count Per Bucket

    One-Hour BucketBeginning on 00:00 Wednesday March 01

    Ending on 23:58 Saturday March 31

    5

  • Introduction:

    Application to Arrival Histogram

    6

  • Workload Characterization using EMD

    EMD and Workload Characterization

    Workload Histogram decomposition

    Piecewise sine fitting

    Characterization Results

    Improvements over hyper-exponential distribution

    Require non-trivial manual fine tuning

    Impractical comparing to traditional distribution techniques

    7

  • Can EMD do anything else?

    Workload Forecasting

    Preprocessing data for forecasting techniques

    Improve accuracy and flexibility of predicted data

    A common preprocessing technique to both workload characterization and forecasting

    Workload characterization model that reflects actual future workload

    Forecasting model with extended range and modification capability

    8

  • Characterization and Forecasting:

    Comparison

    Original Workload

    Characterization Model

    StatisticalMeasurements

    PastWorkload

    Forecasting Model

    ExactMeasurements

    Future of Past Workload

    NoNo

    Measurements

    System/Simulator

    PerformanceMeasurements

    Model modification to create hypothetical scenarios for capacity

    planning

    Real Time Prediction

    ExactMeasurements

    Scheduling/Resource Management based on prediction

    Real Future Workload

    Yes

    No

    Yes

    Yes

    Yes

    No

    No

    9

  • Forecasting Feasibility Study:

    Data Preprocessing

    Pattern isolation:

    Workloads, enterprise workloads in particular, contain patterns

    9-5, M-F, monthly, yearly, holidays,

    Individual patterns = Individual Signals with unique frequencies

    Signal decomposition techniques

    Difficulties: Difficulties:

    Different arrival sources carry different patterns

    Patterns that exist only for a period of time

    10

  • Forecasting Feasibility Study:

    Arrival Patterns

    11

    Arrival Time Histogram Comparison (Wednesday and Thursday, last week of May 2006)

    Arrival Time Histogram Comparison ( Wednesdays, June 2006)

    Existence of daily arrival patterns

  • Forecasting Feasibility Study:

    Arrival PatternsComparing IMFs ofadjacent groups:1.1, 1.2, 1.3, and 1.4are the IMFs of thefirst group (4193 10,000) - 2.1, 2.2,2.3, and 2.4 are forthe second group (0- 5,000), and 3.1, 3.2,3.3, and 3.4 are forthe overall group (0- 10,000)

    12

    - 10,000)

    IMFs generated from adjacent data subset exhibit a sense of continuity

  • Forecasting Feasibility Study:

    Hypothesis

    IMFs can be used to isolate signals with patterns to be inputs of a forecasting technique

    13

  • Forecasting Feasibility Study:

    Algorithm and Evaluation Metric

    Algorithm:

    Using two set of data, estimation data and prediction data

    Calculate the optimal estimated weights of the estimation data set

    Apply the calculated weights to the prediction data set in order to find out the future dataorder to find out the future data

    Evaluation:

    Mean Average Percentage Error

    14

    =

    =

    1

    0

    n

    i i

    ii

    measured

    measuredpredictedMAPE

  • Forecasting Feasibility Study:

    Preliminary Results

    Experiment 1:predict the first Thursday of June 2006 based on the original histograms of the first

    Experiment IMF Count Estimation MAPE Prediction MAPE

    1 0 0.0% 53.89%

    2 2 13.68% 32.27%

    3 10 2.65% 39.08%

    4 12 0.9% 36.2%

    predict the first Thursday of June 2006 based on the original histograms of the first Wednesday of June 2006, and the last Wednesday and Thursday of May 2006.

    Experiment 2: predict the first Thursday of June 2006 with the same data set, but this time the days are

    decomposed to IMFs.

    Experiment 3 and 4: predict the first Thursday of June 2006, using the IMFs of the first Wednesday of June, and

    the last Wednesday and Thursday of May to predict first Thursday of June.

    The ranges of the empirical data used by the EMD process are extended.

    Experiment 3: Full month of May until the first Wednesday of June.

    Experiment 4: Full month of April and May until the first Wednesday of June.

    15

  • Forecasting Feasibility Study:

    Preliminary Analysis

    Experiments with EMD preprocessing offer a better prediction result.

    An increases in data retention in Experiment 4 from Experiment 3 improved the prediction.

    Experiment 2:

    Low number of IMFs

    Low estimation MAPE

    High prediction MAPE

    16

  • Forecasting Feasibility Study:

    Conclusion

    EMD: Potential data processing platform

    Need better predictive tools to increase the accuracy of EMD-based predictions

    17

  • Proposed Work

    Prediction techniques for EMD-based data

    Comparison of EMD as a data preprocessing tool against traditional decomposition techniques (Wavelet and Fourier)

    Application on different workloads

    Effect of the length of retained data for forecasting

    18

  • Questions?

    19