Time Series Forecasting– Part I

51
1 1 Time Series Forecasting– Part I What is a Time Series ? Components of Time Series Evaluation Methods of Forecast Smoothing Methods of Time Series Time Series Decomposition by Duong Tuan Anh Faculty of Computer Science and Engineering September 2011

description

Time Series Forecasting– Part I. What is a Time Series ? Components of Time Series Evaluation Methods of Forecast Smoothing Methods of Time Series Time Series Decomposition. by Duong Tuan Anh Faculty of Computer Science and Engineering September 2011. 1. 29. 28. 27. 26. 25. 24. 23. - PowerPoint PPT Presentation

Transcript of Time Series Forecasting– Part I

Page 1: Time Series Forecasting– Part I

11

Time Series Forecasting– Part I

What is a Time Series ? Components of Time Series Evaluation Methods of Forecast Smoothing Methods of Time Series Time Series Decomposition

by Duong Tuan Anh

Faculty of Computer Science and Engineering

September 2011

Page 2: Time Series Forecasting– Part I

22

What is a Time series ?

A time series is a collection of observations made sequentially in time.

0 50 100 150 200 250 300 350 400 450 50023

24

25

26

27

28

29

Examples: Financial time series, scientific time series

A study on random sample of 4000 graphics from 15 of the the world’s news papers published between 1974 and 1989 found that more than 75% of all graphics were time series.

Page 3: Time Series Forecasting– Part I

33

Time series models

Regression models Predict the response over time of the variable under

study to changes in one or more of the explanatory variables.

Deterministic models of time series Stochastic models of time seriesAll the three kinds of models can be used for

forecasting.

Page 4: Time Series Forecasting– Part I

44

Components of a time series

The pattern or behavior of the data in a time series has several components.

Theoretically, any time series can be decomposed into: Trend Cyclical Seasonal Irregular

However, this decomposition is often not straight-forward because these factors interact.

Page 5: Time Series Forecasting– Part I

55

Trend component

The trend component accounts for the gradual shifting of the time series to relatively higher or lower values over a long period of time.

Trend is usually the result of long-term factors such as changes in the population, demographics, technology, or consumer preferences.

Page 6: Time Series Forecasting– Part I

66

Seasonal component

The seasonal component accounts for regular patterns of variability within certain time periods, such as a year.

The variability does not always correspond with the seasons of the year (i.e. winter, spring, summer, fall).

There can be, for example, within-week or within-day “seasonal” behavior.

Page 7: Time Series Forecasting– Part I

77

Cyclical component

Any regular pattern of sequences of values above and below the trend line lasting more than one year can be attributed to the cyclical component.

Usually, this component is due to multiyear cyclical movements in the economy.

Page 8: Time Series Forecasting– Part I

88

Evaluating Methods of forecasts

Forecasting method is selected - many times by intuition, previous experience, or computer resource availability

Divide the data into two sections - an initialization part and a test part

Use the forecast technique to determine the fitted values for the initialization data set

Use the forecast technique to forecast the test data set and determine the forecast errors

Evaluate errors (MAD, MPE, MSD, MAPE) Use the technique, modify, or develop new model

Page 9: Time Series Forecasting– Part I

99

Evaluation Methods of Forecasts

There are three measures of accuracy of the fitted models: MAPE, MAD and MSD for each of the sample forecasting and smoothing methods.

For all three measures, the smaller the value, the better the fit of the model.

Use these statistics to compare the fit of the different methods.

MAPE (Mean Absolute Percentage Error) measure the accuracy of fitted time series values. It expresses accuracy as a percentage.

|(yt-yt’)/yt| MAPE = -------------- 100 (yt 0) n

Page 10: Time Series Forecasting– Part I

1010

MAPE, MAD, and MSD

where yt is the actual value, yt’ is the fitted value and n is the number of observations.

MAD (Mean Absolute Deviation) expresses accuracy in the same units as the data, which help conceptualize the amount of error.

|yt-yt’|

MAD = ----------

n

where yt is the actual value, yt’ is the fitted value and n is

the number of observations.

Page 11: Time Series Forecasting– Part I

1111

MAPE, MAD, and MSD

MSD(Mean Squared Deviation) is a more sensitive measure of an unusually large forecast error than MAD.

(yt-yt’)2

MSD = ---------- n

where yt is the actual value, yt’ is the fitted value and n is the number of observations.

Page 12: Time Series Forecasting– Part I

1212

Methods of smoothing time series

Arithmetic Moving Average Exponential Smoothing Methods Holt-Winters method for Exponential Smoothing

Smoothing a time series: to eliminate some of short-term fluctuations.

Smoothing also can be done to remove seasonal fluctuations, i.e., to deseasonalize a time series.

These models are deterministic in that no reference is made to the sources or nature of the underlying randomness in the series.

The models involves extrapolation techniques.

Page 13: Time Series Forecasting– Part I

1313

Averaging Methods

Simple Averages - quick, inexpensive (should only be used on stationary data)

Moving Average method consists of computing an average of the most recent n data values for the series and using this average for forecasting the value of the time series for the next period.

Moving averages are useful if one can assume item to be forecast will stay steady over time.

Series of arithmetic means – used only for smoothing, provides overall impression of data over time

(most recent n data items) Moving Average = ------------------------------------------

n

Page 14: Time Series Forecasting– Part I

1414

Moving average methods

Works best with stationary data. The smaller the number, the more weight given to

recent periods. A smaller number is desirable when there are

sudden shifts in the level of the series. The greater the number, less weight is given to

more recent periods. The larger the order of the moving average, the

greater the smoothing effect. Larger n when there are wide, infrequent fluctuations in the data.

By smoothing recent actual values, removes randomness.

Page 15: Time Series Forecasting– Part I

1515

Weighted Moving Averages

Weighted Moving Average - place more weight on recent observations. Sum of the weights needs to equal 1.

Used when trend is present Older data usually less important

(weight for period n)(Value in period n)

WMA = --------------------------------------------------------

weights

Page 16: Time Series Forecasting– Part I

1616

Notes on Moving Averages

MA models do not provide information about forecast confidence.

We can not calculate standard errors. We can not explain the stochastic component of the

time series. This stochastic component creates the error in our forecast.

Page 17: Time Series Forecasting– Part I

1717

Exponential Smoothing Methods

Single Exponential Smoothing (Averaging) Double Exponential Smoothing & Holt’s Method Winter’s Model.Note: - Single Exponential Smoothing is for series without

trend and without seasonal component. - Double Exponential Smoothing is for series with trend

and without seasonal component. - Winter’s model is for for series with trend and

seasonal component.

Page 18: Time Series Forecasting– Part I

1818

Single Exponential Smoothing

Continually revising a forecast in light of more recent experiences. Averaging (smoothing) past values of a series in a decreasing (exponential) manner. The observations are weighted with more weight being given to the more recent observations

At = αYt-1 + (1 – α) At-1 (S1) New forecast = α (old observation) + (1- α) old

forecast

Here we denote the original series by yt and the smoothed series by At.

The equation can be rewritten as:

At = At-1 + α(Yt –At-1)

Page 19: Time Series Forecasting– Part I

1919

Single Exponential Smoothing

When looking at the formula – new forecast is really the old forecast plus times the error in the old forecast

To get started, we need a smoothing constant , an initial forecast, and an actual value. We can use the first actual as the forecast value or we can average the first n observations.

The smoothing constant serves as the weighting factor. When is close to 1, the new forecast will include a substantial adjustment for any error that occurred in the preceding forecast. When is close to 0, the new forecast is very similar to the old forecast.

Page 20: Time Series Forecasting– Part I

2020

Single Exponential Smoothing (cont.)

The smoothing constant α is not an arbitrary choice - but generally falls between 0.1 and 0.5. If we want predictions to be stable and random variation smoothed, use a small . If we want a rapid response, a larger value is required.

Page 21: Time Series Forecasting– Part I

2121

Why Exponential?

At = Yt-1 + (1- )At-1

At-1 = Yt-2 + (1- )At-2

At-2 = Yt-3 + (1- )At-3

…At = Yt-1 + (1- ) Yt-2 + (1- ) 2Yt-3 + …. + (1 - ) kYt-k+1

k decreases exponentially.

Page 22: Time Series Forecasting– Part I

2222

Actual

Smoothed

Forecast

Actual

Smoothed

Forecast

0 10 20 30

150

250

350

450

550

650

750

850

Sal

es

Time

Smoothing Constant

Alpha:

MAPE:

MAD:

MSD:

0.100

37.0

134.9

27735.5

Sales data Single Exponential Smoothing .1

The small here smooths the data.

Page 23: Time Series Forecasting– Part I

2323

Actual

Smoothed

Forecast

Actual

Smoothed

Forecast

0 5 10 15 20 25

140

240

340

440

540

640

740

840

940

Sal

es

Time

Smoothing Constant

Alpha:

MAPE:

MAD:

MSD:

0.600

36.5

134.5

22248.4

Sales data Single Exponential Smoothing .6

The large in this example responds quickly to the data.

Page 24: Time Series Forecasting– Part I

2424

Tracking

Use a tracking signal (measure of errors over time) and setting limits. For example, if we forecast n periods, count the number of negative and positive errors. If the number of positive errors is substantially less or greater than n/2, then the process is out of control.

Can also use 95% prediction interval (1.96 * sqrt (MSE)). If the forecast error is outside of the interval, use a new optimal .

Looking back at the .1 single exponential smoothing:1.96*sqrt(24261) = +-305 Observation #21 is out-of-control. We

need to re-evaluate alpha level because this technique is biased.

Page 25: Time Series Forecasting– Part I

2525

Exponential Smoothing Adjusted for Trend: Holt’s method

In some situations, the observed data are trending and contain information that allows the anticipation of future upward movement.

In that case, a linear trend forecast function is needed. Holt’s smoothing method allows for evolving local

linear trend in a time series and can be used to forecast.

When there is a trend, an estimate of the current slope and the current level is required.

Page 26: Time Series Forecasting– Part I

2626

Holt’s Method

Holt’s method uses two coefficients. is the smoothing constant for the level is the trend smoothing constant - used to remove

random error. Advantage of Holt’s method: it provides

flexibility in selecting the rates at which the level and trend are tracked.

Page 27: Time Series Forecasting– Part I

2727

Equations in Holt’s method

The exponentially smoothed series, or the current level estimate: At = Yt + (1- )(At-1 + Tt-1) (S2) The trend estimate: Tt = (At – At-1)+(1- )Tt-1 (S3) Forecast p periods into the future: Y’t+p = At + pTt

where At = new smoothed value (estimate of current level) Yt = new actual value at time t. Tt = trend estimate Y’t+p = forecast for p periods into the future. = smoothing constant for the level = smoothing constant for trend estimate

Page 28: Time Series Forecasting– Part I

28

How to initiate Holt’s method

To get started, initial values for A and T in equation (S2) and (S3) must be determined.

One approach is to set A1 to Y1 and T1 to zero.

The second approach is to use the average of the first five or six observations as A1. T1 is then estimated by the slope of a line that is fit to these five or six observations.

Page 29: Time Series Forecasting– Part I

29

Holt’s method

Holt exponential smoothing with parameters = 1.0 and = 0.099 for time series of electricity consumption.

Page 30: Time Series Forecasting– Part I

3030

Winter’s Method

Winters’ method is an easy way to account for seasonality when data have a seasonal pattern.

It extends Holt’s Method to include an estimate for seasonality. is the smoothing constant for the level is the trend smoothing constant - used to remove random

error. smoothing constant for seasonality

This formula removes seasonal effects. The forecast is modified by multiplying by a seasonal index.

Page 31: Time Series Forecasting– Part I

3131

Winter’s Method

The four equations used in Winters’ (multiplication) smoothing are:

The smoothed series or level estimate: At = Yt /St-s+ (1- )(At-1 + Tt-1) The trend estimate: Tt = (At – At-1)+(1- ) Tt-1 The seasonality estimate: St = Yt/At + (1- )St-s Forecast p periods into the future: Y’t+p = (At + pTt)St-s+p

where At = new smoothed value (estimate of current level) Yt = new actual value at time t. Tt = trend estimate Y’t+p = forecast for p periods into the future. Tt = trend estimate = smoothing constant for the level = smoothing constant for trend estimate = smoothing constant for seasonality estimate p = periods to be forecast into the future s = length of seasonality

WINTERS’ METHOD

Is also called TRIPLE EXPONENTIAL SMOOTHING )

Page 32: Time Series Forecasting– Part I

32

How to initiate Winter’s method

To begin the Winter’s method, the initial values for the smoothed series At, the trend Tt and the seasonal indices St must be set.

One approach is to set the first estimate of At to Y1. The trend is estimated to 0 and the seasonal indices are each set to 1.0.

Page 33: Time Series Forecasting– Part I

3333

Actual

Smoothed

Forecast

Actual

Smoothed

Forecast

0 5 10 15 20 25

100

200

300

400

500

600

700

800

900

Sal

es

Time

Smoothing ConstantsAlpha (level):

Gamma (trend):Delta (season):

MAPE:MAD:

MSD:

0.400

0.1000.300

15.21 63.55

7636.86

Winter's Method Sales Data

Winter’s Method

Page 34: Time Series Forecasting– Part I

34

Decomposition

Decomposition is a procedure to identify the component factors of a time series.

How the components relate to the original series: a model that expresses the time series variable Y in terms of the components T (trend), C (cycle), S (seasonal) and I (iregular).

Additive components model & multiplicative components model.

It is difficult to deal with cyclical component of a time series. To keep things simple we assume that any cycle in the data is part of the trend.

Additive model: Yt = Tt + St + It

Multiplicative model: Yt = Tt St It

Page 35: Time Series Forecasting– Part I

35

Additive and multiplicative models

The additive model works best when the time series has roughly the same variability through the length of the series. That is, all the values of the series fall within a band with

constant width centered on the trend. The multiplicative model works best when the

variability of the time series increased with the level. That is the values of the series become larger as the trend

increases. See the figure in the next slide. Most economic time series have seasonal variation

that increases with the level of the series. So multiplicative model is suitable to them.

Page 36: Time Series Forecasting– Part I

36

(a) A time series with constant variability

(b) A time series with variability increasing with level

Page 37: Time Series Forecasting– Part I

37

Trend equations

Trend can be described by a straight line or a smooth line.

Linear trend: T’t = a + bt Here T’t is the predicted value for the trend at time t. The

symbol t used for the variable represents time and takes integer values 1,2,3,… The slope b is the average increase or decrease in T for each one-period increase in time.

Time trend equations can be fit to the data using the method of least squares. Recall that this method selects the values of coefficients in the

trend equation (e.g. a and b) so that the estimated trend values T’t are close to the actual value Yt as measured by the sum of squared errors criterion

SSE = (Yt – T’t)2

(See Appendix of this chapter for how to find a and b)

Page 38: Time Series Forecasting– Part I

38

Trend line for the Car Registrations Time Series

Page 39: Time Series Forecasting– Part I

39

Additional trend curves

The life cycle of a new product has 3 stages: introduction, growth, and maturity and saturation.

A curve is needed to model the trend over a new product.

A simple function that allows for curvature is the quadratic trend T’t = b0 + b1t + b2t2

When a time series starts slowly and then appears to be increasing at an increasing rate Exponential trend: T’t = b0 b1

t

The coefficient b1 is related to the growth rate.

Page 40: Time Series Forecasting– Part I

40

Page 41: Time Series Forecasting– Part I

41

The increase in the number of salespeople is not constant. It appears as if increasingly larger numbers of people are being added in the later years.

An exponential trend curve fit to the salepeople data has the equation:

T’t = 10.016(1.313)t

Page 42: Time Series Forecasting– Part I

42

Seasonality

Several methods for measuring seasonal variation.

The basic idea: first estimate and remove the trend from the original

series and then smooth out the irregular component. This leaves data containing only seasonal variation.

The seasonal values are collected and summarized to produce a number for each observed interval of the year (week, month, quarter, and so on)

Page 43: Time Series Forecasting– Part I

43

Identification of seasonal component

The identification of seasonal component in a time series differs from trend analysis in two ways: The trend is determined directly from the original data, but the

seasonal component is determined indirectly after eliminating the other components from the data.

The trend is represented by one best-fitting curve, but a separate seasonal value has to be computed for each observed interval.

If an additive decomposition is employed, estimates of the trend, seasonal components are added together to produce the original series.

If an multiplicative decomposition is employed, estimates of individual components must be multiplied together to produce the original series

Page 44: Time Series Forecasting– Part I

44

Seasonal indices

The seasonal indices measure the seasonal variation in the series.

Seasonal indices are percentages that show changes over time.

Ex: With monthly data, a seasonal index of 1.0 for a particular

month means the expected value for that month is 1/12 the total for the year.

An index of 1.25 for a different month implies the observation for that month is expected to be 25% more than 1/12 of the annual total.

A monthly index of 0.80 indicates that the expected level of that month is 20% less than 1/12 the total for the year.

Page 45: Time Series Forecasting– Part I

45

Seasonal adjustment

After the seasonal component has been isolated, it can be used to calculate seasonally adjusted data.

Seasonal adjustment techniques are ad hoc methods of computing seasonal indices and use those indices to deseasonalize the series by removing those seasonal variation.

For an multiplicative decomposition, the seasonally adjusted data are computed by dividing the original data by the seasonal component (i.e. seasonal index)

deseasonalized data = raw data/seasonal index

Page 46: Time Series Forecasting– Part I

46

Seasonal adjustment technique

Seasonal adjustment techniques are based on the idea that a time series yt can be represented as the product of 4 components:

yt = T S C I The objective is to eliminate the seasonal component S. First, we try to isolate the combined trend and cyclical

components T C. This cannot be done exactly; instead an ad-hoc smoothing procedure is used to remove T C from the original time series.

For example, supposed that yt consists of monthly data. Then a 12-month average ym

t is computed: ym

t = (yt+6+… + yt + yt-1 + … + yt-5)/12 Presumably ym

t is relatively free of seasonal and irregular fluctuations and is thus as estimate of T C.

Now, we divide the original data by this estimate of T C to obtain an estimate of the combined seasonal and irregular components S I.

Page 47: Time Series Forecasting– Part I

47

Seasonal adjustment technique (cont.)

S I = yt/ ymt = zt

The next step is to eliminate the irregular component I in order to obtain the seasonal index. To do this, we average the values of S I corresponding to the same month.

In other words, suppose that y1 (and hence z1) corresponds to January, y2 to February, etc., and there are 48 months of data. We thus compute

zm1 = (z1 + z13 + z25 + z37)

zm2 = (z2 + z14 + z26 + z38)

…………………………… zm

12 = (z12 + z24 + z36 + z48)

Page 48: Time Series Forecasting– Part I

48

Seasonal adjustment technique (cont.)

The rationale here is that when the seasonal-irregular percentages zt are averaged for each month (each quarter if the data are quarterly), the irregular fluctuations will be largely smoothed out.

The 12 averages zm1,…, zm

12 will then be estimates of the seasonal indices. They should sum close to 12.

The deseasonalization of the original series yt is now straightforward; just divide each value in the series by its corresponding seasonal index.

Thus, the seasonally adjusted yat is obtained from

ya1 =y1/ zm

1, ya2 =y2/ zm

2 …, ya12 =y12/ zm

12, etc.

Page 49: Time Series Forecasting– Part I

49

Appendix: Least-square parameter estimates

Our goal is to minimize (Yt – Y’t)2 where Y’t = a + bXi is the fitted value of Y corresponding to a particular observation Xi.

We minimize the expression by taking the partial derivatives with respect to a and to b, setting each equal to 0, and solving the resulting pair of simultaneous equations:

2)( ii bXaYa

=-2 )( ii bXaY

2)( ii bXaYb

=-2 )( iii bXaYX

(A.1)

(A.2)

Page 50: Time Series Forecasting– Part I

50

Least-square parameter estimates

Equating these derivatives to zero and dividing by -2, we get

(Yi – a – bXi) = 0 (A.3)

Xi(Yi – a – bXi) = 0 (A.4) Finally by rewriting Eqs. (A.3) and (A.4), we obtain the

pair of simultaneous equations: Yi = aN + bXi (A.5)

XiYi = aXi +bXi2 (A.6)

Now we can solve for a and b simultaneously by multiplying (A.5) by Xi and Eq. (A.6) by N: XiYi = aNXi + b(Xi)2 (A.7)

NXiYi = aNXi +bN(Xi)2 (A.8)

Page 51: Time Series Forecasting– Part I

51

Least-square parameter estimates (cont.)

Subtracting Eq. (A.7) from Eq. (A.8), we get

NXiYi - XiYi = b[N(Xi)2 - (Xi)2 ] (A.9)

from which it follows that

b = (NXiYi - XiYi )/ (N(Xi)2 - (Xi)2) (A.10)

Given b, we may calculate a from Eq. (A.5):

a = (Yi - b Xi)/N (A.11)