Chapter 24 Forecasting Models to accompany Operations Research: Applications and Algorithms 4th...

Chapter 24

Forecasting Models

to accompany

Operations Research: Applications and Algorithms

4th edition

by Wayne L. Winston

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

2

Description

We discuss two important types of forecasting methods: extrapolation methods and causal forecasting methods.

Extrapolation methods (unlike the causal forecasting methods) don’t take into account what “caused” past data; they simply assume that past trends and patterns will continue in the future.

Causal forecasting methods attempt to forecast future values of a variable (called the dependent variable) by using past data to estimate the relationship between the dependent variable and one or more independent variables.

3

24.1 Moving-Average Forecasting Methods

Let x1, x2,…,xt,… be observed values of a time series, where xt is the value of the time series observed during period t.

One of the most commonly used forecasting methods is the moving-average method.

We define ft,1 to be the forecast period for period t+1 made after observing xt.

For the moving-average method, ft,1 = average of the last N observations =average of xt, xt-1, xt-2,…,xt-N+1

where N is a given parameter.

4

Choice of N

We will use the mean absolute deviation (MAD) as our measure of forecast accuracy.

Before defining the MAD, we need to define the concept of a forecast error.

Given a forecast for xt, we define et to be the error in our forecast for xt, to be given by

et=xt-(forecast for xt)

The MAD is simply the average of the absolute values of all the et’s.

We begin with an explanation of the Excel OFFSET function.

5

This function lets you pick out a cell range relative to a given location in the spreadsheet.

The syntax of the OFFSET function is as follows:OFFSET (reference, rows, columns, height, width)

Reference is the cell from which you base the row and column references.

Rows helps locate the upper left-hand corner of the OFFSET range. Rows is measured by number of rows up or down from the cell reference.

Columns helps locate the upper left-hand corner of the OFFSET range. Columns is measured by the number of columns left or right from the cell references.

6

Height is the number of rows in the selected range.

Width is the number of columns in the selected range.

File Offsetexample.xls contains some examples of how the OFFSET function works.

The nice thing about the OFFSET function is that it can be copied like any formula.

Moving-average forecast perform well for a time series that fluctuates about a constant base level.

We find that air conditioner sales is a good example. They exhibit seasonality: The peaks and valleys of the series repeat at regular 12 month intervals.

7

24.2 Simple Exponential Smoothing

If a time series fluctuates about a base level, simple exponential smoothing may be used to obtain good forecasts for future values of the series.

To describe simple exponential smoothing let At = smoothed average of a time series after observing xt.

After observing xt, At is the forecast for the value of the time series during any future period.

The key equation in simple exponential smoothing is At = αxt + (1- α)At-1.

8

In the above equation, α is a smoothing constant that satisfies 0< α<1.

As with moving-average forecasts, we let ft,k be the forecast for xt+k made at the end of period t. Then At=ft,k

Assuming that we are trying to forecast one period ahead, our error for predicting xt is given by et=xt-ft-1,1 = xt-At-1.

Thus, our new forecast At=ft, 1 is equal to our old forecast (At-1) plus a fraction of our period t error (et).

9

This implies that if we “overpredict” xt, we lower our forecast, and if we “underpredict” xt, we raise our forecast.

For larger values of the smoothing constant α, more weight is given to the most recent observation.

10

24.3 Holt’s Method: Exponential Smoothing with Trend

If we believe that a time series exhibits a linear trend, Holt’s method often yields good forecasts.

At the end of the tth period, Holt’s method yields an estimate of the base level (Lt) and the per-period trend (Tt) of the series.

To computer Lt, we take a weighted average of the following two quantities: Xt, which is an estimate of the period t base level from

the current period

Lt-1+Tt-1, which is an estimate of the period t base level based on previous data

11

To computer Tt, we take a weighted average of the following two quantities: An estimate of trend from the current period given by

the increase in the smoothed base from t-1 to period t

Tt-1, which is an estimate of the trend

As before, we define ft,k to be the forecast for xt+k made at the end of period t. Thenft,k=Lt,kTt

To initialize Holt’s method, we need an initial estimate of the base and an initial estimate (call it T0) of the trend.

12

A multiplicative version of Holt’s method can be used to generate good forecasts for a series of the form xt=abtεt

Here , the value of b represents the percentage growth in the base level of the series during each period.

13

A Spreadsheet Interpretation of the Holt Method

The file Holt.xls contains an implementation of the Holt method.

We can use an Excel one-way data table to determine values of α and β that yields a small MAD.

14

24.4 Winter’s Method: Exponential Smoothing with Seasonality

The appropriately named Winter’s method is used to forecast time series for which trend and seasonality are present.

To describe Winter’s method, we require two definition.

Let c = the number of periods in the length of the seasonal pattern.

Let st be an estimate of a seasonal multiplicative factor for month t obtained after observing xt.

15

In what follows Lt and Tt have the same meaning as they did in Holt’s method.

Each period Lt, Tt, and st are updated by using the following equations.

Again, α, β, and γ are smoothing constants, each of which is between 0 and 1.

ctt

tt

tttt

ttct

tt

sL

xs

TLLT

TLas

xL

)1(

)1()(

))(1(

11

11

16

The first equation above updates the estimate of the series bas by taking a weighted average of the following two quantities:

1. Lt-1 + Tt-1 which is our base level estimate before observing xt

2. The deseasonlized observation , which is an estimate of the base obtained from the current period

The second equation above is identical to the Tt equation used to update trend in the Holt method.

ct

t

s

x

17

The third equation above updates the estimates of month t’s seasonality by taking a weighted average of the following two quantities:

1. Our most recent estimate of month t’s seasonality (st-c)

2. , which is an estimate of month t’s seasonality, obtained from the current month

t

t

L

x

18

Initialization of Winter’s Method

To obtain good forecasts with Winter’s method, we must obtain good initial estimates of base, trend, and all seasonal factors.

19

Forecasting Accuracy

For any forecasting model is which forecast errors are normally distributed, we may use MAD to estimate se = standard deviation of our forecasts

The relationship between MAD and se is given in this formula: se=1.25MAD

20

24.5 Ad Hoc Forecasting Suppose we want to determine how many

tellers a back must have working each day to provide adequate service.

Can we develop a single forecasting model to help the bank predict the number of customers who will enter search day?

Let xt = number of customers entering the bank on day t.

21

We postulate that xt = BxDWt x Mtx εt , where

B = base level of customer traffic corresponding to an average day

DWt = day of the week factor corresponding to the day of the week on which day t falls

Mt = month factor corresponding to the month during the

εt = random error teams whose average value equals 1

To begin, we estimate B = average number of arrivals per day the bank is open.

We illustrate the estimation by

BDWt

open isbank Mondayson arrivals ofnumber averageMondayfor

22

Similarly, we find the rest of the days that the bank

is open.

To estimate for Mt (say for May) we write

In a similar fashion, we find the Mt for the remaining months.

Our simple model yielded a MAD of 79.1.

If this method were used to generate forecast for the coming year, however, the MAD would probably exceed 79.1.

BM t

open isbank for which day May on arrivals ofnumber average May for

23

This is because we have fit our parameters to past data; there is no guarantee that future data will “know” that they should follow the same pattern as past data.

Suppose the bank manager observes that on the day after a holiday, bank traffic is much higher than the model predicts.

How can we use this information to obtain more accurate customer forecasts for days after holidays.

We find that the average value of Actual/Forecast for days after a holiday is 1.15.

24

Thus, for any day after a holiday, we obtain a new forecast simply by multiplying our previous forecast by 1.15.

25

24.6 Simple Linear Regression Often, we predict the value of one variable (called

the dependent variable) from the value of another variable (the independent variable).

If the dependent variable and the independent variable are related in a linear fashion, simple linear regression can be used to estimate this relationship.

To illustrate simple linear regression, let’s recall the Giapetto problem.

To set up this problem, we need to determine the cost of producing a soldier and the cost of producing a train.

26

The value and minimizing F( , ) are called the least squares estimates of and .

We call the least squares regression line.

Essentially, if the least squares line fits the points well, we will use as our predication for y1.

0

1

1

1

21)(

))((

xx

yyxx

i

ii

0

0

ii xy 10

ix10

27

Every least squares line has two properties: It passes through the point .

The least squares line “splits” the data points, in the sense that the sum of the vertical distances from point above the least squares line to the least squares line equals the sum of the vertical distances from points below the least squares line to the least squares line.

),( yx

0ie

28

How Good a Fit? How do we determine how well the least squares

line fits on the data points?

To answer this question, we need to discuss three components of variation: sum of squares total (SST), sum of squares error (SSE), and sum of square regression (SSR).

Sum of squares total is given by . SST measures the total variation of yi about its mean.

Sum of squares error is given by .

If the least squares line passes through all the data points, SSE=0.

2)( yySST i

22)( ii eyySSE

29

The sums of squares regression can be shown

thatSST=SSR+SSE

SST is a function only of the values of y.

For a good fit, SSE will be small so the equation above shows that SSR will be large for a good fit.

More formally, we may define the coefficient of determination (R2) for y by

A measure of linear association between x and y is the sample linear correlation rxy.

xyR by explained in variationof percentage SST

SSR2

30

A sample correlation near +1 indicates a strong positive linear relationship between x and y; a sample correlation near -1 indicates a strong negative linear relationship between x and y; and a sample correlation near 0 indicates a weak linear relationship between x and y.

31

Forecasting Accuracy

A measure of the accuracy of predictions derived from regression is given by the standard error of the estimate (se).

If we let n = number of observations, se is given by

Any observation for which y is not within 2se of called an outlier.

y

2n

SSEse

32

T-Tests in Regression

Using a t-test, we can test the significance of a linear relationship.

We can compute the t-statistic given by

)(StdErr 1

1

t

33

Assumptions Underlying the Simple Linear Regression Model

Assumption 1 The variance of the error term should not depend on

the value of the independent variable x.

This assumption is call homoscedasticity.

If the variance of the error term depends on x, then we say that heteroscedasticity is present.

Assumption 2 Errors are normally distributed.

Assumption 3 The errors should be independent. This assumption is

often violated when data are collected over time.

34

Independence of the errors implies that knowing the value of one error should tell us nothing about the value of the next (or any other) error.

This sequence of errors exhibits the following pattern: a positive error is usually followed by another positive error and a negative error is usually followed by a negative error.

This pattern indicates that successive errors are not independent; it is referred to as positive autocorrelation.

35

In other words, positive autocorrelation indicates that successive errors have a positive linear relationship and are not linearly independent.

If the sequence of errors in time sequence resembles Figure 13 in the book we have negative autocorrelation.

Here the successive errors have a negative linear relationship and are not independent.

When there is no obvious pattern present, and the independence assumption appears to be satisfied.

36

Observe that the errors “average out” to 0, so we would expect about half our errors to be positive and half to be negative.

Thus, if there is no pattern in the errors, we would expect the errors to change sign about half the time.

This observation enables us to formalize the preceding discussion as follows. If the errors change sign very rarely, they probably

violate the independence assumption, and positive autocorrelation is probably present.

37

If the error change sign very often, they probably violate the independence assumption, and negative autocorrelation is probably present.

If the errors change sign about half the time, they probably satisfy the independence assumption.

If positive or negative autocorrelation is present, correcting for the autocorrelation will often result in much more accurate forecasts.

38

Running Regressions with Excel

Cost.xls illustrates how to run a regression with Excel.

Important numbers in the output include: R Square This is r2.

Multiple R This is the square root of r2, with the sign of Multiple R being the same as the slope of the regression line.

Standard Error This is se

Observations This is the number of data points.

SS column The Regression entry is SSR. The Residual entry is SSE. The Total entry is SST.

39

Coefficients column The Intercept entry and the trains entry.

t stat This gives the observed t-statistic for the Intercept and the trains variable.

Standard Error column The Intercept entry gives the standard error, and the trains entry gives the standard error. The coefficient entry divided by the standard error entry yields the t-statistic for the intercept or slope.

P-value For the intercept and slope, this gives Probability(|tn-2|≥|Observed t-statistic|).

40

Obtaining a Scatterplot with Excel

To obtain a scatterplot with Excel, let the range where your independent variable is be the X range.

41

24.7 Fitting Nonlinear Relationships

Often, a plot of points of the form (xi,yi) indicates that y is not a linear function of x.

The plot may indicate that there is a nonlinear relationship between x and y.

42

Using a Spreadsheet to Fit a Nonlinear Relationship

VCR.xls shows how we can use a spreadsheet to fit a curve to the data.

43

Utilizing the Excel Trend Curve

The excel Trend Curve makes it easy to fit an equation to a set of data.

After creating an X-Y scatterplot, click on the points in the graph until they turn gold. Then select Chart Add Trendline. Choosing Linear yields the straight line that best fits

the points.

Choose Logarithmic if the scatterplot looks like e or f from Figure 17 in the book.

Choose Power if the scatterplot looks like a or b in Figure 17 of the book.

44

Choose Exponential if the scatterplot looks like c or d from Figure 17.

Choose Polynomial of order n (n = 1,2,3,4,5, or 6)yields the best-fitting equation of the form

nnxxxy 2

210

45

24.8 Multiple Regression

In many situations, more than one independent variable may be useful in predicting the value of a dependent variable.

We then use multiple regression.

In multiple regression, we model the relationship between y and the k independent variables by

Where εi is an error term with mean 0, representing the fact that the actual value of yi may not equal

ikikiii xxxy 22110

kikii xxx 22110

46

Estimation of the βi’s

We call

the least squares regression equation.kikiii xxxy

22110

47

Goodness of Fit Revisited

For multiple regression, we define SSR, SSE and SST as we did earlier.

We also find the R2 =SSR/SST = percentage of variation in y explained by the k independent variables and 1-R2 = percentage of variation in y not explained by the k independent variables.

If we define the standard error of the estimate as

)1(

kn

SSEse

48

Hypothesis Testing

If we have included independent variables x1, x2,…xk in a multiple regression, we often want to test H0:βi = 0 against Ha:βi ≠ 0.

To test this hypotheses, we compute

where StdErr( ) measures the amount of uncertainty present in our estimate of βi

)(StdErr i

it

i

49

Choosing the Best Regression Equation

How can we choose between several regression equations having different sets of independent variables?

We usually want to choose the equation with the lowest value of se, since that will yield the most accurate forecasts. We also want the t-statistics for all variables in the equation to be significant.

These two objectives may conflict, in which case it is difficult to determine the “best” equation.

The Cp statistic, then the regression chosen should have a Cp value close to +1.

50

Multicollinearity

If an estimated regression equation contains two or more independent variables that exhibit a strong linear relationship, we say that Multicollinearity is present.

A strong linear relationship between some of the independent variables may make the computer’s estimates of the βi’s unreliable.

By the way, if an exact linear relationship exists between two or more independent variables, there are an infinite number of combinations of the which will minimize the sum of the squared errors, and most computer packages will print an error message.

i

51

Dummy Variables

Often, a nonqualitative or qualitative independent variable may influence the dependent variable.

To model the effect of a categorical variable on a dependent variable, we define c-1 dummy variables as follows:

otherwise0

variablelcategorica of 1- on value n takesobservatio if1

otherwise0

variablelcategorica of 2 on value n takesobservatio if1

otherwise0

variablelcategorica of 1 on value n takesobservatio if1

1

1

2

2

1

1

c

c

x

cx

x

x

x

x

52

Interpretation of Coefficients of Dummy Variables

How do we interpret the coefficients of dummy variables?

To illustrate, let’s determine how whether or not a day is a payday affects credit union traffic.

We calculate the difference between paydays and nonpays by setting payday variable x5 =1 for paydays and =0 for non paydays.

53

Multiplicative Models

Often, we believer that there is a relationship of the following form

Thus to estimate we run a multiple regression with the dependent variable being ln Y and the independent variables being ln x1, ln x2,…ln xk.

kkxxxY 22

210

54

Heteroscedasticity and AutoCorrelation in Multiple Regression

By plotting the errors in time-series sequence, we may check to see whether the errors from a multiple regression are independent.

If autocorrelation is present and the errors do not appear to be independent, then correcting for autocorrelation will usually yield better forecasts.

By plotting the errors (on the y-axis) against the predicated value of y we can determine whether homoscedasticity or heteroscedasticity is present.

55

If homoscedasticity is present, the plot should show no obvious pattern, whereas if heteroscedasticity is present, the plot should show an obvious pattern indicating that the errors somehow depend on the predicated value of y.

If heteroscedasticity is present, the t-tests described in this section are invalid.

56

Implementing Multiple Regression on a Spreadsheet

Credit.xls includes the output for the regression

The regression output has the following interpretation: Intercept This is .

Standard Error This is se

R-Square This is R2. It means that together all the independent variables in the regression explain 84.6% of the variation in the number of customers arriving daily.

Observations This is the number of data points.

Total df This is the degrees of freedom used for the t-test of H0:βi = 0 against H1:β ≠ 0

0

57

Coefficients For each independent variable, this column yields the coefficient of the independent variable in the least squares equation.

STANDARD ERROR For each independent variable, this row yields StdErr .

t stat This gives the observed t-statistic for the Intercept and all independent variables.

Standard Error column The Intercept entry gives the standard error, and the coefficient entries give the standard error for each independent variable.

P-value For the intercept and each independent variable in a regression with k independent variables, this gives Probability(|tn-k-1≥Observed t-statistic|).

i

Chapter 24 Forecasting Models to accompany Operations Research: Applications and Algorithms 4th...

Documents

Transcript of Chapter 24 Forecasting Models to accompany Operations Research: Applications and Algorithms 4th...