Chapter 24 Forecasting Models to accompany Operations Research: Applications and Algorithms 4th...
-
Upload
sherilyn-marshall -
Category
Documents
-
view
227 -
download
18
Transcript of Chapter 24 Forecasting Models to accompany Operations Research: Applications and Algorithms 4th...
Chapter 24
Forecasting Models
to accompany
Operations Research: Applications and Algorithms
4th edition
by Wayne L. Winston
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
2
Description
We discuss two important types of forecasting methods: extrapolation methods and causal forecasting methods.
Extrapolation methods (unlike the causal forecasting methods) don’t take into account what “caused” past data; they simply assume that past trends and patterns will continue in the future.
Causal forecasting methods attempt to forecast future values of a variable (called the dependent variable) by using past data to estimate the relationship between the dependent variable and one or more independent variables.
3
24.1 Moving-Average Forecasting Methods
Let x1, x2,…,xt,… be observed values of a time series, where xt is the value of the time series observed during period t.
One of the most commonly used forecasting methods is the moving-average method.
We define ft,1 to be the forecast period for period t+1 made after observing xt.
For the moving-average method, ft,1 = average of the last N observations =average of xt, xt-1, xt-2,…,xt-N+1
where N is a given parameter.
4
Choice of N
We will use the mean absolute deviation (MAD) as our measure of forecast accuracy.
Before defining the MAD, we need to define the concept of a forecast error.
Given a forecast for xt, we define et to be the error in our forecast for xt, to be given by
et=xt-(forecast for xt)
The MAD is simply the average of the absolute values of all the et’s.
We begin with an explanation of the Excel OFFSET function.
5
This function lets you pick out a cell range relative to a given location in the spreadsheet.
The syntax of the OFFSET function is as follows:OFFSET (reference, rows, columns, height, width)
Reference is the cell from which you base the row and column references.
Rows helps locate the upper left-hand corner of the OFFSET range. Rows is measured by number of rows up or down from the cell reference.
Columns helps locate the upper left-hand corner of the OFFSET range. Columns is measured by the number of columns left or right from the cell references.
6
Height is the number of rows in the selected range.
Width is the number of columns in the selected range.
File Offsetexample.xls contains some examples of how the OFFSET function works.
The nice thing about the OFFSET function is that it can be copied like any formula.
Moving-average forecast perform well for a time series that fluctuates about a constant base level.
We find that air conditioner sales is a good example. They exhibit seasonality: The peaks and valleys of the series repeat at regular 12 month intervals.
7
24.2 Simple Exponential Smoothing
If a time series fluctuates about a base level, simple exponential smoothing may be used to obtain good forecasts for future values of the series.
To describe simple exponential smoothing let At = smoothed average of a time series after observing xt.
After observing xt, At is the forecast for the value of the time series during any future period.
The key equation in simple exponential smoothing is At = αxt + (1- α)At-1.
8
In the above equation, α is a smoothing constant that satisfies 0< α<1.
As with moving-average forecasts, we let ft,k be the forecast for xt+k made at the end of period t. Then At=ft,k
Assuming that we are trying to forecast one period ahead, our error for predicting xt is given by et=xt-ft-1,1 = xt-At-1.
Thus, our new forecast At=ft, 1 is equal to our old forecast (At-1) plus a fraction of our period t error (et).
9
This implies that if we “overpredict” xt, we lower our forecast, and if we “underpredict” xt, we raise our forecast.
For larger values of the smoothing constant α, more weight is given to the most recent observation.
10
24.3 Holt’s Method: Exponential Smoothing with Trend
If we believe that a time series exhibits a linear trend, Holt’s method often yields good forecasts.
At the end of the tth period, Holt’s method yields an estimate of the base level (Lt) and the per-period trend (Tt) of the series.
To computer Lt, we take a weighted average of the following two quantities: Xt, which is an estimate of the period t base level from
the current period
Lt-1+Tt-1, which is an estimate of the period t base level based on previous data
11
To computer Tt, we take a weighted average of the following two quantities: An estimate of trend from the current period given by
the increase in the smoothed base from t-1 to period t
Tt-1, which is an estimate of the trend
As before, we define ft,k to be the forecast for xt+k made at the end of period t. Thenft,k=Lt,kTt
To initialize Holt’s method, we need an initial estimate of the base and an initial estimate (call it T0) of the trend.
12
A multiplicative version of Holt’s method can be used to generate good forecasts for a series of the form xt=abtεt
Here , the value of b represents the percentage growth in the base level of the series during each period.
13
A Spreadsheet Interpretation of the Holt Method
The file Holt.xls contains an implementation of the Holt method.
We can use an Excel one-way data table to determine values of α and β that yields a small MAD.
14
24.4 Winter’s Method: Exponential Smoothing with Seasonality
The appropriately named Winter’s method is used to forecast time series for which trend and seasonality are present.
To describe Winter’s method, we require two definition.
Let c = the number of periods in the length of the seasonal pattern.
Let st be an estimate of a seasonal multiplicative factor for month t obtained after observing xt.
15
In what follows Lt and Tt have the same meaning as they did in Holt’s method.
Each period Lt, Tt, and st are updated by using the following equations.
Again, α, β, and γ are smoothing constants, each of which is between 0 and 1.
ctt
tt
tttt
ttct
tt
sL
xs
TLLT
TLas
xL
)1(
)1()(
))(1(
11
11
16
The first equation above updates the estimate of the series bas by taking a weighted average of the following two quantities:
1. Lt-1 + Tt-1 which is our base level estimate before observing xt
2. The deseasonlized observation , which is an estimate of the base obtained from the current period
The second equation above is identical to the Tt equation used to update trend in the Holt method.
ct
t
s
x
17
The third equation above updates the estimates of month t’s seasonality by taking a weighted average of the following two quantities:
1. Our most recent estimate of month t’s seasonality (st-c)
2. , which is an estimate of month t’s seasonality, obtained from the current month
t
t
L
x
18
Initialization of Winter’s Method
To obtain good forecasts with Winter’s method, we must obtain good initial estimates of base, trend, and all seasonal factors.
19
Forecasting Accuracy
For any forecasting model is which forecast errors are normally distributed, we may use MAD to estimate se = standard deviation of our forecasts
The relationship between MAD and se is given in this formula: se=1.25MAD
20
24.5 Ad Hoc Forecasting Suppose we want to determine how many
tellers a back must have working each day to provide adequate service.
Can we develop a single forecasting model to help the bank predict the number of customers who will enter search day?
Let xt = number of customers entering the bank on day t.
21
We postulate that xt = BxDWt x Mtx εt , where
B = base level of customer traffic corresponding to an average day
DWt = day of the week factor corresponding to the day of the week on which day t falls
Mt = month factor corresponding to the month during the
εt = random error teams whose average value equals 1
To begin, we estimate B = average number of arrivals per day the bank is open.
We illustrate the estimation by
BDWt
open isbank Mondayson arrivals ofnumber averageMondayfor
22
Similarly, we find the rest of the days that the bank
is open.
To estimate for Mt (say for May) we write
In a similar fashion, we find the Mt for the remaining months.
Our simple model yielded a MAD of 79.1.
If this method were used to generate forecast for the coming year, however, the MAD would probably exceed 79.1.
BM t
open isbank for which day May on arrivals ofnumber average May for
23
This is because we have fit our parameters to past data; there is no guarantee that future data will “know” that they should follow the same pattern as past data.
Suppose the bank manager observes that on the day after a holiday, bank traffic is much higher than the model predicts.
How can we use this information to obtain more accurate customer forecasts for days after holidays.
We find that the average value of Actual/Forecast for days after a holiday is 1.15.
24
Thus, for any day after a holiday, we obtain a new forecast simply by multiplying our previous forecast by 1.15.
25
24.6 Simple Linear Regression Often, we predict the value of one variable (called
the dependent variable) from the value of another variable (the independent variable).
If the dependent variable and the independent variable are related in a linear fashion, simple linear regression can be used to estimate this relationship.
To illustrate simple linear regression, let’s recall the Giapetto problem.
To set up this problem, we need to determine the cost of producing a soldier and the cost of producing a train.
26
The value and minimizing F( , ) are called the least squares estimates of and .
We call the least squares regression line.
Essentially, if the least squares line fits the points well, we will use as our predication for y1.
0
1
1
1
21)(
))((
xx
yyxx
i
ii
0
0
ii xy 10
ix10
27
Every least squares line has two properties: It passes through the point .
The least squares line “splits” the data points, in the sense that the sum of the vertical distances from point above the least squares line to the least squares line equals the sum of the vertical distances from points below the least squares line to the least squares line.
),( yx
0ie
28
How Good a Fit? How do we determine how well the least squares
line fits on the data points?
To answer this question, we need to discuss three components of variation: sum of squares total (SST), sum of squares error (SSE), and sum of square regression (SSR).
Sum of squares total is given by . SST measures the total variation of yi about its mean.
Sum of squares error is given by .
If the least squares line passes through all the data points, SSE=0.
2)( yySST i
22)( ii eyySSE
29
The sums of squares regression can be shown
thatSST=SSR+SSE
SST is a function only of the values of y.
For a good fit, SSE will be small so the equation above shows that SSR will be large for a good fit.
More formally, we may define the coefficient of determination (R2) for y by
A measure of linear association between x and y is the sample linear correlation rxy.
xyR by explained in variationof percentage SST
SSR2
30
A sample correlation near +1 indicates a strong positive linear relationship between x and y; a sample correlation near -1 indicates a strong negative linear relationship between x and y; and a sample correlation near 0 indicates a weak linear relationship between x and y.
31
Forecasting Accuracy
A measure of the accuracy of predictions derived from regression is given by the standard error of the estimate (se).
If we let n = number of observations, se is given by
Any observation for which y is not within 2se of called an outlier.
y
2n
SSEse
32
T-Tests in Regression
Using a t-test, we can test the significance of a linear relationship.
We can compute the t-statistic given by
)(StdErr 1
1
t
33
Assumptions Underlying the Simple Linear Regression Model
Assumption 1 The variance of the error term should not depend on
the value of the independent variable x.
This assumption is call homoscedasticity.
If the variance of the error term depends on x, then we say that heteroscedasticity is present.
Assumption 2 Errors are normally distributed.
Assumption 3 The errors should be independent. This assumption is
often violated when data are collected over time.
34
Independence of the errors implies that knowing the value of one error should tell us nothing about the value of the next (or any other) error.
This sequence of errors exhibits the following pattern: a positive error is usually followed by another positive error and a negative error is usually followed by a negative error.
This pattern indicates that successive errors are not independent; it is referred to as positive autocorrelation.
35
In other words, positive autocorrelation indicates that successive errors have a positive linear relationship and are not linearly independent.
If the sequence of errors in time sequence resembles Figure 13 in the book we have negative autocorrelation.
Here the successive errors have a negative linear relationship and are not independent.
When there is no obvious pattern present, and the independence assumption appears to be satisfied.
36
Observe that the errors “average out” to 0, so we would expect about half our errors to be positive and half to be negative.
Thus, if there is no pattern in the errors, we would expect the errors to change sign about half the time.
This observation enables us to formalize the preceding discussion as follows. If the errors change sign very rarely, they probably
violate the independence assumption, and positive autocorrelation is probably present.
37
If the error change sign very often, they probably violate the independence assumption, and negative autocorrelation is probably present.
If the errors change sign about half the time, they probably satisfy the independence assumption.
If positive or negative autocorrelation is present, correcting for the autocorrelation will often result in much more accurate forecasts.
38
Running Regressions with Excel
Cost.xls illustrates how to run a regression with Excel.
Important numbers in the output include: R Square This is r2.
Multiple R This is the square root of r2, with the sign of Multiple R being the same as the slope of the regression line.
Standard Error This is se
Observations This is the number of data points.
SS column The Regression entry is SSR. The Residual entry is SSE. The Total entry is SST.
39
Coefficients column The Intercept entry and the trains entry.
t stat This gives the observed t-statistic for the Intercept and the trains variable.
Standard Error column The Intercept entry gives the standard error, and the trains entry gives the standard error. The coefficient entry divided by the standard error entry yields the t-statistic for the intercept or slope.
P-value For the intercept and slope, this gives Probability(|tn-2|≥|Observed t-statistic|).
40
Obtaining a Scatterplot with Excel
To obtain a scatterplot with Excel, let the range where your independent variable is be the X range.
41
24.7 Fitting Nonlinear Relationships
Often, a plot of points of the form (xi,yi) indicates that y is not a linear function of x.
The plot may indicate that there is a nonlinear relationship between x and y.
42
Using a Spreadsheet to Fit a Nonlinear Relationship
VCR.xls shows how we can use a spreadsheet to fit a curve to the data.
43
Utilizing the Excel Trend Curve
The excel Trend Curve makes it easy to fit an equation to a set of data.
After creating an X-Y scatterplot, click on the points in the graph until they turn gold. Then select Chart Add Trendline. Choosing Linear yields the straight line that best fits
the points.
Choose Logarithmic if the scatterplot looks like e or f from Figure 17 in the book.
Choose Power if the scatterplot looks like a or b in Figure 17 of the book.
44
Choose Exponential if the scatterplot looks like c or d from Figure 17.
Choose Polynomial of order n (n = 1,2,3,4,5, or 6)yields the best-fitting equation of the form
nnxxxy 2
210
45
24.8 Multiple Regression
In many situations, more than one independent variable may be useful in predicting the value of a dependent variable.
We then use multiple regression.
In multiple regression, we model the relationship between y and the k independent variables by
Where εi is an error term with mean 0, representing the fact that the actual value of yi may not equal
ikikiii xxxy 22110
kikii xxx 22110
46
Estimation of the βi’s
We call
the least squares regression equation.kikiii xxxy
22110
47
Goodness of Fit Revisited
For multiple regression, we define SSR, SSE and SST as we did earlier.
We also find the R2 =SSR/SST = percentage of variation in y explained by the k independent variables and 1-R2 = percentage of variation in y not explained by the k independent variables.
If we define the standard error of the estimate as
)1(
kn
SSEse
48
Hypothesis Testing
If we have included independent variables x1, x2,…xk in a multiple regression, we often want to test H0:βi = 0 against Ha:βi ≠ 0.
To test this hypotheses, we compute
where StdErr( ) measures the amount of uncertainty present in our estimate of βi
)(StdErr i
it
i
49
Choosing the Best Regression Equation
How can we choose between several regression equations having different sets of independent variables?
We usually want to choose the equation with the lowest value of se, since that will yield the most accurate forecasts. We also want the t-statistics for all variables in the equation to be significant.
These two objectives may conflict, in which case it is difficult to determine the “best” equation.
The Cp statistic, then the regression chosen should have a Cp value close to +1.
50
Multicollinearity
If an estimated regression equation contains two or more independent variables that exhibit a strong linear relationship, we say that Multicollinearity is present.
A strong linear relationship between some of the independent variables may make the computer’s estimates of the βi’s unreliable.
By the way, if an exact linear relationship exists between two or more independent variables, there are an infinite number of combinations of the which will minimize the sum of the squared errors, and most computer packages will print an error message.
i
51
Dummy Variables
Often, a nonqualitative or qualitative independent variable may influence the dependent variable.
To model the effect of a categorical variable on a dependent variable, we define c-1 dummy variables as follows:
otherwise0
variablelcategorica of 1- on value n takesobservatio if1
otherwise0
variablelcategorica of 2 on value n takesobservatio if1
otherwise0
variablelcategorica of 1 on value n takesobservatio if1
1
1
2
2
1
1
c
c
x
cx
x
x
x
x
52
Interpretation of Coefficients of Dummy Variables
How do we interpret the coefficients of dummy variables?
To illustrate, let’s determine how whether or not a day is a payday affects credit union traffic.
We calculate the difference between paydays and nonpays by setting payday variable x5 =1 for paydays and =0 for non paydays.
53
Multiplicative Models
Often, we believer that there is a relationship of the following form
Thus to estimate we run a multiple regression with the dependent variable being ln Y and the independent variables being ln x1, ln x2,…ln xk.
kkxxxY 22
210
54
Heteroscedasticity and AutoCorrelation in Multiple Regression
By plotting the errors in time-series sequence, we may check to see whether the errors from a multiple regression are independent.
If autocorrelation is present and the errors do not appear to be independent, then correcting for autocorrelation will usually yield better forecasts.
By plotting the errors (on the y-axis) against the predicated value of y we can determine whether homoscedasticity or heteroscedasticity is present.
55
If homoscedasticity is present, the plot should show no obvious pattern, whereas if heteroscedasticity is present, the plot should show an obvious pattern indicating that the errors somehow depend on the predicated value of y.
If heteroscedasticity is present, the t-tests described in this section are invalid.
56
Implementing Multiple Regression on a Spreadsheet
Credit.xls includes the output for the regression
The regression output has the following interpretation: Intercept This is .
Standard Error This is se
R-Square This is R2. It means that together all the independent variables in the regression explain 84.6% of the variation in the number of customers arriving daily.
Observations This is the number of data points.
Total df This is the degrees of freedom used for the t-test of H0:βi = 0 against H1:β ≠ 0
0
57
Coefficients For each independent variable, this column yields the coefficient of the independent variable in the least squares equation.
STANDARD ERROR For each independent variable, this row yields StdErr .
t stat This gives the observed t-statistic for the Intercept and all independent variables.
Standard Error column The Intercept entry gives the standard error, and the coefficient entries give the standard error for each independent variable.
P-value For the intercept and each independent variable in a regression with k independent variables, this gives Probability(|tn-k-1≥Observed t-statistic|).
i