Demand Forecasting With Multiple Regression Course Notes

8/12/2019 Demand Forecasting With Multiple Regression Course Notes

http://slidepdf.com/reader/full/demand-forecasting-with-multiple-regression-course-notes 1/26



Course Transcript

IEEE eLearning Library Biometrics for Recognition at a Distance Transcript pg. 5 / 26

Background

We developed this module partly because, in the recent eye for transport survey, it wasindicated that forecasting is an area where most respondents still feel that they have room forimprovement. Only 22% of retail and consumer product supply chain executives rated theirforecasting capabilities as either good or excellent. In other words, 30% rated theirforecasting as less than satisfactory or very poor, or if we rephrase this one more time, wecan report that 78% of respondents would not rate their forecasting capabilities as anythingbetter than merely satisfactory.

Forecasting and Green EngineeringForecasting is very important to green engineering in general and to green production inmanufacturing systems in particular. Paul Anastas, a professor of chemical engineering andan EPA administrator has developed 12 principles of green engineering. We’re not going togo over each of those, but there are two principles that appear to be particularly relevant toexplaining why forecasting is important to green engineering.

Forecasting and Green Engineering (cont.)

Principle five, which states that a green engineering system must be output-pulled versusinput-pushed and talks about the fact that manufacturing systems that adhere just in time,philosophy, are pull system. The customer requests the product and pulls it through thesupply chain. When it is not practical to wait until the customer requests the product,accurate forecasts are used instead of actual customer orders. Now lean manufacturingdescribes the process of getting rid of ways that result in overproduction, delays, inventory,over-processing and error and leads to a more continuous production flow based almostentirely on customer pull.

Module RationaleConsequently, the rationale for this module being part of the IEEE green productionmanagement series is that an accurate prediction of future demand is a requirement in orderto plan production without creating wasteful overages or shortages and hence constitutes acornerstone of successful green engineering.



Course Transcript


Independent vs. Dependent Item ForecastsTo understand what we are talking about in this module, it is important to understand thedifference between independent and dependent items for forecasting. We refer to anindependent demand item as essentially forecasting finished products; those that have a useall by themselves. The dependent demand items are essentially the components that go intothe final product. The demand for the finished product, for the independent demand item,can be estimated with forecasting techniques such as discussed in this module. The demandfor components is not forecasted independently, it is calculated from the forecast of thefinished product by techniques such as Materials Requirement Planning

Learning ObjectivesThe learning objectives of this module are basically that you will learn to use multipleregression for forecasting. As part of that you will learn to postulate causal or independentvariables, you will learn to use the output from multiple regression software products todevelop a robust forecasting model and finally, you will learn how to make point and intervalforecasts.

Two Quantitative Approaches to ForecastingThere are two quantitative approaches to forec asting, and we stress the word “quantitative”because there are also a number of qualitative approaches to forecasting, which are usedwhen there is no hard data available. But for quantitative forecasting we typically use eithermultiple regression or timed series analysis. Multiple regression develops an equation thatpredicts how one or more independent or causal variables, and sometimes we call thisindependent causal variables, how they can forecast the value of a dependent variable.

Now a time series analysis is different in that it does not necessarily try to bring in causalvariables, but it simply examines data that has been collected over time and quantifies thepatterns it exhibits such as trend and seasonal behavior in order to project those into thefuture for the purposes of making a forecast.

What is a Good Forecast?If the mental question is, what is a good forecast? I think we would all agree that a goodforecast is a prediction that comes as close to the future it tries to predict as possible, butunfortunately, we can only determine that in retrospect. Consequently, to assess how good aforecast might be, we select a recent known past period as a test period and forecast for thatperiod as if it were unknown. Once we have made that forecast if we have several forecasts



Course Transcript


Consequently, we would draw the conclusion that method one is the best forecasting methodfor the data in this example.

Multiple RegressionIf we next focus our attention on the main topic of this module, multiple regression, then wewould indicate that multiple regression requires a sample of data that includes the values of adependent variable that we’ll always refer to as “Y” and a number of corresponding, and I’llsay that number is “M” an arbitrary number, corresponding independent variables, X1, X2through XM and typically M could be as small as one variable if you only have one

independent variable, then you have what is mostly referred to as simple regression. If youhave more than one independent variable then we have what is normally referred to asmultiple regression.

Once you have identified the independent variables and incidentally, those are variables thatyou have reason to believe would impact the value of your dependent variable Y. Once youhave selected your independent variables and the dependent variable, then you postulatethat you will be able to obtain an equation by which to forecast the value of Y. Theforecasted value of Y is usually not the same as the actual value of Y and consequently werefer to Y* as the forecasted value of Y and we hypothesize that that is going to be equal to aset of values, an intercept if you will, that we call β0 and then we have added to that a slopefor each of the independent variables multiplied by the independent variables, in this case itwould be as shown here.

Now, the β’s are betas, we refer to as the regression coefficients. Again, for us to pick Y andX we have to have values for those. The unknowns in that equation are going to be the β’sand we are going to util ize multiple regression to find the values of β0, β1, β2, through βMthat gives us values for Y* that minimize the sum of residuals squared. In other words, wewant to minimize the difference between Y and Y*2, the sum of those. In multiple regression,the test period is not a small period, but we utilize the entire data set as the test period.

A Numerical ExampleTo illustrate how one goes about doing multiple regression, we are going to take a numericalexample, which is a modification of an example given in the book by Makridakes andWheelwright, that is listed in the references. In our particular example, we are going to saythat we have 14 years of data and we are trying to predict the sales for a company that wecall the Carolina Plate Glass Company. Now, this particular company is a fictitious company,but we assume then that the executives of that particular company have gotten together and



Course Transcript


they have found that their principle customers are the automobile producers and the builders.So they then feel that they have reason to believe that if they have the production ofautomobiles for a particular year and the building contracts that are awarded in a particularyear, then they should be able to predict their sales fairly well.

Regression Model for the ExampleThis is a hypothesis and the way that they state this hypothesis is that we believe that oursales are going to be able to be predicted by an equation that is constant β0 plus a β1 x ourautomobile production NAβ2 x our building contracts awarded. Now β0 is the intercept.

Again, the intercept simply means the value of Y when X1 is zero and X2 is zero. When X1

is zero we mean there is no automobile production, when X2 is zero we mean there is nobuilding contracts awarded and presumably then these zeroes simply means the demand orthe sales that would be accrued from sources other than automobile and building.

Now β1 is the slope with respect to automobile production and that simply indicates thechange in the sales, in the predicted sales for every unit change in X1. Similarly, β2 is theslope with respect to building contracts. In other words, if building contracts, X2 changes byone unit, then β2 reflects by how much the predicted from sales will increase. Now we aregoing to utilize mult iple regression to find the values of β0, β1 and β2.

Multiple Regression SoftwareNow, there are a number of different software products that can be used to accomplish this.One that is perhaps as available, if not more so than any other one, is the regressioncapability that’s associated with Microsoft’s Excel spreadsheet. It’s a data analysis add -in. Inour appendix we show how to use that particular add-in to go through all of the steps that willillustrate its use. But there are many other very good software systems, perhaps for theheavy duty user of regression and those include systems that are called Minitab, you alsohave the SAS and you also have SPSS, those are all acronyms for regression packages.Many times they include much more than just regression, they are part of overall statistical

computational systems.

MR Software -Differences and CommonalitiesNow, there are differences and commonalities in the various multiple regression softwareproducts that exist. First of all, multiple regression usually involves the manipulation of largevolumes of data and most multiple regression software packages differ in the manner in



Course Transcript


which data is entered and the types of statistical analysis they provide above and beyondbasic multiple regression. They differ in the amount and type of output analysis includinggraphics that they provide. However, all multiple regression software packages providesimilar basic multiple regression output and we’ll refer to that as BMR, basic multipleregression output.

BMR OutputNow, the basic multiple regression output that is provided by all software packagesencompasses information about regression statistics, something that is typically referred to

as ANOVA or analysis of variance output, it provides information regarding the actual valuesand the predicted values, in other words, residual output. It gives information about thecorrelation matrix.

Using BMR OutputIn order to use basic multiple regression output, we have to understand that multipleregression is a statistical procedure that considers your data to be a sample from the largerpopulation. A variety of statistical tests must be conducted before a regression model isconsidered to yield statistically significant estimates of the population parameters it produces.Thus the basic multiple regression outputs will be submitted to five tests in sequence, onlywhen all five tests are satisfied, will the resulting regression model be considered ready toproduce statistically reliable forecasts. A model that passes all five tests is referred to as arobust model.

BMR Output for ExampleTo give you an idea of what a basic multiple regression output might look like, here we havethe output of what Excel’s data analysis add -in produces. All regression program softwareproducts produce something that we refer to as the regression statistics. We will not explainthese here. We will explain these as we go on.



Course Transcript


BMR Output for Example (cont.)They produce a residual output, which is shown in this particular table, and is simply a listingof the predicted values and the residuals, which is the difference between the predicted andthe actual values that you have.

BMR Output for Example (cont.) And then all basic multiple regression outputs also provide a correlation matrix that talksabout the relationship between the independent variables with each other, as well as each

independent variable and the dependent variable. What we are talking there is the degree ofassociation between those and of course, the degree of association is referred to as thecorrelation between two variables.

Regression Tests for “Robustness” We have indicated that before we can use the basic multiple regression output for forecastingwe have to subject that information to a sequence of five tests that ultimately will lead to astatistically reliable or robust model that we will be able to use for forecasting.

The first test that we are going to use is what we call test one, where we check the adjustedR2 and we do that to see the extent, the independent variables, and explain the dependentvariable. Now the adjusted R2 gives the percentage of the overall variability of thedependent variable that is explained by the independent variables.

There are different statisticians that use diffe rent criteria. It’s recommend that the criteria forsuccessfully passing this test are that your adjusted R2 is greater than or equal to 0.6. Inother words, I would personally like to see models for which the independent variablesexplain at least 60% of the overall variability of the dependent variable.

What do we do if that test is not met? Well, if we have not explained enough of the variabilityof the independent variable, perhaps then we would want to look for additional independentvariables. Can we think of any other independent variables that we have not yet included inthe model that might help us to further explain the value of the dependent variable?

If the answer to that is negative, then we simply say, okay, that’s the best we can do and we’llgo ahead and proceed, wishing that we had variables that would explain a greater



Course Transcript


percentage of the variability. Again, this is no reason to stop, we simply continueunderstanding that we just wish that we could do better.

Applying Test 1In appl ying the test, it’s very simple. We simply go to that first part of the output, we go to theregression statistics, we look for where it says “adjusted R2” and we look at that number andthat number is greater than 0.6. Consequently, we say that test one is okay. We alwaysrecommend that somehow in the regression output we type out the results of each of thetests that we take.

Regression Tests for “Robustness” The second test for robustness is to check the F statistic. We check the F statistic to seewhether any of the independent variables explain the dependent variable. Currently mycriterion for successfully passing this test is if the F statistic is greater than or equal to five.

Again, different authors of different texts have different preferences, but you should be happyif the F statistic is greater than or equal to five, then we have at least one useful variable inour model.

If we do not get an F statistic that is greater than or equal to five, then unfortunately, we haveto throw all of the independent variables away and start from scratch, see if in fact we canfind independent variables that do explain the dependent variable.

Applying Test 2 Applying test two, just like applying test one is a simple process. We now go to the ANOVA,the analysis of variance tables that are given to us. I have marked in red where the F statisticis located. In our case, the F statistic has a value of 60.41, which definitely is greater thanfive. Consequently, we pronounce test two as being satisfied.

Regression Tests for “Robustness” The third test examines the correlation matrix to determine if multicolinearity exists betweenpairs of independent variables. Multicolinearity exists when two independent variables are“strongly” correlated to each othe r meaning that the absolute value of their correlation



Course Transcript


coefficient is greater than 0.70. If two previously thought independent variables arecorrelated, then one of the two is not only redundant in predicting the dependent variable, butit compounds any error associated with those variables.

Test 3 is passed if ALL the correlation coefficients between the INDEPENDENT variableshave an absolute value equal to or less than 0.70.

If there is only one pair of independent variables that have an absolute value of theircorrelation coefficient greater than 0.7, then we discard from the regression model theindependent variable that has the least correlation coefficient in absolute value with thedependent variable.

Regression Tests for “Robustness” If more than a single pair of independent variables have a correlation coefficient in absolutevalue between them greater than 0.7, then we select the pair that has the largest correlationcoefficient and absolute value. From that pair, we discard from the regression model theindependent variable that has the least correlation coefficient and absolute value with thedependent variable.

We continue to eliminate independent variables from the regression model using the aboveactions until all remaining independent variables have correlation coefficients that are lessthan or equal to 0.7 and absolute value. Now something that is important to remember is thatafter each independent variable is removed from the regression model, according to theabove rules, the remaining data must be processed by the regression software again and alltests repeated again.

Preparing to Conduct Test 3Clearly before we can apply test three, we have to obtain the correlation matrix from themultiple regression software that you are using.

Applying Test 3For our example, your correlation output would be what you see at the top left-hand corner ofthis particular spreadsheet. Now, below that, we explain the meaning of all of thatinformation because the key of what we have said is that the test involves checking thecorrelation coefficients between independent variables. We only have two independent



Course Transcript


variables X1 and X2 and the correlation coefficient between X1 and X2 is given to us ingreen. In this case it’s 0.030414.

The other infor mation involves if we have to remove a coefficient, which we don’t have in thisparticular case or we have to remove a variable and you see here, that the red number is thecorrelation between variable X1 automobile production and sales and the black variableinvolved is the correlation coefficient between building contracts awarded and sales. It isimportant to remember that when we want to use independent variables to predict adependent variable it is good to have high correlation between the independent variable andthe dependent variable, but not between the independent variables.

Applying Test 3 (Cont.)So having explained a meaning of the entries in the correlation matrix, we can now apply testthree by checking as to whether there is multicolinearity. In order to pass test 3, all thecorrelation coefficients between the independent variables must be less than or equal to 0.70in absolute value. That particular correlation coefficient, the green number has a value of0.030414, which is less than 0.70; consequently we conclude that there is no multicolinearityand consequently test 3 passes.

Regression Tests for “Robustness” Test four for robustness involves performing what is referred to as a T test. We do that oneach of the regression coefficients in order to determine if it is statistically significantlydifferent from zero. Now my criteria for passing test four is that all regression coefficientsmust have the absolute value of their corresponding T statistic greater than or equal to 2.0.This corresponds to approximately a 95% confidence interval depending on whether you usea T or Z statistic. We use a 95% confidence interval in all of our discussions in this module.

If we find that there is a T statistic that is less than two in absolute value, then we must select,

if there’s more than one, the regression coefficient that has the smallest absolute value of theT statistic and discard from the regression model, the independent variable that is associatedwith that regression coefficient. Note that after each independent variable is removed fromthe regression model, the remaining data must be processed by the regression softwareagain and all tests repeated. We never simply erase or delete a variable from the regressionmodel without rerunning the entire model with the remaining data.



Course Transcript


Applying Test 4In applying test four, remember in our example we already applied test one, test two and testthree and they were all satisfied. Now we are applying test four. The T statistic is indicatedin the column in red. We find that there’s a T statistic that is less than two in absolute value,but it’s associated with the intercept. In other words, the intercept is not significantly differentfrom zero and of course, if something is not significantly different from zero, then we might aswell call it zero.

Test four is not okay because of that. Consequently, we must remove the intercept from themodel. Know that there is no independent variable associated with the intercept;nevertheless we must remove the intercept.

Removing the Intercept And in order to do that most multiple regression software allows the user to specify that theywish the intercept, sometimes called the constant, to be zero. So to remove the intercept wehave to select that option and we have to rerun the multiple regression. Again, do not simplyeliminate the constant when the intercept is forced to be zero, all other output is impacted.

Removing the Intercept (Cont.)Now in removing the intercept, if we go to the multiple regression software and check that wewant the intercept to be removed, then for our example, we have the new regression resultsshown here. We perform test one again, we see that it is okay. We perform test two againand we see that it is okay. Now, test number three never has to be performed once it is okay,so since test three was okay last time, there was no multicolinearity, then it continues to begrandfathered in from here on. Consequently, now we check the T statistics, the intercept iszero and now, we have slightly different values for the value of β 1 automobile production andthe value of β 2, building contracts awarded. The associated T statistics are all in absolute

value greater than or equal to two. Consequently now, test four is okay and we can move onto the next test.

Regression Tests for “Robustness” The next test, test five is to determine if the distribution of residuals appears to be Gaussianor bell-shaped. Now, criteria for successfully passing this test are that the histogram of the



Course Transcript


residuals, which is given as part of the regression output, should appear to be Gaussian. If itis not, then it means that there are other independent variables that have not been identifiedor that the data exhibits other correlation. When this happens, you may want to consultsomeone that has considerable statistical expertise regarding how to resolve this problem.However, if all of the other tests are passed and you are stuck and you can’t go anywhereelse, then we suggest that you proceed with caution, understanding that there is stillsomething out there that is not random, that is impacting the value of your sales.

Obtaining the Distribution of Residuals

To obtain the distribution of residuals, the residuals are almost always provided in any basicmultiple regression program. Most multiple regression programs will give you histograms,some of them don’t. If you do not have the option of getting the histogram of residuals, thenyou would have to get it and in order to get that you might remember that we get thehistogram by deciding how many observations we have. If we have less than 25observations and consequently we need to have either five or six class intervals. Now theserules are given in any of the statistics books listed in the references.

Once you have decided how many class intervals you want, then you have to determine thewidth of the class interval and that is simply the difference between the maximum value in thedata, the minimum value of the data, divided by your number of class intervals and then yourhistogram, your class interval limits are essentially the prior limit plus the class interval wherethe first prior form is the min value of your data set. Again here, you have that informationgiven and you simply tally how many values fall between -82 and -56, how many between -56and -29 and so on.

Applying Test 5What we have found here is that once we have the distribution of residuals, then it almostseems to be shaped by a Gaussian distribution, a bell-shaped or normal distribution, but it

seems to be bimodal. The second class seems to be higher than the class before and after itand this bimodal distribution is one that we are always concerned about, it’s not bell -shapedand when you have this bimodal distribution, then that is usually an indication that you mayhave some other correlation, something else in play. To remove that so that all yourresiduals are random and not due to some other cause, you may want to consider someonethat has a greater statistical exper tise, but we don’t have that. We look at this and say, wellgosh, we know there’s something going on here, but we’ll just go o n and proceed with ouranalysis.



Course Transcript


Making a Forecast (Cont.)The regression equation was obtained from a sample of data, hence, we have to think of Y*as the expected estimate of sales. The actual values of the expected sales are a randomvariable, which the mean is Y* and they have a variability, a standard error. That standarderror is found right under our adjusted R2.

Making A Forecast (Cont)Keeping that in mind, we go back to our example and let us remember that we have a 14-

year period, a set of history, which we did. We wanted to get a forecast, and let’s assumethat we want to get a forecast for year 15, 16, 17, 18 and 19. We did have independentvariable data together with our sales data, but we are now stuck with the fact that if we wantto forecast our dependent variable, we have to have values for X1 and X2 into the future. Sothe question is where do we get that? Typically, we have to select independent variables forwhich there are estimated future values. In our case, automobile production is somethingthat is typically forecasted by industry and governmental groups, as well as building activity.So we need to look at the forecast generated by the automobile or builders or by thegovernment economic agencies and see what they predict over these next several years.We have to use those figures as the value of the independent variables that we are going touse to calculate our dependent variable.

Making a Forecast (Cont.)So let us now say that we have gone to those data sources and we have gotten estimates forwhat future automobile production is going to be, what future building contracts awarded isgoing to be and now we want to forecast sales.

Making a Forecast (Cont.)When we make a forecast we should provide three pieces of information. First of all, theexpected value, then the upper confidence limit, which we refer to as UCL and the lowercompetence limit, LCL, so statistically we are talking about a confidence interval about themean, which is the expected forecast. The lower confidence limit and the upper confidencelimit is that region about the expected value that will give us the 95% of all possible valuesthat the true value of the future sale can be.



Course Transcript


Making a Forecast (Cont.)Using the figures from our example, we would calculate the expected sales or Y* by simplyplugging the values of X1 and X2 for the future years in this formula.

The upper confidence limit would be the expected sales or Y* plus the product of 1.96 timesthe standard error.

The lower confidence limit is the expected sales or Y* minus the product of 1.96 times thestandard error.

Now, these figures are approximations, I would call them engineering approximations. If yougo to multiple regression textbooks, they have some other terms there to account for goingfarther into the future. I think that these particular values are good enough for basic purposes,understanding that we’re going to be wrong anyway.

Making a Forecast (Cont.) Applying those formulas we find that for year 15 our expected value is 587.01, the lowerconfidence limit, which is this value -1.96, the standard error is this guy, this value is theexpected value plus 1.96 times the standard error. So here again, we have these values ofthe independent variables that we got from industry groups, this is the best single value for aforecast that we can have, but are better off saying that for example, in year 19 we are 95%confident that our sales are going to be between 539.76 million and 683.58 million with theexpected number being 611.67.

AppendixThis module provides an appendix, which are instructions for doing everything that we haveshown in this module using the Excel data analysis add-in on multiple regression. This is not

a recommendation that you should choose this specific tool for multiple regression. Thereare many fine products, but we thought it would be useful to illustrate what might be involvedin regression by providing this appendix.



Course Transcript


Multiple Regression with ExcelFirst of all, for what we show in this appendix, you must have Excel 2007 installed on yourcomputer. Once you have verified that you have Excel 2007 installed on your computer, thenyou can follow these steps to get your data analysis link. Now, of course you can also clickthe help button for specific details

Obtaining the Data Analysis Link And when you do, then the Microsoft Excel help shows you how to load the analysis tool,

which is what I call a data analysis add-in and here it gives you again in more specific details,exactly how to get this data analysis installed.

The Regression Dialog BoxWhen you go and get regression, you go to data analysis, you ask for regression and thisparticular box pops up, this dialogue box

Filling in the Regression Dialog Box

Once you ask for the dialogue box and you get it, you fill it in according to the way that wehave shown here. Your input by range is the range of cells where you find your dependentvariable Y, and that is from C8 to cell C22. Your input X range refers to the matrix of all ofyour independent variables, and they have to be in your data set configured to each other. Inthis case, you can see that our two independent variables are X1 and X2 and the first cell inthat matrix is D8 and it goes to the lower right-hand corner, D22. So those are the addressesof both our dependent an d independent variable. Since we’ve entered in the cell addresses,the headings of the column Y for the dependent variable and X1 and X2 for the independentvariable, we check the box where it says “labels”. We then indicate that we want a newworksheet. What that means is we want the output of the regression to come out in aseparate worksheet and then we have to name that worksheet and we call it regression all,meaning we have not taken any variables out of it yet. Then finally, we do want to get theresiduals that we have talked about, so we check the box that says “residuals”. So we’veentered the cell addresses for the dependent variable, for these independent variables, we’veindicated that we have labels in those addresses, we gave a name to the sheet where theresults will appear and we’ve indicated that we wanted to get the residuals. That then allowsyou to go to the next step.



Course Transcript


Explanation of Dialog BoxNow, we’ve summarized these steps in this particular slide and once we have completed the box as we showed in a prior slide, you can click “OK”.

BMR Output for Example And after we say “OK” you would get the information that we have shown in the main body ofthis particular module, namely your summary output, your ANOVA output, as well as yourresidual output.

Example for Getting Correlation MatrixIn order to conduct test three, we had to get a correlation matrix. We obtained a correlationmatrix by going to the data analysis menu that comes up when you ask for the add-in, thenwe look for the entry that says “correlation” that we see in the arrow labeled “A” and we selectit. When we select it we get a box that is titled “correlation” and we fill it out with the inputrange to begin with. The input range is the matrix where all of your dependent andindependent variable are located. The initial cell for that matrix is cell C6 and the bottomright-hand corner, the end of that matrix is cell E20. Now as before, we have included thelabels of the columns Y, X1 and X2 in our input range, so we check labels to indicate that.Next we indicate that we want a new worksheet and we’re going to call it “correlation all”.Once we have filled in this information, then we can click “OK”.

Instructions for Getting Correlation MatrixShown here is a summary of the information we just covered for reference purposes.

Example Correlation OutputIf you say “OK”, you get this information that was the correlation matrix that we showed in themain body of the module



Course Transcript


class number one we have an upper limit of -56 and some, our second class we have anupper limit of -29 and some, all the way up to the five intervals or classes that we wanted.

Example of Getting Histogram with Excel (Cont.)Now, to use this information in Excel, we have to go to the Excel data analysis menu, wehave to ask for a histogram and then we have enter the information in the histogram sheet.The input range now is all your cell addresses for the residual starting at cell C3, going to cellC17. Your second entry is your bin range. Your bin and your class intervals are the samething in Excel, so a bin is simply a class interval and what Excel wants are the upper limits.

You will put in your list of upper limits, and you’ll leave the label in here limit, and we put thatin here so the bin range goes from B21, which is where the limit is, all the way to B26.

Now we’ve decided that we want the output from this particular worksheet to be located in anarbitrary location, so we’re going to pick F3, which i s going to be here to see the output.Then very importantly you want to be sure that you ask for the chart output so we havechecked under G, the chart output box.

Instructions for Getting Histogram with Excel (Cont.)Shown here is a summary of the information we just covered for reference purposes.

Excel Histogram OutputWe will get this output information from the Excel histogram function. Remember, wecalculated this and this was the data that we had and then the histogram output gives us thisinformation, the frequencies and the histogram, the graph of these frequencies.

Instructions for Generating Y vs Predicted Y GraphThe last function in Excel that we performed was to generate the Y versus predicted Y graph,not Y versus, but the graph where we showed the actual and the forecasted values together.What we have to do is we have to obtain the predicted Y data from the regression, no INToutput. We have to obtain the Y data from the original data set, we copy both of those andpaste them both in a new area of the spreadsheet and we have to highlight both columns.



Course Transcript


Then from the Excel “insert” tab we click “line” and then we click one of the options, I usuallyclick the first option. Doing this creates the actual versus predicted Y graph.

The purpose of obtaining this graph is to obtain a visual of how the estimates for Y obtainedfrom the regression compared to the actual values.

Example of Actual vs. Predicted PlotHere we have collected the Y from my regression, no INT output, the predict ed Y. We’vegotten the Y from the original data, we put them side-by-side on this spreadsheet and we

have highlighted them. Now, we go to the “insert” tab and in the “insert” tab, which is next tothe “home” tab on the top of the Excel spreadsheet, we look for the “line graph” icon that weshow with β and once we click that, then low and behold we can easily get this particulargraph. This then allows you to get a visual of how well your regression predicted youractuals.

Course SummaryIn this tutorial we reviewed the importance of creating demand forecasting and the stepsinvolved in using multiple regression to create a demand forecast. We used the Solver Add-infor Microsoft Excel to demonstrate how to create a demand forecast using multipleregression.



Glossary


ResidualThe difference between actual demand and forecasted demand

Sum of Squares

The sum of the residuals for each data period squared. It is the measure of forecast“goodness” in multiple regression.

Multiple Regression

This is a simple, yet reasonable, algorithm than is used to establish minimum expectedperformance on a dataset. For instance, the eigenfaces approach based on principalcomponent analysis is the baseline algorithm for face recognition. And, the silhouettecorrelation approach establishes the baseline for gait recognition.

Regression Coefficients

The coefficients associated with of each term of a regression model which will minimize theresiduals and is obtained from the regression analysis.

Robust Model

A regression model that is statistically significant (passes all 5 tests for robustness)

Adjusted R Square

Indicates the percentage of variability of the dependent variable explainedby the regression model.

Multicolinearity

The existence of significant correlation between previously hypothesized independentvariables.

T – test

A test used in multiple regression to determine if the regression coefficients are significantlydifferent than zero.

Expected forecastThe value of the dependent variable as computed from the regression equation.

Upper Confidence Limit of a forecast

Together with the lower confidence limit, it forms an interval between which the true value ofthe forecast will be with a predetermined probability (95% for our example).



References

MS Nixon, T Tan, R Chellappa, "Human Identification Based on Gait," Springer 2006, ISBN978-0-387-24424-2.

Nixon, M.S.; Carter, J.N.; "Automatic Recognition by Gait," Proceedings of the IEEE, vol.94,no.11, pp.2013-2024, Nov. 2006

Kale, A.; Sundaresan, A.; Rajagopalan, A.N.; Cuntoor, N.P.; Roy-Chowdhury, A.K.; Kruger,V.; Chellappa, R.; "Identification of humans using gait," Image Processing, IEEETransactions on, vol.13, no.9, pp.1163-1173, Sept. 2004.

Han, J.; Bhanu, B.; "Individual recognition using gait energy image," Pattern Analysis andMachine Intelligence, IEEE Transactions on , vol.28, no.2, pp.316-322, Feb. 20.

Sarkar, S., Liu, Z.: Gait Recognition. In: Handbook of Biometrics. Springer (2008)

Z. Liu and S. Sarkar, “Improved Gait Recognition by Gait Dynamics Normalization,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 6, pp. 863 – 876,June 2006.

Z. Liu and S. Sarkar, “Effect of Silhouette Quality on Hard Problems in Gait Recognition,”IEEE Transactions on Systems, Man, and Cybernetics-Part B, vol. 35, no. 2, pp. 170 –183, Apr. 2005.

S. Sarkar, P. Jonathon Phillips, Z. Liu, I. Robledo, P. Grother, K. Bowyer, “The Human IDGait Challenge Problem: Data Sets, Performance, and Analysis,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 162 – 177, Feb. 2005.

H. Vajaria, T. Islam, P. Mohanty, S. Sarkar, R. Sankar, R. Kasturi, “Evaluation and analysis ofa face and voice outdoor multi- biometric system,” Pattern Recognition Letters, vol. 28,no. 12, pp. 1572 – 1580, Sept. 2007.

Z. Liu and S. Sarkar, “Outdoor recognition at a distance by fusing gait and face,” Image andVision Computing, vol. 25, no. 6, pp. 817 —832, June 2007.

Demand Forecasting With Multiple Regression Course Notes

Documents

Transcript of Demand Forecasting With Multiple Regression Course Notes