Business Analysis- Causal Models and Regression Analysis
Transcript of Business Analysis- Causal Models and Regression Analysis
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
1/36
Causal Models and
Regression Analysis
Chapter 13
Forecasting
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
2/36
In aIn a causal forecastingcausal forecasting model, the forecast for themodel, the forecast for the
quantity of interest rides piggyback on anotherquantity of interest rides piggyback on anotherquantity or set of quantities.quantity or set of quantities.
In other words, our knowledge of the value ofIn other words, our knowledge of the value ofone variable (or perhaps several variables)one variable (or perhaps several variables)enables us to forecast the value of anotherenables us to forecast the value of anothervariable.variable.
In this model, letIn this model, let
yy denote the true value of some variable ofdenote the true value of some variable ofinterest andinterest and
yy denote a predicted or forecast value fordenote a predicted or forecast value forthat variable.that variable.
^
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
3/36
Then, in a causal model,Then, in a causal model,
wherewhere
ff is a forecasting rule, or function, andis a forecasting rule, or function, and
xx11,, xx22 , , xxii , is a set of variables, is a set of variables
yy == ff((xx11,, xx22, , xxnn))^
In this representation, theIn this representation, the xx variables are oftenvariables are oftencalledcalled independent variablesindependent variables, whereas, whereas yy is theis thedependentdependent oror response variableresponse variable..
^
We either know the independent variables inWe either know the independent variables inadvance or can forecast them more easily thanadvance or can forecast them more easily than yy..
Then the independent variables will be used in theThen the independent variables will be used in theforecasting model to forecast the dependentforecasting model to forecast the dependentvariable.variable.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
4/36
Companies often find by looking at pastCompanies often find by looking at pastperformance that their monthly sales are directlyperformance that their monthly sales are directlyrelated to the monthly GDP, and thus figure thatrelated to the monthly GDP, and thus figure thata good forecast could be made using nexta good forecast could be made using nextmonths GDP figure.months GDP figure.
The only problem is that this quantity is notThe only problem is that this quantity is not
known, or it may just be a forecast and thus not aknown, or it may just be a forecast and thus not atruly independent variable.truly independent variable.
To use a causal forecasting model, requires twoTo use a causal forecasting model, requires twoconditions:conditions:
1.1. There must be a relationship betweenThere must be a relationship betweenvalues of the independent and dependentvalues of the independent and dependentvariables such that the former providesvariables such that the former providesinformation about the latter.information about the latter.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
5/36
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
6/36
The company plans to use traffic flow (measuredThe company plans to use traffic flow (measuredin the average number of cars per hour) toin the average number of cars per hour) toforecast sales (measured in average dollar salesforecast sales (measured in average dollar salesper hour).per hour).
The firm has had five stations in operation forThe firm has had five stations in operation formore than a year and has used historical data tomore than a year and has used historical data to
calculate the following averages:calculate the following averages:
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
7/36
The averages are plotted in a scatter diagram.The averages are plotted in a scatter diagram.
$-
$50.00
$100.00
$150.00
$200.00
$250.00
$300.00
0 50 100 150 200 250
Cars/hour
Sales/hour
($)
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
8/36
Now, these data will be used to construct aNow, these data will be used to construct afunction that will be used to forecast sales at anyfunction that will be used to forecast sales at anyproposed location by measuring the traffic flow atproposed location by measuring the traffic flow at
that location and plugging its value into thethat location and plugging its value into theconstructed function.constructed function.
Least Squares FitsLeast Squares Fits The method of least squares isThe method of least squares is
a formal procedure for curve fitting. It is a twoa formal procedure for curve fitting. It is a two--step process.step process.
1.1. Select a specific functional form (e.g., aSelect a specific functional form (e.g., astraight line or quadratic curve).straight line or quadratic curve).
2.2. Within the set of functions specified in stepWithin the set of functions specified in step1, choose the specific function that1, choose the specific function thatminimizes the sum of the squaredminimizes the sum of the squareddeviations between the data points and thedeviations between the data points and the
function values.function values.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
9/36
To demonstrate the process, consider the salesTo demonstrate the process, consider the sales--traffic flow example.traffic flow example.
1.1. Assume a straight line; that is, functions ofAssume a straight line; that is, functions ofthe formthe form y = a + bxy = a + bx..
2.2. Draw the line in the scatter diagram andDraw the line in the scatter diagram andindicate the deviations between observedindicate the deviations between observed
points and the function aspoints and the function as ddii ..
dd11 == yy11 [a +b[a +bxx11] = 220] = 220 [a + 150b][a + 150b]
For example,For example,
wherewhereyy11 = actual sales/hr at location 1= actual sales/hr at location 1xx11 = actual traffic flow at location 1= actual traffic flow at location 1aa == yy--axis intercept for the functionaxis intercept for the function
bb = slope for the function= slope for the function
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
10/36
The valueThe value dd1122 is one measure of how close theis one measure of how close the
value of the functionvalue of the function [a +b[a +bxx11]] is to the observedis to the observedvalue,value, yy11; that is it indicates how well the; that is it indicates how well the
function fits at this one point.function fits at this one point.
$-
$50.00
$100.00
$150.00
$200.00
$250.00
$300.00
0 50 100 150 200 250
Cars/hour
Sales/hour($)
dd22
dd55dd44
dd11
dd33
yy == aa ++ bxbx
yy
xx
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
11/36
One measure of how well the function fits overallOne measure of how well the function fits overallis the sum of the squared deviations:is the sum of the squared deviations:
ddii2277i=1i=1
55
Consider a general model withConsider a general model with nn as opposed toas opposed tofive observations. Since eachfive observations. Since each ddii == yyii (a +b(a +bxxii)),,
the sum of the squared deviations can be writtenthe sum of the squared deviations can be writtenas:as:
77i=1i=1
nn
((yyii [a +b[a +bxxii])])22
Using the method of least squares, selectUsing the method of least squares, select aa andand bbso as to minimize the sum in the equation above.so as to minimize the sum in the equation above.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
12/36
Now, take the partial derivative of the sum withNow, take the partial derivative of the sum withrespect torespect to aa and set the resulting expressionand set the resulting expressionequal to zero.equal to zero.
77i=1i=1
nn
--2(2(yyii [a +b[a +bxxii]) = 0]) = 0
A second equation is derived by following theA second equation is derived by following the
same procedure withsame procedure with bb..
77i=1i=1
nn
--22xxii ((yyii [a +b[a +bxxii]) = 0]) = 0
Recall that the values forRecall that the values for xxii andand yyii are theare theobservations, and our goal is to find the values ofobservations, and our goal is to find the values ofaa andand bb that satisfy these two equations.that satisfy these two equations.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
13/36
The solution is:The solution is:
xxii
77i=1i=1
nn
xxiiyyii --
b =b =
11nn 77
i=1i=1
nn
xxii 77i=1i=1
nn
yyii
77i=1i=1
nn
xxii22 --
11nn 77
i=1i=1
nn 22
aa == 11nn 77i=1i=1
nn
yyii -- bb11nn 77i=1i=1
nn
xxii
The next step is to determine the values for:The next step is to determine the values for:
77i=1i=1
nn
xxii22 77
i=1i=1
nn
yyii77i=1i=1
nn
xxii 77i=1i=1
nn
xxiiyyii
Note that these quantities depend only onNote that these quantities depend only onobserved data and can be found with simpleobserved data and can be found with simplearithmetic operations or automatically usingarithmetic operations or automatically using
Excels predefined functions.Excels predefined functions.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
14/36
Using Excel, click onUsing Excel, click on ToolsTools Data Analysis Data Analysis
In the resultingIn the resulting
dialog, choosedialog, chooseRegressionRegression..
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
15/36
In theIn the RegressionRegression dialog, enter thedialog, enter the YY--range andrange andXX--range.range.
Choose toChoose toplace theplace theoutput inoutput in
a newa newworksheetworksheetcalledcalledResultsResults
SelectSelect Residual PlotsResidual Plots andand Normal Probability PlotsNormal Probability Plotsto be created along with the output.to be created along with the output.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
16/36
ClickClick OKOK to produce the following results:to produce the following results:
Note thatNote that aa ((InterceptIntercept) and) and bb ((XVariable 1XVariable 1) are) arereported asreported as 57.10457.104 andand 0.929970.92997, respectively., respectively.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
17/36
To add the resulting least squares line, first clickTo add the resulting least squares line, first clickon the worksheeton the worksheet Chart 1Chart 1 which contains thewhich contains theoriginal scatter plot.original scatter plot.
Next, click on the data series so that they areNext, click on the data series so that they arehighlighted and then choosehighlighted and then choose Add Trendline Add Trendline from thefrom the ChartChart pullpull--down menu.down menu.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
18/36
ChooseChoose Linear TrendLinear Trend in the resulting dialog andin the resulting dialog andclickclick OKOK..
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
19/36
A linear trend is fit to the data:A linear trend is fit to the data:
$-
$50.00
$100.00
$150.00
$200.00
$250.00
$300.00
0 50 100 150 200 250
Cars/hour
Sales/hour($)
Series1
Linear (Series1)
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
20/36
One of the other summary output values that isOne of the other summary output values that isgiven in Excel is:given in Excel is: R Square = 69.4%R Square = 69.4%
This is a goodness of fit measure whichT
his is a goodness of fit measure whichrepresents therepresents the RR22 statistic discussed instatistic discussed inintroductory statistics classes.introductory statistics classes.
RR22 ranges in value fromranges in value from 00 toto 11 and gives anand gives an
indication of how much of the total variation inindication of how much of the total variation inYYfrom its mean is explained by the new trend line.from its mean is explained by the new trend line.
In fact, there are three different sums of errors:In fact, there are three different sums of errors:
TSSTSS (Total Sum of Squares)(Total Sum of Squares)
ESSESS (Error Sum of Squares)(Error Sum of Squares)
R
SSR
SS (R
egression Sum of Squares)(R
egression Sum of Squares)
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
21/36
The basic relationship between them is:The basic relationship between them is:
TSS = ESS + RSSTSS = ESS + RSS
They are defined as follows:They are defined as follows:
TSS =TSS = 77i=1i=1
nn
((YYii YY ))22
ESS =ESS = 77i=1i=1
nn
((YYii YYii ))22^
77i=1i=1
nn
((YYii YY ))22^ RSS =RSS =
Essentially, theEssentially, the ESSESS is the amount of variationis the amount of variationthat cant be explained by the regression.that cant be explained by the regression.
TheThe RSSRSS quantity is effectively the amount of thequantity is effectively the amount of theoriginal, total variation (original, total variation (TSSTSS) that could be) that could be
removed using the regression line.removed using the regression line.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
22/36
If the regression line fits perfectly, thenIf the regression line fits perfectly, then ESS = 0ESS = 0andand RSS = TSSRSS = TSS, resulting in, resulting in RR22 = 1= 1..
RR22 ==RSSRSSTSSTSS
RR22 is defined as:is defined as:
In this example,In this example, RR22 = .694= .694 which means thatwhich means thatapproximatelyapproximately 70%70% of the variation in theof the variation in theYYvalues is explained by the one explanatoryvalues is explained by the one explanatoryvariable (variable (XX), cars per hour.), cars per hour.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
23/36
Now, returning to the original question: ShouldNow, returning to the original question: Shouldwe build a station at Buffalo Grove where trafficwe build a station at Buffalo Grove where trafficis 183 cars/hour?is 183 cars/hour?
The best guess at what the corresponding salesThe best guess at what the corresponding salesvolume would be is found by placing thisvolume would be is found by placing this XX valuevalueinto the new regression equation:into the new regression equation:
Sales/hour = 57.104 + 0.92997 * (183 cars/hour)Sales/hour = 57.104 + 0.92997 * (183 cars/hour)
However, it would be nice to be able to state aHowever, it would be nice to be able to state a95% confidence interval around this best95% confidence interval around this best
guess.guess.
yy = a + b *= a + b * xx^
= $227.29= $227.29
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
24/36
Excel reports that theExcel reports that the
standard error (standard error (SSee) is) is44.1844.18..
This quantity representsThis quantity representsthe amount of scatter inthe amount of scatter in
the actual data aroundthe actual data aroundthe regression line.the regression line.
We can get the information to do this from ExcelsWe can get the information to do this from ExcelsSummary OutputSummary Output..
The formula forThe formula for SSee is:is:
SSee ==77i=1i=1
nn
((YYii YYii ))22^
nn kk --11
WhereWhere nn is the numberis the numberof data points (e.g.,of data points (e.g., 55))andand kk is the number ofis the number ofindependent variablesindependent variables
(e.g.,(e.g., 11).).
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
25/36
This equation is equivalent to:This equation is equivalent to:nn kk --11
ESSESS
Once we knowOnce we know SSee and based on the normaland based on the normaldistribution, we can state thatdistribution, we can state that
We haveWe have 68%68% confidence that the actualconfidence that the actual
value of sales/hour is withinvalue of sales/hour is within ++ 11 SSee of theof thepredicted value (predicted value ($277.29$277.29).).
We haveWe have 95%95% confidence that the actualconfidence that the actualvalue of sales/hour is withinvalue of sales/hour is within ++ 22 SSee of theof the
predicted value (predicted value ($277.29$277.29).).
[[277.29277.29 2(44.18)2(44.18);; 227.29 + 2(44.18)227.29 + 2(44.18)]]
[[$138.93$138.93;; $315.65$315.65]]
TheThe 95%95% confidence interval is:confidence interval is:
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
26/36
Another value of interest in theAnother value of interest in the SummarySummary reportreportis theis the tt--statistic for thestatistic for the XX variable and itsvariable and itsassociated values.associated values.
TheThe tt--statistic isstatistic is 2.612.61 and theand the PP--value isvalue is 0.07980.0798..
AA PP--value less thanvalue less than 0.050.05 represents that we haverepresents that we haveat leastat least 95%95% confidence that the slope parameterconfidence that the slope parameter((bb) is statistically significantly than) is statistically significantly than 00 (zero).(zero).
A slope ofA slope of 00 results in a flat trend line andresults in a flat trend line andindicates no relationship betweenindicates no relationship betweenYY andand XX..
TheThe 95%95% confidence limit forconfidence limit for bb is [is [--0.2050.205;; 2.0642.064]]
Thus, we cant exclude the possibility that theThus, we cant exclude the possibility that the
true value oftrue value of bb might bemight be 00..
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
27/36
Also given in theAlso given in the SummarySummary report is thereport is theFF significance. Since there is only onesignificance. Since there is only oneindependent variable, theindependent variable, the FF significance issignificance is
identical to theidentical to the PP--value for thevalue for the tt--statistic.statistic.
In the case of more than oneIn the case of more than one XX variable, thevariable, the FF significance tests the hypothesis that all thesignificance tests the hypothesis that all the XX
variable parameters as a group are statisticallyvariable parameters as a group are statisticallysignificantly different than zero.significantly different than zero.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
28/36
Concerning multiple regression models, as youConcerning multiple regression models, as youadd otheradd other XX variables, thevariables, the RR22 statistic will alwaysstatistic will alwaysincrease, meaning theincrease, meaning the RSSRSS has increased.has increased.
In this case, the AdjustedIn this case, the AdjustedRR22 statistic is a reliablestatistic is a reliable
indicator of the trueindicator of the truegoodness of fit because itgoodness of fit because it
compensates for thecompensates for thereduction in thereduction in the ESSESS due todue to
the addition of morethe addition of moreindependent variables.independent variables.
Thus, it may report a decreased adjustedThus, it may report a decreased adjusted RR22 valuevalueeven thougheven though RR22 has increased, unless thehas increased, unless theimprovement inimprovement in RSSRSS is more than compensatedis more than compensatedfor by the addition of the new independentfor by the addition of the new independent
variables.variables.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
29/36
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
30/36
One must proceed with caution when fitting dataOne must proceed with caution when fitting datawith a polynomial function.with a polynomial function.
For example, it is possible to find a (For example, it is possible to find a (kk 11))--degreedegreepolynomial that will perfectly fitpolynomial that will perfectly fit kk data points.data points.
To be more specific, suppose we have sevenTo be more specific, suppose we have sevenhistorical observations, denotedhistorical observations, denoted
((xxii ,, yyii),), ii = 1, 2, , 7= 1, 2, , 7
It is possible to find a sixthIt is possible to find a sixth--degree polynomialdegree polynomial
yy = a= a00 + a+ a11xx + a+ a22xx22 + + a+ + a66xx
66
that exactly passes through each of these seventhat exactly passes through each of these sevendata points.data points.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
31/36
A perfect fit gives zero for the sum of squaredA perfect fit gives zero for the sum of squareddeviations.deviations.
However,However,
this isthis isdeceptive,deceptive,for it doesfor it doesnot implynot imply
much aboutmuch aboutthethe
predictivepredictivevalue of thevalue of the
model formodel foruse inuse infuturefuture
forecasting.forecasting.
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
32/36
Despite the perfect fit of the polynomial function,Despite the perfect fit of the polynomial function,the forecast is very inaccurate. The linear fitthe forecast is very inaccurate. The linear fitmight provide more realistic forecasts.might provide more realistic forecasts.
Also, noteAlso, notethat thethat the
polynomialpolynomialfit hasfit has
hazardoushazardousextrapolationextrapolation
propertiesproperties(i.e., the(i.e., the
polynomialpolynomialblows upblows up
at itsat itsextremes).extremes).
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
33/36
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
34/36
Correlation Coefficient and
Coefficient of Determination
Coefficient of determination = r2.
Correlation coefficient = r.
Where: Yi = dependent variable.
Xi = independent variable.
n = number of observations.
2 2 2 2[ ( ) ][ ( ) ]
i i i i
i i i i
n X Y X Y r
n X X Y Y
!
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
35/36
Correlation Coefficient and
Coefficient of Determination
-
8/3/2019 Business Analysis- Causal Models and Regression Analysis
36/36
Summary: Causal Forecasting Models
The goal of causal forecasting model is to developthe best statistical relationship between a dependentvariable and one or more independent variables.
The most common model approach used in practice
is regression analysis. Only linear regressionmodels are examined in this course.
In causal forecasting models, when one tries topredict a dependent variable using a singleindependent variable, it is called asimple regressionmodel.
When one uses more than one independent variableto forecast the dependent variable, it is called amultiple regression model.