Business Analysis- Causal Models and Regression Analysis

download Business Analysis- Causal Models and Regression Analysis

of 36

Transcript of Business Analysis- Causal Models and Regression Analysis

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    1/36

    Causal Models and

    Regression Analysis

    Chapter 13

    Forecasting

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    2/36

    In aIn a causal forecastingcausal forecasting model, the forecast for themodel, the forecast for the

    quantity of interest rides piggyback on anotherquantity of interest rides piggyback on anotherquantity or set of quantities.quantity or set of quantities.

    In other words, our knowledge of the value ofIn other words, our knowledge of the value ofone variable (or perhaps several variables)one variable (or perhaps several variables)enables us to forecast the value of anotherenables us to forecast the value of anothervariable.variable.

    In this model, letIn this model, let

    yy denote the true value of some variable ofdenote the true value of some variable ofinterest andinterest and

    yy denote a predicted or forecast value fordenote a predicted or forecast value forthat variable.that variable.

    ^

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    3/36

    Then, in a causal model,Then, in a causal model,

    wherewhere

    ff is a forecasting rule, or function, andis a forecasting rule, or function, and

    xx11,, xx22 , , xxii , is a set of variables, is a set of variables

    yy == ff((xx11,, xx22, , xxnn))^

    In this representation, theIn this representation, the xx variables are oftenvariables are oftencalledcalled independent variablesindependent variables, whereas, whereas yy is theis thedependentdependent oror response variableresponse variable..

    ^

    We either know the independent variables inWe either know the independent variables inadvance or can forecast them more easily thanadvance or can forecast them more easily than yy..

    Then the independent variables will be used in theThen the independent variables will be used in theforecasting model to forecast the dependentforecasting model to forecast the dependentvariable.variable.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    4/36

    Companies often find by looking at pastCompanies often find by looking at pastperformance that their monthly sales are directlyperformance that their monthly sales are directlyrelated to the monthly GDP, and thus figure thatrelated to the monthly GDP, and thus figure thata good forecast could be made using nexta good forecast could be made using nextmonths GDP figure.months GDP figure.

    The only problem is that this quantity is notThe only problem is that this quantity is not

    known, or it may just be a forecast and thus not aknown, or it may just be a forecast and thus not atruly independent variable.truly independent variable.

    To use a causal forecasting model, requires twoTo use a causal forecasting model, requires twoconditions:conditions:

    1.1. There must be a relationship betweenThere must be a relationship betweenvalues of the independent and dependentvalues of the independent and dependentvariables such that the former providesvariables such that the former providesinformation about the latter.information about the latter.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    5/36

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    6/36

    The company plans to use traffic flow (measuredThe company plans to use traffic flow (measuredin the average number of cars per hour) toin the average number of cars per hour) toforecast sales (measured in average dollar salesforecast sales (measured in average dollar salesper hour).per hour).

    The firm has had five stations in operation forThe firm has had five stations in operation formore than a year and has used historical data tomore than a year and has used historical data to

    calculate the following averages:calculate the following averages:

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    7/36

    The averages are plotted in a scatter diagram.The averages are plotted in a scatter diagram.

    $-

    $50.00

    $100.00

    $150.00

    $200.00

    $250.00

    $300.00

    0 50 100 150 200 250

    Cars/hour

    Sales/hour

    ($)

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    8/36

    Now, these data will be used to construct aNow, these data will be used to construct afunction that will be used to forecast sales at anyfunction that will be used to forecast sales at anyproposed location by measuring the traffic flow atproposed location by measuring the traffic flow at

    that location and plugging its value into thethat location and plugging its value into theconstructed function.constructed function.

    Least Squares FitsLeast Squares Fits The method of least squares isThe method of least squares is

    a formal procedure for curve fitting. It is a twoa formal procedure for curve fitting. It is a two--step process.step process.

    1.1. Select a specific functional form (e.g., aSelect a specific functional form (e.g., astraight line or quadratic curve).straight line or quadratic curve).

    2.2. Within the set of functions specified in stepWithin the set of functions specified in step1, choose the specific function that1, choose the specific function thatminimizes the sum of the squaredminimizes the sum of the squareddeviations between the data points and thedeviations between the data points and the

    function values.function values.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    9/36

    To demonstrate the process, consider the salesTo demonstrate the process, consider the sales--traffic flow example.traffic flow example.

    1.1. Assume a straight line; that is, functions ofAssume a straight line; that is, functions ofthe formthe form y = a + bxy = a + bx..

    2.2. Draw the line in the scatter diagram andDraw the line in the scatter diagram andindicate the deviations between observedindicate the deviations between observed

    points and the function aspoints and the function as ddii ..

    dd11 == yy11 [a +b[a +bxx11] = 220] = 220 [a + 150b][a + 150b]

    For example,For example,

    wherewhereyy11 = actual sales/hr at location 1= actual sales/hr at location 1xx11 = actual traffic flow at location 1= actual traffic flow at location 1aa == yy--axis intercept for the functionaxis intercept for the function

    bb = slope for the function= slope for the function

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    10/36

    The valueThe value dd1122 is one measure of how close theis one measure of how close the

    value of the functionvalue of the function [a +b[a +bxx11]] is to the observedis to the observedvalue,value, yy11; that is it indicates how well the; that is it indicates how well the

    function fits at this one point.function fits at this one point.

    $-

    $50.00

    $100.00

    $150.00

    $200.00

    $250.00

    $300.00

    0 50 100 150 200 250

    Cars/hour

    Sales/hour($)

    dd22

    dd55dd44

    dd11

    dd33

    yy == aa ++ bxbx

    yy

    xx

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    11/36

    One measure of how well the function fits overallOne measure of how well the function fits overallis the sum of the squared deviations:is the sum of the squared deviations:

    ddii2277i=1i=1

    55

    Consider a general model withConsider a general model with nn as opposed toas opposed tofive observations. Since eachfive observations. Since each ddii == yyii (a +b(a +bxxii)),,

    the sum of the squared deviations can be writtenthe sum of the squared deviations can be writtenas:as:

    77i=1i=1

    nn

    ((yyii [a +b[a +bxxii])])22

    Using the method of least squares, selectUsing the method of least squares, select aa andand bbso as to minimize the sum in the equation above.so as to minimize the sum in the equation above.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    12/36

    Now, take the partial derivative of the sum withNow, take the partial derivative of the sum withrespect torespect to aa and set the resulting expressionand set the resulting expressionequal to zero.equal to zero.

    77i=1i=1

    nn

    --2(2(yyii [a +b[a +bxxii]) = 0]) = 0

    A second equation is derived by following theA second equation is derived by following the

    same procedure withsame procedure with bb..

    77i=1i=1

    nn

    --22xxii ((yyii [a +b[a +bxxii]) = 0]) = 0

    Recall that the values forRecall that the values for xxii andand yyii are theare theobservations, and our goal is to find the values ofobservations, and our goal is to find the values ofaa andand bb that satisfy these two equations.that satisfy these two equations.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    13/36

    The solution is:The solution is:

    xxii

    77i=1i=1

    nn

    xxiiyyii --

    b =b =

    11nn 77

    i=1i=1

    nn

    xxii 77i=1i=1

    nn

    yyii

    77i=1i=1

    nn

    xxii22 --

    11nn 77

    i=1i=1

    nn 22

    aa == 11nn 77i=1i=1

    nn

    yyii -- bb11nn 77i=1i=1

    nn

    xxii

    The next step is to determine the values for:The next step is to determine the values for:

    77i=1i=1

    nn

    xxii22 77

    i=1i=1

    nn

    yyii77i=1i=1

    nn

    xxii 77i=1i=1

    nn

    xxiiyyii

    Note that these quantities depend only onNote that these quantities depend only onobserved data and can be found with simpleobserved data and can be found with simplearithmetic operations or automatically usingarithmetic operations or automatically using

    Excels predefined functions.Excels predefined functions.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    14/36

    Using Excel, click onUsing Excel, click on ToolsTools Data Analysis Data Analysis

    In the resultingIn the resulting

    dialog, choosedialog, chooseRegressionRegression..

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    15/36

    In theIn the RegressionRegression dialog, enter thedialog, enter the YY--range andrange andXX--range.range.

    Choose toChoose toplace theplace theoutput inoutput in

    a newa newworksheetworksheetcalledcalledResultsResults

    SelectSelect Residual PlotsResidual Plots andand Normal Probability PlotsNormal Probability Plotsto be created along with the output.to be created along with the output.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    16/36

    ClickClick OKOK to produce the following results:to produce the following results:

    Note thatNote that aa ((InterceptIntercept) and) and bb ((XVariable 1XVariable 1) are) arereported asreported as 57.10457.104 andand 0.929970.92997, respectively., respectively.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    17/36

    To add the resulting least squares line, first clickTo add the resulting least squares line, first clickon the worksheeton the worksheet Chart 1Chart 1 which contains thewhich contains theoriginal scatter plot.original scatter plot.

    Next, click on the data series so that they areNext, click on the data series so that they arehighlighted and then choosehighlighted and then choose Add Trendline Add Trendline from thefrom the ChartChart pullpull--down menu.down menu.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    18/36

    ChooseChoose Linear TrendLinear Trend in the resulting dialog andin the resulting dialog andclickclick OKOK..

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    19/36

    A linear trend is fit to the data:A linear trend is fit to the data:

    $-

    $50.00

    $100.00

    $150.00

    $200.00

    $250.00

    $300.00

    0 50 100 150 200 250

    Cars/hour

    Sales/hour($)

    Series1

    Linear (Series1)

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    20/36

    One of the other summary output values that isOne of the other summary output values that isgiven in Excel is:given in Excel is: R Square = 69.4%R Square = 69.4%

    This is a goodness of fit measure whichT

    his is a goodness of fit measure whichrepresents therepresents the RR22 statistic discussed instatistic discussed inintroductory statistics classes.introductory statistics classes.

    RR22 ranges in value fromranges in value from 00 toto 11 and gives anand gives an

    indication of how much of the total variation inindication of how much of the total variation inYYfrom its mean is explained by the new trend line.from its mean is explained by the new trend line.

    In fact, there are three different sums of errors:In fact, there are three different sums of errors:

    TSSTSS (Total Sum of Squares)(Total Sum of Squares)

    ESSESS (Error Sum of Squares)(Error Sum of Squares)

    R

    SSR

    SS (R

    egression Sum of Squares)(R

    egression Sum of Squares)

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    21/36

    The basic relationship between them is:The basic relationship between them is:

    TSS = ESS + RSSTSS = ESS + RSS

    They are defined as follows:They are defined as follows:

    TSS =TSS = 77i=1i=1

    nn

    ((YYii YY ))22

    ESS =ESS = 77i=1i=1

    nn

    ((YYii YYii ))22^

    77i=1i=1

    nn

    ((YYii YY ))22^ RSS =RSS =

    Essentially, theEssentially, the ESSESS is the amount of variationis the amount of variationthat cant be explained by the regression.that cant be explained by the regression.

    TheThe RSSRSS quantity is effectively the amount of thequantity is effectively the amount of theoriginal, total variation (original, total variation (TSSTSS) that could be) that could be

    removed using the regression line.removed using the regression line.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    22/36

    If the regression line fits perfectly, thenIf the regression line fits perfectly, then ESS = 0ESS = 0andand RSS = TSSRSS = TSS, resulting in, resulting in RR22 = 1= 1..

    RR22 ==RSSRSSTSSTSS

    RR22 is defined as:is defined as:

    In this example,In this example, RR22 = .694= .694 which means thatwhich means thatapproximatelyapproximately 70%70% of the variation in theof the variation in theYYvalues is explained by the one explanatoryvalues is explained by the one explanatoryvariable (variable (XX), cars per hour.), cars per hour.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    23/36

    Now, returning to the original question: ShouldNow, returning to the original question: Shouldwe build a station at Buffalo Grove where trafficwe build a station at Buffalo Grove where trafficis 183 cars/hour?is 183 cars/hour?

    The best guess at what the corresponding salesThe best guess at what the corresponding salesvolume would be is found by placing thisvolume would be is found by placing this XX valuevalueinto the new regression equation:into the new regression equation:

    Sales/hour = 57.104 + 0.92997 * (183 cars/hour)Sales/hour = 57.104 + 0.92997 * (183 cars/hour)

    However, it would be nice to be able to state aHowever, it would be nice to be able to state a95% confidence interval around this best95% confidence interval around this best

    guess.guess.

    yy = a + b *= a + b * xx^

    = $227.29= $227.29

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    24/36

    Excel reports that theExcel reports that the

    standard error (standard error (SSee) is) is44.1844.18..

    This quantity representsThis quantity representsthe amount of scatter inthe amount of scatter in

    the actual data aroundthe actual data aroundthe regression line.the regression line.

    We can get the information to do this from ExcelsWe can get the information to do this from ExcelsSummary OutputSummary Output..

    The formula forThe formula for SSee is:is:

    SSee ==77i=1i=1

    nn

    ((YYii YYii ))22^

    nn kk --11

    WhereWhere nn is the numberis the numberof data points (e.g.,of data points (e.g., 55))andand kk is the number ofis the number ofindependent variablesindependent variables

    (e.g.,(e.g., 11).).

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    25/36

    This equation is equivalent to:This equation is equivalent to:nn kk --11

    ESSESS

    Once we knowOnce we know SSee and based on the normaland based on the normaldistribution, we can state thatdistribution, we can state that

    We haveWe have 68%68% confidence that the actualconfidence that the actual

    value of sales/hour is withinvalue of sales/hour is within ++ 11 SSee of theof thepredicted value (predicted value ($277.29$277.29).).

    We haveWe have 95%95% confidence that the actualconfidence that the actualvalue of sales/hour is withinvalue of sales/hour is within ++ 22 SSee of theof the

    predicted value (predicted value ($277.29$277.29).).

    [[277.29277.29 2(44.18)2(44.18);; 227.29 + 2(44.18)227.29 + 2(44.18)]]

    [[$138.93$138.93;; $315.65$315.65]]

    TheThe 95%95% confidence interval is:confidence interval is:

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    26/36

    Another value of interest in theAnother value of interest in the SummarySummary reportreportis theis the tt--statistic for thestatistic for the XX variable and itsvariable and itsassociated values.associated values.

    TheThe tt--statistic isstatistic is 2.612.61 and theand the PP--value isvalue is 0.07980.0798..

    AA PP--value less thanvalue less than 0.050.05 represents that we haverepresents that we haveat leastat least 95%95% confidence that the slope parameterconfidence that the slope parameter((bb) is statistically significantly than) is statistically significantly than 00 (zero).(zero).

    A slope ofA slope of 00 results in a flat trend line andresults in a flat trend line andindicates no relationship betweenindicates no relationship betweenYY andand XX..

    TheThe 95%95% confidence limit forconfidence limit for bb is [is [--0.2050.205;; 2.0642.064]]

    Thus, we cant exclude the possibility that theThus, we cant exclude the possibility that the

    true value oftrue value of bb might bemight be 00..

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    27/36

    Also given in theAlso given in the SummarySummary report is thereport is theFF significance. Since there is only onesignificance. Since there is only oneindependent variable, theindependent variable, the FF significance issignificance is

    identical to theidentical to the PP--value for thevalue for the tt--statistic.statistic.

    In the case of more than oneIn the case of more than one XX variable, thevariable, the FF significance tests the hypothesis that all thesignificance tests the hypothesis that all the XX

    variable parameters as a group are statisticallyvariable parameters as a group are statisticallysignificantly different than zero.significantly different than zero.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    28/36

    Concerning multiple regression models, as youConcerning multiple regression models, as youadd otheradd other XX variables, thevariables, the RR22 statistic will alwaysstatistic will alwaysincrease, meaning theincrease, meaning the RSSRSS has increased.has increased.

    In this case, the AdjustedIn this case, the AdjustedRR22 statistic is a reliablestatistic is a reliable

    indicator of the trueindicator of the truegoodness of fit because itgoodness of fit because it

    compensates for thecompensates for thereduction in thereduction in the ESSESS due todue to

    the addition of morethe addition of moreindependent variables.independent variables.

    Thus, it may report a decreased adjustedThus, it may report a decreased adjusted RR22 valuevalueeven thougheven though RR22 has increased, unless thehas increased, unless theimprovement inimprovement in RSSRSS is more than compensatedis more than compensatedfor by the addition of the new independentfor by the addition of the new independent

    variables.variables.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    29/36

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    30/36

    One must proceed with caution when fitting dataOne must proceed with caution when fitting datawith a polynomial function.with a polynomial function.

    For example, it is possible to find a (For example, it is possible to find a (kk 11))--degreedegreepolynomial that will perfectly fitpolynomial that will perfectly fit kk data points.data points.

    To be more specific, suppose we have sevenTo be more specific, suppose we have sevenhistorical observations, denotedhistorical observations, denoted

    ((xxii ,, yyii),), ii = 1, 2, , 7= 1, 2, , 7

    It is possible to find a sixthIt is possible to find a sixth--degree polynomialdegree polynomial

    yy = a= a00 + a+ a11xx + a+ a22xx22 + + a+ + a66xx

    66

    that exactly passes through each of these seventhat exactly passes through each of these sevendata points.data points.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    31/36

    A perfect fit gives zero for the sum of squaredA perfect fit gives zero for the sum of squareddeviations.deviations.

    However,However,

    this isthis isdeceptive,deceptive,for it doesfor it doesnot implynot imply

    much aboutmuch aboutthethe

    predictivepredictivevalue of thevalue of the

    model formodel foruse inuse infuturefuture

    forecasting.forecasting.

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    32/36

    Despite the perfect fit of the polynomial function,Despite the perfect fit of the polynomial function,the forecast is very inaccurate. The linear fitthe forecast is very inaccurate. The linear fitmight provide more realistic forecasts.might provide more realistic forecasts.

    Also, noteAlso, notethat thethat the

    polynomialpolynomialfit hasfit has

    hazardoushazardousextrapolationextrapolation

    propertiesproperties(i.e., the(i.e., the

    polynomialpolynomialblows upblows up

    at itsat itsextremes).extremes).

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    33/36

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    34/36

    Correlation Coefficient and

    Coefficient of Determination

    Coefficient of determination = r2.

    Correlation coefficient = r.

    Where: Yi = dependent variable.

    Xi = independent variable.

    n = number of observations.

    2 2 2 2[ ( ) ][ ( ) ]

    i i i i

    i i i i

    n X Y X Y r

    n X X Y Y

    !

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    35/36

    Correlation Coefficient and

    Coefficient of Determination

  • 8/3/2019 Business Analysis- Causal Models and Regression Analysis

    36/36

    Summary: Causal Forecasting Models

    The goal of causal forecasting model is to developthe best statistical relationship between a dependentvariable and one or more independent variables.

    The most common model approach used in practice

    is regression analysis. Only linear regressionmodels are examined in this course.

    In causal forecasting models, when one tries topredict a dependent variable using a singleindependent variable, it is called asimple regressionmodel.

    When one uses more than one independent variableto forecast the dependent variable, it is called amultiple regression model.