Multiple_Regression.ppt

download Multiple_Regression.ppt

of 17

Transcript of Multiple_Regression.ppt

  • 7/27/2019 Multiple_Regression.ppt

    1/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    MULTIPLE REGRESSION

  • 7/27/2019 Multiple_Regression.ppt

    2/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    The Multiple RegressionModel

    Idea: Examine the linear relationship between1 dependent (Y) & 2 or more independent variables (Xi)

    ikik2i21i10i X

    X

    X

    Y

    Multiple Regression Model with k Independent Variables:

    Y-intercept Population slopes Random Error

  • 7/27/2019 Multiple_Regression.ppt

    3/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Assumptions The error term is normally distributed. For each

    fixed value ofX, the distribution ofYis normal.

    The mean of the error term is 0 and SD should beone .

    The variance of the error term is constant. Thisvariance does not depend on the values assumedbyX.

    The error terms are uncorrelated. In other words,

    the observations have been drawn independently. The regressors are independent amongst

    themselves.

  • 7/27/2019 Multiple_Regression.ppt

    4/17

    Assumptions

    Independent variables should beuncorrelated with residual..

    Model should be properly specified.

    No. of observation should be more thanno. of parameters

    Model is linear in parameters Independent variables are fixed in

    repeated samples.

    Mr. Pranav Ranjan & Ms. RaziaSehdev ICTC, LPU

  • 7/27/2019 Multiple_Regression.ppt

    5/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Statistics Associated with Multiple

    Regression

    Coefficient of multiple determination.

    The strength of association in multiple regression ismeasured by the square of the multiple correlation

    coefficient, R2, which is also called the coefficient ofmultiple determination.

    Adjusted R2

    R2, coefficient of multiple determination, is adjusted for thenumber of independent variables and the sample size to accountfor the diminishing returns.

    After the first few variables, the additional independent variablesdo not make much contribution.

  • 7/27/2019 Multiple_Regression.ppt

    6/17

    Statistics Associated with

    Multiple Regression Ftest

    Used to test the null hypothesis that thecoefficient of multiple determination in thepopulation, R2pop, is zero.

    The test statistic has an Fdistribution withkand (n - k- 1) degrees of freedom.

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

  • 7/27/2019 Multiple_Regression.ppt

    7/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Statistics Associated with

    Multiple Regression Partial regression coefficient.

    The partial regression coefficient, b1, denotesthe change in the predicted value, , per unit change

    inX1 when the other independent variables,X2 toXk,are held constant.

    Yi

  • 7/27/2019 Multiple_Regression.ppt

    8/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Multiple Regression Output

    Regression Statist ics

    Multiple R 0.72213R Square 0.52148

    Adjusted R

    Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS FSigni f icance

    F

    Regression 2 29460.027

    14730.01

    3 6.53861 0.01201

    Residual 12 27033.306 2252.776Total 14 56493.333

    Coeff ic ien

    ts

    Standard

    Error t Stat P-value Low er 95%

    Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertisin 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    ertising)74.131(Advce)24.975(Pri-306.526Sales

  • 7/27/2019 Multiple_Regression.ppt

    9/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    The Multiple RegressionEquation

    ertising)74.131(Advce)24.975(Pri-306.526Sales

    b1 = -24.975: saleswill decrease, onaverage, by 24.975

    pies per week foreach $1 increase inselling price, net ofthe effects of changesdue to advertising

    b2 = 74.131: sales willincrease, on average,by 74.131 pies per

    week for each $100increase inadvertising, net of theeffects of changesdue to price

    whereSales is in number of pies per week

    Price is in $Advertising is in $100s.

  • 7/27/2019 Multiple_Regression.ppt

    10/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Using The Equation to MakePredictions

    Predict sales for a week in which the sellingprice is $5.50 and advertising is $350:

    Predicted salesis 428.62 pies

    428.62

    (3.5)74.131(5.50)24.975-306.526

    ertising)74.131(Advce)24.975(Pri-306.526Sales

    Note that Advertising isin $100s, so $350means that X2 = 3.5

  • 7/27/2019 Multiple_Regression.ppt

    11/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R

    Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS FSigni f icance

    F

    Regression 2 29460.027

    14730.01

    3 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coeff ic ien

    ts

    Standard

    Error t Stat P-value Low er 95%

    Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    .5214856493.3

    29460.0

    SST

    SSRr2

    52.1% of the variation in pie sales

    is explained by the variation inprice and advertising

    Multiple Coefficient ofDetermination

    (continued)

  • 7/27/2019 Multiple_Regression.ppt

    12/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted R

    Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS FSigni f icance

    F

    Regression 2 29460.027

    14730.01

    3 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coeff ic ien

    ts

    Standard

    Error t Stat P-value Low er 95%

    Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    .44172r2adj

    44.2% of the variation in pie sales is

    explained by the variation in price and

    advertising, taking into account the samplesize and number of independent variables

    (continued)Adjusted r2

  • 7/27/2019 Multiple_Regression.ppt

    13/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    6.53862252.8

    14730.0

    MSE

    MSRF

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148Adjusted R

    Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS FSigni f icance

    F

    Regression 2 29460.027

    14730.01

    3 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coeff ic ien

    ts

    Standard

    Error t Stat P-value Low er 95%

    Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    (continued)F Test for Overall Significance

    With 2 and 12 degreesof freedom P-value forthe F Test

  • 7/27/2019 Multiple_Regression.ppt

    14/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148Adjusted R

    Square 0.44172

    Standard Error 47.46341

    Observations 15

    ANOVA df SS MS FSigni f icance

    F

    Regression 2 29460.027

    14730.01

    3 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coeff ic ien

    ts

    Standard

    Error t Stat P-value Low er 95%

    Upper

    95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    t-value for Price is t = -2.306, with

    p-value .0398

    t-value for Advertising is t = 2.855,

    with p-value .0145

    (continued)

    Are Individual Variables Significant?

  • 7/27/2019 Multiple_Regression.ppt

    15/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Multicollinearity

    Multicollinearity arises when intercorrelations amongthe predictors are very high. Result in several problems, including:

    The partial regression coefficients may not beestimated precisely. The standard errors are likelyto be high.

    The magnitudes as well as the signs of the partialregression coefficients may change from sample tosample.

    It becomes difficult to assess the relative importanceof the independent variables in explaining thevariation in the dependent variable.

    Predictor variables may be incorrectly included or

    removed in stepwise regression.

  • 7/27/2019 Multiple_Regression.ppt

    16/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    A simple procedure for adjusting for multicollinearityconsists of using only one of the variables in a highlycorrelated set of variables.

    Alternatively, the set of independent variables can betransformed into a new set of predictors that aremutually independent by using techniques such asprincipal components analysis.

    More specialized techniques, such as ridgeregression and latent root regression, can also beused.

    Multicollinearity

  • 7/27/2019 Multiple_Regression.ppt

    17/17

    Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU

    Multicollinearity Diagnostics:

    Variance Inflation Factor (VIF) measures how much the varianceof the regression coefficients is inflated by multicollinearityproblems. If VIF equals 0, there is no correlation between theindependent measures. A VIF measure of 1 is an indication of some

    association between predictor variables, but generally not enoughto cause problems. A maximum acceptable VIF value would be 10;anything higher would indicate a problem with multicollinearity.

    Tolerance the amount of variance in an independent variable thatis not explained by the other independent variables. If the other

    variables explain a lot of the variance of a particular independentvariable we have a problem with multicollinearity. Thus, smallvalues for tolerance indicate problems of multicollinearity. Theminimum cutoff value for tolerance is typically .10. That is, thetolerance value must be smaller than .10 to indicate a problem of

    multicollinearity.