Multiple_Regression.ppt
-
Upload
dipak-thakur -
Category
Documents
-
view
215 -
download
0
Transcript of Multiple_Regression.ppt
-
7/27/2019 Multiple_Regression.ppt
1/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
MULTIPLE REGRESSION
-
7/27/2019 Multiple_Regression.ppt
2/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
The Multiple RegressionModel
Idea: Examine the linear relationship between1 dependent (Y) & 2 or more independent variables (Xi)
ikik2i21i10i X
X
X
Y
Multiple Regression Model with k Independent Variables:
Y-intercept Population slopes Random Error
-
7/27/2019 Multiple_Regression.ppt
3/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Assumptions The error term is normally distributed. For each
fixed value ofX, the distribution ofYis normal.
The mean of the error term is 0 and SD should beone .
The variance of the error term is constant. Thisvariance does not depend on the values assumedbyX.
The error terms are uncorrelated. In other words,
the observations have been drawn independently. The regressors are independent amongst
themselves.
-
7/27/2019 Multiple_Regression.ppt
4/17
Assumptions
Independent variables should beuncorrelated with residual..
Model should be properly specified.
No. of observation should be more thanno. of parameters
Model is linear in parameters Independent variables are fixed in
repeated samples.
Mr. Pranav Ranjan & Ms. RaziaSehdev ICTC, LPU
-
7/27/2019 Multiple_Regression.ppt
5/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Statistics Associated with Multiple
Regression
Coefficient of multiple determination.
The strength of association in multiple regression ismeasured by the square of the multiple correlation
coefficient, R2, which is also called the coefficient ofmultiple determination.
Adjusted R2
R2, coefficient of multiple determination, is adjusted for thenumber of independent variables and the sample size to accountfor the diminishing returns.
After the first few variables, the additional independent variablesdo not make much contribution.
-
7/27/2019 Multiple_Regression.ppt
6/17
Statistics Associated with
Multiple Regression Ftest
Used to test the null hypothesis that thecoefficient of multiple determination in thepopulation, R2pop, is zero.
The test statistic has an Fdistribution withkand (n - k- 1) degrees of freedom.
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
-
7/27/2019 Multiple_Regression.ppt
7/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Statistics Associated with
Multiple Regression Partial regression coefficient.
The partial regression coefficient, b1, denotesthe change in the predicted value, , per unit change
inX1 when the other independent variables,X2 toXk,are held constant.
Yi
-
7/27/2019 Multiple_Regression.ppt
8/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Multiple Regression Output
Regression Statist ics
Multiple R 0.72213R Square 0.52148
Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS FSigni f icance
F
Regression 2 29460.027
14730.01
3 6.53861 0.01201
Residual 12 27033.306 2252.776Total 14 56493.333
Coeff ic ien
ts
Standard
Error t Stat P-value Low er 95%
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertisin 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
ertising)74.131(Advce)24.975(Pri-306.526Sales
-
7/27/2019 Multiple_Regression.ppt
9/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
The Multiple RegressionEquation
ertising)74.131(Advce)24.975(Pri-306.526Sales
b1 = -24.975: saleswill decrease, onaverage, by 24.975
pies per week foreach $1 increase inselling price, net ofthe effects of changesdue to advertising
b2 = 74.131: sales willincrease, on average,by 74.131 pies per
week for each $100increase inadvertising, net of theeffects of changesdue to price
whereSales is in number of pies per week
Price is in $Advertising is in $100s.
-
7/27/2019 Multiple_Regression.ppt
10/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Using The Equation to MakePredictions
Predict sales for a week in which the sellingprice is $5.50 and advertising is $350:
Predicted salesis 428.62 pies
428.62
(3.5)74.131(5.50)24.975-306.526
ertising)74.131(Advce)24.975(Pri-306.526Sales
Note that Advertising isin $100s, so $350means that X2 = 3.5
-
7/27/2019 Multiple_Regression.ppt
11/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Regression Statist ics
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS FSigni f icance
F
Regression 2 29460.027
14730.01
3 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coeff ic ien
ts
Standard
Error t Stat P-value Low er 95%
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.5214856493.3
29460.0
SST
SSRr2
52.1% of the variation in pie sales
is explained by the variation inprice and advertising
Multiple Coefficient ofDetermination
(continued)
-
7/27/2019 Multiple_Regression.ppt
12/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Regression Statist ics
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS FSigni f icance
F
Regression 2 29460.027
14730.01
3 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coeff ic ien
ts
Standard
Error t Stat P-value Low er 95%
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
.44172r2adj
44.2% of the variation in pie sales is
explained by the variation in price and
advertising, taking into account the samplesize and number of independent variables
(continued)Adjusted r2
-
7/27/2019 Multiple_Regression.ppt
13/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
6.53862252.8
14730.0
MSE
MSRF
Regression Statist ics
Multiple R 0.72213
R Square 0.52148Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS FSigni f icance
F
Regression 2 29460.027
14730.01
3 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coeff ic ien
ts
Standard
Error t Stat P-value Low er 95%
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
(continued)F Test for Overall Significance
With 2 and 12 degreesof freedom P-value forthe F Test
-
7/27/2019 Multiple_Regression.ppt
14/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Regression Statist ics
Multiple R 0.72213
R Square 0.52148Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS FSigni f icance
F
Regression 2 29460.027
14730.01
3 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coeff ic ien
ts
Standard
Error t Stat P-value Low er 95%
Upper
95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
t-value for Price is t = -2.306, with
p-value .0398
t-value for Advertising is t = 2.855,
with p-value .0145
(continued)
Are Individual Variables Significant?
-
7/27/2019 Multiple_Regression.ppt
15/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Multicollinearity
Multicollinearity arises when intercorrelations amongthe predictors are very high. Result in several problems, including:
The partial regression coefficients may not beestimated precisely. The standard errors are likelyto be high.
The magnitudes as well as the signs of the partialregression coefficients may change from sample tosample.
It becomes difficult to assess the relative importanceof the independent variables in explaining thevariation in the dependent variable.
Predictor variables may be incorrectly included or
removed in stepwise regression.
-
7/27/2019 Multiple_Regression.ppt
16/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
A simple procedure for adjusting for multicollinearityconsists of using only one of the variables in a highlycorrelated set of variables.
Alternatively, the set of independent variables can betransformed into a new set of predictors that aremutually independent by using techniques such asprincipal components analysis.
More specialized techniques, such as ridgeregression and latent root regression, can also beused.
Multicollinearity
-
7/27/2019 Multiple_Regression.ppt
17/17
Mr. Pranav Ranjan & Ms. Razia SehdevICTC, LPU
Multicollinearity Diagnostics:
Variance Inflation Factor (VIF) measures how much the varianceof the regression coefficients is inflated by multicollinearityproblems. If VIF equals 0, there is no correlation between theindependent measures. A VIF measure of 1 is an indication of some
association between predictor variables, but generally not enoughto cause problems. A maximum acceptable VIF value would be 10;anything higher would indicate a problem with multicollinearity.
Tolerance the amount of variance in an independent variable thatis not explained by the other independent variables. If the other
variables explain a lot of the variance of a particular independentvariable we have a problem with multicollinearity. Thus, smallvalues for tolerance indicate problems of multicollinearity. Theminimum cutoff value for tolerance is typically .10. That is, thetolerance value must be smaller than .10 to indicate a problem of
multicollinearity.