7/23/2019 Regresi Linear Berganda 2014
1/50
Program S2 Teknik Sipil
Regresi Linear Berganda
Statistika
7/23/2019 Regresi Linear Berganda 2014
2/50
Model Regresi Linear Berganda
Mengkaji hubungan linear antara variabel tak bebas (y)dengan 2 atau lebih variabel bebas(xi)
xxxy kk22110
kk22110 xbxbxbby
Model Regresi Linear Berganda dari Populasi:
Y-intercept Population slopes Random Error
Estimated(or predicted)value of y
Estimated slope coefficients
Model Regresi Linear Berganda Dugaan:
Estimatedintercept
7/23/2019 Regresi Linear Berganda 2014
3/50
Model 2 variabel x
y
x1
x2
22110 xbxbby
Model Regresi Linear Berganda
7/23/2019 Regresi Linear Berganda 2014
4/50
Model 2 variabel x
y
x1
x2
22110 xbxbby yi
yi
7/23/2019 Regresi Linear Berganda 2014
5/50
Galat berdistribusi normal
Mean dari galat = 0
Galat memiliki ragam konstan (ragamhomogen)
Galat model saling bebas
e = (yy)
7/23/2019 Regresi Linear Berganda 2014
6/50
Model Regresi Linear Berganda dengan vektornobservasi dalam variabel y dan Kvariabel x:
y
y
y
y
x x x
x x x
x x xn
K
K
n n nK K n
1
2
11 12 1
21 22 2
1 2
1
2
1
2
= X +
Model Regresi Linear Bergandadengan Notasi Matriks
7/23/2019 Regresi Linear Berganda 2014
7/50
Pendugaan Parameter Modeldengan Metode OLS
2
1
A digression on multivariate calculus.Matrix and vector derivatives.
Derivative of a scalar with respect to a vector
Derivative of a column vector wrt a row vector
n
iie
e e = (y - Xb)'(y - Xb)
Other derivatives
7/23/2019 Regresi Linear Berganda 2014
8/50
2
Note: Derivative of 1x1 wrt Kx1 is a Kx1 vector.
Solution
(y - Xb)'(y - Xb)X'(y - Xb) = 0
b
(1x1)/ (kx1) (-2)(nxK)'(nx1)
= (-2)(Kxn)(nx1) = Kx1
: X'y = X'Xb
Pendugaan Parameter Modeldengan Metode OLS
7/23/2019 Regresi Linear Berganda 2014
9/50
-1
1
1
Assuming it exists: = ( )
Note the analogy: = Var( ) Cov( ,y)
1 1 =
Suggests something desirable about least squares
n n
b X'X X'y
x x
b X'X X'y
Pendugaan Parameter Modeldengan Metode OLS
7/23/2019 Regresi Linear Berganda 2014
10/50
2
2
=
column vector =
row vector
= 2
(y - Xb)'(y - Xb)X'(y - Xb)
b
(y - Xb)'(y - Xb)(y - Xb)'(y - Xb) b
b b b
X'X
Pendugaan Parameter Modeldengan Metode OLS
7/23/2019 Regresi Linear Berganda 2014
11/50
Does bMinimize ee?
2
1 1 1 1 2 1 1
221 2 1 1 2 1 2
2
1 1 1 2 1
...
...2
... ... ... ......
If there were a single b, we would require this to be
po
n n n
i i i i i i i iK
n n n
i i i i i i i iK
n n n
i iK i i iK i i iK
x x x x x
x x x x x
x x x x x
e'eX'X = 2
b b'
2
1sitive, which it would be; 2 = 2 0.
The matrix counterpart of a positive number is a
positive definite matrix.
n
iix
x'x
7/23/2019 Regresi Linear Berganda 2014
12/50
Multiple Coefficient ofDetermination
Reports the proportion of total variation in yexplained by all x variables taken together
squaresofsumTotal
regressionsquaresofSum
SST
SSRR2
7/23/2019 Regresi Linear Berganda 2014
13/50
Adjusted R2
R2 never decreases when a new x variable isadded to the model
This can be a disadvantage when comparing
models What is the net effect of adding a new variable?
We lose a degree of freedom when a new xvariable is added
Did the new x variable add enoughexplanatory power to offset the loss of onedegree of freedom?
7/23/2019 Regresi Linear Berganda 2014
14/50
Shows the proportion of variation in y explainedbyall x variables adjusted for the number of xvariables used
(where n = sample size, k = number of independent variables)
Penalize excessive use of unimportant independentvariables
Smaller than R2
Useful in comparing among models
Adjusted R2
1kn1n)R1(1R 22A
7/23/2019 Regresi Linear Berganda 2014
15/50
Is the Model Significant?
F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all
of the x variables considered together and y
Use F test statistic
Hypotheses:
H0
: 1
= 2
= = k
= 0 (no linear relationship)
HA: at least one i 0 (at least one independentvariable affects y)
7/23/2019 Regresi Linear Berganda 2014
16/50
F-Test for Overall Significance
Test statistic:
where F has (numerator) D1= k and(denominator) D2= (nk - 1)
degrees of freedom
MSE
MSR
kn
SSEk
SSR
F
1
7/23/2019 Regresi Linear Berganda 2014
17/50
H0: 1= 2= 0
HA: 1and 2not both zero
= .05
df1= 2 df2= 12
Test Statistic:
Decision:
Conclusion:
Reject H0at = 0.05
The regression model does explaina significant portion of the variationin pie sales
(There is evidence that at least oneindependent variable affects y)
0
= .05
F.05
= 3.885Reject H0Do not
reject H0
6.5386MSE
MSRF
CriticalValue:
F
= 3.885
F-Test for Overall Significance
F
7/23/2019 Regresi Linear Berganda 2014
18/50
Are Individual VariablesSignificant?
Use t-tests of individual variable slopes
Shows if there is a linear relationship between thevariable xiand y
Hypotheses:
H0: i = 0 (no linear relationship)
HA: i 0 (linear relationship does existbetween xiand y)
7/23/2019 Regresi Linear Berganda 2014
19/50
7/23/2019 Regresi Linear Berganda 2014
20/50
Distributor pie beku sebagai makanan pencucimulut ingin mengevaluasi faktor-faktor yangmempengaruhi permintaan:
Dependent variable: Jumlah penjualan Pie(units per week)
Independent variables: Harga (in $)
Advertising ($100s)
Data dikumpulkan selama 15 weeks
ContohModel Regresi Linear Berganda
7/23/2019 Regresi Linear Berganda 2014
21/50
Sales = b0+ b1(Price)
+ b2
(Advertising)
WeekPie
SalesPrice
($)Advertising
($100s)
1 350 5.50 3.3
2 460 7.50 3.3
3 350 8.00 3.0
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Model Regresi dugaan:
ContohModel Regresi Linear Berganda
7/23/2019 Regresi Linear Berganda 2014
22/50
Slope (bi)
Estimates that the average value of y changes by biunits for each 1 unit increase in Xiholding all othervariables constant
Example: if b1= -20, then sales (y) is expected todecrease by an estimated 20 pies per week for each$1 increase in selling price (x1), net of the effects ofchanges due to advertising (x2)
y-intercept (b0) The estimated average value of y when all xi= 0
(assuming all xi= 0 is within the range of observedvalues)
ContohModel Regresi Linear Berganda
7/23/2019 Regresi Linear Berganda 2014
23/50
Regression Statist ics
Multiple R 0.72213
R Square 0.52148
Adjusted RSquare 0.44172
Standard Error 47.46341Observations 15
ANOVA df SS MS F Signif icance FRegression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coeff ic ient
sStandard
Error t Stat P-value Low er 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.5214856493.3
29460.0
SST
SSRR2
52.1% of the variation in pie sales
is explained by the variation inprice and advertising
Multiple Coefficient ofDetermination
7/23/2019 Regresi Linear Berganda 2014
24/50
ertising)74.131(Advce)24.975(Pri-306.526Sales
b1= -24.975:saleswill decrease, onaverage, by 24.975
pies per week foreach $1 increase inselling price, net ofthe effects of changesdue to advertising
b2= 74.131:sales willincrease, on average,by 74.131 pies per
week for each $100increase inadvertising, net of theeffects of changesdue to price
whereSales is in number of pies per week
Price is in $Advertising is in $100s.
ContohModel Regresi Linear Berganda
7/23/2019 Regresi Linear Berganda 2014
25/50
Using The Model to MakePredictions
Predict sales for a week in which the sellingprice is $5.50 and advertising is $350:
Predicted salesis 428.62 pies
428.62
(3.5)74.131(5.50)24.975-306.526
ertising)74.131(Advce)24.975(Pri-306.526Sales
Note that Advertising isin $100s, so $350means that x2= 3.5
7/23/2019 Regresi Linear Berganda 2014
26/50
options nodate nonumber;
data pie;
input week sales price adverts;
cards;
1 350 5.5 3.3
2 460 7.5 3.3
3 350 8 34 430 8 4.5
5 350 6.8 3
6 380 7.5 4
7 430 4.5 3
8 470 6.4 3.7
9 450 7 3.5
10 490 5 4
11 340 7.2 3.5
12 300 7.9 3.2
13 440 5.9 4
14 450 5 3.5
15 300 7 2.7
;;;
proc reg data = pie;
model sales=price adverts;
run;
Model Regresi Linear Bergandadengan SAS
7/23/2019 Regresi Linear Berganda 2014
27/50
The SAS SystemThe REG Procedure
Model: MODEL1
Dependent Variable: sales
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 29460 14730 6.54 0.0120
Error 12 27033 2252.77554
Corrected Total 14 56493
Root MSE 47.46341 R-Square 0.5215Dependent Mean 399.33333 Adj R-Sq 0.4417
Coeff Var 11.88566
Model Regresi Linear Bergandadengan SAS
p-value darimodel
R2dari model
7/23/2019 Regresi Linear Berganda 2014
28/50
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 306.52619 114.25389 2.68 0.0199price 1 -24.97509 10.83213 -2.31 0.0398
adverts 1 74.13096 25.96732 2.85 0.0145
Model Regresi Linear Bergandadengan SAS
p-value dari
masing-masingvariabel
Nilai dugaan darimasing-masing
parameter
7/23/2019 Regresi Linear Berganda 2014
29/50
Example 2
Many possible factors - not all significant
Fungal toxin contamination of seed
pods to be used as a drug source
Contoh 2Model Regresi Linear Berganda
7/23/2019 Regresi Linear Berganda 2014
30/50
Collect batches of seed pods from a variety of locations.
For each location during June-July (When seeds are forming) note :
Temp: Mean noon temp (C)
Wind: Mean wind speed (Km/h)
Sun: Mean daily sunshine (h)
Rain Total rainfall (cms/month)
For each batch of pods, note:
Conc of toxin (mg/100g)
Data collection
Possiblepredictors
Dependent variable
7/23/2019 Regresi Linear Berganda 2014
31/50
Temp Wind Sun Rain Toxin
20.9 13.3 6.23 13.0 18.125.4 10.8 8.13 22.8 28.628.2 10.9 10.21 11.1 15.9
23.7 8.2 6.96 7.4 19.226.5 9.8 9.04 13.2 19.323.9 12.3 7.84 5.1 14.826.7 10.0 6.69 15.6 21.730.0 12.2 8.30 13.2 16.524.9 10.7 9.22 20.5 23.8
22.0 15.0 8.37 13.7 19.0
Toxin content (m
g/100g) andweather conditions at ten sites
7/23/2019 Regresi Linear Berganda 2014
32/50
Contoh Toxin dengan SAS
options nodate nonumber;
data toxin;
input Temp Wind Sun Rain Toxin;
cards;
20.9 13.3 6.23 13.0 18.1
25.4 10.8 8.13 22.8 28.6
28.2 10.9 10.21 11.1 15.9
23.7 8.2 6.96 7.4 19.2
26.5 9.8 9.04 13.2 19.3
23.9 12.3 7.84 5.1 14.8
26.7 10.0 6.69 15.6 21.7
30.0 12.2 8.30 13.2 16.5
24.9 10.7 9.22 20.5 23.822.0 15.0 8.37 13.7 19.0
;;;
proc reg data = toxin;
model Toxin = Temp Wind Sun Rain;
run;
7/23/2019 Regresi Linear Berganda 2014
33/50
Contoh Toxin dengan SAS
The SAS System
The REG Procedure
Model: MODEL1
Dependent Variable: Toxin
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 4 139.78209 34.94552 14.11 0.0062
Error 5 12.38691 2.47738
Corrected Total 9 152.16900
Root MSE 1.57397 R-Square 0.9186
Dependent Mean 19.69000 Adj R-Sq 0.8535Coeff Var 7.99375
Equation issignificant
7/23/2019 Regresi Linear Berganda 2014
34/50
Contoh Toxin dengan SAS
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 31.60838 7.10506 4.45 0.0067
Temp 1 -0.42013 0.24131 -1.74 0.1421
Wind 1 -0.79356 0.29770 -2.67 0.0446
Sun 1 -0.23747 0.50857 -0.47 0.6602
Rain 1 0.70676 0.10031 7.05 0.0009
Two
predictorsnot signif
7/23/2019 Regresi Linear Berganda 2014
35/50
Removing predictors
Temperature and Sunshine both non-significant.
DO NOT REMOVE BOTH AT ONCE.
May find that if we remove one of these, the
other becomes significant.
7/23/2019 Regresi Linear Berganda 2014
36/50
Removing predictors
Remove non-significantpredictors one at a time until allremaining predictors are
significant.
7/23/2019 Regresi Linear Berganda 2014
37/50
Removing predictors
Which is removed first?
Usually remove the least significant factor
(highest P value) first.But, use knowledge of the system concerned. Ifyou think a particular factor is especiallyimportant, but its P value is greater than someother factor, you might modify the order ofremoval to try to preserve the importantvariable.
7/23/2019 Regresi Linear Berganda 2014
38/50
ToxinRemove factor with highest p-value
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 31.60838 7.10506 4.45 0.0067
Temp 1 -0.42013 0.24131 -1.74 0.1421
Wind 1 -0.79356 0.29770 -2.67 0.0446
Sun 1 -0.23747 0.50857 -0.47 0.6602
Rain 1 0.70676 0.10031 7.05 0.0009
Remove Sunshine.Least significant
7/23/2019 Regresi Linear Berganda 2014
39/50
options nodate nonumber;
data toxin;
input Temp Wind Sun Rain Toxin;
cards;
20.9 13.3 6.23 13.0 18.1
25.4 10.8 8.13 22.8 28.6
28.2 10.9 10.21 11.1 15.9
23.7 8.2 6.96 7.4 19.2
26.5 9.8 9.04 13.2 19.3
23.9 12.3 7.84 5.1 14.8
26.7 10.0 6.69 15.6 21.7
30.0 12.2 8.30 13.2 16.5
24.9 10.7 9.22 20.5 23.822.0 15.0 8.37 13.7 19.0
;;;
proc reg data = toxin;
model Toxin = Temp Wind Rain;
run;
ToxinRemove factor with highest p-value
7/23/2019 Regresi Linear Berganda 2014
40/50
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 139.24195 46.41398 21.54 0.0013
Error 6 12.92705 2.15451
Corrected Total 9 152.16900
Root MSE 1.46782 R-Square 0.9150
Dependent Mean 19.69000 Adj R-Sq 0.8726
Coeff Var 7.45467
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 31.56513 6.62535 4.76 0.0031Temp 1 -0.47896 0.19193 -2.50 0.0468
Wind 1 -0.82177 0.27184 -3.02 0.0233
Rain 1 0.70108 0.09285 7.55 0.0003
Fungal toxin - 3 predictorsEquation issignificant
All three predictorsnow significant.
7/23/2019 Regresi Linear Berganda 2014
41/50
ToxinFinal equation
Toxin = 31.6 - 0.479 x Temp - 0.822 x Wind + 0.701 xRain
Warm sites produce lesstoxin
Windy sites produce lesstoxin
Wet sites producemore toxin
All predictions should be made using this equation, not the full equation.
Note plus or minus signs on the three terms
T i
7/23/2019 Regresi Linear Berganda 2014
42/50
ToxinUsing the equation
Consider a potential site ...Temp: 26 Wind: 11 Km/h Rain: 21cm/month
Predicted toxin would be:
Toxin = 31.6 - 0.479 x Temp - 0.822 x Wind + 0.701 x Rain
= 31.6 - 0.479 x 26 - 0.822 x 11 + 0.701 x 21
= 31.6 - 12.45 - 9.04 + 14.72
= 24.8 mg/100g
(A high value. Predict that this is not a good site to choose.)
7/23/2019 Regresi Linear Berganda 2014
43/50
options nodate nonumber;
data toxin;
input Temp Wind Sun Rain Toxin;
cards;
20.9 13.3 6.23 13.0 18.1
25.4 10.8 8.13 22.8 28.6
28.2 10.9 10.21 11.1 15.9
23.7 8.2 6.96 7.4 19.226.5 9.8 9.04 13.2 19.3
23.9 12.3 7.84 5.1 14.8
26.7 10.0 6.69 15.6 21.7
30.0 12.2 8.30 13.2 16.5
24.9 10.7 9.22 20.5 23.8
22.0 15.0 8.37 13.7 19.0
;;;
proc reg data = toxin;
model Toxin = Temp Wind Sun Rain;
run;
proc reg data = toxin;
model Toxin = Temp Wind Rain;
run;
Analisis Regresi Berganda DataToxin Secara Lengkap dengan SAS
7/23/2019 Regresi Linear Berganda 2014
44/50
Contoh 4 Set Data Anscombe
No Y1 X1 Y2 X2 Y3 X3 Y4 X4
1 8,04 10 9,14 10 7,46 10 6,58 8
2 6,95 8 8,14 8 6,77 8 5,76 8
3 7,58 13 8,74 13 12,74 13 7,71 8
4 8,81 9 8,77 9 7,11 9 8,84 8
5 8,33 11 9,26 11 7,81 11 8,47 8
6 9,96 14 8,10 14 8,84 14 7,04 8
7 7,24 6 6,13 6 6,08 6 5,25 8
8 4,26 4 3,10 4 5,39 4 12,50 19
9 10,84 12 9,13 12 8,15 12 5,56 8
10 4,82 7 7,26 7 6,42 7 7,91 8
11 5,68 5 4,74 5 5,73 5 6,89 8
7/23/2019 Regresi Linear Berganda 2014
45/50
4 Set Data Anscombe
Masing-masing ke-4 data set dianalisis dengan regresi linearsederhana dan menghasilkan output MINITAB berikut:
1. Data 1:Regression Analysis: Y1 versus X1
The regression equation is
Y1 = 3,00 + 0,500 X1
Predictor Coef SE Coef T P
Constant 3,000 1,125 2,67 0,026
X1 0,5001 0,1179 4,24 0,002
S = 1,23660 R-Sq = 66,7% R-Sq(adj) = 62,9%
Analysis of Variance
Source DF SS MS F P
Regression 1 27,510 27,510 17,99 0,002
Residual Error 9 13,763 1,529
Total 10 41,273
7/23/2019 Regresi Linear Berganda 2014
46/50
4 Set Data Anscombe2. Data 2:
Regression Analysis: Y2 versus X2
The regression equation is
Y2 = 3,00 + 0,500 X2
Predictor Coef SE Coef T PConstant 3,001 1,125 2,67 0,026
X2 0,5000 0,1180 4,24 0,002
S = 1,23721 R-Sq = 66,6% R-Sq(adj) = 62,9%
Analysis of VarianceSource DF SS MS F P
Regression 1 27,500 27,500 17,97 0,002
Residual Error 9 13,776 1,531
Total 10 41,276
7/23/2019 Regresi Linear Berganda 2014
47/50
4 Set Data Anscombe
3. Data 3:
Regression Analysis: Y3 versus X3The regression equation is
Y3 = 3,00 + 0,500 X3
Predictor Coef SE Coef T P
Constant 3,002 1,124 2,67 0,026
X3 0,4997 0,1179 4,24 0,002
S = 1,23631 R-Sq = 66,6% R-Sq(adj) = 62,9%
Analysis of VarianceSource DF SS MS F P
Regression 1 27,470 27,470 17,97 0,002
Residual Error 9 13,756 1,528
Total 10 41,226
7/23/2019 Regresi Linear Berganda 2014
48/50
4 Set Data Anscombe
4. Data 4:Regression Analysis: Y4 versus X4
The regression equation is
Y4 = 3,00 + 0,500 X4
Predictor Coef SE Coef T P
Constant 3,002 1,124 2,67 0,026
X4 0,4999 0,1178 4,24 0,002
S = 1,23570 R-Sq = 66,7% R-Sq(adj) = 63,0%
Analysis of VarianceSource DF SS MS F P
Regression 1 27,490 27,490 18,00 0,002
Residual Error 9 13,742 1,527
Total 10 41,232
7/23/2019 Regresi Linear Berganda 2014
49/50
4 Set Data Anscombe
Ternyata masing-masing ke-4 data set yangdianalisis dengan regresi linear sederhana danmenghasilkan output yang hampir sama!
Bila dilihat berdasarkan scatter plot untukmasing-masing data adalah sebagai berikut:
7/23/2019 Regresi Linear Berganda 2014
50/50
4 Set Data Anscombe
Ternyatake-4 datamenghasilkan kondisiatau bentukhubungan
yangberbeda!
Top Related