Download - Multiple Linear Regression - eskisehir.edu.tr 310/duyuru...Multiple regression model Note Any regression model that is linear in the parameters (theβ’ s) is a linear regression

Multiple Linear Regression

Sukru Acitas

Anadolu University, Department of Statistics, 26470 Eskisehir, TURKEY,[email protected].

ENM310 Experimental Design& Regression Analysis

Reference textbook

⊲ Montgomery, D. C., Peck, E. A., & Vining, G. G. (2015). Introductionto linear regression analysis. John Wiley & Sons. (Chapter 3)

Multiple Linear Regression Exp.Desg.& Reg.Ana. 2 / 35

Multiple regression model

Definition

A regression model that involves more than one regressor variable is calleda multiple regression model.



The response y may be related to k regressor or predictor variables.

Statistical model

yi = β0 + β1x1 + β1x2 + . . . + βkxk + ε (1)




Statistical model

yi = β0 + β1x1 + β1x2 + . . . + βkxk + ε (1)

The parameters βj (j = 0, 1, . . . , k), are called the regressioncoefficients.




Statistical model

yi = β0 + β1x1 + β1x2 + . . . + βkxk + ε (1)


This model describes a hyperplane in the k− dimensional space of theregressor variables xj .




Statistical model

yi = β0 + β1x1 + β1x2 + . . . + βkxk + ε (1)


This model describes a hyperplane in the k− dimensional space of theregressor variables xj .

The parameter βj represents the expected change in the response yper unit change in xj when all of the remaining regressor variables xj(i j) are held constant. For this reason the parameters βj are oftencalled partial regression coefficients.



Note

Any regression model that is linear in the parameters (theβ’ s) is a linearregression model, regardless of the shape of the surface that it generates.

Example

y = β0 + β1x1 + ε

y = β0 + β1x1 + β2x2 + ε

y = β0 + β1x1 + β2x1x2 + ε

y = β0 + β1x31 + β2x

22 + β3x3 + ε

y = β0 +√

β1x31 ++ε

y =1

β1x31 ++ε


Estimation of the model parameters

The method of least squares can be used to estimate the regressioncoefficients in model (1).

Suppose that n > k observations are available, and let yi denote thei − th observed response and xij denote the i − th observation or levelof regressor xj . The data will appear as in the following Table.

Data for multiple linear regression

i y x1 x2 . . . xk1 y1 x11 x12 . . . x1k2 y2 x21 x22 . . . x2k...

......

... . . ....

n yn xn1 xn2 . . . xnk



We may write the sample regression model corresponding to Eq. (1) asfollows:

Multiple linear regression model

yi = β0 + β1xi1 + β1xi2 + . . .+ βkxik + εi , i = 1, 2, . . . , n (2)

= β0 +

k∑

j=1

βjxij , i = 1, 2, . . . , n (3)



Assumptions:

The error term ε in the model has E (ε) = 0 , Var(ε) = σ2 , and thatthe errors are uncorrelated,

The regressor variables are fixed (i.e., mathematical or nonrandom)variables, measured without error and they are uncorrelated.

When testing hypotheses or constructing CIs, we will have to assumethat the error term ε has normal distribution with mean 0 andvariance σ2.



The least - squares function is

S(β0, β1, . . . , βk) =n∑

i=1

ε2i =n∑

i=1

yi − β0 −k∑

j=1

βjxij

2

(4)



The least - squares function is

S(β0, β1, . . . , βk) =n∑

i=1

ε2i =n∑

i=1

yi − β0 −k∑

j=1

βjxij

2

(4)

⊲ The function S must be minimized with respect to β0, β1, β2, . . . , βk .



The least - squares estimators of β0, β1, β2, . . . , βk must satisfy

∂S(β0, β1, . . . , βk)

∂β0

∣∣∣∣β̂0,β̂1,β̂2,...,β̂k

= (−2)

n∑

i=1

yi − β̂0 −

k∑

j=1

β̂jxij

= 0

(5)and

∂S(β0, β1, . . . , βk)

∂βj

∣∣∣∣β̂j ,β̂1,β̂2,...,β̂k

= (−2)n∑

i=1

yi − β̂0 −k∑

j=1

β̂jxij

xij = 0.

(6)



Simplifying Eq. (5) and (6), we obtain the least - squares normalequations:

Normal equations

n

n∑

i=1

yi = β̂0 + β̂1

n∑

i=1

xi1 + · · ·+ β̂k

n∑

i=1

xik (7)

n∑

i=1

xi1yi = β̂0

n∑

i=1

xi1 + β̂1

n∑

i=1

x2i1 + · · ·+ β̂k

n∑

i=1

xi1xik (8)

...n∑

i=1

xikyi = β̂0

n∑

i=1

xi1xik + β̂1

n∑

i=1

xi1xik + · · ·+ β̂k

n∑

i=1

xi1x2ik (9)



Note

There are p = k + 1 normal equations, one for each of the unknownregression coefficients. The solution to the normal equations will bethe least - squares estimators β̂0, β̂1, β̂2, . . . , β̂k .

It is more convenient to deal with multiple regression models if theyare expressed in matrix notation.

This allows a very compact display of the model, data, and results.


Matrix form of multiple linear regression model

In matrix notation, the model given by Eq. (3) is

Matrix notation

y = Xβ + ε (10)


Matrix form of multiple linear regression model

y =

y1y2...yn

, X =

1 x11 x12 · · · x1k1 x21 x22 · · · x2k...

... · · ·. . .

...1 xn1 xn2 · · · xnk

,

β =

β0β1β2...βk

ve ε =

ε1ε2...εn


Least-squares estimation

We wish to find the vector of least-squares estimators, β̂, that minimizes

Least-squares function: Matrix form

S(β) =

n∑

i=1

ε2i = ε′ε = (y − Xβ)′(y − Xβ) (11)



Note that S(β) may be expressed as

S(β) = y′y − β′X′y − y′Xβ + βX′Xβ′ (12)

= y′y − 2β′X′y + βX′Xβ′ (13)



The least-squares estimators must satisfy

∂S(β)

∂β

∣∣∣∣β=β̂

= −2X′y + 2X′Xβ̂ = 0 (14)

which simplifies to

Least - squares normal equations

X′y = X′Xβ̂ (15)



Least-squares estimator

β̂ = (X′X)−1X′y (16)


Some definitions

Fitted regression model

The fitted regression model is given by

ŷ = Xβ̂ = X(X′X)−1X′y. (17)

Hat matrix

The hat matrix is defined as

H = X(X′X)−1X′ (18)


Some definitions

Note

ŷ = Hy (19)


Some definitions

Note

ŷ = Hy (19)

The hat matrix maps the vector of observed values into a vector offitted values.


Some definitions

Note

ŷ = Hy (19)

The hat matrix maps the vector of observed values into a vector offitted values.

The hat matrix and its properties play a central role in regressionanalysis.


Some definitions

Residual

The difference between the observed value yi and the corresponding fittedvalue ŷi is the residual ei = yi − ŷi . The n residuals may be convenientlywritten in matrix notation as

e = y − ŷ. (20)


Some definitions

Alternative notation for residual

e = y − Xβ̂ (21)

= y −Hy (22)

= y(I−H) (23)


Estimation of σ2

As in simple linear regression, we may develop an estimator of σ2 from theresidual sum of squares

SSRes =n∑

i=

(yi − ŷi)2 =

n∑

i=1

e2i = e′e (24)

Substituting y − Xβ̂, we have

SSRes = e′e (25)

= (y − Xβ̂)′(y − Xβ̂) (26)

= y′y − 2β̂′

X′y + β̂′

X′Xβ̂︸︷︷︸

X′y

(27)

= y′y − β̂′

X′y. (28)


Estimation of σ2

The residual sum of squares has n − p degrees of freedom associatedwith it since p parameters are estimated in the regression model.

The residual mean square is

MSRes

MSRes =SSRes

n − p(29)

The expected value of MSRes is σ2, so an unbiased estimator of σ2 is

Estimator of σ2

σ̂2 = MSRes . (30)


Properties of LS estimators

β̂ is an unbiased estimator of β. That is E (β̂) = β.

The variance property of β̂ is expressed by the variance-covariancematrix:

Variance of β̂

Var(β̂) = E

{

(β̂ − β)′(β̂ − β)

}

= σ̂2(X′X)−1.

Var(β̂) is p × p symmetric matrix.

j−th diagonal element of Var(β̂) is the variance of β̂j .

(ij)−th off-diagonal element is the covariance between β̂i and β̂j .


Hypothesis testing in multiple linear regression

Once we have estimated the parameters in the model, we face twoimmediate questions:

What is the overall adequacy of the model?

What is the overall adequacy of the model?


Test for significance of regression

The test for significance of regression is a test to determine if there is alinear relationship between the response y and any of the regressorvariables x1, x2, . . . , xk . This procedure is often thought of as an overall orglobal test of model adequacy.

Hypotheses

H0 : β1 = β2 = · · · = βk = 0

H1 : βj 6= 0 for at least one j



Test Statistic

To test the null hypothesis H0,

F0 =SSReg/k

SSRes/(n − k − 1)=

MSReg

MSRes

is used. It can be shown that F0 has F distribution with degrees offreedom ν1 = k and ν2 = n − k − 1.



Total sum of squares

SST = y′y −

(n∑

i=1

yi

)2

n

Regression sum of squares

SSReg = β̂′

X′y −

(n∑

i=1

yi

)2

n

Residual sum of squares

SSRes = y′y − β̂

′

X′y



Decomposition of total sum of squares

SST = SSReg + SSRes



ANOVA Table

Source SS df MS F

Regression SSReg k MSReg F0Residual SSRes n − k − 1 MSResTotal SST n − 1



ANOVA Table

Source SS df MS F

Regression SSReg k MSReg F0Residual SSRes n − k − 1 MSResTotal SST n − 1

Reject H0 : β1 = β2 = · · · = βk = 0 if F0 > Fα,ν1,ν2.


R2 and Adjusted R2

Two other ways to assess the overall adequacy of the model are R2

and adjusted R2, denoted R2Adj .

In general, R2 never decreases when a regressor is added to themodel, regardless of the value of the contribution of that variable.Therefore, it is difficult to judge whether an increase in R2 is reallytelling us anything important.

Adjusted R2

R2Adj = 1−SSRes/(n − p)

SST/(n − 1)(31)


R2 and Adjusted R2

Two other ways to assess the overall adequacy of the model are R2

and adjusted R2, denoted R2Adj .

In general, R2 never decreases when a regressor is added to themodel, regardless of the value of the contribution of that variable.Therefore, it is difficult to judge whether an increase in R2 is reallytelling us anything important.

Adjusted R2

R2Adj = 1−SSRes/(n − p)

SST/(n − 1)(31)

R2Adj will only increase on adding a variable to the model if theaddition of the variable reduces the residual mean square.


Tests on individual regression coefficients

Adding a variable to a regression model always causes the sum ofsquares for regression to increase and the residual sum of squares todecrease.

We must decide whether the increase in the regression sum of squaresis sufficient to warrant using the additional regressor in the model.

The addition of a regressor also increases the variance of the fittedvalue ŷ , so we must be careful to include only regressors that are ofreal value in explaining the response.

Furthermore, adding an unimportant regressor may increase theresidual mean square, which may decrease the usefulness of themodel.



Hypotheses

H0 : βj = 0, j = 1, 2, . . . , k

H1 : βj 6= 0.

Test statistic

t0 =β̂j

se(β̂j ), j = 1, 2, . . . , k



Hypotheses

H0 : βj = 0, j = 1, 2, . . . , k

H1 : βj 6= 0.

Test statistic

t0 =β̂j

se(β̂j ), j = 1, 2, . . . , k

Reject H0 : βj = 0 if |t0| > tα/2,n−k−1.



Hypotheses

H0 : βj = 0, j = 1, 2, . . . , k

H1 : βj 6= 0.

Test statistic

t0 =β̂j

se(β̂j ), j = 1, 2, . . . , k


If H0 : βj = 0 is not rejected, then this indicates that the regressor xjcan be deleted from the model.



Hypotheses

H0 : βj = 0, j = 1, 2, . . . , k

H1 : βj 6= 0.

Test statistic

t0 =β̂j

se(β̂j ), j = 1, 2, . . . , k


If H0 : βj = 0 is not rejected, then this indicates that the regressor xjcan be deleted from the model.

This is really a partial or marginal test because the regressioncoefficient β̂j depends on all of the other regressor variables xi (i 6= j)that are in the model. Thus, this is a test of the contribution of xjgiven the other regressors in the model.


Thank you :)