Topic 12: Multiple Linear Regression

26
Topic 12: Multiple Linear Regression

description

Topic 12: Multiple Linear Regression. Outline. Multiple Regression Data and notation Model Inference Recall notes from Topic 3 for simple linear regression. Data for Multiple Regression. Y i is the response variable - PowerPoint PPT Presentation

Transcript of Topic 12: Multiple Linear Regression

Page 1: Topic 12: Multiple Linear Regression

Topic 12: Multiple Linear Regression

Page 2: Topic 12: Multiple Linear Regression

Outline

• Multiple Regression– Data and notation–Model – Inference

• Recall notes from Topic 3 for simple linear regression

Page 3: Topic 12: Multiple Linear Regression

Data for Multiple Regression

• Yi is the response variable• Xi1, Xi2, … , Xi,p-1 are p-1

explanatory (or predictor) variables • Cases denoted by i = 1 to n

Page 4: Topic 12: Multiple Linear Regression

Multiple Regression Model

• Yi is the value of the response variable for the ith case

• β0 is the intercept• β1, β2, … , βp-1 are the regression

coefficients for the explanatory variables

0 1 1 2 2 1 , 1...i i i p i p iY X X X

Page 5: Topic 12: Multiple Linear Regression

Multiple Regression Model

• Xi,k is the value of the kth explanatory variable for the ith case

• ei are independent Normally distributed random errors with mean 0 and variance σ2

0 1 1 2 2 1 , 1...i i i p i p iY X X X

Page 6: Topic 12: Multiple Linear Regression

Multiple Regression Parameters

• β0 is the intercept• β1, β2, … , βp-1 are the regression

coefficients for the explanatory variables

• σ2 the variance of the error term

Page 7: Topic 12: Multiple Linear Regression

Interesting special cases

• Yi = β0 + β1Xi + β2Xi2 +…+ βp-1Xi

p-1+ ei

(polynomial of order p-1)• X’s can be indicator or dummy variables

taking the values 0 and 1 (or any other two distinct numbers)

• Interactions between explanatory variables (represented as the product of explanatory variables)

Page 8: Topic 12: Multiple Linear Regression

Interesting special cases• Consider the model Yi= β0 + β1Xi1+ β2Xi2+β3X i1Xi2+ ei

• If X2 a dummy variable–Yi = β0 + β1Xi + ei (when X2=0)

–Yi = β0 + β1Xi1+β2+β3Xi1+ ei (when X2=1)

= (β0+β2) + (β1+β3)Xi1+ ei

–Modeling two different regression lines at same time

Page 9: Topic 12: Multiple Linear Regression

Model in Matrix Form

n 1 n p n 1p 1

2

n n

2

n n

~N(0, )

~ ( , )

β

Y X

I

Y N X I

Page 10: Topic 12: Multiple Linear Regression

Least Squares

YXXbX equations normal Obtain

)XbY()XbY(SSE minimize to b Find

Page 11: Topic 12: Multiple Linear Regression

Least Squares Solution

Fitted (predicted) values

YX)XX(b 1

HY YX)XX(XXbY 1

Page 12: Topic 12: Multiple Linear Regression

Residuals

Y)HI( HYY YYe

is symetric and idempotent ( )( ) ( )

I H

I H I H I H

Page 13: Topic 12: Multiple Linear Regression

Covariance Matrix of residuals

• Cov(e)=σ2(I-H)(I-H)΄= σ2(I-H)

• Var(ei)= σ2(1-hii)

• hii= X΄i(X΄X)-1Xi

• X΄i =(1,Xi1,…,Xi,p-1)

• Residuals are usually correlated

• Cov(ei,ej)= -σ2hij

Page 14: Topic 12: Multiple Linear Regression

Estimation of σ2

2

n( ) ( )

n

E

sp

pSSE MSEdf

s s Root MSE

Y Xb Y Xb

Page 15: Topic 12: Multiple Linear Regression

Distribution of b• b = (X΄X)-1X΄Y• Since Y~N(Xβ, σ2I)• E(b)=((X΄X)-1X΄)Xβ=β• Cov(b)=σ2 ((X΄X)-1X΄)((X΄X)-1X΄)΄

=σ2(X΄X)-1

• σ2 (X΄X)-1 is estimated by s2 (X΄X)-1

Page 16: Topic 12: Multiple Linear Regression

ANOVA Table• Sources of variation are –Model (SAS) or Regression (KNNL)–Error (SAS, KNNL) or Residual–Total

• SS and df add as before–SSM + SSE =SSTO–dfM + dfE = dfTotal

Page 17: Topic 12: Multiple Linear Regression

Sums of Squares

n 2

ii 1

n2

i ii 1

n2

ii 1

ˆSSM Y Y

ˆSSE (Y Y )

SSTO (Y Y)

Page 18: Topic 12: Multiple Linear Regression

Degrees of Freedom

1-n-n

1-

Total

E

M

dfpdf

pdf

Page 19: Topic 12: Multiple Linear Regression

Mean Squares

Total

E

M

SSTO/df MSTSSE/df MSESSM/dfMSM

Page 20: Topic 12: Multiple Linear Regression

Mean Squares

n 2

ii 1

n2

i ii 1

n2

ii 1

ˆMSM Y Y / ( 1)

ˆMSE (Y Y ) / (n )

MST (Y Y) / (n 1)

p

p

Page 21: Topic 12: Multiple Linear Regression

ANOVA Table

Source SS df MS F

Model SSM dfM MSM MSM/MSE

Error SSE dfE MSE Total SSTO dfTotal MST

Page 22: Topic 12: Multiple Linear Regression

ANOVA F test

• H0: β1 = β2 = … = βp-1 = 0

• Ha: βk ≠ 0, for at least one k=1,., p-1

• Under H0, F ~ F(p-1,n-p)

• Reject H0 if F is large, use P-value

Page 23: Topic 12: Multiple Linear Regression

P-value of F test• The P-value for the F significance test

tells us one of the following:– there is no evidence to conclude that

any of our explanatory variables can help us to model the response variable using this kind of model (P ≥ .05)–one or more of the explanatory

variables in our model is potentially useful for predicting the response variable in a linear model (P ≤ .05)

Page 24: Topic 12: Multiple Linear Regression

R2

• The squared multiple regression correlation (R2) gives the proportion of variation in the response variable explained by all the explanatory variables

• It is usually expressed as a percent• It is sometimes called the coefficient of

multiple determination (KNNL p 226)

Page 25: Topic 12: Multiple Linear Regression

R2

• R2 = SSM/SST– the proportion of variation explained

• R2 = 1 – (SSE/SST)– 1 – the proportion not explained

• Can express F test is terms of R2

F = [ (R2)/(p-1) ] / [ (1- R2)/(n-p) ]

Page 26: Topic 12: Multiple Linear Regression

Background Reading

• We went over KNNL 6.1 - 6.5