PSYC 3030 Review Session Gigi Luk December 7, 2004.

24
PSYC 3030 Review Session Gigi Luk December 7, 2004

Transcript of PSYC 3030 Review Session Gigi Luk December 7, 2004.

Page 1: PSYC 3030 Review Session Gigi Luk December 7, 2004.

PSYC 3030 Review Session

Gigi Luk

December 7, 2004

Page 2: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Overview

Matrix Multiple Regression Indicator variables Polynomial Regression Regression Diagnostics Model Building

Page 3: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Matrix: Basic Operation

Addition Subtraction Multiplication Inverse

|A| ≠ 0 A is non-singular All rows (columns) are linearly independent

Possible only when dimensions are the same

Possible only when inside dimensions are the same 2x3 & 3x2

Page 4: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Matrix: Inverse

441

221

331

982

175

423

95

42A

Linearly Dependent: Linearly independent:

15.2

25.4

2/22/5

2/42/9

25

49

2

11A

220184529|| A

Page 5: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Some notations

n = sample size p = number of parameters c = number of values in x (cf. LOF, p. 85) g = number of family member in a Bonferroni

test (cf. p. 92) J = I = H = x(x’x)-1x’

11

11

10

01

Page 6: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Matrix: estimates & residuals

LS estimates x’y = (x’x)b x’x =

x’y =

(x’x)-1=

Residuals e =

= y – xb

= [I – H]y

2ii

i

xx

xn

ii

i

yx

y

nx

xx

xxn i

ii

ii

2

22 )(

1

yy ˆ

Page 7: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Matrix: Application in Regression

SSE = e’e = y’y-b’x’y n-p SSE/n-p SSM = 1 SSR = b’x’y – SSM p-1 SSR/p-1 SST = y’y n SSTO = y’(1-J/n)y n-1

= y’y – SSM

df MS

Jyyn

yn '12

Page 8: PSYC 3030 Review Session Gigi Luk December 7, 2004.

var-cov (b) = est σ2(b) = s2(b) = = MSE (x’x)-1

=

Matrix: Variance-Covariance

33231

32221

31211

var)cov()cov(

)cov(var)cov(

)cov()cov(var

yyyyy

yyyyy

yyyyy

2110

1020

bbb

bbb

ss

ss

2222

2222

2

)()(

)()(

)(

iiii

iiii

xx

MSE

xx

MSExxx

MSEx

xx

xMSE

n

MSE

Var-cov (Y) = σ2(Y) =

Page 9: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Matrix: Variance-Covariance

)ˆ(}{

:nobservatio new a

of varianceEstimated

})'(ˆ{)()ˆ(

:responsemean a

of varianceEstimated

22

1'2'2

h

hhhhh

ysMSEpreds

xxxxMSExbsxys

Page 10: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Multiple Regression

Model with more than 2 independent variables: y = β0 + β1X1 + β2X2 + εi

22212

21211

21

'

iiii

iiii

ii

xxxx

xxxx

xxn

xx

ii

ii

i

yx

yx

y

yx

2

1'

Page 11: PSYC 3030 Review Session Gigi Luk December 7, 2004.

MR: R-square

Coefficients of multiple determination: R2 = SSR/SSTO

0 ≤ R2 ≤ 1 alternative:

Coefficients of partial determination:

SSTO

SSE1

)(

)|(

1

12212 xSSE

xxSSRry

)(

)|(

21

2132213 xxSSE

xxxSSRry

Page 12: PSYC 3030 Review Session Gigi Luk December 7, 2004.

SSTO

SSR(X2)

SSE(X2)

SSR(X1|X2)

SSR(X1)

SSE(X1)

SSR(X2|X1)

SSR(X1,X2)

SSE(X1,X2)

Page 13: PSYC 3030 Review Session Gigi Luk December 7, 2004.

MR: Hypothesis testing Test for regression relation (the overall test):

Ho: β1 = β2 =….. =βp-1 =0 Ha: not all βs = 0

If F* ≤ F(1-α; p-1, n-p), conclude Ho.

F*=MSR/MSE Test for βk:

Ho: βk = 0 Ha: βk ≠ 0

If |t|* ≤ t(1-α/2; n-p), conclude Ho.

t* = bk/s(bk) ≈ F*= [MSR(xk|all others)/MSE]

Page 14: PSYC 3030 Review Session Gigi Luk December 7, 2004.

MR: Hypothesis Testing (cont’) Test for LOF:

Ho: E{Y} = βo + β1X1+β2X2+….+ βp-1Xp-1

Ha: E{Y} ≠ βo + β1X1+β2X2+….+ βp-1Xp-1

If F* ≤ F(1-α; c-p, n-p), conclude Ho.

F* = (SSLF/c-p)/(SSPE/n-c) Test whether some βk=0:

Ho: βh = βh+1 =….. =βp-1 =0

If F* ≤ F(1-α; p-1, n-p), conclude Ho.

F* = [MSR(xh…xp-1|x1…xh-1)]/MSE

Page 15: PSYC 3030 Review Session Gigi Luk December 7, 2004.

MR: Extra SS (p. 141, CK) Full: y = βo+ β1X1+ β2X2 SSR(x1,x2)

Red: y = βo+ β1X1 SSR(x1)

SSR (x2|x1) = SSR(x1,x2) - SSR(x1)

= Effect of X2 adjusted for X1

= SSE(x1) - SSE(x1,x2) General Linear Test

Ho: β2 = 0 Ha: β2 ≠ 0

F* =FFFR df

FSSE

nn

SSESSTO

df

FSSE

dfdf

FSSERSSE )(

)2()1(

)()()(

Page 16: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Indicator variables

boys

girls

X = receptive vocabulary

Y = expressive vocabulary

0

y-hat = bo +b1X1y-hat = bo +b1X1 +b2X2

bo+b2

boslope = b1

Page 17: PSYC 3030 Review Session Gigi Luk December 7, 2004.

X = receptive vocabulary

Y = expressive vocabulary

0

boys

girls

y-hat = bo + b1X1 +b2X2 + b12X1X2

If b12 > 0, then there is an interaction boys and girls have different slopes in the relation of X and Y.

Page 18: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Polynomial Regression 2nd Order: Y = βo+ β1X1 + β2X2+εi

3rd Order: Y = βo+ β1X1 + β2X2+ β3X3+εi

Interaction:

Y = βo+ β1X1 + β2X2+ β11X2 1+ β22X2 2+

β12X1X2+ εi

linear quadratic

interaction

Page 19: PSYC 3030 Review Session Gigi Luk December 7, 2004.

PR: Partial F-test (p.303, 5th ed.) Test whether a 1st order model would be

sufficient:

Ho: β11= β22= β12= 0 Ha: not all βs in Ho =0

F* = pn

SSE

p

xxxxxxSSR

),|,,( 212122

21

In order to obtain this SSR, you need sequential SS (see top of p. 304 in text). This test is a modified test for extra SS.)

Page 20: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Regression Diagnostics Collinearity:

Effects: (1) poor numerical accuracy

(2) poor precision of estimatesDanger sign: several large s(bk)

Determinant of x’x ≈ 0Eigenvalues of c = # of linear dependenciesCondition #: (λmax/ λi)1/2

15-30 watch out > 30 trouble > 100 disaster

Page 21: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Regression DiagnosticsVIF (Variance Inflation Factor)

= 1/(1-R2i)

When to worry? When VIF ≈ 10TOL (Tolerance)

= 1/VIFi

Page 22: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Model Building

Goals:Make R2 large or MSE smallKeep cost of data collection, s(b) small

Selection Criteria: R2 look at ∆R2

MSE can or as variables are added

Page 23: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Model Building (cont’)

Cp≈ p = est. of 1/σ2

Σ{var(yhat) + [yhattrue – yhatp]}

=SSEp/MSEall – (n-2p)

=p+(m+1-p)(Fp-1)

m: # available predictors

Fp: incremental F for predictors omitted

Random error Bias

Page 24: PSYC 3030 Review Session Gigi Luk December 7, 2004.

Model Building (cont’) Variable Selection Procedure

Choose min MSE & Cp≈ p

SAS tools: Forward Backward Stepwise Guided selection: key vars, promising vars, haystack

Substantive knowledge of the area Examination of each var: expected sign &

magnitude coefficients