Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

12
Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown

Transcript of Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Page 1: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Tutorial 6

Thursday February 21MBP 1010

Kevin Brown

Page 2: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Linear Regression

Page 3: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Requires you to define?

• Y – independent variable• X – dependent variable(s)

Page 4: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Allows you to answer what questions?

•Is there an association (same question as the Pearson correlation coefficient)

•What is the association? Measured as the slope.

Page 5: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Assumes

•Linearity•Constant residual variance (homoscedasticity) / residuals normal

•Errors are independent (i.e. not clustered)

Page 6: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Homogeneity of variance

Page 7: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Outputs “estimates”

• intercept•slope•standard errors•t values•p-values•residual standard error (SSE – what is this?)•R2

Page 8: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Linear regression example: height vs. weightExtract information:

> summary(lm(HW[,2] ~ HW[,1]))

Call:lm(formula = HW[, 2] ~ HW[, 1])

Residuals: Min 1Q Median 3Q Max -36.490 -10.297 3.426 9.156 37.385

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.860 18.304 -0.156 0.876 HW[, 1] 42.090 9.449 4.454 5.02e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.12 on 48 degrees of freedomMultiple R-squared: 0.2925, Adjusted R-squared: 0.2777 F-statistic: 19.84 on 1 and 38 DF, p-value: 5.022e-05

Page 9: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Linear regression example: height vs. weightExtract information:

> summary(lm(HW[,2] ~ HW[,1]))

Call:lm(formula = HW[, 2] ~ HW[, 1])

Residuals: Min 1Q Median 3Q Max -36.490 -10.297 3.426 9.156 37.385

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.860 18.304 -0.156 0.876 HW[, 1] 42.090 9.449 4.454 5.02e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.12 on 48 degrees of freedomMultiple R-squared: 0.2925, Adjusted R-squared: 0.2777 F-statistic: 19.84 on 1 and 38 DF, p-value: 5.022e-05

Page 10: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Example

• Televisions, Physicians and Life Expectancy (World Almanac Factbook 1993) example– Residuals & Outliers– High leverage points & influential observations– Dummy variable coding– Transformations

• Take home messages– Regression is a very flexible tool– correlation ≠ causation

Page 11: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Dummy coding

• Creates an alternate variable that’s used for analysis

• For 2 categories you set values of …– reference level to 0– level of interest to 1

Page 12: Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown.

Do these treatments interact?

• Standard approach: ANOVA

Treatment #1

Trea

tmen

t #2 Interaction