Correlation/Regression - part 2

11
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3, 1/7. Look at the scatterplot… Example 2.13 shows that the prediction line is given by the linear equation (nea="nonexercise activity") predicted fat gain = 3.505 - 0.00344 x nea. The intercept (3.505 kg) equals the fat gain when non-exercise activity increase = 0 and the slope (-0.00344) equals the rate of change of fat gain per calorie increase in nea ; i.e., the predicted fat gain decreases by about 0.00344 kg for each calorie increase in nea. So to get the predicted value of fat gain for an nea of say 400 calories, you can either estimate it graphically from the line (next page) or numerically by evaluating the equation at nea = 400; pred. fat gain at nea of 400 = 3.505 - 0.00344 (400) = 2.129 kg But be careful about extrapolation!

description

Correlation/Regression - part 2. Consider Example 2.12 in section 2.3, 1/7. Look at the scatterplot… Example 2.13 shows that the prediction line is given by the linear equation (nea="nonexercise activity") predicted fat gain = 3.505 - 0.00344 x nea . - PowerPoint PPT Presentation

Transcript of Correlation/Regression - part 2

Page 1: Correlation/Regression - part 2

Correlation/Regression - part 2• Consider Example 2.12 in section 2.3, 1/7. Look at the

scatterplot… Example 2.13 shows that the prediction line is given by the linear equation (nea="nonexercise activity")

predicted fat gain = 3.505 - 0.00344 x nea.• The intercept (3.505 kg) equals the fat gain when non-

exercise activity increase = 0 and the slope (-0.00344) equals the rate of change of fat gain per calorie increase in nea; i.e., the predicted fat gain decreases by about 0.00344 kg for each calorie increase in nea.

• So to get the predicted value of fat gain for an nea of say 400 calories, you can either estimate it graphically from the line (next page) or numerically by evaluating the equation at nea = 400;

pred. fat gain at nea of 400 = 3.505 - 0.00344 (400) = 2.129 kg

• But be careful about extrapolation!

Page 2: Correlation/Regression - part 2

The graphical method: find nea=400 on the x-axis, draw a vertical line to intersect the regression line, then draw a horizontal line to intersect the y-axis - the place of intersection will be the predicted y for that value of x.

Page 3: Correlation/Regression - part 2

The least squares line makes the errors (or residuals) as small as possible by minimizing their sum of squares.

Page 4: Correlation/Regression - part 2

• The least squares process finds the values of b0 and b1 that minimize the sums of the squares of the errors to give y-hat = b0 + b1 x , where

b1 = r (sy/sx) and b0 = ybar - b1 xbar

• As we've noted before, use R to do these calculations for you - but notice a couple of things from these equations:– b1 and r have the same sign (since sy and sx are >0)– the prediction line always passes through the point

(xbar,ybar)

Page 5: Correlation/Regression - part 2

• Besides the correlation coefficient (r) having the same sign as the slope of the regression line it also has the property that its square r2 gives the proportion of total variability in y explained by the regression line of y on x.

• Another important idea to mention is that if you regress y on x (i.e., treat y as the response) you will get a different line than if you regress x on y (treat x as the response), even though the value of r will be the same in both cases! See the Figure 2.15 on the following slide - read about this important set of data in Example 2.17 on page 116.

Page 6: Correlation/Regression - part 2

• Regress velocity on distance (solid line) and distance on velocity (dashed line) to get two distinct lines - however, r = .7842 in both cases…

Page 7: Correlation/Regression - part 2

• Cautions about regression and correlation:– always look at the plot of the residuals (recall that for

every observed point on the scatterplot, we have:

residual at xi = observed yi - predicted yi)

A plot of the residuals against the explanatory variable should show a "random" scatter around zero - see Fig.2.20. There should

be no pattern

to the resids. Go over Ex.

2.20, p.128

Page 8: Correlation/Regression - part 2

– Beware of lurking variables! Go over Examples 2.21 & 2.22 on pages 129-132 (eBook section 2.4, 2/5)…

– Look out for outliers (in either the explanatory or response variable) and influential values (in the explanatory variable). Go over these examples carefully…note #18 is influential and #15 is an outlier in the y-direction.

Page 9: Correlation/Regression - part 2
Page 10: Correlation/Regression - part 2

Note that outliers in the y-direction can have large residuals, while outliers in the x-direction (possible influential values) might not have large residuals.

Page 11: Correlation/Regression - part 2

• All HW for Chapter 2:– section 2.1:

#2.6-2.9,2.11,2.13-2.15,2.18,2.19,2.21,2.26– section 2.2:

#2.29-2.32,2.35,2.39,2.41,2.43,2.46,2.50,2.51– section 2.3: #2.57-2.58,2.62,2.64,2.66,2.68,2.73,2.74– section 2.4: #2.85,2.87,2.89,2.94,2.96,2.97,2.101– section 2.5: #2.111-2.113,2.119, 2.121– section 2.6: #2.122, 2.125, 2.127, 2.129, 2.130– Chapter 2 Exercises: Do several of those on p. 161-

169