Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

20
Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 15 1

Transcript of Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Page 1: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Chapter 15

Describing Relationships: Regression, Prediction, and Causation

Chapter 15 1

Page 2: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Regression Line

A regression line is a straight line that describe how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.

Chapter 15 2

Page 3: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Linear Regression• Objective: To quantify the linear relationship between

an explanatory variable and a response variable.

We can then predict the average response for all subjects with a given value of the explanatory variable.

• Regression equation: y = a + bx– x is the value of the explanatory variable.– y is the average value of the response variable.

Chapter 15 3

Page 4: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Figure 15.1 Using a straight-line pattern for prediction, for Example 1. The data are the lengths of two bones in 5 fossils of the extinct beast Archaeopteryx.

Page 5: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Figure 15.2 A weaker straight-line pattern. The data are the percentage in each state who voted Democratic in the two Reagan presidential elections.

Page 6: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Least Squares

• Used to determine the “best” line.

• We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict).

• Least Squares: Use the line that minimizes the sum of the squares of the vertical distances of the data points from the line.

Chapter 15 6

Page 7: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Figure 15.3 A regression line aims to predict y from x. So a good regression line makes the vertical distances from the data points to the line small.

Page 8: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Prediction via Regression Line

Chapter 15 8

The regression equation is:

y = 3.6 + 0.97x

– y is the average age of all husbands who have wives of age x.

For all women aged 30, we predict the average husband age to be 32.7 years:

3.6 + (0.97)(30) = 32.7 years

Ages of Husband and Wife

Page 9: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Coefficient of Determination (r2)

• Measures usefulness of regression prediction.

• r2, the square of the correlation, measures the percentage of the variation in the values of the response variable (y) that is explained by the regression line. r=1: r2=1: regression line explains all (100%) of

the variation in y. r=.7: r2=.49: regression line explains almost half

(50%) of the variation in y.

Chapter 15 9

Page 10: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Computing r

Chapter 15 10

Page 11: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Beware of Extrapolation

• Sarah’s height was plotted against her age.

• Can you predict her height at age 42 months?

• Can you predict her height at age 30 years (360 months)?

80

85

90

95

100

30 35 40 45 50 55 60 65

age (months)

hei

gh

t (c

m)

Chapter 15 11

Page 12: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

• Regression line:y = 71.95 + .383 x

• height at age 42 months? y = 88 cm.

• height at age 30 years? y = 209.8 cm.– She is predicted to be

6' 10.5" at age 30.70

90

110

130

150

170

190

210

30 90 150 210 270 330 390

age (months)

hei

gh

t (c

m)

Chapter 15 12

Beware of Extrapolation

Page 13: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Correlation Does Not Imply Causation

Even very strong correlations may not correspond to a real causal

relationship.

Chapter 15 13

Page 14: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Chapter 15 14

Evidence of Causation• A properly conducted experiment

establishes the connection.• Other considerations:

– A reasonable explanation for a cause and effect exists.

– The connection happens in repeated trials. – The connection happens under varying

conditions.– Potential confounding factors are ruled out.– The alleged cause precedes the effect in time.

Page 15: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Chapter 15 15

Reasons Two Variables May Be Related (Correlated)

• Explanatory variable causes change in response variable.

• Response variable causes change in explanatory variable.

• Explanatory may have some cause, but is not the sole cause of changes in the response variable.

• Confounding variables may exist.• Both variables may result from a common cause

– such as, both variables changing over time.• The correlation may be merely a coincidence.

Page 16: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Chapter 15 16

Explanatory is notSole Contributor

barbecued foods are known to contain carcinogens, but other lifestyle choices may also contribute

• Explanatory: Consumption of barbecued foods

• Response: Incidence of stomach cancer

Page 17: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Chapter 15 17

Common Response(both variables change due to

common cause)

Both may result from an unhappy marriage.

• Explanatory: Divorce among men• Response: Percent abusing alcohol

Page 18: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Chapter 15 18

The Relationship May Be Just a Coincidence

Some strong correlations (or apparent associations) just by chance, even when the variables are not related in the population.

Page 19: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Chapter 15 19

Key Concepts

• Least Squares Regression Equation• R2

• Correlation does not imply causation• Confirming causation• Reasons variables may be correlated

Continued…

Page 20: Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.

Chapter 15 20

Cautions about Correlation and Regression

• Only describe linear relationships.

• Variables are both affected by outliers.

• Always plot the data before interpreting.

• Beware of extrapolation.– predicting outside of the range of x

• Beware of lurking variables.– They have important effect on the relationship among the

variables in a study, but are not included in the study.

• Association does not imply causation.