Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.
-
Upload
allyson-owens -
Category
Documents
-
view
229 -
download
4
Transcript of Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.
Chapter 15
Describing Relationships: Regression, Prediction, and Causation
Chapter 15 1
Regression Line
A regression line is a straight line that describe how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.
Chapter 15 2
Linear Regression• Objective: To quantify the linear relationship between
an explanatory variable and a response variable.
We can then predict the average response for all subjects with a given value of the explanatory variable.
• Regression equation: y = a + bx– x is the value of the explanatory variable.– y is the average value of the response variable.
Chapter 15 3
Figure 15.1 Using a straight-line pattern for prediction, for Example 1. The data are the lengths of two bones in 5 fossils of the extinct beast Archaeopteryx.
Figure 15.2 A weaker straight-line pattern. The data are the percentage in each state who voted Democratic in the two Reagan presidential elections.
Least Squares
• Used to determine the “best” line.
• We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict).
• Least Squares: Use the line that minimizes the sum of the squares of the vertical distances of the data points from the line.
Chapter 15 6
Figure 15.3 A regression line aims to predict y from x. So a good regression line makes the vertical distances from the data points to the line small.
Prediction via Regression Line
Chapter 15 8
The regression equation is:
y = 3.6 + 0.97x
– y is the average age of all husbands who have wives of age x.
For all women aged 30, we predict the average husband age to be 32.7 years:
3.6 + (0.97)(30) = 32.7 years
Ages of Husband and Wife
Coefficient of Determination (r2)
• Measures usefulness of regression prediction.
• r2, the square of the correlation, measures the percentage of the variation in the values of the response variable (y) that is explained by the regression line. r=1: r2=1: regression line explains all (100%) of
the variation in y. r=.7: r2=.49: regression line explains almost half
(50%) of the variation in y.
Chapter 15 9
Computing r
Chapter 15 10
Beware of Extrapolation
• Sarah’s height was plotted against her age.
• Can you predict her height at age 42 months?
• Can you predict her height at age 30 years (360 months)?
80
85
90
95
100
30 35 40 45 50 55 60 65
age (months)
hei
gh
t (c
m)
Chapter 15 11
• Regression line:y = 71.95 + .383 x
• height at age 42 months? y = 88 cm.
• height at age 30 years? y = 209.8 cm.– She is predicted to be
6' 10.5" at age 30.70
90
110
130
150
170
190
210
30 90 150 210 270 330 390
age (months)
hei
gh
t (c
m)
Chapter 15 12
Beware of Extrapolation
Correlation Does Not Imply Causation
Even very strong correlations may not correspond to a real causal
relationship.
Chapter 15 13
Chapter 15 14
Evidence of Causation• A properly conducted experiment
establishes the connection.• Other considerations:
– A reasonable explanation for a cause and effect exists.
– The connection happens in repeated trials. – The connection happens under varying
conditions.– Potential confounding factors are ruled out.– The alleged cause precedes the effect in time.
Chapter 15 15
Reasons Two Variables May Be Related (Correlated)
• Explanatory variable causes change in response variable.
• Response variable causes change in explanatory variable.
• Explanatory may have some cause, but is not the sole cause of changes in the response variable.
• Confounding variables may exist.• Both variables may result from a common cause
– such as, both variables changing over time.• The correlation may be merely a coincidence.
Chapter 15 16
Explanatory is notSole Contributor
barbecued foods are known to contain carcinogens, but other lifestyle choices may also contribute
• Explanatory: Consumption of barbecued foods
• Response: Incidence of stomach cancer
Chapter 15 17
Common Response(both variables change due to
common cause)
Both may result from an unhappy marriage.
• Explanatory: Divorce among men• Response: Percent abusing alcohol
Chapter 15 18
The Relationship May Be Just a Coincidence
Some strong correlations (or apparent associations) just by chance, even when the variables are not related in the population.
Chapter 15 19
Key Concepts
• Least Squares Regression Equation• R2
• Correlation does not imply causation• Confirming causation• Reasons variables may be correlated
Continued…
Chapter 15 20
Cautions about Correlation and Regression
• Only describe linear relationships.
• Variables are both affected by outliers.
• Always plot the data before interpreting.
• Beware of extrapolation.– predicting outside of the range of x
• Beware of lurking variables.– They have important effect on the relationship among the
variables in a study, but are not included in the study.
• Association does not imply causation.