Pg 337..345: 3b, 6b (form and strength) Page 350..359: 10b, 12a, 16c, 16e.

20
Section 6.2: Regression, Prediction, and Causation

Transcript of Pg 337..345: 3b, 6b (form and strength) Page 350..359: 10b, 12a, 16c, 16e.

Page 1: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Section 6.2:Regression, Prediction, and

Causation

Page 2: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Pg 337..345: 3b, 6b (form and strength) Page 350..359: 10b, 12a, 16c, 16e

Homework Turn In…

Page 3: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

A straight line that describes how a response variable y changes as an explanatory variable x changes.

Often used to predict the value of y for a given value of x.

Also known as a line of best fit.

Regression Lines

Page 4: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Regression Line

You can predict the humeruslength when the femur length is 50 by following thedotted line to the regressionline and then moving horizontally to the humerusvalues.

Page 5: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Since different people may draw regression lines differently, an equation may give a more accurate prediction.

Because we want to predict y from x we want a line that is close to the points in the vertical (y) direction.

We need a way to find from the data the equation of the line that comes closest to the points in the vertical direction.

There are many ways to make the collection of vertical distances “as small as possible”…

Regression Equations

Page 6: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Least-Squares Regression Line of y on x is the line that makesthe sum of thesquares of the vertical distanceof the data points from the line as small as possible.

Least Squares Regression Line

Page 7: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

To find the proper placement of the line, look at the vertical distances of the points near the line you have drawn.

Each data point d will represent the difference between the observed y-value of the point and the predicted y-value (where the line crossed). These differences can be positive, negative, or zero. When the point is above the line, the difference is positive, below the line is negative, on the line is zero.

You then take the sum of all the squared differences. The regression line is placed where the sum of all

the squared differences is the smallest.

Least-Squares Regression Line

Page 8: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Y = a + bx b = slope of the line (the amount by which y

changes when x increases by 1 unit) a = the intercept (the value of y when x =

0)

Can be used to predict points.

We will be doing a calculator activity on Tuesday to see how to get these equations from scatterplots.

Regression Equation of a Line

Page 9: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

We often use several explanatory variables to predict a response.

The basic properties of predicting responses of a least-squares regression line are:◦ Prediction is based on fitting some “model” to a set of

data. (The regression line)◦ Prediction works best when the model fits the data closely.

(More trustworthy if data is close together, if patterns are not strong, prediction may be very inaccurate).

◦ Prediction outside the range of the available data is risky. (Checking within the range is ok, but assuming outside the range may not work—a child’s height for example, if continues on the same rate of growth may make the child 10 feet tall at age 20).

Understanding Prediction

Page 10: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Correlation and regression are closely connected; however correlation does not require you to choose an explanatory variable and regression does.

Both correlation and regression are strongly affected by outliers…

What do you think Hawaii is known for that is definitely an outlier compared to the other 49 states?

Correlation and Regression

Page 11: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Rainfall …

If Hawaii is included,r = 0.408; if Hawaii isnot included, r = 0.195.

If Hawaii is included,the LSRL is the solidline; if Hawaii is notincluded, the LSRL isthe dotted line.

Page 12: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

The usefulness of the regression line for prediction depends on the strength of the correlation between the variables.

The square of the correlation is the right measure to use…

r squared will be a number between 0 and 1. The higher the number, higher the amount it accounts for all the variation along the line (you want a high number)…example 0.972 = 97.2% successful in explaining the regression line.

Correlation and Regression

Page 13: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

A strong relationship between 2 variables does not always mean that changes in one variable cause changes in the other.

The relationship between two variables is often influenced by other variables lurking in the background.

The best evidence for causation comes from randomized comparative experiments.

The observed relationship between 2 variables may be due to direct causation, common response, or confounding.

An observed relationship can be used for prediction without worrying about causation as long as the patterns found in the past data continue to hold true.

Causation

Page 14: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

There is a strong relationship between cigarette smoking and death rate from lung cancer. Does smoking cigarettes cause lung cancer?

There is a strong association between the availability of handguns in a nation and that nation’s homicide rate from guns. Does easy access to hand guns cause more murders?

Causation

Page 15: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Does watching television extend your lifespan?

◦ Countries which are rich enough to have televisions are probably also fortunate enough to have better nutrition, clean water, better health care, etc. than poorer nations.

◦ This was called a “nonsense correlation”. The correlation is real, but the conclusion is nonsense.

Causation

Page 16: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Common Response: a lurking variable influences both x and y creates a high correlation even though there is no direct connection between x and y. Ex., obesity in children: a lurking variable can be TV viewing time, but explanatory variables may be inheritance from parents, overeating, or lack of physical activity,

Causation

Page 17: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Confounding: a child may be overweight not because of their poor eating habits but because their parents provide poor choices (their parents have bad eating habits themselves).

Causation

Page 18: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

If an experiment is not possible, you must meet the following criteria to prove causation:◦ The association between the variables is strong.◦ The association between the variables is

consistent.◦ Higher doses are associated with stronger

responses.◦ The alleged cause precedes the effect in time.◦ The alleged cause is plausible.

Evidence for Causation

Page 19: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Pages 370-371, #26, 27 Page 384, #35, 36, 37 Page 389, #40

Homework

Page 20: Pg 337..345: 3b, 6b (form and strength)  Page 350..359: 10b, 12a, 16c, 16e.

Pages 370-371, #26, 27 Page 384, #35, 36, 37 Page 389, #40

Homework