Pg 337..345: 3b, 6b (form and strength) Page 350..359: 10b, 12a, 16c, 16e.

Section 6.2:Regression, Prediction, and

Causation

Pg 337..345: 3b, 6b (form and strength) Page 350..359: 10b, 12a, 16c, 16e

Homework Turn In…

A straight line that describes how a response variable y changes as an explanatory variable x changes.

Often used to predict the value of y for a given value of x.

Also known as a line of best fit.

Regression Lines

Regression Line

You can predict the humeruslength when the femur length is 50 by following thedotted line to the regressionline and then moving horizontally to the humerusvalues.

Since different people may draw regression lines differently, an equation may give a more accurate prediction.

Because we want to predict y from x we want a line that is close to the points in the vertical (y) direction.

We need a way to find from the data the equation of the line that comes closest to the points in the vertical direction.

There are many ways to make the collection of vertical distances “as small as possible”…

Regression Equations

Least-Squares Regression Line of y on x is the line that makesthe sum of thesquares of the vertical distanceof the data points from the line as small as possible.

Least Squares Regression Line

To find the proper placement of the line, look at the vertical distances of the points near the line you have drawn.

Each data point d will represent the difference between the observed y-value of the point and the predicted y-value (where the line crossed). These differences can be positive, negative, or zero. When the point is above the line, the difference is positive, below the line is negative, on the line is zero.

You then take the sum of all the squared differences. The regression line is placed where the sum of all

the squared differences is the smallest.

Least-Squares Regression Line

Y = a + bx b = slope of the line (the amount by which y

changes when x increases by 1 unit) a = the intercept (the value of y when x =

0)

Can be used to predict points.

We will be doing a calculator activity on Tuesday to see how to get these equations from scatterplots.

Regression Equation of a Line

We often use several explanatory variables to predict a response.

The basic properties of predicting responses of a least-squares regression line are:◦ Prediction is based on fitting some “model” to a set of

data. (The regression line)◦ Prediction works best when the model fits the data closely.

(More trustworthy if data is close together, if patterns are not strong, prediction may be very inaccurate).

◦ Prediction outside the range of the available data is risky. (Checking within the range is ok, but assuming outside the range may not work—a child’s height for example, if continues on the same rate of growth may make the child 10 feet tall at age 20).

Understanding Prediction

Correlation and regression are closely connected; however correlation does not require you to choose an explanatory variable and regression does.

Both correlation and regression are strongly affected by outliers…

What do you think Hawaii is known for that is definitely an outlier compared to the other 49 states?

Correlation and Regression

Rainfall …

If Hawaii is included,r = 0.408; if Hawaii isnot included, r = 0.195.

If Hawaii is included,the LSRL is the solidline; if Hawaii is notincluded, the LSRL isthe dotted line.

The usefulness of the regression line for prediction depends on the strength of the correlation between the variables.

The square of the correlation is the right measure to use…

r squared will be a number between 0 and 1. The higher the number, higher the amount it accounts for all the variation along the line (you want a high number)…example 0.972 = 97.2% successful in explaining the regression line.

Correlation and Regression

A strong relationship between 2 variables does not always mean that changes in one variable cause changes in the other.

The relationship between two variables is often influenced by other variables lurking in the background.

The best evidence for causation comes from randomized comparative experiments.

The observed relationship between 2 variables may be due to direct causation, common response, or confounding.

An observed relationship can be used for prediction without worrying about causation as long as the patterns found in the past data continue to hold true.

Causation

There is a strong relationship between cigarette smoking and death rate from lung cancer. Does smoking cigarettes cause lung cancer?

There is a strong association between the availability of handguns in a nation and that nation’s homicide rate from guns. Does easy access to hand guns cause more murders?

Causation

Does watching television extend your lifespan?

◦ Countries which are rich enough to have televisions are probably also fortunate enough to have better nutrition, clean water, better health care, etc. than poorer nations.

◦ This was called a “nonsense correlation”. The correlation is real, but the conclusion is nonsense.

Causation

Common Response: a lurking variable influences both x and y creates a high correlation even though there is no direct connection between x and y. Ex., obesity in children: a lurking variable can be TV viewing time, but explanatory variables may be inheritance from parents, overeating, or lack of physical activity,

Causation

Confounding: a child may be overweight not because of their poor eating habits but because their parents provide poor choices (their parents have bad eating habits themselves).

Causation

If an experiment is not possible, you must meet the following criteria to prove causation:◦ The association between the variables is strong.◦ The association between the variables is

consistent.◦ Higher doses are associated with stronger

responses.◦ The alleged cause precedes the effect in time.◦ The alleged cause is plausible.

Evidence for Causation

Pages 370-371, #26, 27 Page 384, #35, 36, 37 Page 389, #40

Homework

Pg 337..345: 3b, 6b (form and strength) Page 350..359: 10b, 12a, 16c, 16e.

Documents

Transcript of Pg 337..345: 3b, 6b (form and strength) Page 350..359: 10b, 12a, 16c, 16e.