5.3 Scatter Plots and...

Post on 17-Mar-2020

68 views 0 download

Transcript of 5.3 Scatter Plots and...

§ 5.3 Scatter Plots and Correlation

Looking for Correlation

ExampleDoes the number of hours you watch TV per week impact youraverage grade in a class?

Hours 12 10 5 3 15 16 8Grade 70 85 82 88 65 75 68

To see if there is a relationship, we will create a scatter plot andanalyze it.

DefinitionA scatter plot is a geographical representation between twoquantitative variables. They may be from the same individual (i.e.education v. income, height v. weight) or from paired individuals (i.e.age of partners in a relationship).

Looking for Correlation

ExampleDoes the number of hours you watch TV per week impact youraverage grade in a class?

Hours 12 10 5 3 15 16 8Grade 70 85 82 88 65 75 68

To see if there is a relationship, we will create a scatter plot andanalyze it.

DefinitionA scatter plot is a geographical representation between twoquantitative variables. They may be from the same individual (i.e.education v. income, height v. weight) or from paired individuals (i.e.age of partners in a relationship).

Scatter Plots

When working with scatter plots, there are two variables. They maybe two different types.

DefinitionA response variable measures the outcome of a study.

DefinitionAn explanatory variable may explain or influence changes in aresponse variable.

Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.

Scatter Plots

When working with scatter plots, there are two variables. They maybe two different types.

DefinitionA response variable measures the outcome of a study.

DefinitionAn explanatory variable may explain or influence changes in aresponse variable.

Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.

Scatter Plots

When working with scatter plots, there are two variables. They maybe two different types.

DefinitionA response variable measures the outcome of a study.

DefinitionAn explanatory variable may explain or influence changes in aresponse variable.

Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.

Back to Our Example

In our example, which is the explanatory variable?

Watched TV hours.

The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”

Let’s plot the data and see what we have.

Back to Our Example

In our example, which is the explanatory variable?

Watched TV hours.

The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”

Let’s plot the data and see what we have.

Back to Our Example

In our example, which is the explanatory variable?

Watched TV hours.

The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”

Let’s plot the data and see what we have.

Back to Our Example

In our example, which is the explanatory variable?

Watched TV hours.

The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”

Let’s plot the data and see what we have.

The Scatter Plot

Grades v. Hours of TV

Hours of TV

Gra

de

70

80

90

65

75

85

5 10 15

••

How Does the Relationship Look?

What do we think?

It looks like the more hours of TV that are watched, the lower theaverage grade. But how good is the relationship? We can measure thisin different ways. One is direction (+,−) and another is by rankingthe strength. These are both accomplished by looking at thecorrelation coefficient.

How Does the Relationship Look?

What do we think?

It looks like the more hours of TV that are watched, the lower theaverage grade. But how good is the relationship? We can measure thisin different ways. One is direction (+,−) and another is by rankingthe strength. These are both accomplished by looking at thecorrelation coefficient.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.

4 Correlation is not resistant.5 Correlation has no units.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.4 Correlation is not resistant.

5 Correlation has no units.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.

So How Do We Find This Correlation Coefficient?

The Correlation Coefficient

r = 1n−1

∑(xi−x

Sx

)(yi−y

Sy

)= 1

n−1∑

zxzy

Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.

x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971

So How Do We Find This Correlation Coefficient?

The Correlation Coefficient

r = 1n−1

∑(xi−x

Sx

)(yi−y

Sy

)= 1

n−1∑

zxzy

Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.

x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971

So How Do We Find This Correlation Coefficient?

The Correlation Coefficient

r = 1n−1

∑(xi−x

Sx

)(yi−y

Sy

)= 1

n−1∑

zxzy

Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.

x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971

Finding the Correlation Coefficient

For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.

i zx zy product1 .4391 -.6848 -.3007

2 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454

-3.9026

r =16(−3.9026) = −.6504

Interpretation: Moderate negative correlation

Finding the Correlation Coefficient

For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.

i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454

-3.9026

r =16(−3.9026) = −.6504

Interpretation: Moderate negative correlation

Finding the Correlation Coefficient

For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.

i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454

-3.9026

r =16(−3.9026) = −.6504

Interpretation: Moderate negative correlation

Finding the Correlation Coefficient

For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.

i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454

-3.9026

r =16(−3.9026) = −.6504

Interpretation: Moderate negative correlation

So Can We Say There Is A Relationship?

So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...

Correlation does not necessarily imply causation.

Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.

Can you think of any lurking variables that would impact ourexample?

So Can We Say There Is A Relationship?

So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...

Correlation does not necessarily imply causation.

Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.

Can you think of any lurking variables that would impact ourexample?

So Can We Say There Is A Relationship?

So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...

Correlation does not necessarily imply causation.

Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.

Can you think of any lurking variables that would impact ourexample?

So Can We Say There Is A Relationship?

So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...

Correlation does not necessarily imply causation.

Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.

Can you think of any lurking variables that would impact ourexample?

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation

2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Assumptions and Conditions for Correlation

Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.

Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.

No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.

Assumptions and Conditions for Correlation

Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.

Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.

No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.

Assumptions and Conditions for Correlation

Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.

Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.

No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.

Another Example

ExampleThe following gives the power numbers for the starting 9 for the 2007Boston Red Sox. Is there relationship between the number of homeruns and the number of RBIs? Does the number of home runs affectthe number of RBIs? Produce a scatter plot and discuss thecorrelation.

Player Home Runs RBIsVaritek 17 68Youkilis 16 83Pedroia 8 50Lowell 21 120Lugo 8 73Ramirez 20 88Crisp 6 60Drew 11 64Ortiz 35 117

Red Sox Example

Which is the explanatory variable? Which is the response variable?

Since we are asking if HR affects RBIs, HR would be the explanatoryvariable and therefore x. So RBIs is the y variable.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Red Sox Example

Which is the explanatory variable? Which is the response variable?

Since we are asking if HR affects RBIs, HR would be the explanatoryvariable and therefore x. So RBIs is the y variable.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Before We Go On

Something to notice: we have two values with the same x-coordinate.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTS

Select LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTest

Make sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Using Technology

For our example, we have

r = .8463

So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Using Technology

For our example, we have

r = .8463

So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Using Technology

For our example, we have

r = .8463

So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Using Technology

For our example, we have

r = .8463

So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

One More Example

ExampleThere is some evidence that drinking moderate amounts of wine helpsprevent heart attacks. The accompanying table gives data on yearlywine consumption (in liters of alcohol from drinking wine per person)and yearly deaths from heart disease (per 100,000 people) in 19developing nations. Construct a scatter plot and describe what yousee.

Country Alcohol Deaths County Alcohol DeathsAustralia 2.5 211 Austria 3.9 167Belgium 2.9 131 Canada 2.4 191Denmark 2.9 220 Finland 0.8 297France 9.1 71 Iceland 0.8 211Ireland 0.7 300 Italy 7.9 107Netherlands 1.8 167 New Zealand 1.9 266Norway 0.8 227 Spain 6.5 86Sweden 1.6 207 Switzerland 5.8 115United Kingdom 1.3 285 United States 1.2 199West Germany 2.7 172

The Scatter Plot

Heart Disease v. Alcohol from Wine

Alcohol from Wine (in liters)

Dea

ths

(per

100,

000)

100

200

300

50

150

250

2 4 6 8

••

••

••

••

r = −.8428, strong negative correlation

The Scatter Plot

Heart Disease v. Alcohol from Wine

Alcohol from Wine (in liters)

Dea

ths

(per

100,

000)

100

200

300

50

150

250

2 4 6 8

••

••

••

••

r = −.8428, strong negative correlation

The Linear Regression Line

DefinitionA linear regression line is a straight line that describes how a responsevariable y changes as an explanatory variable x changes. We often usea regression line to predict the value of y for a given value of x.

Linear functions are of the form y = mx + b but we will considerthem as

∧y = b0 + b1x where b0 is the y-intercept and b1 is the slope.

The calculator actually uses the form∧y = a + bx so be careful.

The Linear Regression Line

DefinitionA linear regression line is a straight line that describes how a responsevariable y changes as an explanatory variable x changes. We often usea regression line to predict the value of y for a given value of x.

Linear functions are of the form y = mx + b but we will considerthem as

∧y = b0 + b1x where b0 is the y-intercept and b1 is the slope.

The calculator actually uses the form∧y = a + bx so be careful.

Formulas

What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.

b1 = rsy

sx

b0 = y− b1x

If the correlation coefficient is too small, there is no point in finding∧y

since b0 and b1 are both dependent on r.

Formulas

What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.

b1 = rsy

sx

b0 = y− b1x

If the correlation coefficient is too small, there is no point in finding∧y

since b0 and b1 are both dependent on r.

Formulas

What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.

b1 = rsy

sx

b0 = y− b1x

If the correlation coefficient is too small, there is no point in finding∧y

since b0 and b1 are both dependent on r.

Formulas

What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.

b1 = rsy

sx

b0 = y− b1x

If the correlation coefficient is too small, there is no point in finding∧y

since b0 and b1 are both dependent on r.

Example Using Given Values

ExampleThe following list gives the power numbers for starting 9 Red Soxplayers for the 2007 season.

Name Homeruns RBIsJason Varitek 17 68

Kevin Youkilis 16 83Dustin Pedroia 8 50Mike Lowell 21 120Julio Lugo 8 73

Manny Ramirez 20 88Coco Crisp 6 60J.D. Drew 11 64

David Ortiz 35 117

We want to know if there the number of homeruns affects the numberof RBIs.

The Needed Values

We can find the mean and standard deviation of both sets of dataquickly using our technology.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

And, since the data is already in the calculator, we can obtain thevalue of r.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

r = .8463

The Needed Values

We can find the mean and standard deviation of both sets of dataquickly using our technology.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

And, since the data is already in the calculator, we can obtain thevalue of r.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

r = .8463

The Needed Values

We can find the mean and standard deviation of both sets of dataquickly using our technology.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

And, since the data is already in the calculator, we can obtain thevalue of r.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

r = .8463

The Correlation Coefficient

We can use these values to find the equation of the regression line.

b1 = rsy

sx

= .8463(

24.479.05

)= 2.29

b0 = y− b1x

= 80.33− 2.29(15.78)

= 44.19

So, the regression line is

∧y = 44.19 + 2.29x

The Correlation Coefficient

We can use these values to find the equation of the regression line.

b1 = rsy

sx

= .8463(

24.479.05

)= 2.29

b0 = y− b1x

= 80.33− 2.29(15.78)

= 44.19

So, the regression line is

∧y = 44.19 + 2.29x

The Correlation Coefficient

We can use these values to find the equation of the regression line.

b1 = rsy

sx

= .8463(

24.479.05

)= 2.29

b0 = y− b1x

= 80.33− 2.29(15.78)

= 44.19

So, the regression line is

∧y = 44.19 + 2.29x

The Correlation Coefficient

We can use these values to find the equation of the regression line.

b1 = rsy

sx

= .8463(

24.479.05

)= 2.29

b0 = y− b1x

= 80.33− 2.29(15.78)

= 44.19

So, the regression line is

∧y = 44.19 + 2.29x

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.

In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.

In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

The Scatter Plot

Let’s see how good the regression line is by plotting it over the scatterplot.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

To do so, we press Y= and put the line under Y1, then selectGRAPH

The Scatter Plot

Let’s see how good the regression line is by plotting it over the scatterplot.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

To do so, we press Y= and put the line under Y1, then selectGRAPH

Plot and Line

And now with the regression line

∧y = 44.19 + 2.29x

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Predictions

One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want

∧y) and we are given a value of x = 60.

Our prediction would be

∧y = 44.19 + 2.29(60) = 181.59

So, our prediction is 182 RBIs.

Predictions

One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want

∧y) and we are given a value of x = 60.

Our prediction would be

∧y = 44.19 + 2.29(60) = 181.59

So, our prediction is 182 RBIs.

Predictions

One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want

∧y) and we are given a value of x = 60.

Our prediction would be

∧y = 44.19 + 2.29(60) = 181.59

So, our prediction is 182 RBIs.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is

little correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation

3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is

little correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation3 (x, y) is always on the line

4 This only shows us the linear model; it is possible that there islittle correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is

little correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is

little correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Another Sox Example

Suppose we wanted to know if a player was expected to score moreruns if he got more hits. To answer this question, we will use theroster of the 2011 Boston Red Sox.

Name Runs HitsJarred Saltalamacchia 52 84Adrian Gonzalez 108 213Dustin Pedroia 102 195Marco Scutaro 59 118Kevin Youkilis 68 111Carl Crawford 65 129Jacoby Ellsbury 119 212J.D. Drew 23 55David Ortiz 84 162Jed Lowrie 40 78Josh Reddick 41 71Jason Varitek 32 49Darnell McDonald 26 37Mike Aviles 17 32Mike Cameron 9 14Drew Sutton 11 17Ryan Lavarnway 5 9Yamaico Navarro 6 8Conor Jackson 2 3Jose Iglesias 3 2Lars Anderson 2 0Joey Gathright 1 0

The Correlation Coefficient

The first thing we will do is find the correlation coefficient.

When we plug all of the data into our technology, we get r = .9942.

Interpretation?

There is a strong, positive correlation between hits and runs scored.

The Correlation Coefficient

The first thing we will do is find the correlation coefficient.

When we plug all of the data into our technology, we get r = .9942.

Interpretation?

There is a strong, positive correlation between hits and runs scored.

The Correlation Coefficient

The first thing we will do is find the correlation coefficient.

When we plug all of the data into our technology, we get r = .9942.

Interpretation?

There is a strong, positive correlation between hits and runs scored.

The Correlation Coefficient

The first thing we will do is find the correlation coefficient.

When we plug all of the data into our technology, we get r = .9942.

Interpretation?

There is a strong, positive correlation between hits and runs scored.

Producing the Scatter Plot

Now, let’s produce a scatter plot for the data.

2011 Red Sox

Hits

Run

s

40

80

120

20

60

100

50 100 150 200

••

•• •

••••

•••••••••

Producing the Scatter Plot

Now, let’s produce a scatter plot for the data.

2011 Red Sox

Hits

Run

s

40

80

120

20

60

100

50 100 150 200

••

•• •

••••

•••••••••

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?

For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?

If a player has no hits, we expect 1.92 runs to be scored.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Scatter Plot With Regression Line

2011 Red Sox

Hits

Run

s

40

80

120

20

60

100

50 100 150 200

••

•• •

•••••••••••••

So, when we plot the regression line over the scatter plot, we see thatthe line is a good fit.

Scatter Plot With Regression Line

2011 Red Sox

Hits

Run

s

40

80

120

20

60

100

50 100 150 200

••

•• •

•••••••••••••

So, when we plot the regression line over the scatter plot, we see thatthe line is a good fit.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?

Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.

2 What if we wanted to know how many hits a player had if theyscored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?

We are given the value of∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...

∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not

among the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant

3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not

among the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.

4 There could be lurking variables. Those are ones that are notamong the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not

among the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not

among the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Example Where You Are Doing The Work

ExampleWe want to know know if there is a relationship between the score onthe math portion of the SAT exam and the number of hours studyingfor the test. The question is, “Does studying more increase the scoreon the exam?” The following data was taken from a study conductedof 20 students as they prepared and took the SAT exam.

Hours 4 9 10 14 4 7 12 22 1 3Score 390 580 650 730 410 530 600 790 350 400Hours 8 11 5 6 10 11 16 13 13 10Score 590 640 450 520 690 690 770 700 730 640

Variable Types

What is the response variable?

Math SAT score

What is the explanatory variable?Hours of study

Variable Types

What is the response variable?Math SAT score

What is the explanatory variable?Hours of study

Variable Types

What is the response variable?Math SAT score

What is the explanatory variable?

Hours of study

Variable Types

What is the response variable?Math SAT score

What is the explanatory variable?Hours of study

Correlation Coefficient

So let’s get first find the correlation coefficient to see what we aredealing with.

r = .9336

Our interpretation?

This tells us there is a strong positive correlation.

Correlation Coefficient

So let’s get first find the correlation coefficient to see what we aredealing with.

r = .9336

Our interpretation?

This tells us there is a strong positive correlation.

Correlation Coefficient

So let’s get first find the correlation coefficient to see what we aredealing with.

r = .9336

Our interpretation?

This tells us there is a strong positive correlation.

Is The Data Significant?

What is the inequality we are using?

r√

n > 3

Is this data significant?

r√

n = .9336√

20 ≈ 4.17 > 3

So, the data is significant based on this criteria.

Is The Data Significant?

What is the inequality we are using?

r√

n > 3

Is this data significant?

r√

n = .9336√

20 ≈ 4.17 > 3

So, the data is significant based on this criteria.

Is The Data Significant?

What is the inequality we are using?

r√

n > 3

Is this data significant?

r√

n = .9336√

20 ≈ 4.17 > 3

So, the data is significant based on this criteria.

Is The Data Significant?

What is the inequality we are using?

r√

n > 3

Is this data significant?

r√

n = .9336√

20 ≈ 4.17 > 3

So, the data is significant based on this criteria.

Visual Representation

Next, let’s produce our scatter plot so we can see what we are dealingwith.

Math SAT Score v. Hours of Study

Hours of Study

SAT

Scor

e

400

600

800

300

500

700

5 10 15 20

••

••

••

••

••

• ••

••

Visual Representation

Next, let’s produce our scatter plot so we can see what we are dealingwith.

Math SAT Score v. Hours of Study

Hours of Study

SAT

Scor

e

400

600

800

300

500

700

5 10 15 20

••

••

••

••

••

• ••

••

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable Condition

Both variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough Condition

Data looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier Condition

There do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken Condition

Pretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?

For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?

Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?

If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Scatter Plot With Regression Line

Math SAT Score v. Hours of Study

Hours of Study

SAT

Scor

e

400

600

800

300

500

700

5 10 15 20

••

••

••

••

••

• ••

•••

That data point where the person studied for 22 hours does look alittle sketchy, but it does not seem so far out of whack that it seems tobe an outlier.

Scatter Plot With Regression Line

Math SAT Score v. Hours of Study

Hours of Study

SAT

Scor

e

400

600

800

300

500

700

5 10 15 20

••

••

••

••

••

• ••

•••

That data point where the person studied for 22 hours does look alittle sketchy, but it does not seem so far out of whack that it seems tobe an outlier.

Predictions

So what score would we expect for a person who studied for 10 hours?

∧y = 353.16 + 25.33(10) = 606.46

So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.

If someone scored a 720, how many hours would we guess theystudied?

720 = 353.16 + 25.33x⇒ x = 14.48 hours

Predictions

So what score would we expect for a person who studied for 10 hours?

∧y = 353.16 + 25.33(10) = 606.46

So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.

If someone scored a 720, how many hours would we guess theystudied?

720 = 353.16 + 25.33x⇒ x = 14.48 hours

Predictions

So what score would we expect for a person who studied for 10 hours?

∧y = 353.16 + 25.33(10) = 606.46

So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.

If someone scored a 720, how many hours would we guess theystudied?

720 = 353.16 + 25.33x⇒ x = 14.48 hours

Predictions

So what score would we expect for a person who studied for 10 hours?

∧y = 353.16 + 25.33(10) = 606.46

So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.

If someone scored a 720, how many hours would we guess theystudied?

720 = 353.16 + 25.33x⇒ x = 14.48 hours