5.3 Scatter Plots and...

161
§ 5.3 Scatter Plots and Correlation

Transcript of 5.3 Scatter Plots and...

Page 1: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

§ 5.3 Scatter Plots and Correlation

Page 2: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Looking for Correlation

ExampleDoes the number of hours you watch TV per week impact youraverage grade in a class?

Hours 12 10 5 3 15 16 8Grade 70 85 82 88 65 75 68

To see if there is a relationship, we will create a scatter plot andanalyze it.

DefinitionA scatter plot is a geographical representation between twoquantitative variables. They may be from the same individual (i.e.education v. income, height v. weight) or from paired individuals (i.e.age of partners in a relationship).

Page 3: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Looking for Correlation

ExampleDoes the number of hours you watch TV per week impact youraverage grade in a class?

Hours 12 10 5 3 15 16 8Grade 70 85 82 88 65 75 68

To see if there is a relationship, we will create a scatter plot andanalyze it.

DefinitionA scatter plot is a geographical representation between twoquantitative variables. They may be from the same individual (i.e.education v. income, height v. weight) or from paired individuals (i.e.age of partners in a relationship).

Page 4: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Scatter Plots

When working with scatter plots, there are two variables. They maybe two different types.

DefinitionA response variable measures the outcome of a study.

DefinitionAn explanatory variable may explain or influence changes in aresponse variable.

Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.

Page 5: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Scatter Plots

When working with scatter plots, there are two variables. They maybe two different types.

DefinitionA response variable measures the outcome of a study.

DefinitionAn explanatory variable may explain or influence changes in aresponse variable.

Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.

Page 6: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Scatter Plots

When working with scatter plots, there are two variables. They maybe two different types.

DefinitionA response variable measures the outcome of a study.

DefinitionAn explanatory variable may explain or influence changes in aresponse variable.

Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.

Page 7: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Back to Our Example

In our example, which is the explanatory variable?

Watched TV hours.

The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”

Let’s plot the data and see what we have.

Page 8: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Back to Our Example

In our example, which is the explanatory variable?

Watched TV hours.

The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”

Let’s plot the data and see what we have.

Page 9: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Back to Our Example

In our example, which is the explanatory variable?

Watched TV hours.

The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”

Let’s plot the data and see what we have.

Page 10: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Back to Our Example

In our example, which is the explanatory variable?

Watched TV hours.

The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”

Let’s plot the data and see what we have.

Page 11: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Scatter Plot

Grades v. Hours of TV

Hours of TV

Gra

de

70

80

90

65

75

85

5 10 15

••

Page 12: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

How Does the Relationship Look?

What do we think?

It looks like the more hours of TV that are watched, the lower theaverage grade. But how good is the relationship? We can measure thisin different ways. One is direction (+,−) and another is by rankingthe strength. These are both accomplished by looking at thecorrelation coefficient.

Page 13: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

How Does the Relationship Look?

What do we think?

It looks like the more hours of TV that are watched, the lower theaverage grade. But how good is the relationship? We can measure thisin different ways. One is direction (+,−) and another is by rankingthe strength. These are both accomplished by looking at thecorrelation coefficient.

Page 14: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.

Page 15: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.

Page 16: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.

4 Correlation is not resistant.5 Correlation has no units.

Page 17: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.4 Correlation is not resistant.

5 Correlation has no units.

Page 18: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts About Correlation Coefficients:

1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.

2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.

3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.

Page 19: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

So How Do We Find This Correlation Coefficient?

The Correlation Coefficient

r = 1n−1

∑(xi−x

Sx

)(yi−y

Sy

)= 1

n−1∑

zxzy

Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.

x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971

Page 20: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

So How Do We Find This Correlation Coefficient?

The Correlation Coefficient

r = 1n−1

∑(xi−x

Sx

)(yi−y

Sy

)= 1

n−1∑

zxzy

Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.

x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971

Page 21: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

So How Do We Find This Correlation Coefficient?

The Correlation Coefficient

r = 1n−1

∑(xi−x

Sx

)(yi−y

Sy

)= 1

n−1∑

zxzy

Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.

x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971

Page 22: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.

i zx zy product1 .4391 -.6848 -.3007

2 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454

-3.9026

r =16(−3.9026) = −.6504

Interpretation: Moderate negative correlation

Page 23: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.

i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454

-3.9026

r =16(−3.9026) = −.6504

Interpretation: Moderate negative correlation

Page 24: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.

i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454

-3.9026

r =16(−3.9026) = −.6504

Interpretation: Moderate negative correlation

Page 25: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.

i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454

-3.9026

r =16(−3.9026) = −.6504

Interpretation: Moderate negative correlation

Page 26: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

So Can We Say There Is A Relationship?

So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...

Correlation does not necessarily imply causation.

Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.

Can you think of any lurking variables that would impact ourexample?

Page 27: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

So Can We Say There Is A Relationship?

So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...

Correlation does not necessarily imply causation.

Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.

Can you think of any lurking variables that would impact ourexample?

Page 28: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

So Can We Say There Is A Relationship?

So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...

Correlation does not necessarily imply causation.

Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.

Can you think of any lurking variables that would impact ourexample?

Page 29: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

So Can We Say There Is A Relationship?

So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...

Correlation does not necessarily imply causation.

Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.

Can you think of any lurking variables that would impact ourexample?

Page 30: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Page 31: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Page 32: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation

2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Page 33: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Page 34: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Significance

We also need to test for significance to see what is going on.

If |r|√

n > 3, the correlation is significant

Otherwise it is not significant

The smaller this value, the smaller the probability that the correlationwill be significant.

Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data

Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.

Page 35: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Assumptions and Conditions for Correlation

Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.

Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.

No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.

Page 36: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Assumptions and Conditions for Correlation

Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.

Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.

No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.

Page 37: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Assumptions and Conditions for Correlation

Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.

Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.

No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.

Page 38: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Another Example

ExampleThe following gives the power numbers for the starting 9 for the 2007Boston Red Sox. Is there relationship between the number of homeruns and the number of RBIs? Does the number of home runs affectthe number of RBIs? Produce a scatter plot and discuss thecorrelation.

Player Home Runs RBIsVaritek 17 68Youkilis 16 83Pedroia 8 50Lowell 21 120Lugo 8 73Ramirez 20 88Crisp 6 60Drew 11 64Ortiz 35 117

Page 39: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Red Sox Example

Which is the explanatory variable? Which is the response variable?

Since we are asking if HR affects RBIs, HR would be the explanatoryvariable and therefore x. So RBIs is the y variable.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Page 40: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Red Sox Example

Which is the explanatory variable? Which is the response variable?

Since we are asking if HR affects RBIs, HR would be the explanatoryvariable and therefore x. So RBIs is the y variable.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Page 41: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Before We Go On

Something to notice: we have two values with the same x-coordinate.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Page 42: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Page 43: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Page 44: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Page 45: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTS

Select LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Page 46: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTest

Make sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Page 47: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Page 48: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Finding the Correlation Coefficient

What is our guess as to the correlation?

Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.

Input data in usual way, with explanatory variable under L1 andresponse variable under L2

Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively

Press Calculate and scroll to find r and r2

Page 49: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Using Technology

For our example, we have

r = .8463

So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Page 50: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Using Technology

For our example, we have

r = .8463

So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Page 51: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Using Technology

For our example, we have

r = .8463

So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Page 52: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Using Technology

For our example, we have

r = .8463

So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Page 53: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Page 54: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Page 55: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Page 56: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Page 57: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Page 58: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Page 59: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Technology and Scatter Plots

We can also create a scatter plot on the calculator.

Make sure there are no functions in the grapher (press Y= tocheck)

Input the data in the usual way (we already have it there for thisexample)

Press 2nd and Y= to get into the STAT PLOT menu

Make sure only the plot we want is turned on

Select the first graph in the first row and then make sure theXList and YList are correct

Press ZOOM 9

Page 60: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

One More Example

ExampleThere is some evidence that drinking moderate amounts of wine helpsprevent heart attacks. The accompanying table gives data on yearlywine consumption (in liters of alcohol from drinking wine per person)and yearly deaths from heart disease (per 100,000 people) in 19developing nations. Construct a scatter plot and describe what yousee.

Country Alcohol Deaths County Alcohol DeathsAustralia 2.5 211 Austria 3.9 167Belgium 2.9 131 Canada 2.4 191Denmark 2.9 220 Finland 0.8 297France 9.1 71 Iceland 0.8 211Ireland 0.7 300 Italy 7.9 107Netherlands 1.8 167 New Zealand 1.9 266Norway 0.8 227 Spain 6.5 86Sweden 1.6 207 Switzerland 5.8 115United Kingdom 1.3 285 United States 1.2 199West Germany 2.7 172

Page 61: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Scatter Plot

Heart Disease v. Alcohol from Wine

Alcohol from Wine (in liters)

Dea

ths

(per

100,

000)

100

200

300

50

150

250

2 4 6 8

••

••

••

••

r = −.8428, strong negative correlation

Page 62: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Scatter Plot

Heart Disease v. Alcohol from Wine

Alcohol from Wine (in liters)

Dea

ths

(per

100,

000)

100

200

300

50

150

250

2 4 6 8

••

••

••

••

r = −.8428, strong negative correlation

Page 63: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Linear Regression Line

DefinitionA linear regression line is a straight line that describes how a responsevariable y changes as an explanatory variable x changes. We often usea regression line to predict the value of y for a given value of x.

Linear functions are of the form y = mx + b but we will considerthem as

∧y = b0 + b1x where b0 is the y-intercept and b1 is the slope.

The calculator actually uses the form∧y = a + bx so be careful.

Page 64: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Linear Regression Line

DefinitionA linear regression line is a straight line that describes how a responsevariable y changes as an explanatory variable x changes. We often usea regression line to predict the value of y for a given value of x.

Linear functions are of the form y = mx + b but we will considerthem as

∧y = b0 + b1x where b0 is the y-intercept and b1 is the slope.

The calculator actually uses the form∧y = a + bx so be careful.

Page 65: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Formulas

What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.

b1 = rsy

sx

b0 = y− b1x

If the correlation coefficient is too small, there is no point in finding∧y

since b0 and b1 are both dependent on r.

Page 66: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Formulas

What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.

b1 = rsy

sx

b0 = y− b1x

If the correlation coefficient is too small, there is no point in finding∧y

since b0 and b1 are both dependent on r.

Page 67: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Formulas

What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.

b1 = rsy

sx

b0 = y− b1x

If the correlation coefficient is too small, there is no point in finding∧y

since b0 and b1 are both dependent on r.

Page 68: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Formulas

What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.

b1 = rsy

sx

b0 = y− b1x

If the correlation coefficient is too small, there is no point in finding∧y

since b0 and b1 are both dependent on r.

Page 69: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Example Using Given Values

ExampleThe following list gives the power numbers for starting 9 Red Soxplayers for the 2007 season.

Name Homeruns RBIsJason Varitek 17 68

Kevin Youkilis 16 83Dustin Pedroia 8 50Mike Lowell 21 120Julio Lugo 8 73

Manny Ramirez 20 88Coco Crisp 6 60J.D. Drew 11 64

David Ortiz 35 117

We want to know if there the number of homeruns affects the numberof RBIs.

Page 70: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Needed Values

We can find the mean and standard deviation of both sets of dataquickly using our technology.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

And, since the data is already in the calculator, we can obtain thevalue of r.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

r = .8463

Page 71: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Needed Values

We can find the mean and standard deviation of both sets of dataquickly using our technology.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

And, since the data is already in the calculator, we can obtain thevalue of r.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

r = .8463

Page 72: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Needed Values

We can find the mean and standard deviation of both sets of dataquickly using our technology.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

And, since the data is already in the calculator, we can obtain thevalue of r.

Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47

r = .8463

Page 73: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Correlation Coefficient

We can use these values to find the equation of the regression line.

b1 = rsy

sx

= .8463(

24.479.05

)= 2.29

b0 = y− b1x

= 80.33− 2.29(15.78)

= 44.19

So, the regression line is

∧y = 44.19 + 2.29x

Page 74: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Correlation Coefficient

We can use these values to find the equation of the regression line.

b1 = rsy

sx

= .8463(

24.479.05

)= 2.29

b0 = y− b1x

= 80.33− 2.29(15.78)

= 44.19

So, the regression line is

∧y = 44.19 + 2.29x

Page 75: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Correlation Coefficient

We can use these values to find the equation of the regression line.

b1 = rsy

sx

= .8463(

24.479.05

)= 2.29

b0 = y− b1x

= 80.33− 2.29(15.78)

= 44.19

So, the regression line is

∧y = 44.19 + 2.29x

Page 76: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Correlation Coefficient

We can use these values to find the equation of the regression line.

b1 = rsy

sx

= .8463(

24.479.05

)= 2.29

b0 = y− b1x

= 80.33− 2.29(15.78)

= 44.19

So, the regression line is

∧y = 44.19 + 2.29x

Page 77: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Page 78: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.

In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Page 79: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Page 80: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.

In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Page 81: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Page 82: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Practical Interpretation

What do these coefficients mean in practical terms?

The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.

The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.

Note: In context, we may need to round to whole numbers for theanswers to make any sense.

Page 83: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Scatter Plot

Let’s see how good the regression line is by plotting it over the scatterplot.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

To do so, we press Y= and put the line under Y1, then selectGRAPH

Page 84: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Scatter Plot

Let’s see how good the regression line is by plotting it over the scatterplot.

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

To do so, we press Y= and put the line under Y1, then selectGRAPH

Page 85: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Plot and Line

And now with the regression line

∧y = 44.19 + 2.29x

2007 Red Sox Power Numbers

Home Runs

RB

Is

40

80

120

20

60

100

10 20 30

••

••

• •

Page 86: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want

∧y) and we are given a value of x = 60.

Our prediction would be

∧y = 44.19 + 2.29(60) = 181.59

So, our prediction is 182 RBIs.

Page 87: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want

∧y) and we are given a value of x = 60.

Our prediction would be

∧y = 44.19 + 2.29(60) = 181.59

So, our prediction is 182 RBIs.

Page 88: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want

∧y) and we are given a value of x = 60.

Our prediction would be

∧y = 44.19 + 2.29(60) = 181.59

So, our prediction is 182 RBIs.

Page 89: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is

little correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Page 90: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation

3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is

little correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Page 91: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation3 (x, y) is always on the line

4 This only shows us the linear model; it is possible that there islittle correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Page 92: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is

little correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Page 93: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Facts about Regression Lines

1 Distinction between explanatory variables is essential -remember the formulas ...

2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is

little correlation linearly but that the data has a strong correlationif we were using some other type of model.

5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.

Page 94: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Another Sox Example

Suppose we wanted to know if a player was expected to score moreruns if he got more hits. To answer this question, we will use theroster of the 2011 Boston Red Sox.

Name Runs HitsJarred Saltalamacchia 52 84Adrian Gonzalez 108 213Dustin Pedroia 102 195Marco Scutaro 59 118Kevin Youkilis 68 111Carl Crawford 65 129Jacoby Ellsbury 119 212J.D. Drew 23 55David Ortiz 84 162Jed Lowrie 40 78Josh Reddick 41 71Jason Varitek 32 49Darnell McDonald 26 37Mike Aviles 17 32Mike Cameron 9 14Drew Sutton 11 17Ryan Lavarnway 5 9Yamaico Navarro 6 8Conor Jackson 2 3Jose Iglesias 3 2Lars Anderson 2 0Joey Gathright 1 0

Page 95: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Correlation Coefficient

The first thing we will do is find the correlation coefficient.

When we plug all of the data into our technology, we get r = .9942.

Interpretation?

There is a strong, positive correlation between hits and runs scored.

Page 96: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Correlation Coefficient

The first thing we will do is find the correlation coefficient.

When we plug all of the data into our technology, we get r = .9942.

Interpretation?

There is a strong, positive correlation between hits and runs scored.

Page 97: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Correlation Coefficient

The first thing we will do is find the correlation coefficient.

When we plug all of the data into our technology, we get r = .9942.

Interpretation?

There is a strong, positive correlation between hits and runs scored.

Page 98: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Correlation Coefficient

The first thing we will do is find the correlation coefficient.

When we plug all of the data into our technology, we get r = .9942.

Interpretation?

There is a strong, positive correlation between hits and runs scored.

Page 99: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Producing the Scatter Plot

Now, let’s produce a scatter plot for the data.

2011 Red Sox

Hits

Run

s

40

80

120

20

60

100

50 100 150 200

••

•• •

••••

•••••••••

Page 100: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Producing the Scatter Plot

Now, let’s produce a scatter plot for the data.

2011 Red Sox

Hits

Run

s

40

80

120

20

60

100

50 100 150 200

••

•• •

••••

•••••••••

Page 101: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

Page 102: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

Page 103: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

Page 104: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

Page 105: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

The Assumptions

The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.

Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.

Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?

Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?

Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?

Page 106: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Page 107: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Page 108: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?

For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Page 109: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Page 110: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?

If a player has no hits, we expect 1.92 runs to be scored.

Page 111: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Linear Regression Line

Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have

∧y = b0 + b1x = 1.92 + .52x

What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.

What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.

Page 112: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Scatter Plot With Regression Line

2011 Red Sox

Hits

Run

s

40

80

120

20

60

100

50 100 150 200

••

•• •

•••••••••••••

So, when we plot the regression line over the scatter plot, we see thatthe line is a good fit.

Page 113: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Scatter Plot With Regression Line

2011 Red Sox

Hits

Run

s

40

80

120

20

60

100

50 100 150 200

••

•• •

•••••••••••••

So, when we plot the regression line over the scatter plot, we see thatthe line is a good fit.

Page 114: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?

Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Page 115: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Page 116: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.

2 What if we wanted to know how many hits a player had if theyscored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Page 117: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?

We are given the value of∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Page 118: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...

∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Page 119: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.

∧y = 1.92 + .52(200) ≈ 105.92

So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they

scored 120 runs?We are given the value of

∧y and want to find the value of x. So,

we use our algebra skills ...∧y = 1.92 + .52x

120 = 1.92 + .52x

118.08 = .52x

227.08 = x

We expect about 227 hits.

Page 120: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not

among the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Page 121: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant

3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not

among the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Page 122: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.

4 There could be lurking variables. Those are ones that are notamong the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Page 123: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not

among the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Page 124: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Important Points

A few important points to keep in mind1 An observation is influential for a statistical calculation if

removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.

2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not

among the explanatory or response variables but may influencethe interpretation of the relationship.

5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.

Page 125: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Example Where You Are Doing The Work

ExampleWe want to know know if there is a relationship between the score onthe math portion of the SAT exam and the number of hours studyingfor the test. The question is, “Does studying more increase the scoreon the exam?” The following data was taken from a study conductedof 20 students as they prepared and took the SAT exam.

Hours 4 9 10 14 4 7 12 22 1 3Score 390 580 650 730 410 530 600 790 350 400Hours 8 11 5 6 10 11 16 13 13 10Score 590 640 450 520 690 690 770 700 730 640

Page 126: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Variable Types

What is the response variable?

Math SAT score

What is the explanatory variable?Hours of study

Page 127: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Variable Types

What is the response variable?Math SAT score

What is the explanatory variable?Hours of study

Page 128: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Variable Types

What is the response variable?Math SAT score

What is the explanatory variable?

Hours of study

Page 129: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Variable Types

What is the response variable?Math SAT score

What is the explanatory variable?Hours of study

Page 130: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Correlation Coefficient

So let’s get first find the correlation coefficient to see what we aredealing with.

r = .9336

Our interpretation?

This tells us there is a strong positive correlation.

Page 131: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Correlation Coefficient

So let’s get first find the correlation coefficient to see what we aredealing with.

r = .9336

Our interpretation?

This tells us there is a strong positive correlation.

Page 132: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Correlation Coefficient

So let’s get first find the correlation coefficient to see what we aredealing with.

r = .9336

Our interpretation?

This tells us there is a strong positive correlation.

Page 133: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Is The Data Significant?

What is the inequality we are using?

r√

n > 3

Is this data significant?

r√

n = .9336√

20 ≈ 4.17 > 3

So, the data is significant based on this criteria.

Page 134: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Is The Data Significant?

What is the inequality we are using?

r√

n > 3

Is this data significant?

r√

n = .9336√

20 ≈ 4.17 > 3

So, the data is significant based on this criteria.

Page 135: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Is The Data Significant?

What is the inequality we are using?

r√

n > 3

Is this data significant?

r√

n = .9336√

20 ≈ 4.17 > 3

So, the data is significant based on this criteria.

Page 136: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Is The Data Significant?

What is the inequality we are using?

r√

n > 3

Is this data significant?

r√

n = .9336√

20 ≈ 4.17 > 3

So, the data is significant based on this criteria.

Page 137: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Visual Representation

Next, let’s produce our scatter plot so we can see what we are dealingwith.

Math SAT Score v. Hours of Study

Hours of Study

SAT

Scor

e

400

600

800

300

500

700

5 10 15 20

••

••

••

••

••

• ••

••

Page 138: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Visual Representation

Next, let’s produce our scatter plot so we can see what we are dealingwith.

Math SAT Score v. Hours of Study

Hours of Study

SAT

Scor

e

400

600

800

300

500

700

5 10 15 20

••

••

••

••

••

• ••

••

Page 139: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 140: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable Condition

Both variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 141: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 142: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough Condition

Data looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 143: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 144: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier Condition

There do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 145: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 146: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken Condition

Pretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 147: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Checking Conditions/Assumptions

We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?

Quantitative Variable ConditionBoth variables are quantitative.

Straight Enough ConditionData looks reasonably straight.

Outlier ConditionThere do not seem to be any outliers.

Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.

Page 148: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Page 149: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Page 150: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?

For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Page 151: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Page 152: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?

Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Page 153: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Page 154: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?

If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Page 155: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

∧y = b0 + b1x

Next, we find the equation of the linear regression line.

∧y = 353.16 + 25.33x

What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.

What is the label for the slope?Points per hour of study

What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.

Page 156: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Scatter Plot With Regression Line

Math SAT Score v. Hours of Study

Hours of Study

SAT

Scor

e

400

600

800

300

500

700

5 10 15 20

••

••

••

••

••

• ••

•••

That data point where the person studied for 22 hours does look alittle sketchy, but it does not seem so far out of whack that it seems tobe an outlier.

Page 157: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Scatter Plot With Regression Line

Math SAT Score v. Hours of Study

Hours of Study

SAT

Scor

e

400

600

800

300

500

700

5 10 15 20

••

••

••

••

••

• ••

•••

That data point where the person studied for 22 hours does look alittle sketchy, but it does not seem so far out of whack that it seems tobe an outlier.

Page 158: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

So what score would we expect for a person who studied for 10 hours?

∧y = 353.16 + 25.33(10) = 606.46

So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.

If someone scored a 720, how many hours would we guess theystudied?

720 = 353.16 + 25.33x⇒ x = 14.48 hours

Page 159: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

So what score would we expect for a person who studied for 10 hours?

∧y = 353.16 + 25.33(10) = 606.46

So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.

If someone scored a 720, how many hours would we guess theystudied?

720 = 353.16 + 25.33x⇒ x = 14.48 hours

Page 160: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

So what score would we expect for a person who studied for 10 hours?

∧y = 353.16 + 25.33(10) = 606.46

So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.

If someone scored a 720, how many hours would we guess theystudied?

720 = 353.16 + 25.33x⇒ x = 14.48 hours

Page 161: 5.3 Scatter Plots and Correlationbtravers.weebly.com/uploads/6/7/2/9/6729909/scatterplots_and_correlation_slides.pdfScatter Plots When working with scatter plots, there are two variables.

Predictions

So what score would we expect for a person who studied for 10 hours?

∧y = 353.16 + 25.33(10) = 606.46

So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.

If someone scored a 720, how many hours would we guess theystudied?

720 = 353.16 + 25.33x⇒ x = 14.48 hours