Post on 17-Mar-2020
§ 5.3 Scatter Plots and Correlation
Looking for Correlation
ExampleDoes the number of hours you watch TV per week impact youraverage grade in a class?
Hours 12 10 5 3 15 16 8Grade 70 85 82 88 65 75 68
To see if there is a relationship, we will create a scatter plot andanalyze it.
DefinitionA scatter plot is a geographical representation between twoquantitative variables. They may be from the same individual (i.e.education v. income, height v. weight) or from paired individuals (i.e.age of partners in a relationship).
Looking for Correlation
ExampleDoes the number of hours you watch TV per week impact youraverage grade in a class?
Hours 12 10 5 3 15 16 8Grade 70 85 82 88 65 75 68
To see if there is a relationship, we will create a scatter plot andanalyze it.
DefinitionA scatter plot is a geographical representation between twoquantitative variables. They may be from the same individual (i.e.education v. income, height v. weight) or from paired individuals (i.e.age of partners in a relationship).
Scatter Plots
When working with scatter plots, there are two variables. They maybe two different types.
DefinitionA response variable measures the outcome of a study.
DefinitionAn explanatory variable may explain or influence changes in aresponse variable.
Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.
Scatter Plots
When working with scatter plots, there are two variables. They maybe two different types.
DefinitionA response variable measures the outcome of a study.
DefinitionAn explanatory variable may explain or influence changes in aresponse variable.
Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.
Scatter Plots
When working with scatter plots, there are two variables. They maybe two different types.
DefinitionA response variable measures the outcome of a study.
DefinitionAn explanatory variable may explain or influence changes in aresponse variable.
Explanatory variables are often called independent and are on thex-axis. Response variables are often called dependent and are on they-axis.
Back to Our Example
In our example, which is the explanatory variable?
Watched TV hours.
The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”
Let’s plot the data and see what we have.
Back to Our Example
In our example, which is the explanatory variable?
Watched TV hours.
The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”
Let’s plot the data and see what we have.
Back to Our Example
In our example, which is the explanatory variable?
Watched TV hours.
The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”
Let’s plot the data and see what we have.
Back to Our Example
In our example, which is the explanatory variable?
Watched TV hours.
The response variable is there for the average grade. So the questionwe are trying to answer is “Does watching TV influence the averagegrade in a class?”
Let’s plot the data and see what we have.
The Scatter Plot
Grades v. Hours of TV
Hours of TV
Gra
de
70
80
90
65
75
85
5 10 15
•
••
•
•
•
•
How Does the Relationship Look?
What do we think?
It looks like the more hours of TV that are watched, the lower theaverage grade. But how good is the relationship? We can measure thisin different ways. One is direction (+,−) and another is by rankingthe strength. These are both accomplished by looking at thecorrelation coefficient.
How Does the Relationship Look?
What do we think?
It looks like the more hours of TV that are watched, the lower theaverage grade. But how good is the relationship? We can measure thisin different ways. One is direction (+,−) and another is by rankingthe strength. These are both accomplished by looking at thecorrelation coefficient.
Facts About Correlation Coefficients:
1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.
2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.
3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.
Facts About Correlation Coefficients:
1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.
2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.
3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.
Facts About Correlation Coefficients:
1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.
2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.
3 Correlation measures only the linear relationship.
4 Correlation is not resistant.5 Correlation has no units.
Facts About Correlation Coefficients:
1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.
2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.
3 Correlation measures only the linear relationship.4 Correlation is not resistant.
5 Correlation has no units.
Facts About Correlation Coefficients:
1 −1 ≤ r ≤ 1. The least correlation is 0 and the best correlation is±1. Whether r is positive or negative only tells us whichdirection the relationship goes - whether y increases as xincreases or if y decreases as x increases. Being negative is not“bad”.
2 Correlation makes no distinction between x and y, that is,between the choice of explanatory and response variables. Weneed to make sure we are careful, though, as the next part(regression line) depends heavily on the correct choice.
3 Correlation measures only the linear relationship.4 Correlation is not resistant.5 Correlation has no units.
So How Do We Find This Correlation Coefficient?
The Correlation Coefficient
r = 1n−1
∑(xi−x
Sx
)(yi−y
Sy
)= 1
n−1∑
zxzy
Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.
x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971
So How Do We Find This Correlation Coefficient?
The Correlation Coefficient
r = 1n−1
∑(xi−x
Sx
)(yi−y
Sy
)= 1
n−1∑
zxzy
Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.
x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971
So How Do We Find This Correlation Coefficient?
The Correlation Coefficient
r = 1n−1
∑(xi−x
Sx
)(yi−y
Sy
)= 1
n−1∑
zxzy
Let’s find the correlation coefficient for our example. First, we need afew values, x, y, Sx, Sy.
x = 9.857 y = 76.143Sx = 4.880 Sy = 8.971
Finding the Correlation Coefficient
For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.
i zx zy product1 .4391 -.6848 -.3007
2 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454
-3.9026
r =16(−3.9026) = −.6504
Interpretation: Moderate negative correlation
Finding the Correlation Coefficient
For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.
i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454
-3.9026
r =16(−3.9026) = −.6504
Interpretation: Moderate negative correlation
Finding the Correlation Coefficient
For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.
i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454
-3.9026
r =16(−3.9026) = −.6504
Interpretation: Moderate negative correlation
Finding the Correlation Coefficient
For each pair, find the z-score for each value. Then multiply themtogether. After summing, divide by n− 1.
i zx zy product1 .4391 -.6848 -.30072 .0293 .9873 .02893 -.9953 .6529 -.64984 -1.4050 1.3217 -1.85705 1.0539 -1.2421 -1.30906 1.2588 -.1274 -.16047 -.3805 -.9077 .3454
-3.9026
r =16(−3.9026) = −.6504
Interpretation: Moderate negative correlation
So Can We Say There Is A Relationship?
So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...
Correlation does not necessarily imply causation.
Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.
Can you think of any lurking variables that would impact ourexample?
So Can We Say There Is A Relationship?
So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...
Correlation does not necessarily imply causation.
Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.
Can you think of any lurking variables that would impact ourexample?
So Can We Say There Is A Relationship?
So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...
Correlation does not necessarily imply causation.
Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.
Can you think of any lurking variables that would impact ourexample?
So Can We Say There Is A Relationship?
So, can we say that there is a direct relationship between the numberof hours of TV watched and the average grade? Not so fast ...
Correlation does not necessarily imply causation.
Just because it looks the part does not mean we have evidence thatthere is a relationship. We have to consider a couple of other things.One is lurking variables. These are variables that may be present butwe are not actually considering them within the data.
Can you think of any lurking variables that would impact ourexample?
Significance
We also need to test for significance to see what is going on.
If |r|√
n > 3, the correlation is significant
Otherwise it is not significant
The smaller this value, the smaller the probability that the correlationwill be significant.
Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data
Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.
Significance
We also need to test for significance to see what is going on.
If |r|√
n > 3, the correlation is significant
Otherwise it is not significant
The smaller this value, the smaller the probability that the correlationwill be significant.
Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data
Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.
Significance
We also need to test for significance to see what is going on.
If |r|√
n > 3, the correlation is significant
Otherwise it is not significant
The smaller this value, the smaller the probability that the correlationwill be significant.
Reasons why data may not be significant:1 Genuine lack of correlation
2 Not enough data
Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.
Significance
We also need to test for significance to see what is going on.
If |r|√
n > 3, the correlation is significant
Otherwise it is not significant
The smaller this value, the smaller the probability that the correlationwill be significant.
Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data
Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.
Significance
We also need to test for significance to see what is going on.
If |r|√
n > 3, the correlation is significant
Otherwise it is not significant
The smaller this value, the smaller the probability that the correlationwill be significant.
Reasons why data may not be significant:1 Genuine lack of correlation2 Not enough data
Our example is not significant because of quantity. So we cannotconsider that watching TV has a direct impact on grades.
Assumptions and Conditions for Correlation
Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.
Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.
No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.
Assumptions and Conditions for Correlation
Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.
Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.
No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.
Assumptions and Conditions for Correlation
Quantitative Variables ConditionDon’t make the common error of calling an association involvinga categorical variable a correlation. Correlation is only aboutquantitative variables.
Straight Enough ConditionThe best check for the assumption that the variables are trulylinearly related is to look at the scatter plot to see whether itlooks reasonably straight. That’s a judgment call, but not adifficult one.
No Outliers ConditionOutliers can distort the correlation dramatically, making a weakassociation look strong or a strong one look weak. Outliers caneven change the sign of the correlation. But it’s easy to seeoutlier in the scatter plot, so to check this condition, just look.
Another Example
ExampleThe following gives the power numbers for the starting 9 for the 2007Boston Red Sox. Is there relationship between the number of homeruns and the number of RBIs? Does the number of home runs affectthe number of RBIs? Produce a scatter plot and discuss thecorrelation.
Player Home Runs RBIsVaritek 17 68Youkilis 16 83Pedroia 8 50Lowell 21 120Lugo 8 73Ramirez 20 88Crisp 6 60Drew 11 64Ortiz 35 117
Red Sox Example
Which is the explanatory variable? Which is the response variable?
Since we are asking if HR affects RBIs, HR would be the explanatoryvariable and therefore x. So RBIs is the y variable.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
Red Sox Example
Which is the explanatory variable? Which is the response variable?
Since we are asking if HR affects RBIs, HR would be the explanatoryvariable and therefore x. So RBIs is the y variable.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
Before We Go On
Something to notice: we have two values with the same x-coordinate.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
Finding the Correlation Coefficient
What is our guess as to the correlation?
Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.
Input data in usual way, with explanatory variable under L1 andresponse variable under L2
Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively
Press Calculate and scroll to find r and r2
Finding the Correlation Coefficient
What is our guess as to the correlation?
Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.
Input data in usual way, with explanatory variable under L1 andresponse variable under L2
Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively
Press Calculate and scroll to find r and r2
Finding the Correlation Coefficient
What is our guess as to the correlation?
Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.
Input data in usual way, with explanatory variable under L1 andresponse variable under L2
Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively
Press Calculate and scroll to find r and r2
Finding the Correlation Coefficient
What is our guess as to the correlation?
Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.
Input data in usual way, with explanatory variable under L1 andresponse variable under L2
Press STAT and scroll to TESTS
Select LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively
Press Calculate and scroll to find r and r2
Finding the Correlation Coefficient
What is our guess as to the correlation?
Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.
Input data in usual way, with explanatory variable under L1 andresponse variable under L2
Press STAT and scroll to TESTSSelect LinRegTTest
Make sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively
Press Calculate and scroll to find r and r2
Finding the Correlation Coefficient
What is our guess as to the correlation?
Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.
Input data in usual way, with explanatory variable under L1 andresponse variable under L2
Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively
Press Calculate and scroll to find r and r2
Finding the Correlation Coefficient
What is our guess as to the correlation?
Now let’s find the correlation coefficient. But there must be an easierway ... and that way would be technology.
Input data in usual way, with explanatory variable under L1 andresponse variable under L2
Press STAT and scroll to TESTSSelect LinRegTTestMake sure the XList and YList are the lists where the data forthe explanatory and response variables are located, respectively
Press Calculate and scroll to find r and r2
Using Technology
For our example, we have
r = .8463
So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
Using Technology
For our example, we have
r = .8463
So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
Using Technology
For our example, we have
r = .8463
So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
Using Technology
For our example, we have
r = .8463
So the correlation coefficient tells us that there is a strong positivecorrelation. So, we should find the regression line.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
Technology and Scatter Plots
We can also create a scatter plot on the calculator.
Make sure there are no functions in the grapher (press Y= tocheck)
Input the data in the usual way (we already have it there for thisexample)
Press 2nd and Y= to get into the STAT PLOT menu
Make sure only the plot we want is turned on
Select the first graph in the first row and then make sure theXList and YList are correct
Press ZOOM 9
Technology and Scatter Plots
We can also create a scatter plot on the calculator.
Make sure there are no functions in the grapher (press Y= tocheck)
Input the data in the usual way (we already have it there for thisexample)
Press 2nd and Y= to get into the STAT PLOT menu
Make sure only the plot we want is turned on
Select the first graph in the first row and then make sure theXList and YList are correct
Press ZOOM 9
Technology and Scatter Plots
We can also create a scatter plot on the calculator.
Make sure there are no functions in the grapher (press Y= tocheck)
Input the data in the usual way (we already have it there for thisexample)
Press 2nd and Y= to get into the STAT PLOT menu
Make sure only the plot we want is turned on
Select the first graph in the first row and then make sure theXList and YList are correct
Press ZOOM 9
Technology and Scatter Plots
We can also create a scatter plot on the calculator.
Make sure there are no functions in the grapher (press Y= tocheck)
Input the data in the usual way (we already have it there for thisexample)
Press 2nd and Y= to get into the STAT PLOT menu
Make sure only the plot we want is turned on
Select the first graph in the first row and then make sure theXList and YList are correct
Press ZOOM 9
Technology and Scatter Plots
We can also create a scatter plot on the calculator.
Make sure there are no functions in the grapher (press Y= tocheck)
Input the data in the usual way (we already have it there for thisexample)
Press 2nd and Y= to get into the STAT PLOT menu
Make sure only the plot we want is turned on
Select the first graph in the first row and then make sure theXList and YList are correct
Press ZOOM 9
Technology and Scatter Plots
We can also create a scatter plot on the calculator.
Make sure there are no functions in the grapher (press Y= tocheck)
Input the data in the usual way (we already have it there for thisexample)
Press 2nd and Y= to get into the STAT PLOT menu
Make sure only the plot we want is turned on
Select the first graph in the first row and then make sure theXList and YList are correct
Press ZOOM 9
Technology and Scatter Plots
We can also create a scatter plot on the calculator.
Make sure there are no functions in the grapher (press Y= tocheck)
Input the data in the usual way (we already have it there for thisexample)
Press 2nd and Y= to get into the STAT PLOT menu
Make sure only the plot we want is turned on
Select the first graph in the first row and then make sure theXList and YList are correct
Press ZOOM 9
One More Example
ExampleThere is some evidence that drinking moderate amounts of wine helpsprevent heart attacks. The accompanying table gives data on yearlywine consumption (in liters of alcohol from drinking wine per person)and yearly deaths from heart disease (per 100,000 people) in 19developing nations. Construct a scatter plot and describe what yousee.
Country Alcohol Deaths County Alcohol DeathsAustralia 2.5 211 Austria 3.9 167Belgium 2.9 131 Canada 2.4 191Denmark 2.9 220 Finland 0.8 297France 9.1 71 Iceland 0.8 211Ireland 0.7 300 Italy 7.9 107Netherlands 1.8 167 New Zealand 1.9 266Norway 0.8 227 Spain 6.5 86Sweden 1.6 207 Switzerland 5.8 115United Kingdom 1.3 285 United States 1.2 199West Germany 2.7 172
The Scatter Plot
Heart Disease v. Alcohol from Wine
Alcohol from Wine (in liters)
Dea
ths
(per
100,
000)
100
200
300
50
150
250
2 4 6 8
••
•
••
•
•
•
•
•
•
••
•
•
•
•
••
r = −.8428, strong negative correlation
The Scatter Plot
Heart Disease v. Alcohol from Wine
Alcohol from Wine (in liters)
Dea
ths
(per
100,
000)
100
200
300
50
150
250
2 4 6 8
••
•
••
•
•
•
•
•
•
••
•
•
•
•
••
r = −.8428, strong negative correlation
The Linear Regression Line
DefinitionA linear regression line is a straight line that describes how a responsevariable y changes as an explanatory variable x changes. We often usea regression line to predict the value of y for a given value of x.
Linear functions are of the form y = mx + b but we will considerthem as
∧y = b0 + b1x where b0 is the y-intercept and b1 is the slope.
The calculator actually uses the form∧y = a + bx so be careful.
The Linear Regression Line
DefinitionA linear regression line is a straight line that describes how a responsevariable y changes as an explanatory variable x changes. We often usea regression line to predict the value of y for a given value of x.
Linear functions are of the form y = mx + b but we will considerthem as
∧y = b0 + b1x where b0 is the y-intercept and b1 is the slope.
The calculator actually uses the form∧y = a + bx so be careful.
Formulas
What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.
b1 = rsy
sx
b0 = y− b1x
If the correlation coefficient is too small, there is no point in finding∧y
since b0 and b1 are both dependent on r.
Formulas
What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.
b1 = rsy
sx
b0 = y− b1x
If the correlation coefficient is too small, there is no point in finding∧y
since b0 and b1 are both dependent on r.
Formulas
What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.
b1 = rsy
sx
b0 = y− b1x
If the correlation coefficient is too small, there is no point in finding∧y
since b0 and b1 are both dependent on r.
Formulas
What we will be finding is the least squares regression line of y on x.This is the line that makes the sum of the squares of the verticaldistances of the data points from the line as small as possible.
b1 = rsy
sx
b0 = y− b1x
If the correlation coefficient is too small, there is no point in finding∧y
since b0 and b1 are both dependent on r.
Example Using Given Values
ExampleThe following list gives the power numbers for starting 9 Red Soxplayers for the 2007 season.
Name Homeruns RBIsJason Varitek 17 68
Kevin Youkilis 16 83Dustin Pedroia 8 50Mike Lowell 21 120Julio Lugo 8 73
Manny Ramirez 20 88Coco Crisp 6 60J.D. Drew 11 64
David Ortiz 35 117
We want to know if there the number of homeruns affects the numberof RBIs.
The Needed Values
We can find the mean and standard deviation of both sets of dataquickly using our technology.
Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47
And, since the data is already in the calculator, we can obtain thevalue of r.
Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47
r = .8463
The Needed Values
We can find the mean and standard deviation of both sets of dataquickly using our technology.
Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47
And, since the data is already in the calculator, we can obtain thevalue of r.
Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47
r = .8463
The Needed Values
We can find the mean and standard deviation of both sets of dataquickly using our technology.
Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47
And, since the data is already in the calculator, we can obtain thevalue of r.
Variable Mean Standard Deviationx 15.78 9.05y 80.33 24.47
r = .8463
The Correlation Coefficient
We can use these values to find the equation of the regression line.
b1 = rsy
sx
= .8463(
24.479.05
)= 2.29
b0 = y− b1x
= 80.33− 2.29(15.78)
= 44.19
So, the regression line is
∧y = 44.19 + 2.29x
The Correlation Coefficient
We can use these values to find the equation of the regression line.
b1 = rsy
sx
= .8463(
24.479.05
)= 2.29
b0 = y− b1x
= 80.33− 2.29(15.78)
= 44.19
So, the regression line is
∧y = 44.19 + 2.29x
The Correlation Coefficient
We can use these values to find the equation of the regression line.
b1 = rsy
sx
= .8463(
24.479.05
)= 2.29
b0 = y− b1x
= 80.33− 2.29(15.78)
= 44.19
So, the regression line is
∧y = 44.19 + 2.29x
The Correlation Coefficient
We can use these values to find the equation of the regression line.
b1 = rsy
sx
= .8463(
24.479.05
)= 2.29
b0 = y− b1x
= 80.33− 2.29(15.78)
= 44.19
So, the regression line is
∧y = 44.19 + 2.29x
Practical Interpretation
What do these coefficients mean in practical terms?
The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.
The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.
Note: In context, we may need to round to whole numbers for theanswers to make any sense.
Practical Interpretation
What do these coefficients mean in practical terms?
The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.
In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.
The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.
Note: In context, we may need to round to whole numbers for theanswers to make any sense.
Practical Interpretation
What do these coefficients mean in practical terms?
The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.
The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.
Note: In context, we may need to round to whole numbers for theanswers to make any sense.
Practical Interpretation
What do these coefficients mean in practical terms?
The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.
The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.
In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.
Note: In context, we may need to round to whole numbers for theanswers to make any sense.
Practical Interpretation
What do these coefficients mean in practical terms?
The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.
The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.
Note: In context, we may need to round to whole numbers for theanswers to make any sense.
Practical Interpretation
What do these coefficients mean in practical terms?
The slope tells us that a change in the explanatory variable byone unit will result in a change in the response variable by theamount and direction of the slope.In our example, the slope is b1 = 2.29, which tells us that forevery homerun hit, you’d expect to get an additional 2.29 RBIs.
The y-intercept tells us the value of the response variable whenthe explanatory variable is 0.In our example b0 = 44.19, which means that we expect a playerwho hits no homeruns to have 44.19 RBIs.
Note: In context, we may need to round to whole numbers for theanswers to make any sense.
The Scatter Plot
Let’s see how good the regression line is by plotting it over the scatterplot.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
To do so, we press Y= and put the line under Y1, then selectGRAPH
The Scatter Plot
Let’s see how good the regression line is by plotting it over the scatterplot.
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
To do so, we press Y= and put the line under Y1, then selectGRAPH
Plot and Line
And now with the regression line
∧y = 44.19 + 2.29x
2007 Red Sox Power Numbers
Home Runs
RB
Is
40
80
120
20
60
100
10 20 30
••
•
•
••
• •
•
Predictions
One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want
∧y) and we are given a value of x = 60.
Our prediction would be
∧y = 44.19 + 2.29(60) = 181.59
So, our prediction is 182 RBIs.
Predictions
One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want
∧y) and we are given a value of x = 60.
Our prediction would be
∧y = 44.19 + 2.29(60) = 181.59
So, our prediction is 182 RBIs.
Predictions
One use of the regression line is making predictions. Suppose wewanted to know about how many RBI we could expect a player tohave if they hit 60 home runs. We are looking to predict the value of y(so we want
∧y) and we are given a value of x = 60.
Our prediction would be
∧y = 44.19 + 2.29(60) = 181.59
So, our prediction is 182 RBIs.
Facts about Regression Lines
1 Distinction between explanatory variables is essential -remember the formulas ...
2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is
little correlation linearly but that the data has a strong correlationif we were using some other type of model.
5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.
Facts about Regression Lines
1 Distinction between explanatory variables is essential -remember the formulas ...
2 There is a close connection between slope and correlation
3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is
little correlation linearly but that the data has a strong correlationif we were using some other type of model.
5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.
Facts about Regression Lines
1 Distinction between explanatory variables is essential -remember the formulas ...
2 There is a close connection between slope and correlation3 (x, y) is always on the line
4 This only shows us the linear model; it is possible that there islittle correlation linearly but that the data has a strong correlationif we were using some other type of model.
5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.
Facts about Regression Lines
1 Distinction between explanatory variables is essential -remember the formulas ...
2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is
little correlation linearly but that the data has a strong correlationif we were using some other type of model.
5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.
Facts about Regression Lines
1 Distinction between explanatory variables is essential -remember the formulas ...
2 There is a close connection between slope and correlation3 (x, y) is always on the line4 This only shows us the linear model; it is possible that there is
little correlation linearly but that the data has a strong correlationif we were using some other type of model.
5 We will not always get perfect correlation (probably never) butwe need the line to be “straight enough” for the data to makesense. What that means is variable. Depending on the situation,r = .3 could be good enough; other times r = .8 would be aminimum.
Another Sox Example
Suppose we wanted to know if a player was expected to score moreruns if he got more hits. To answer this question, we will use theroster of the 2011 Boston Red Sox.
Name Runs HitsJarred Saltalamacchia 52 84Adrian Gonzalez 108 213Dustin Pedroia 102 195Marco Scutaro 59 118Kevin Youkilis 68 111Carl Crawford 65 129Jacoby Ellsbury 119 212J.D. Drew 23 55David Ortiz 84 162Jed Lowrie 40 78Josh Reddick 41 71Jason Varitek 32 49Darnell McDonald 26 37Mike Aviles 17 32Mike Cameron 9 14Drew Sutton 11 17Ryan Lavarnway 5 9Yamaico Navarro 6 8Conor Jackson 2 3Jose Iglesias 3 2Lars Anderson 2 0Joey Gathright 1 0
The Correlation Coefficient
The first thing we will do is find the correlation coefficient.
When we plug all of the data into our technology, we get r = .9942.
Interpretation?
There is a strong, positive correlation between hits and runs scored.
The Correlation Coefficient
The first thing we will do is find the correlation coefficient.
When we plug all of the data into our technology, we get r = .9942.
Interpretation?
There is a strong, positive correlation between hits and runs scored.
The Correlation Coefficient
The first thing we will do is find the correlation coefficient.
When we plug all of the data into our technology, we get r = .9942.
Interpretation?
There is a strong, positive correlation between hits and runs scored.
The Correlation Coefficient
The first thing we will do is find the correlation coefficient.
When we plug all of the data into our technology, we get r = .9942.
Interpretation?
There is a strong, positive correlation between hits and runs scored.
Producing the Scatter Plot
Now, let’s produce a scatter plot for the data.
2011 Red Sox
Hits
Run
s
40
80
120
20
60
100
50 100 150 200
•
••
•• •
•
•
•
••••
•••••••••
Producing the Scatter Plot
Now, let’s produce a scatter plot for the data.
2011 Red Sox
Hits
Run
s
40
80
120
20
60
100
50 100 150 200
•
••
•• •
•
•
•
••••
•••••••••
The Assumptions
The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.
Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.
Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?
Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?
Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?
The Assumptions
The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.
Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.
Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?
Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?
Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?
The Assumptions
The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.
Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.
Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?
Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?
Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?
The Assumptions
The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.
Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.
Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?
Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?
Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?
The Assumptions
The points give us a pretty good indication that there is a very strongpositive correlation here. Before we go on, we want to make sure allof the assumptions about regression lines are met.
Quantitative Variable ConditionIf either y or x is categorical, you cannot make a scatter plot andyou cannot perform a regression.
Straight Enough ConditionDoes the data look straight enough that we can see a linearrelationship in the data set?
Outlier ConditionAre there any outliers that dramatically influence the fit of theregression line?
Does the Plot Thicken ConditionDoes the spread of the data around the generally straightrelationship seem to be consistent for all values of x?
Linear Regression Line
Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have
∧y = b0 + b1x = 1.92 + .52x
What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.
What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.
Linear Regression Line
Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have
∧y = b0 + b1x = 1.92 + .52x
What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.
What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.
Linear Regression Line
Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have
∧y = b0 + b1x = 1.92 + .52x
What is the practical interpretation of the slope b1?
For each hit, we expect a player to score .52 additional runs.
What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.
Linear Regression Line
Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have
∧y = b0 + b1x = 1.92 + .52x
What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.
What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.
Linear Regression Line
Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have
∧y = b0 + b1x = 1.92 + .52x
What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.
What is the practical interpretation of the y-intercept b0?
If a player has no hits, we expect 1.92 runs to be scored.
Linear Regression Line
Since all of these are satisfied, we will continue on to find the formulaof the regression line. Using our technology, we have
∧y = b0 + b1x = 1.92 + .52x
What is the practical interpretation of the slope b1?For each hit, we expect a player to score .52 additional runs.
What is the practical interpretation of the y-intercept b0?If a player has no hits, we expect 1.92 runs to be scored.
Scatter Plot With Regression Line
2011 Red Sox
Hits
Run
s
40
80
120
20
60
100
50 100 150 200
•
••
•• •
•
•
•
•••••••••••••
So, when we plot the regression line over the scatter plot, we see thatthe line is a good fit.
Scatter Plot With Regression Line
2011 Red Sox
Hits
Run
s
40
80
120
20
60
100
50 100 150 200
•
••
•• •
•
•
•
•••••••••••••
So, when we plot the regression line over the scatter plot, we see thatthe line is a good fit.
Predictions
1 If a player got 200 hits, how many runs would we expect them tohave?
Here, we are given the x value and using our regression line, wefind the predicted value.
∧y = 1.92 + .52(200) ≈ 105.92
So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they
scored 120 runs?We are given the value of
∧y and want to find the value of x. So,
we use our algebra skills ...∧y = 1.92 + .52x
120 = 1.92 + .52x
118.08 = .52x
227.08 = x
We expect about 227 hits.
Predictions
1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.
∧y = 1.92 + .52(200) ≈ 105.92
So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they
scored 120 runs?We are given the value of
∧y and want to find the value of x. So,
we use our algebra skills ...∧y = 1.92 + .52x
120 = 1.92 + .52x
118.08 = .52x
227.08 = x
We expect about 227 hits.
Predictions
1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.
∧y = 1.92 + .52(200) ≈ 105.92
So, we’d expect about 106 runs for a player with 200 hits.
2 What if we wanted to know how many hits a player had if theyscored 120 runs?We are given the value of
∧y and want to find the value of x. So,
we use our algebra skills ...∧y = 1.92 + .52x
120 = 1.92 + .52x
118.08 = .52x
227.08 = x
We expect about 227 hits.
Predictions
1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.
∧y = 1.92 + .52(200) ≈ 105.92
So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they
scored 120 runs?
We are given the value of∧y and want to find the value of x. So,
we use our algebra skills ...∧y = 1.92 + .52x
120 = 1.92 + .52x
118.08 = .52x
227.08 = x
We expect about 227 hits.
Predictions
1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.
∧y = 1.92 + .52(200) ≈ 105.92
So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they
scored 120 runs?We are given the value of
∧y and want to find the value of x. So,
we use our algebra skills ...
∧y = 1.92 + .52x
120 = 1.92 + .52x
118.08 = .52x
227.08 = x
We expect about 227 hits.
Predictions
1 If a player got 200 hits, how many runs would we expect them tohave?Here, we are given the x value and using our regression line, wefind the predicted value.
∧y = 1.92 + .52(200) ≈ 105.92
So, we’d expect about 106 runs for a player with 200 hits.2 What if we wanted to know how many hits a player had if they
scored 120 runs?We are given the value of
∧y and want to find the value of x. So,
we use our algebra skills ...∧y = 1.92 + .52x
120 = 1.92 + .52x
118.08 = .52x
227.08 = x
We expect about 227 hits.
Important Points
A few important points to keep in mind1 An observation is influential for a statistical calculation if
removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.
2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not
among the explanatory or response variables but may influencethe interpretation of the relationship.
5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.
Important Points
A few important points to keep in mind1 An observation is influential for a statistical calculation if
removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.
2 Correlation and least squares regression lines are not resistant
3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not
among the explanatory or response variables but may influencethe interpretation of the relationship.
5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.
Important Points
A few important points to keep in mind1 An observation is influential for a statistical calculation if
removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.
2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.
4 There could be lurking variables. Those are ones that are notamong the explanatory or response variables but may influencethe interpretation of the relationship.
5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.
Important Points
A few important points to keep in mind1 An observation is influential for a statistical calculation if
removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.
2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not
among the explanatory or response variables but may influencethe interpretation of the relationship.
5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.
Important Points
A few important points to keep in mind1 An observation is influential for a statistical calculation if
removing it would markedly change the results of thecalculation. Point that are outliers in either the x or y directionare often influential points.
2 Correlation and least squares regression lines are not resistant3 They only describe linear relationships.4 There could be lurking variables. Those are ones that are not
among the explanatory or response variables but may influencethe interpretation of the relationship.
5 An association between an explanatory variable x and a responsevariable y, even if r is very strong, is not itself good evidence thatchanges in x actually cause changes in y. The phrase toremember is that correlation does not necessarily implycausation.
Example Where You Are Doing The Work
ExampleWe want to know know if there is a relationship between the score onthe math portion of the SAT exam and the number of hours studyingfor the test. The question is, “Does studying more increase the scoreon the exam?” The following data was taken from a study conductedof 20 students as they prepared and took the SAT exam.
Hours 4 9 10 14 4 7 12 22 1 3Score 390 580 650 730 410 530 600 790 350 400Hours 8 11 5 6 10 11 16 13 13 10Score 590 640 450 520 690 690 770 700 730 640
Variable Types
What is the response variable?
Math SAT score
What is the explanatory variable?Hours of study
Variable Types
What is the response variable?Math SAT score
What is the explanatory variable?Hours of study
Variable Types
What is the response variable?Math SAT score
What is the explanatory variable?
Hours of study
Variable Types
What is the response variable?Math SAT score
What is the explanatory variable?Hours of study
Correlation Coefficient
So let’s get first find the correlation coefficient to see what we aredealing with.
r = .9336
Our interpretation?
This tells us there is a strong positive correlation.
Correlation Coefficient
So let’s get first find the correlation coefficient to see what we aredealing with.
r = .9336
Our interpretation?
This tells us there is a strong positive correlation.
Correlation Coefficient
So let’s get first find the correlation coefficient to see what we aredealing with.
r = .9336
Our interpretation?
This tells us there is a strong positive correlation.
Is The Data Significant?
What is the inequality we are using?
r√
n > 3
Is this data significant?
r√
n = .9336√
20 ≈ 4.17 > 3
So, the data is significant based on this criteria.
Is The Data Significant?
What is the inequality we are using?
r√
n > 3
Is this data significant?
r√
n = .9336√
20 ≈ 4.17 > 3
So, the data is significant based on this criteria.
Is The Data Significant?
What is the inequality we are using?
r√
n > 3
Is this data significant?
r√
n = .9336√
20 ≈ 4.17 > 3
So, the data is significant based on this criteria.
Is The Data Significant?
What is the inequality we are using?
r√
n > 3
Is this data significant?
r√
n = .9336√
20 ≈ 4.17 > 3
So, the data is significant based on this criteria.
Visual Representation
Next, let’s produce our scatter plot so we can see what we are dealingwith.
Math SAT Score v. Hours of Study
Hours of Study
SAT
Scor
e
400
600
800
300
500
700
5 10 15 20
•
••
•
•
••
•
••
••
••
• ••
••
•
Visual Representation
Next, let’s produce our scatter plot so we can see what we are dealingwith.
Math SAT Score v. Hours of Study
Hours of Study
SAT
Scor
e
400
600
800
300
500
700
5 10 15 20
•
••
•
•
••
•
••
••
••
• ••
••
•
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable ConditionBoth variables are quantitative.
Straight Enough ConditionData looks reasonably straight.
Outlier ConditionThere do not seem to be any outliers.
Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable Condition
Both variables are quantitative.
Straight Enough ConditionData looks reasonably straight.
Outlier ConditionThere do not seem to be any outliers.
Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable ConditionBoth variables are quantitative.
Straight Enough ConditionData looks reasonably straight.
Outlier ConditionThere do not seem to be any outliers.
Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable ConditionBoth variables are quantitative.
Straight Enough Condition
Data looks reasonably straight.
Outlier ConditionThere do not seem to be any outliers.
Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable ConditionBoth variables are quantitative.
Straight Enough ConditionData looks reasonably straight.
Outlier ConditionThere do not seem to be any outliers.
Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable ConditionBoth variables are quantitative.
Straight Enough ConditionData looks reasonably straight.
Outlier Condition
There do not seem to be any outliers.
Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable ConditionBoth variables are quantitative.
Straight Enough ConditionData looks reasonably straight.
Outlier ConditionThere do not seem to be any outliers.
Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable ConditionBoth variables are quantitative.
Straight Enough ConditionData looks reasonably straight.
Outlier ConditionThere do not seem to be any outliers.
Does the Plot Thicken Condition
Pretty much - other than the one person who studied for 22hours, the relationship seems very strong.
Checking Conditions/Assumptions
We are feeling pretty good about this - it seems to have a strong,positive correlation. When we consider the conditions, are we stillhappy with this?
Quantitative Variable ConditionBoth variables are quantitative.
Straight Enough ConditionData looks reasonably straight.
Outlier ConditionThere do not seem to be any outliers.
Does the Plot Thicken ConditionPretty much - other than the one person who studied for 22hours, the relationship seems very strong.
∧y = b0 + b1x
Next, we find the equation of the linear regression line.
∧y = 353.16 + 25.33x
What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.
What is the label for the slope?Points per hour of study
What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.
∧y = b0 + b1x
Next, we find the equation of the linear regression line.
∧y = 353.16 + 25.33x
What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.
What is the label for the slope?Points per hour of study
What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.
∧y = b0 + b1x
Next, we find the equation of the linear regression line.
∧y = 353.16 + 25.33x
What is the practical interpretation of the slope b1?
For each hour of study, we expect the person to get an additional25.33 points on their score.
What is the label for the slope?Points per hour of study
What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.
∧y = b0 + b1x
Next, we find the equation of the linear regression line.
∧y = 353.16 + 25.33x
What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.
What is the label for the slope?Points per hour of study
What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.
∧y = b0 + b1x
Next, we find the equation of the linear regression line.
∧y = 353.16 + 25.33x
What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.
What is the label for the slope?
Points per hour of study
What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.
∧y = b0 + b1x
Next, we find the equation of the linear regression line.
∧y = 353.16 + 25.33x
What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.
What is the label for the slope?Points per hour of study
What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.
∧y = b0 + b1x
Next, we find the equation of the linear regression line.
∧y = 353.16 + 25.33x
What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.
What is the label for the slope?Points per hour of study
What is the practical interpretation of the y-intercept b0?
If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.
∧y = b0 + b1x
Next, we find the equation of the linear regression line.
∧y = 353.16 + 25.33x
What is the practical interpretation of the slope b1?For each hour of study, we expect the person to get an additional25.33 points on their score.
What is the label for the slope?Points per hour of study
What is the practical interpretation of the y-intercept b0?If a person does not study, we expect their score on the Mathportion of the SAT exam to be 353.16.
Scatter Plot With Regression Line
Math SAT Score v. Hours of Study
Hours of Study
SAT
Scor
e
400
600
800
300
500
700
5 10 15 20
•
••
•
•
••
•
••
••
••
• ••
•••
That data point where the person studied for 22 hours does look alittle sketchy, but it does not seem so far out of whack that it seems tobe an outlier.
Scatter Plot With Regression Line
Math SAT Score v. Hours of Study
Hours of Study
SAT
Scor
e
400
600
800
300
500
700
5 10 15 20
•
••
•
•
••
•
••
••
••
• ••
•••
That data point where the person studied for 22 hours does look alittle sketchy, but it does not seem so far out of whack that it seems tobe an outlier.
Predictions
So what score would we expect for a person who studied for 10 hours?
∧y = 353.16 + 25.33(10) = 606.46
So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.
If someone scored a 720, how many hours would we guess theystudied?
720 = 353.16 + 25.33x⇒ x = 14.48 hours
Predictions
So what score would we expect for a person who studied for 10 hours?
∧y = 353.16 + 25.33(10) = 606.46
So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.
If someone scored a 720, how many hours would we guess theystudied?
720 = 353.16 + 25.33x⇒ x = 14.48 hours
Predictions
So what score would we expect for a person who studied for 10 hours?
∧y = 353.16 + 25.33(10) = 606.46
So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.
If someone scored a 720, how many hours would we guess theystudied?
720 = 353.16 + 25.33x⇒ x = 14.48 hours
Predictions
So what score would we expect for a person who studied for 10 hours?
∧y = 353.16 + 25.33(10) = 606.46
So, since SAT scores are rounded to the nearest 10, we would expectabout a 610.
If someone scored a 720, how many hours would we guess theystudied?
720 = 353.16 + 25.33x⇒ x = 14.48 hours