CHAPTER 7 RELATIONSHIPS BETWEEN...

38
CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES The number of houses being built varies with time. Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Transcript of CHAPTER 7 RELATIONSHIPS BETWEEN...

Page 1: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

The number of houses being built varies with time.

HS_PS_S1_07_CO.indd 242HS_PS_S1_07_CO.indd 242 10/2/11 2:21 AM10/2/11 2:21 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 2: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

In This ChapterIn this chapter, you will learn how to identify and describe the correlation between two variables. You will also learn two methods for finding linear models for two-dimensional data and will see the usefulness and limitations of any model.

Topic List ► Chapter 7 Introduction

► Scatter Plots

► Association

► The Correlation Coefficient

► Fitting a Line to Data

► Least Squares Regression

► Regression Analysis

► Cautions in Statistics

► Chapter 7 Wrap-Up

The number of houses that are under construction in any town varies over time. Sometimes there are more homes being built, and at other times there are fewer. Statistical tools can help builders see trends and patterns so they can make good decisions.

RELATIONSHIPS BETWEEN VARIABLES 243

HS_PS_S1_07_CO.indd 243HS_PS_S1_07_CO.indd 243 10/2/11 2:22 AM10/2/11 2:22 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 3: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Year 2003 2004 2005 2006 2007 2008 2009 2010

Housing Starts (thousands) 1499 1611 1716 1465 1046 622 445 471

U.S. Housing Starts

200

400

600

800

1000

1200

1400

1600

1800

2000

2003 2004 2005 2006 2007 2008 2009 2010 2011Year

Hou

sing

sta

rts (t

hous

ands

)

Graphing two data sets can show trends and enable us to make predictions.

Determining the Association Between Two Data Sets

Sometimes two data sets can be paired to determine whether there is a relationship between two variables. Consider these two variables: single family housing starts and time.

A housing start is the beginning of construction on a new house on privately owned property. According to the National Association of Home Builders, construction began on about 1,499,000 single family homes during 2003. Data from 2003 to 2010 are shown in the table.

Chapter 7 Introduction

We can look at each pair of data as an ordered pair. So the ordered pair (2003, 1499) represents the number of housing starts (in thousands) for 2003 and can be plotted on the coordinate plane.

A scatter plot results when all pairs of data are plotted in a graph.

244 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_IN_RG.indd 244HS_PS_S1_07_IN_RG.indd 244 10/2/11 3:38 AM10/2/11 3:38 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 4: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Chapter 7 Introduction

U.S. Housing Starts

200

400

600

800

1000

1200

1400

1600

1800

2000

2003 2004 2005 2006 2007 2008 2009 2010 2011Year

Hou

sing

sta

rts (t

hous

ands

)

The points on the graph tell us that although the number of housing starts increased for the first couple of years, housing starts generally decreased during this time period. There could be a relationship between the year and the number of new housing starts for this time period.

Using a Line to Describe Data

When the pattern of the points on a graph appears to follow a line, a line (called a model) can be drawn to represent the pattern. Though there are many different models that can be used, a straight line has been used to represent the data here.

A line is useful for making predictions and estimates. For example, someone in 2010 might want to predict the number of housing starts for the next year based on the line. The line indicates that there should be about 200,000 housing starts in 2011, but this prediction could be inaccurate due to unforeseen circumstances.

Keep in mind that estimates based on data and statistics are subject to error. For example, no one can be 100% certain about how many housing starts there will be from one year to the next.

Nevertheless the use of models to represent data continues to play a significant role in fields such as economics and the social sciences.

In this chapter, you will learn how to graph two sets of data on a scatter plot and determine the level of association between the data sets. You will learn how to fi t a line to data and determine the linear regression line. You will then learn how to determine whether a line is the appropriate way to model the data. And you will learn how to decide whether a graph or statistical claim is incorrect or misleading.

Applying It

CHAPTER 7 INTRODUCTION 245

HS_PS_S1_07_IN_RG.indd 245HS_PS_S1_07_IN_RG.indd 245 10/2/11 3:38 AM10/2/11 3:38 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 5: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

6

1

1

2

2

3

3

4

4 5 6

5

−2−3−4

−1−1−2−3−4

y

a b

c

d

x

Review the following skills to prepare for the concepts in Chapter 7.► Determine the slope of a line from its graph.

► Find the equation of a line through two points.

► Find the vertical distance between two points on a coordinate plane.

► Find the slope of a line, given the equation of the line.

► Given the equation of a line, find the value of one variable when the other is unknown.

Preparing for the Chapter

Problem Set

Match the line with its slope, m.

A. m = − �1 _ 3 B. m = �1 _ 3 C. m = −3 D. m = 3

5. (4, 3) and (0, 7)

6. (14, 5) and (8, 11)

7. (1956, 39) and (1968, 81)

8. (0, 11) and (15, 25)

9. (51, 11) and (60, 38)

10. (2004, 315) and (1997, 322)

Find the equation of the line that contains the two points.

1. line a 2. line b 3. line c 4. line d

246 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_IN_PS.indd 246HS_PS_S1_07_IN_PS.indd 246 10/2/11 3:36 AM10/2/11 3:36 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 6: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Chapter 7 Introduction

Rural35%

Survey A

Urban40%

Suburban25%

60

50

40

30

20

10

Urban Suburbansetting

Rural0

Per

cent

Survey B

Find the distance between the two points.

11. (6, 18) and (6, 7)

12. (19.6, 5.7) and (19.6, 8.3)

13. (2003, 47) and (2003, 25)

14. (1948, 102.06) and (1948, 98.85)

15. y = 2x + 3

16. y + 0.55x = 6

17. y = 15 − 2.3x

18. y − 2.3x = 44

Find the slope of the line with the given equation.

Find the value of y for the given value of x.

19. y = 11.5x + 30; x = 4

20. y = 11 − 6.3x; x = 0.12

21. y = 11.5x + 30; x = 0.6

22. y = 11 − 6.3x; x = 71.4

Find the value of x for the given value of y.

23. y = 4.2x − 7; y = 9

24. y = −0.12x + 3; y = 3.25

25. y = 4.2x − 7; y = 31

26. y = −0.12x + 3; y = 2.3

The graphs show the results of two different surveys in which people were asked in what type of setting they prefer to live.

27. According to Survey A, what is the most popular setting? According to Survey B, what is the most popular setting?

28. If 200 people participated in Survey A and 230 in Survey B, how many actual individuals in each survey chose a rural setting?

29. Create a circle graph that shows the combined results of both surveys.

CHAPTER 7 INTRODUCTION 247

HS_PS_S1_07_IN_PS.indd 247HS_PS_S1_07_IN_PS.indd 247 10/2/11 3:37 AM10/2/11 3:37 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 7: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Scatter PlotsA scatter plot reveals patterns in a set of bivariate data.

Bivariate Data

States collect income for services through taxes. Two kinds of taxes that states use are sales tax and income tax. Many states have both, while others have one but not the other.

The table represents a set of bivariate data consisting of 2 variables and represented by a set of 11 ordered pairs. These ordered pairs show sales tax rates and highest income tax rates (in 2010) for a sample of populous states (more than 5 million residents).

For example, Georgia is represented by the ordered pair (6, 4), which means that in 2010, the highest income tax rate in that state was 6% while the sales tax rate was 4%.

Though a table is an effective way to organize a data set, there are also graphical ways to display data.

THINK ABOUT ITBivariate data is sometimes called paired data.

Tax Rates (2010)

Income tax (%) Sales tax (%)

CA 10.3 8.25TX 0 6.25IN 3.4 7

MA 5.3 6.25IL 5 6.25PA 3.07 6

OH 5.925 5.5MD 5.5 6GA 6 4VA 5.75 5FL 0 6

248 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_T01_RG.indd 248HS_PS_S1_07_T01_RG.indd 248 10/2/11 4:24 AM10/2/11 4:24 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 8: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Displaying Bivariate Data

A scatter plot is a graph that displays a set of bivariate data. Look at this scatter plot that represents the data set in the table.

The scatter plot makes it easier to identify patterns that the table alone can’t reveal. For example, we can see that Texas and Florida have close to the same sales tax rate, but no income tax (as of the tax year 2010). We can also see that California is an outlier, with greater sales tax and highest income tax rates than any other state in the sample for 2010.

In the scatter plot, there is a grouping of six states (Illinois, Massachusetts, Maryland, Ohio, Virginia, and Georgia) where sales tax rates and highest income tax rates are similar. Such a grouping is called a cluster and is revealed clearly in the scatter plot.

Creating a Scatter Plot

Creating a scatter plot is really just a matter of plotting points on the coordinate plane. However, it is important to consider appropriate scales for the horizontal and vertical axes. Let’s create a scatter plot for a similar set of bivariate data.

How do populous states compare to lower population states in terms of sales and income tax rates? A scatter plot is a good way to compare.

The table shows sales tax rates and highest income tax rates (in 2010) for a sample of low population states (fewer than 1 million residents).

Sales tax rates range from 0% to 9.9%, while highest income tax rates range from 0% to 7%. So, it makes sense to have a range of 0 to 8 for the vertical axis, and 0 to 11 for the horizontal axis.

Tax Rates (2010)

Income tax (%) Sales tax (%)

SD 0 4VT 9.5 6ND 5.54 5AK 0 0WY 0 4DE 5.5 0MT 6.9 0RI 9.9 7

NH 5 0ME 8.5 5

THINK ABOUT ITClusters are where data points have many neighbors. Outliers are where data points have no neighbors.

The scatter plot reveals that this sample of lower population states has more states with either no sales tax, no income tax, or both. Also there are no clusters of data.

2010 Tax Rates for Low Population States

MT

ME

VT

RI

WY and SD

DENH

ND

AK1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11Highest income tax rate (%)

Sal

es ta

x ra

te (%

) 2010 Tax Rates for High Population States

CA

TX

IN

MAILPA MD

GA

VAFL

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9 10 11Highest income tax rate (%)

Sal

es ta

x ra

te (%

)

OH

SCATTER PLOTS 249

HS_PS_S1_07_T01_RG.indd 249HS_PS_S1_07_T01_RG.indd 249 10/2/11 4:24 AM10/2/11 4:24 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 9: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

250 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

Problem Set

The scatter plot shows the number of chin-ups and push-ups completed by a sample of fourth grade students.

1. How many push-ups did Rishab do?

2. How many chin-ups did Samuel, Gao, Mike, and Alex do together?

3. What is the difference between the number of push-ups Amira and Jared did?

4. Which students did more push-ups than chin-ups?

5. What is the range of the number of push-ups for all students?

6. What is the median number of chin-ups for all students?

7. What is the mean number of push-ups for all students?

TonyaAlex

Jared

RishabGaoRiki

AnneAmiraSamuelLilyAbduMike

Label

BC

A

DEF

GHIJKL

Student Name LabelStudent Name

K

B

J

L D

H

G

C

F

I

AE

Chin-ups

Number of Chin-ups and Push-upsP

ush-

ups

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 9 10

HS_PS_S1_07_T01_PS.indd 250HS_PS_S1_07_T01_PS.indd 250 10/2/11 4:27 AM10/2/11 4:27 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 10: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

SCATTER PLOTS 251

Create a scatter plot for the set of data. Determine whether there are any clusters or outliers.

15. protein and fat grams for nuts (Use the Nutrition data set on p. A-11.)

16. number of mobile phones per 100 people in Guyana and Ecuador (Use the Mobile Phone data set on p. A-9.)

The scatter plot shows temperatures, taken at the same time but at different elevations on a mountain. Temperatures are measured to the nearest degree Fahrenheit, and elevation is measured to the nearest 10 meters.

8. What temperature was measured at 160 meters?

9. At what elevations was a temperature of 20°F taken?

10. What is the mean temperature at 30 meters?

11. What is the mean temperature at elevations above 300 meters?

12. What is the median temperature for all temperatures?

13. At what elevation do you fi nd a possible outlier? What is the temperature at this elevation? Why is it an outlier?

14. At what range of elevations do you fi nd clusters? What is the approximate range of temperatures at these elevations?

10

20

30

Tem

pera

ture

(°F)

Mountain Temperatures

40

50

60

70

50 100 150 200Elevation (m)

250 300 350 400

HS_PS_S1_07_T01_PS.indd 251HS_PS_S1_07_T01_PS.indd 251 10/2/11 4:27 AM10/2/11 4:27 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 11: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

AssociationThe points on a scatter plot often display noticeable patterns.

Direction of Association

The price of gasoline is subject to many fluctuations due to many factors ranging from the time of year to political developments around the world.

The top scatter plot shows the average price for a gallon of gasoline in nine different states in mid-May 2010 and mid-May 2011.

Looking at the scatter plot, you can see a trend between these two variables: The greater the average prices of gas in mid-May 2010, the greater the average price of gas in mid-May 2011. This is an example of a positive association, where the points in a scatter plot increase from left to right.

The bottom scatter plot shows average gas price per gallon and average number of miles driven per day.

This scatter plot shows another trend: As average gas prices increase, the average number of miles driven per day decreases. This is an example of a negative association, where points in a scatter plot decrease from left to right.

THINK ABOUT ITIf the points in a scatter plot go down to the right, then there is a negative association between the variables.

THINK ABOUT ITIf the points in a scatter plot go up to the right, then there is a positive association between the variables.

Average Gas Pricesper Gallon

3.8

3.9

4.0

4.1

4.2

4.3

2.7 2.8 2.9 3.0 3.1 3.2May 2010 (dollars)

May

201

1 (d

olla

rs)

Average Gas Priceand Driving Distance

10

20

30

40

50

60

2.7 2.8 2.9 3.0 3.1 3.2Price per gallon (dollars)

Dai

ly d

rivin

g di

stan

ce (m

iles)

252 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_T02_RG.indd 252HS_PS_S1_07_T02_RG.indd 252 10/2/11 4:35 AM10/2/11 4:35 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 12: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Strength of Association

Some scatter plots show a stronger association between two variables than others. In general, the more closely the data points fit a straight line pattern, the stronger is the association between the two variables.

Determining the strength of association is often a matter of judgment. For example, the variables in Figure 1 are positively associated but do not show as strong of an association as the do the variables in Figure 2. For this reason, Figure 1 shows a moderate positive association, and Figure 2 shows a strong negative association.

Because there is no apparent pattern in Figure 3, the variables have no association. Because all of the data points in Figure 4 fall on the same line, there is a perfect positive association between the variables.

These following phrases, along with the direction (positive or negative), are typically used to describe the strength of association between two variables in a scatter plot:

• No association

• Weak association

• Moderate association

• Strong association

• Perfect association

REMEMBERScatter plots make it easy to describe how two variables are associated.

Moderate Positive Association

Strong Negative Association

Weak or No AssociationPerfect Positive Association

Varia

ble

Y

Variable X

Figure 1 Figure 2

Figure 4Figure 3

Varia

ble

Y

Variable X

Varia

ble

Y

Variable X

Varia

ble

Y

Variable X

ASSOCIATION 253

HS_PS_S1_07_T02_RG.indd 253HS_PS_S1_07_T02_RG.indd 253 10/2/11 4:36 AM10/2/11 4:36 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 13: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

254 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

Problem Set

1.

2.

3.

4.

5.

6.

7.

8.

Determine whether there is a positive association, negative association, or no association between variable X and variable Y. If there is an association, then describe it as weak, moderate, strong, or perfect.

Varia

ble

Y

Variable X

123456789

10

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

123456789

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

123456789

10

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

12345678

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

123456789

10

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

12345678

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

123456789

10

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

123456

1 2 3 4 5 6 7 8 9 10

HS_PS_S1_07_T02_PS.indd 254HS_PS_S1_07_T02_PS.indd 254 10/2/11 4:38 AM10/2/11 4:38 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 14: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

ASSOCIATION 255

Determine whether there is a positive association or negative association between variable X and variable Y.

9. Variable X: minutes of exercise; Variable Y: heart rate

10. Variable X: hours worked; Variable Y: dollars earned

11. Variable X: family size; Variable Y: amount of recycling produced

12. Variable X: age of car; Variable Y: resale value

13. Variable X: number of sleep hours; Variable Y: number of awake hours

14. Variable X: time studying; Variable Y: time watching TV

Create a scatter plot for the set of data. Determine whether there is a positive association, negative association, or no association between the variables. If there is an association, then describe it as weak, moderate, or strong.

15. number of calories and grams of fat for cereals (Use the Nutrition data set on p. A-11.)

16. sugar consumption from 1968 to 2004 in Turkey and the United Kingdom (Use the Sugar Consumption data set on p. A-10.)

The table shows scores that eight students earned on four different quizzes. Use scatter plots to solve.

17. Describe the association between scores on Quiz 1 and scores on Quiz 2.

18. Describe the association between scores on Quiz 1 and scores on Quiz 4.

19. On which quizzes are scores negatively associated with scores on Quiz 3?

20. On which two quizzes are scores perfectly associated?

Daniel Tori Ela Jordan Eun Mi Kenneth Barry Amelia

Quiz 1 10 7 6 4 6 7 6 9

Quiz 2 9 8 7 5 7 7 6 8

Quiz 3 4 3 4 8 5 4 6 6

Quiz 4 5 4 5 9 6 5 7 7

HS_PS_S1_07_T02_PS.indd 255HS_PS_S1_07_T02_PS.indd 255 10/2/11 4:38 AM10/2/11 4:38 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 15: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

The Correlation CoefficientA single number is used to describe the association between two variables.

Understanding the Correlation Coefficient

The direction and strength of the association between two variables can be represented by a single number r called the correlation coefficient.

The correlation coefficient, written as r, describes the strength and direction of the association between two variables. Values of r range from 1 (perfect positive correlation) to 0 (no correlation) to −1 (perfect negative correlation).

THE CORRELATION COEFFICIENT

Calculating r can take a long time; however, most graphing calculators and spreadsheets can be used for quick calculations of r.

Interpreting the Correlation Coefficient

The scatter plot shows the relationship in Argentina between the number of personal computers owned per 100 residents and the number of mobile phones owned per 100 residents from 1998 to 2005.

Mobile Phones and PCs in Argentina (1998–2005)

Num

ber o

f PC

s (p

er 1

00)

Number of mobile phones (per 100)5 10 15 20 25 30 35 40 45 50 55 60 65

2

4

6

8

10 Mobile Phones PCs

7 511 618 718 817 821 835 857 9

REMEMBERWhen data points in a scatter plot increase from left to right, this trend indicates a positive association. When the data points decrease from left to right, this trend indicates a negative association.

REMEMBERThe more closely the data points fit a straight line pattern, the stronger the association is between the two variables.

256 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_T03_RG.indd 256HS_PS_S1_07_T03_RG.indd 256 10/2/11 4:42 AM10/2/11 4:42 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 16: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

The correlation coefficient is about r = 0.76, which can be verified using technology. The correlation coefficient has a value close to 1 and is positive, indicating a strong positive association. The scatter plot also confirms a strong positive association between the variables.

This scatter plot shows the average number of grams of sugar consumed per person each day in Finland and Portugal from 1968 to 2004.

Average Sugar Consumption (per person)

Por

tuga

l (gr

ams

per d

ay)

Finland (grams per day)85 90 95 100 105 110 115 120 125 130 135 140

60

70

80

90

100 Finland Portugal

126 66134 74121 68104 74 96 66104 77118 85101 88104 93 93 93

The correlation coefficient is about r = −0.44, which can be verified using technology. The correlation coefficient is negative and has a value between −1 and 0, indicating a moderate negative association, which can also be confirmed by looking at the scatter plot.

The value of r alone does not always give an accurate description of the true strength of association between two variables. For example, consider a correlation coefficient of r = 0.65. This value might reflect a moderate association for one set of paired data but could reflect a strong association for another set of paired data. It depends on the situation under study.

However, a rule of thumb can be used to describe the strength of association between two variables, based on the value of r. The association between two variables is

• Weak if 0 < |r| < 0.3

• Moderate if 0.3 ≤ |r| < 0.7

• Strong if 0.7 ≤ |r| < 1

For example, consider a scatter plot that has a correlation coefficient of −0.25. Because |−0.25| is between 0 and 0.3, then the association between the two variables in the scatter plot would be considered weak.

THINK ABOUT ITA correlation coefficient of −0.5 indicates the same strength of association between two variables as a coefficient of 0.5.

THE CORRELATION COEFFICIENT 257

HS_PS_S1_07_T03_RG.indd 257HS_PS_S1_07_T03_RG.indd 257 10/2/11 4:42 AM10/2/11 4:42 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 17: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

258 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

Varia

ble

Y

Variable X

123456789

10

1 2 3 4 5 6 7 8 9 10

Varia

ble

YVariable X

123456789

10

1 2 3 4 5 6 7 8 9 10Va

riabl

e Y

Variable X

123456

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

123456789

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

12345678

1 2 3 4 5 6 7 8 9 10

Varia

ble

Y

Variable X

123456789

10

1 2 3 4 5 6 7 8 9 10

Problem Set

Use estimation to match the correlation coefficient with the scatter plot.

1. r = 0

2. r = 0.8

3. r = −0.7

4. r = −0.5

5. r = −0.9

6. r = 1

Draw a scatter plot (with 10 data points) that approximates the correlation coefficient. Describe the strength and direction of the association between the variables.

7. r = 0.95

8. r = −0.2

9. r = 0.6

10. r = −1

In a certain region, the amount of recycling produced by a household and the number of members in the household are correlated with r = 0.85.

11. According to the correlation coefficient, how is the amount of recycling expected to change as the number of members in the household increases?

12. Is it correct to say that the amount of recycling produced by a household is caused by the number of members in the household? Explain.

13. Suppose r = −0.85. Describe how the amount of recycling is expected to change as the number of members in the household increases.

A.

B.

C.

D.

E.

F.

HS_PS_S1_07_T03_PS.indd 258HS_PS_S1_07_T03_PS.indd 258 10/5/11 8:08 PM10/5/11 8:08 PM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 18: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

THE CORRELATION COEFFICIENT 259

Use technology to determine the correlation coefficient between the variables. Use the correlation coefficient to describe the strength of the relationship between the variables.

14. grams of protein and fat for cheeses (Use the Nutrition data set on p. A-11.)

15. grams of carbohydrates and fat for cheeses (Use the Nutrition data set on p. A-11.)

16. sugar consumption from 1968 to 2004 in Turkey and the United Kingdom (Use the Sugar Consumption data set on p. A-10.)

17. sugar consumption from 1968 to 2004 in Spain and Belgium (Use the Sugar Consumption data set on p. A-10.)

The data show the number of home runs and walks for players on the Oakland Athletics who had more than 250 at bats and fewer than 150 at bats during the 2010 baseball season. The data are presented as ordered pairs: (home runs, walks).

More than 250 at bats: (13, 33), (10, 110), (5, 40), (6, 50), (16, 24), (1, 17), (5, 26), (1, 24), (13, 68), (8, 30), (7, 19)

Fewer than 150 at bats: (2, 15), (1, 8), (4, 7), (2, 5), (4, 2), (3, 7), (1, 11), (1, 4), (2, 7), (1, 4), (1, 4), (0, 5), (1, 2), (1, 3), (0, 2)

18. Use technology to determine the correlation coefficient of home runs and walks for players who had more than 250 at bats. Use the correlation coefficient to describe the relationship between the number of home runs and walks for players who had more than 250 at bats.

19. Use technology to determine the coefficient of home runs and walks for players who had fewer than 150 at bats. Use the correlation coefficient to describe the relationship between the number of home runs and walks for players who had fewer than 150 at bats.

20. Challenge Are home runs and walks more strongly correlated for players with more at bats? Why would this be so?

HS_PS_S1_07_T03_PS.indd 259HS_PS_S1_07_T03_PS.indd 259 10/2/11 4:47 AM10/2/11 4:47 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 19: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Fitting a Line to DataA line can be used to summarize a linear trend in a scatter plot.

Drawing a Regression Line

When it appears that the points in a scatter plot reasonably fit a straight line pattern, then a regression line can be drawn through the points to summarize the pattern.

The scatter plot shows the 2010 median weekly earnings of U.S. citizens with various levels of education, as reported by the Bureau of Labor Statistics. Since the points appear to follow a straight line pattern, a regression line has been drawn through the points, summarizing the pattern.

Using a line to represent data in this way can be useful for estimating values of the response variable based on values of the explanatory variable. Using the regression line, the response variable (median weekly income) can be estimated based on values of the explanatory variable (years of education).

Finding the Equation of a Regression Line

Once a regression line has been drawn, we can find the equation of the line using algebra. Values of the explanatory variable can then be substituted into the equation to make predictions.

2010 Education and Median Weekly Income

Med

ian

wee

kly

inco

me

(dol

lars

)

Education (years)9 10 11 12 13 14 15 16 17 18 19 20 21

200

400

600

800

1000

1200

1400

1600

1800 THINK ABOUT ITAccording to the regression line drawn, what is the approximate median weekly income for a person with 17 years of education?

260 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_T04_RG.indd 260HS_PS_S1_07_T04_RG.indd 260 10/2/11 5:07 AM10/2/11 5:07 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 20: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Let’s find the equation of the regression line for the scatter plot. You can choose the points (12, 600) and (20, 1600), since they appear to be closest to the line.

The slope of the line is m = 1600 − 600 __________ 20 − 12 = 1000 _____ 8 = 125.

y − y1 = m(x − x1) Point slope formula

y − 1600 = 125(x − 20) Substitute slope and one point on the line

y − 125x = 900 Equation in y = ax + b form

We will now use the notation ŷ (read “y hat”) in the equation of the regression line to remind us that ŷ is a predicted value of y, based on a certain value of x. So the equation of the regression line would be written as y = 125x − 900.

Making Predictions

The equation of the regression line can be used to estimate, or predict, values of the response variable based on values of the explanatory variable. Suppose we want to estimate the median weekly income for a person with 17 years of education.

For a person with 17 years of education, the value of the explanatory variable is x = 17. The predicted value is y = 125(17) − 900 = 1225. So a person with 17 years of education would have a predicted median weekly income of about $1225.

The Slope of a Regression Line

The equation of the regression line, y = 125x − 900, describes the trend of the data in the scatter plot. The slope gives the change in the response variable for each unit increase in the explanatory variable. The slope of 125 means that a person can expect to earn an additional $125 per week for each additional year of education achieved.

In general, the slope of a regression line specifies the amount of change in the response variable that accompanies one unit of change in the explanatory variable.

THINK ABOUT ITThe model can only predict for a certain range of values of the explanatory variable. For example, why doesn’t the predicted median weekly income for someone with 6 years of education make sense?

TIPThe equation of the regression line is sometimes called a model. The model tells us how the response variable changes as the explanatory variable changes.

When a regression line has been drawn to summarize data in a scatter plot, the following steps can be used to fi nd its equation.Step 1 Choose two points that appear to be closest to the line drawn.

Step 2 Determine the equation of the line through the two points using algebra.

FINDING THE EQUATION OF A REGRESSION LINE

FITTING A LINE TO DATA 261

HS_PS_S1_07_T04_RG.indd 261HS_PS_S1_07_T04_RG.indd 261 10/2/11 5:07 AM10/2/11 5:07 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 21: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

262 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

123456789

10

1 2 3 4 5 6 7 8

123456789

10

1 2 3 4 5 6 7 8 9 10 11

56789

1011

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

Carbon Dioxide Emissions

Miles per gallon (city)

Car

bon

diox

ide

emis

sion

s (to

ns)

mpg (city)4134302322232020191816

CO2 emissions4.85.86.27.27.57.58.18.58.99.39.8

Problem Set

Determine whether Scatter Plot A or Scatter Plot B is described.

Scatter Plot A Scatter Plot B

1. The line of best fi t has a negative slope.

2. The line of best fi t has a positive y-intercept.

3. The line of best fi t has a negative y-intercept.

4. The line of best fi t has a positive slope.

The data in the scatter plot represent carbon dioxide emissions from a sample of vehicles built in 2011.

5. What are the explanatory and response variables shown in the scatter plot? Explain.

6. Use the two data points closest to the line to write an equation for the regression line. Write the equation in the form ŷ = ax + b. Give the slope and y-intercept.

7. For each additional mile per gallon in the city that a car gets, what, according to the equation obtained in Problem 6, would be the effect on carbon dioxide emissions in a year?

8. If a car tested at 25 city miles per gallon, what, according to the equation obtained in Problem 6, would be the estimated number of tons of carbon dioxide emissions in a year?

HS_PS_S1_07_T04_PS.indd 262HS_PS_S1_07_T04_PS.indd 262 10/2/11 4:48 AM10/2/11 4:48 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 22: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

FITTING A LINE TO DATA 263

123456789

10

1 2 3 4 5 6 7

12345678

1 2 3 4 5 6 7 8 9 10

70

90

100

1968 1972 1976Year

Sugar Consumption in Portugal

Gra

ms

of s

ugar

(day

)

1980 1984 1988 1992 1996 2000 2004 2008

Explain why a line would not be used to represent the data shown in the scatter plot.

The scatter plot represents the Sugar Consumption data set on p. A-10.

11. What are the explanatory and response variables shown in the scatter plot? Explain.

12. Suppose a regression line is drawn through the data points. The data points for 1980 and 2004 appear to be closest to the line. Write an equation for the regression line. Write the equation in the form ŷ = ax + b. Give the values of a and b to two decimal places.

13. For each additional year, what is the change in the number of grams of sugar consumed? Use the equation obtained in Problem 12.

14. In 2025, what would be the average daily number of grams of sugar consumed per person in Portugal? Use the equation obtained in Problem 12.

15. Suppose another regression line is drawn through the data points. The data for 1968 and 2004 appear to be closest to this line. Write an equation for the regression line. What would be the average daily number of grams of sugar consumed per person in Portugal in 2025, according to the equation? Compare this to the estimate obtained in Problem 14.

16. Challenge For each additional month, how does the number of grams of sugar consumed change? Use the equation obtained in Problem 15.

9. 10.

HS_PS_S1_07_T04_PS.indd 263HS_PS_S1_07_T04_PS.indd 263 10/2/11 4:48 AM10/2/11 4:48 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 23: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Least Squares RegressionThe least squares regression equation is called the line of best fit.

Least Squares Regression Line

Drawing a regression line to fit a scatter plot does not always give consistent results. In fact, it is possible that two slightly different regression lines could be drawn on the same scatter plot yielding vastly different predictions for the same value. A method for finding a regression line that does not depend on guessing is called least squares regression.

The least squares regression line is the line that makes the sum of the squares of the vertical distances from each data point to the line as small as possible.

For this scatter plot with six data points, the goal would be to find the line that minimizes S—the sum of the squared distances to the line. Once this is achieved, a least squares regression line has been found and represents what statisticians call the line of best fit.

Finding the Least Squares Regression Equation

Without the aid of technology, finding the equation of the least squares regression equation for a data set can be time consuming. Fortunately, most graphing calculators and spreadsheets allow you to calculate the least squares regression equation once the paired data has been entered. Here is an example.

As digital music became more popular after 2000, sales of music CDs in the United States declined. The table represents annual CD sales (in millions) from 2000 to 2009.

Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

CD Sales (millions) 942.5 881.9 803.3 746 767 705.4 619.7 511.1 368.4 292.9

d1

d2

d3

d4

d5

d6

Least Squares Regression Line

S = d12 + d2

2 + d32 + d4

2 + d52 + d6

2

Explanatory variable

Res

pons

e va

riabl

e

264 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_T05_RG.indd 264HS_PS_S1_07_T05_RG.indd 264 10/2/11 5:14 AM10/2/11 5:14 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 24: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

To find the equation of the least squares regression line using technology, data from each variable should be entered as lists into spreadsheet or graphing calculator. Here is what typical input/output will look like when using a graphing calculator.Input: Input data into two list.

L12000200120022003200420052006200720082009

L2942.5881.9803.3746767705.4619.7511.1368.4292.9

L3 LinReg

y = ax + b

a = –68.74181818

b = 138456.7945

r2 = .9418242215

r = –.9704762859

Output: Calculate the linear regression equation.

1998 2000 2002 2004 2006 2008 2010

200

400

600

800

1000

1200

Year

Sal

es (m

illio

ns)

U.S. Sales of CDs

ŷ = –68.7x + 138457

Using the values of a and b from the output, the least squares regression equation for these data is y = −68.7x + 138457 (rounded). The correlation coefficient of r = −0.97 (rounded) indicates a strong negative association between the variables, which is verified by the scatter plot.

Coefficient of DeterminationThe coefficient of determination is the square of the correlation coefficient, and is written r2. When r2 is written as a percent, it represents the percent of variance (or change) of the response variable that is due to changes in the explanatory variable. In general, variance is a measure of variability of a data set relative to its mean.

For the U.S. Sales of CDs scatter plot, the coefficient of determination is r2 = 0.94 (rounded). This means that about 94% of the variance in CD sales is associated with changes in the year. The other 6% of variability is due to other factors.

Q & A

Q What would be the model estimate for CD sales in 1999?

A The model predicts approximately 1,125,500,000 CD sales in 1999.

REMEMBERThe correlation coefficient describes the strength and direction of association of two data sets. The coefficient of determination describes the percent of variation in y associated with changes in x.

LEAST SQUARES REGRESSION 265

HS_PS_S1_07_T05_RG.indd 265HS_PS_S1_07_T05_RG.indd 265 10/2/11 5:14 AM10/2/11 5:14 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 25: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

266 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

Problem Set

Identify the equation of the least squares regression line that most closely matches the data set.

1. 2. 3. 4.

A. y = 0.17x − 13.2 B. y = 5.5x + 52 C. y = 3.5x − 6921.2 D. y = −2.2x + 13.8

5. r = 0.5

6. r = −0.5

7. r = 1

8. r = −0.35

Use the correlation coefficient to find the coefficient of determination. What percent of variation in the response variable can be explained by changes in the explanatory variable? Explain.

The data in the table represent the fuel economy (miles per gallon) for a car at different speeds (miles per hour).

9. Make a scatter plot of the data and determine the equation of the least squares regression line using technology. Draw the graph of the least squares regression line on the scatter plot.

10. According to the least square regression equation found in Problem 9, what would be the effect on fuel economy for each additional mile per hour?

11. Find the value of the coeffi cient of determination. What percent of the variation in speed is associated with differences in fuel economy?

12. According to the least squares regression equation, what is the fuel economy for this vehicle at 65 miles per hour?

x y

1 11

2 9

3 9

4 5

5 2

x y

2 65

4 74

6 81

8 96

10 109

Fuel Economy

mph mpg

5 12

10 17

15 21

20 22

25 27

30 26

35 28

40 27

45 28

50 31

55 32

x y

100 4

120 7

140 11

160 13

180 18

x y

1990 45

1992 51

1994 57

1996 61

1998 75

HS_PS_S1_07_T05_PS.indd 266HS_PS_S1_07_T05_PS.indd 266 10/5/11 8:14 PM10/5/11 8:14 PM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 26: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

LEAST SQUARES REGRESSION 267

13. Make a scatter plot of the data and determine the equation of the least squares regression line using technology. Draw the graph of the least squares regression line on the scatter plot.

14. According to the least squares regression equation, what would be the effect on record times for this event for each additional year?

15. Find the value of the coeffi cient of determination. What percent of the variation in winning times is associated with differences in the year?

16. Use the least squares regression equation to predict the winning time for the Olympic 100-meter women’s butterfl y event in the year 2024.

17. According to the least squares regression equation, during which Olympic year will the winning time break 50 seconds?

18. According to the least squares regression equation, predict the winning time for the Olympic 100-meter women’s butterfl y event in the year 2264. Is this prediction reasonable? Explain.

The winning times for the Olympic 100-meter women’s butterfly are given in the table.

Use the Sugar Consumption data set on p. A-10.

19. Find the equation of the least squares regression line for each: Albania, France, Italy, and the United Kingdom. Use the year as the explanatory variable.

20. Find the coeffi cient of determination for each equation. Order the four countries of Albania, France, Italy, and the United Kingdom from the least coeffi cient of determination to the greatest.

21. Use the coeffi cient of determination to determine whether a linear regression model is appropriate for describing sugar consumption for Albania, France, Italy, or the United Kingdom from 1968 through 2004. Explain your answer.

22. Which country—Albania, France, Italy, or the United Kingdom—would you predict to have the highest sugar consumption in 2012? Explain.

Olympic Year

Winning Time (seconds)

1956 71.0

1960 69.5

1964 64.7

1968 65.5

1972 63.34

1976 60.13

1980 60.42

1984 59.26

1988 59.0

1992 58.62

1996 59.13

2000 56.61

2004 57.72

2008 56.73

HS_PS_S1_07_T05_PS.indd 267HS_PS_S1_07_T05_PS.indd 267 10/2/11 5:17 AM10/2/11 5:17 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 27: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Regression AnalysisResiduals show how well a line summarizes data.

Finding Residuals

The points on a scatter plot usually do not fall on the regression line, which creates an error between observed and predicted values. This error, called a residual, is the vertical distance between a point on the scatter plot and the point on the regression line directly above or below the point.

A residual is the diff erence between an observed value and the predicted value from the regression line: residual (e) = observed (y) − predicted (y).

RESIDUALS

In the top scatter plot, when x = 5, the observed value is y = 10, the predicted value according to the regression line is y = 6, and the residual is e = 4. When a point is below the regression line, the residual is negative. For example, when x = 2, the residual is e = −5.

A residual plot is a graph that shows the residual for each value of the explanatory variable. The bottom scatter plot shows a residual plot for the data.

THINK ABOUT ITThe sum of all of the residuals will always equal 0.

REMEMBERe = y − y

1 2 3 4 5 6 7

2

4

6

8

10

12

14

16

x

y

1 2 3 4 5 6 7x

–5

–4

–3

–2

–1

0

1

2

3

4

5

Res

idua

l (e)

Residual Plot

268 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_T06_RG.indd 268HS_PS_S1_07_T06_RG.indd 268 10/2/11 5:24 AM10/2/11 5:24 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 28: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Interpreting Residual Plots

If the points in a residual plot show a pattern, then a straight line may not be an appropriate way to summarize the data. However, if the points have a random pattern that is scattered above and below the horizontal axis, then a line is probably an appropriate model.

For example, looking at the scatter plot showing the amount of public debt in the United States from 2000 to 2010, it appears that a straight line might be a good way to summarize the data.

The residual plot shown below, however, tells a different story. The curved pattern indicates that a straight line is not the best fit for these data.

When a linear model is not the best fit, quadratic models, exponential models, and others can be used. In this lesson, however, we will focus only on whether a linear model is a good fit.

Curved pattern:A straight line is not a good fit.

Random pattern:A straight line is a good fit.

Res

idua

l (e)

x xR

esid

ual (

e)

U.S. Debt Scatter Plot

ŷ = 0.7382x - 1471.6

2

4

6

8

10

12

14

16

2000 2002 2004 2006 2008 2010 2012Year

Deb

t (tri

llion

s of

dol

lars

)

U.S. Debt Residual Plot

-1.5

-1

-0.5

0

0.5

1

1.5

2

2000 2002 2004 2006 2008 2010 2012Year

Res

idua

l (e)

REGRESSION ANALYSIS 269

HS_PS_S1_07_T06_RG.indd 269HS_PS_S1_07_T06_RG.indd 269 10/2/11 5:24 AM10/2/11 5:24 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 29: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

270 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

Res

pons

e va

riabl

eExplanatory variable

1

2

3

4

5

6

7

8

9

10

11

12

1 2 3 4 5 6 7 8

x y e

1 22 1.047

2 18 −0.638

3 15 −1.305

4 14 −0.009

5 13 1.306

6 9 −0.401

Problem Set

Using the scatter plot with the least squares regression line, estimate the value to the nearest half unit.

1. the observed value at x = 5

2. the predicted value at x = 5

3. the residual at x = 5

4. the observed value at x = 3

5. the predicted value at x = 3

6. the residual at x = 3

7. Show that the sum of the estimated residuals is equal to 0.

8. Create a residual plot for the data set.

9. What is the observed value at x = 2?

10. What is the residual at x = 2?

11. What is the predicted value at x = 2?

12. How many observed values are below the regression equation?

13. Show that the sum of the residuals is equal to 0.

14. Create a residual plot for the data set.

The table shows values of the explanatory variable (x) and corresponding observed values of the response variable (y). Residuals (e) for the x-value are also given.

HS_PS_S1_07_T06_PS.indd 270HS_PS_S1_07_T06_PS.indd 270 10/2/11 5:33 AM10/2/11 5:33 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 30: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

REGRESSION ANALYSIS 271

0

5

10

15

⁻5

⁻10

⁻15

5 10 15 20 25 30 35 40 45 50 55 60 65

e

x 0

5

10

15

⁻5

⁻10

5 10 15 20 25 30 35 40 45 50 55 60 65

e

x

Year 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Income (thousands of dollars) 31.5 34.7 32.2 36.2 39.8 37 41.6 41 45.3 43

Student loan balance (thousands of dollars) 11.5 9.5 6.7 5.2 3.7 3.1 2.1 1.5 1.3 0.9

For the given residual plot, determine whether a linear model is appropriate for the data set. Explain.

Alicia started her own business after graduating from college in 1998. She immediately started paying off her student loan when she started her business. The spreadsheet shows her annual reported income from her business as well as her student loan balance at the end of the year.

17. Create a scatter plot that displays Alicia’s income for each year, and fi nd the correlation coeffi cient.

18. Find the least squares regression equation. What is Alicia’s predicted income for 2002? How does it compare to the observed value during this year?

19. According to the least square regression equation, what is the effect on Alicia’s annual income for each additional year?

20. Find the value of the coeffi cient of determination. What percent of the variation in Alicia’s income is associated with the year?

21. Create a residual plot to determine whether a linear regression model is a good model to use to represent Alicia’s annual income from 1998 to 2007.

22. Create a scatter plot that displays Alicia’s student loan balance for each year.

23. Find the least squares regression equation. What is Alicia’s predicted student loan balance for 2002? How does it compare to the observed value during this year?

24. According to the least squares regression equation, what is the effect on Alicia’s student loan balance for each additional year?

25. Find the value of the coeffi cient of determination. What percent of the variation in Alicia’s student loan balance is associated with the year?

26. Create a residual plot to determine whether a linear regression model is a good model to use to represent her student loan balance from 1998 to 2007.

15. 16.

HS_PS_S1_07_T06_PS.indd 271HS_PS_S1_07_T06_PS.indd 271 10/2/11 5:33 AM10/2/11 5:33 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 31: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Cautions in StatisticsGraphs and statements using statistics can sometimes be false or misleading.

Analyzing Misleading or Incorrect GraphsThis graph shows the Consumer Price Index (CPI) for entertainment from 2007 to 2009.

Consumer Price Index for Entertainment

110

111

112

113

114

115

2007 2008 2009Year

CP

I

You can see that the CPI for 2007 is just more than 111, and the CPI for 2008 is just more than 113. The graph is misleading because the height of the bar for 2008 is almost twice as high as the bar for 2007. Someone looking at this graph might think that entertainment costs the consumer twice as much in 2008 as in 2007 when in fact, entertainment costs from 2007 to 2008 only rose 1.7%. This kind of graph is fairly common and results when the numbering on the vertical axis does not start at 0.

This is one example of how a statistical graph, whether it is accidental or intentional, can be misleading. Use caution when reading such graphs so that you can make sound judgments about the information contained in the graphs.

Here are ways that could lead to statistical graphs with false or misleading conclusions:• Numbering on the vertical axis does not start at 0.• Axis scales are not evenly spaced.• Numbering on one or both of the axes is in reverse order.• The graph does not make sense or is diffi cult to read.

WHY GRAPHS COULD BE INCORRECT OR MISLEADING

BY THE WAYThe Consumer Price Index (CPI) is a measure used to monitor costs for consumers over time. The CPI covers many diff erent household costs including food, housing, and other categories.

272 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_T07_RG.indd 272HS_PS_S1_07_T07_RG.indd 272 10/2/11 5:38 AM10/2/11 5:38 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 32: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Flawed Statistical Claims and Lurking VariablesSometimes we read or hear statistical claims that are flawed, and it is important to be able to identify such flaws when they exist.

Consider the following statistical claim:

A survey about street parking indicates baseball fans who attended games regularly during the season are in favor of street parking.

Though it is probably true that regular game attendees are in favor of street parking, local residents who live near the stadium may not be in favor of street parking. This is a situation where the sample is biased. In such a study, everyone who might be affected by allowing street parking should be included in the survey in order to make it valid.

Consider another statistical claim:

The decrease in attendance at the baseball games from 2007 to 2008 was due to an increase in the price of admission during that time.

Though it may be true that an increase in ticket prices had an effect on game attendance during this time period, there could be other causes. A lurking variable is a hidden variable that was not considered during the study, or has been left out intentionally so that certain outcomes are favored. In this particular situation, possible lurking variables that could have also contributed to the decrease in attendance include

• The win-loss record of the team

• The availability of HDTV and Internet broadcasts

• The safety of the game location

Here are ways that could lead to statistical claims that could be false or misleading:• Lurking variables may also contribute to the outcomes.• The sample is biased toward a particular outcome.• The claim does not make sense or contains errors.

WHY STATISTICAL CLAIMS COULD BE INCORRECT OR MISLEADING

REMEMBERA high correlation between two variables does not necessarily mean that there is a causal relationship between them.

CAUTIONS IN STATISTICS 273

HS_PS_S1_07_T07_RG.indd 273HS_PS_S1_07_T07_RG.indd 273 10/2/11 5:38 AM10/2/11 5:38 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 33: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

274 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

Male22%No

33%

Yes17%

Female28%

Candidate Preferences

Num

ber o

f res

pond

ents

764

765

766

767

Should increase Should stay the same

Should decrease

Public Funding for the Arts

Yea

r

2008

2009

2010

22,000 22,200 22,400 22,600 22,800 23,000 23,200 23,400

Game Attendance

Attendance

Team Record

Year

5

10

15

20

25

2009 2008 2007 2006 2005 2004 2003 2002

Num

ber o

f win

s

1.

2.

3.

4.

Problem Set

Explain why the graph is misleading or incorrect.

Explain why the statistical claim is flawed.

5. A poll found that 65% of the voters were in favor of the proposed amendment, 25% were not in favor, and the remaining 20% were undecided.

6. A study of hotel guests, conducted by a consortium of hotel owners, determined that travel is benefi cial for marriages.

7. A study determined that watching too much television causes poor performance on math tests.

8. A survey of employed people showed they did not favor the tax increase proposal.

9. In a traffi c fl ow study that took place at the corner of Pine and Main streets on the weekend of December 1 and 2, it was determined that a traffi c signal is not needed at this corner.

10. In a study of a group of 16-year-olds, it was determined that a new medicine reduced acne by 230%.

11. It was determined that lack of regular sleep is the reason for the common cold.

12. More people are unemployed today than 100 years ago.

HS_PS_S1_07_T07_PS.indd 274HS_PS_S1_07_T07_PS.indd 274 10/2/11 5:41 AM10/2/11 5:41 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 34: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

CAUTIONS IN STATISTICS 275

For each research conclusion, identify the explanatory and response variables. Identify a lurking variable and how it could affect the research conclusion.

13. A vehicle’s fuel economy improves as the speed of the vehicle increases.

14. Sunscreen sales increase as ice cream sales increase.

15. People who make their own salad dressing with olive oil have lower cholesterol than most people.

16. Higher usage of a cell phone leads to less sleep at night.

A study of 50 apple trees in Orchard X and Orchard Y showed that 16% more trees in Orchard X contained undersized apples than in Orchard Y. The researcher claimed that this is because Orchard X gets less sunlight per day than Orchard Y.

17. What reason might justify the claim made by the researcher?

18. How many more trees in Orchard X contained undersized apples than in Orchard Y?

19. Which of the following could be lurking variables for this situation: age of trees, variety of apples, money spent on the trees, or use of fertilizer? Explain.

Eighty adult learners were given a memory task and then separated into two equal groups. In Group A, the learners were given the memory technique of creating pictures. In Group B, the learners were given the memory technique of recite and repeat. When the memory task was repeated, 45% of the learners in Group A made improvements, while only 37.5% of the learners in Group B made improvements. The researcher claimed that creating pictures was a better memory technique than recite and repeat.

20. What reason might justify the claim made by the researcher?

21. How many actual learners in Group A improved their score? How many in Group B improved their score?

22. Which of the following could be lurking variables for this situation: age of learners, time of day for the experiment, content of what was memorized, head size of the learners, or how the techniques were taught? Explain.

23. Challenge Name another lurking variable not included in Problem 22.

HS_PS_S1_07_T07_PS.indd 275HS_PS_S1_07_T07_PS.indd 275 10/2/11 5:41 AM10/2/11 5:41 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 35: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Chapter 7 Wrap-UpDetermining the Association Between Two Data Sets

The data from the National Association of Home Builders show that from 2003 to 2010, there was an overall drop in the number of housing starts during this time.

U.S. Housing Starts

200400600800

100012001400160018002000

Hou

sing

Sta

rts (t

hous

ands

)

2003 2004 2005 2006 2007 2008 2009 2010Year

There is a strong negative association between the year and the number of single family housing starts with a correlation coefficient of approximately −0.92. The equation of the least squares regression line y ≈ −199.1x + 400,665.6 can be used to represent the data set and to make predictions and estimations.

The coefficient of determination is r2 ≈ 0.85, meaning that about 85% of the change in the number of single family housing starts is due to changes in the year.

These data were collected during a time of economic decline. Data from a different 8-year period might show a very different pattern. Nevertheless, there are many other variables that can explain the decrease in the number of single family housing starts during this time period. These lurking variables include a rise in the cost of housing materials or a lack of available labor.

276 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_WU_RG.indd 276HS_PS_S1_07_WU_RG.indd 276 10/2/11 5:43 AM10/2/11 5:43 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 36: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Regression Analysis

Is a straight line a good fit for these data? A residual plot is a good way to find out.

-300

-200

-100

0

100

200

300

400

20042003 20062005 20082007 2010Year

Res

idua

l (e)

2009

-400

The points in the residual plot show a curved pattern, which indicates that a line may not be the best fit.

In Summary

Two sets of data can be graphed on a scatter plot. The direction and strength of association between the data sets can be determined by the correlation coefficient, r. A linear regression line is a special line of best fit that minimizes the squares of the vertical distances from the points on the scatter plot to the line. A residual plot helps determine whether a line is the appropriate way to model the data. Finally all your statistical knowledge about the situation being studied can help you judge whether a graph or statistical claim is incorrect or misleading.

CHAPTER 7 WRAP-UP 277

HS_PS_S1_07_WU_RG.indd 277HS_PS_S1_07_WU_RG.indd 277 10/2/11 5:43 AM10/2/11 5:43 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 37: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Relationships Between Variables

The scatter plot shows the instrumentation of small musical bands at a band competition.

Practice Problems

1. How many woodwinds were in the band that had the fewest brass instruments? How many brass instruments were in the band that had the fewest woodwinds?

2. What is the mean number of woodwind instruments at the competition?

3. What is the median number of brass instruments for bands that have 8 woodwind instruments?

4. Describe the strength and direction of the association between these variables.

5. Suppose a regression line is drawn, and it is determined that the points (2, 5) and (9, 10) are closest to the line. What is the equation of this line?

6. Use the equation of the regression line found in Problem 5 to predict the number of woodwinds in a band with 15 brass instruments.

7. Use the equation of the regression line found in Problem 5 to predict the number of brass instruments in a band with 8 woodwinds.

Band Instrumentation

Woo

dwin

dBrass

123456789

10111213

1 2 3 4 5 6 7 8 9 10 11 12

Year Average Price (dollars)

1945 0.42

1950 0.46

1955 0.58

1960 0.76

1965 1.01

1970 1.55

1975 2.05

1980 2.691985 3.551990 4.231995 4.352000 5.392005 6.412010 7.50

The table shows the average ticket prices for movies from 1945 to 2010.

8. Create a scatter plot that displays the average price for movies each year. Describe the strength and direction of the association between these variables, and then determine the correlation coeffi cient.

9. Determine the least squares regression equation.

10. Use the least squares regression equation to predict the average price for a movie in 1965. How does it compare to the observed value during this year? What is the value of the residual for this year?

11. Use the least squares regression equation to predict the year that average ticket prices will exceed $12.00.

12. According to the least square regression equation, what would be the effect on the average prices of movie tickets for each additional year?

13. Find the value of the coeffi cient of determination. What percent of the variation in the differences in average ticket prices is associated with the year?

14. Create a residual plot to determine whether a linear regression model is a good model to use to represent average movie ticket prices from 1945 to 2010.

278 CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLES

HS_PS_S1_07_WU_PS.indd 278HS_PS_S1_07_WU_PS.indd 278 10/2/11 5:47 AM10/2/11 5:47 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Page 38: CHAPTER 7 RELATIONSHIPS BETWEEN VARIABLESk12.kitaboo.com/k12/ebookpdf/maths05/17501_HS_PS_chapter07.pdfIn the scatter plot, there is a grouping of six states (Illinois, Massachusetts,

Chapter 7 Wrap-Up

Year of Olympic Games

Winning Time (seconds)

1956 221

1960 216

1964 218

1968 215

1972 216

1976 219

1980 218

1984 213

1988 216

1992 220

1996 216

2000 212

2004 214

2008 213

The table shows the winning times for the Olympic men’s 1000-meter run.

15. Create a scatter plot that displays the winning time for each Olympics. Describe the strength and direction of the association between these variables, and then determine the correlation coeffi cient.

16. Determine the least squares regression equation.

17. Use the least squares regression equation to predict the winning time for the 1992 Olympic Games. How does it compare to the observed value during this year? What is the value of the residual for this year?

18. Use the least squares regression equation to estimate the Olympic year in which the winning time will be less than 210 seconds.

19. According to the least square regression equation, what is the effect on winning times for each additional year?

20. Find the value of the coeffi cient of determination. What percent of the variation in winning times is associated with the year?

21. Create a residual plot to determine whether a linear regression model is a good model to use to represent winning times from 1956 to 2008.

A study of 80 franchise restaurants in each of two Metropolitan Areas, A and B, showed that sales were, on average, 12% higher in Metropolitan Area A. The researcher claimed that this was due to Metropolitan Area A having less crime than Metropolitan Area B.

22. What reason might justify the claim made by the researcher?

23. If sales in Metropolitan Area A were, on average, $2.34 million, what were sales in Metropolitan Area B?

24. List two lurking variables that could also explain the difference in sales between the metropolitan areas.

The cost of a 2-star, 3-star, and 4-star hotel (per night) in a selected foreign city is $195, $202, and $218. When these figures were typed into a spreadsheet program, the display shown here was created.

U.S

. Dol

lars

195

210

215

190

205

185

200

220

2-star 3-star 4-star

Foreign City Hotel Costs 25. What information is being conveyed by this display?

26. Why would this display be considered misleading?

27. According to the data, how much more does a 4-star hotel cost compared to a 2-star hotel? What is the percentage increase in cost from a 2-star hotel to a 4-star hotel?

28. Create a bar graph that is more appropriate for these data.

CHAPTER 7 WRAP-UP 279

HS_PS_S1_07_WU_PS.indd 279HS_PS_S1_07_WU_PS.indd 279 10/2/11 5:47 AM10/2/11 5:47 AM

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.

Copyright © 2011, K12 Inc. All rights reserved. This material may not be reproduced in whole or in part, including illustrations, without the express prior written consent of K12 Inc.