Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education,...

71
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1 Chapter Correlation and Regression 9

Transcript of Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education,...

Page 1: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1

Chapter

Correlation and

Regression

9

Page 2: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 2

Correlation

Correlation

• A relationship between two variables.

• The data can be represented by ordered pairs (x, y)

x is the independent (or explanatory) variable

y is the dependent (or response) variable

.

Page 3: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 3

Correlation

x 1 2 3 4 5

y – 4 – 2 – 1 0 2

A scatter plot can be used to determine whether a

linear (straight line) correlation exists between two

variables.

x

2 4

–2

– 4

y

2

6

Example:

.

Page 4: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 4

Types of Correlation

x

y

Negative Linear Correlation

x

y

No Correlation

x

y

Positive Linear Correlation

x

y

Nonlinear Correlation

As x increases, y

tends to decrease.

As x increases, y

tends to increase.

.

Page 5: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5

Example: Constructing a Scatter Plot

An economist wants to determine

whether there is a linear relationship

between a country’s gross domestic

product (GDP)

and carbon dioxide (CO2)

emissions. The data are shown in

the table. Display the data in a

scatter plot and determine whether

there appears to be a positive or

negative linear correlation or no

linear correlation. (Source: World

Bank and U.S. Energy Information

Administration)

GDP

(trillions of

$), x

CO2

emission

(millions of

metric tons),

y

1.6 428.2

3.6 828.8

4.9 1214.2

1.1 444.6

0.9 264.0

2.9 415.3

2.7 571.8

2.3 454.9

1.6 358.7

1.5 573.5

.

Page 6: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 6

Solution: Constructing a Scatter Plot

Appears to be a positive linear correlation. As the

gross domestic products increase, the carbon dioxide

emissions tend to increase..

Page 7: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 7

Example: Constructing a Scatter Plot

Using Technology

Old Faithful, located in

Yellowstone National Park, is the

world’s most famous geyser. The

duration (in minutes) of several of

Old Faithful’s eruptions and the

times (in minutes) until the next

eruption are shown in the table.

Using a TI-83/84, display the data

in a scatter plot. Determine the

type of correlation.

Duration

x

Time,

y

Duration

x

Time,

y

1.8 56 3.78 79

1.82 58 3.83 85

1.9 62 3.88 80

1.93 56 4.1 89

1.98 57 4.27 90

2.05 57 4.3 89

2.13 60 4.43 89

2.3 57 4.47 86

2.37 61 4.53 89

2.82 73 4.55 86

3.13 76 4.6 92

3.27 77 4.63 91

3.65 77

.

Page 8: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 8

Solution: Constructing a Scatter Plot

Using Technology

From the scatter plot, it appears that the variables have a

positive linear correlation, as the durations of the

eruptions increase, the times until the next eruption tend

to increase.

.

Page 9: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 9

Correlation Coefficient

Correlation coefficient

• A measure of the strength and the direction of a linear

relationship between two variables.

• The symbol r represents the sample correlation

coefficient.

• A formula for r is

• The population correlation coefficient is represented

by ρ (rho).

2 22 2

n xy x yr

n x x n y y

n is the number

of data pairs

.

Page 10: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 10

Correlation Coefficient

• The range of the correlation coefficient is -1 to 1.

-1 0 1

If r = -1 there is a

perfect negative

correlation

If r = 1 there is a

perfect positive

correlation

If r is close to 0

there is no linear

correlation

.

Page 11: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 11

Linear Correlation

Strong negative correlation

Weak positive correlation

Strong positive correlation

No Correlation

x

y

x

y

x

y

x

y

r = 0.91 r = 0.88

r = 0.42 r = 0.07

.

Page 12: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 12

Calculating a Correlation Coefficient

1. Find the sum of the x-

values.

2. Find the sum of the y-

values.

3. Multiply each x-value by

its corresponding y-value

and find the sum.

x

y

xy

In Words In Symbols

.

Page 13: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 13

Calculating a Correlation Coefficient

4. Square each x-value

and find the sum.

5. Square each y-value

and find the sum.

6. Use these five sums to

calculate the

correlation coefficient.

2x

2y

2 22 2

n xy x yr

n x x n y y

In Words In Symbols

.

Page 14: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 14

Example: Calculating the Correlation

Coefficient

Calculate the correlation

coefficient for the gross

domestic products and carbon

dioxide emissions data. What

can you conclude?

GDP

(trillions of $),

x

CO2 emission

(millions of

metric tons), y

1.6 428.2

3.6 828.8

4.9 1214.2

1.1 444.6

0.9 264.0

2.9 415.3

2.7 571.8

2.3 454.9

1.6 358.7

1.5 573.5

.

Page 15: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 15

Solution: Calculating the Correlation

Coefficient

x y xy x2 y2

1.6 428.2 685.12 2.56 183,355.24

3.6 828.8 2983.68 12.96 686,909.44

4.9 1214.2 5949.58 24.01 1,474,281.64

1.1 444.6 489.06 1.21 197,669.16

0.9 264.0 237.6 0.81 69,696

2.9 415.3 1204.37 8.41 172,474.09

2.7 571.8 1543.86 7.29 326,955.24

2.3 454.9 1046.27 5.29 206,934.01

1.6 358.7 573.92 2.56 128,665.69

1.5 573.5 860.25 2.25 328,902.25

Σx = 23.1 Σy = 5554 Σxy = 15,573.71 Σx2 = 67.35 Σy2 = 3,775,842.76.

Page 16: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 16

Solution: Calculating the Correlation

Coefficient

2 22 2

n xy x yr

n x x n y y

2 2

10(15,573.71) 23.1 5554

10(67.35) 23.1 10(3,775,842.76) 5554

27,439.70.882

139.89 6,911,511.6

Σx = 23.1 Σy = 5554 Σxy = 15,573.71 Σx2 = 32.44

Σy2 = 3,775,842.76

r ≈ 0.882 suggests a strong positive linear correlation. As

the gross domestic product increases, the carbon dioxide

emissions also increase..

Page 17: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 17

Example: Using Technology to Find a

Correlation Coefficient

Use a technology tool to calculate

the correlation coefficient for the

Old Faithful data. What can you

conclude?

Duration

x

Time,

y

Duration

x

Time,

y

1.8 56 3.78 79

1.82 58 3.83 85

1.9 62 3.88 80

1.93 56 4.1 89

1.98 57 4.27 90

2.05 57 4.3 89

2.13 60 4.43 89

2.3 57 4.47 86

2.37 61 4.53 89

2.82 73 4.55 86

3.13 76 4.6 92

3.27 77 4.63 91

3.65 77

.

Page 18: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 18

Solution: Using Technology to Find a

Correlation Coefficient

STAT > CalcTo calculate r, you must first enter the

DiagnosticOn command found in the Catalog menu

r ≈ 0.979 suggests a strong positive correlation.

.

Page 19: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 19

Using a Table to Test a Population

Correlation Coefficient ρ

• Once the sample correlation coefficient r has been

calculated, we need to determine whether there is

enough evidence to decide that the population

correlation coefficient ρ is significant at a specified

level of significance.

• Use Table 11 in Appendix B.

• If |r| is greater than the critical value, there is enough

evidence to decide that the correlation coefficient ρ is

significant.

.

Page 20: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 20

Using a Table to Test a Population

Correlation Coefficient ρ

• Determine whether ρ is significant for five pairs of

data (n = 5) at a level of significance of α = 0.01.

• If |r| > 0.959, the correlation is significant. Otherwise,

there is not enough evidence to conclude that the

correlation is significant.

Number of

pairs of data

in sample

level of significance

.

Page 21: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 21

Using a Table to Test a Population

Correlation Coefficient ρ

1. Determine the number

of pairs of data in the

sample.

2. Specify the level of

significance.

3. Find the critical value.

Determine n.

Identify .

Use Table 11 in

Appendix B.

In Words In Symbols

.

Page 22: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 22

Using a Table to Test a Population

Correlation Coefficient ρ

In Words In Symbols

4. Decide if the

correlation is

significant.

5. Interpret the decision

in the context of the

original claim.

If |r| > critical value, the

correlation is significant.

Otherwise, there is not

enough evidence to

support that the

correlation is significant.

.

Page 23: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 23

Example: Using a Table to Test a

Population Correlation Coefficient ρ

Using the Old Faithful data, you

used 25 pairs of data to find

r ≈ 0.979. Is the correlation

coefficient significant? Use

α = 0.05.

Duration

x

Time,

y

Duration

x

Time,

y

1.8 56 3.78 79

1.82 58 3.83 85

1.9 62 3.88 80

1.93 56 4.1 89

1.98 57 4.27 90

2.05 57 4.3 89

2.13 60 4.43 89

2.3 57 4.47 86

2.37 61 4.53 89

2.82 73 4.55 86

3.13 76 4.6 92

3.27 77 4.63 91

3.65 77

.

Page 24: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 24

Solution: Using a Table to Test a

Population Correlation Coefficient ρ

• n = 25, α = 0.05

• |r| ≈ 0.979 > 0.396

• There is enough evidence

at the 5% level of

significance to conclude

that there is a significant

linear correlation between

the duration of Old

Faithful’s eruptions and the

time between eruptions.

.

Page 25: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 25

Hypothesis Testing for a Population

Correlation Coefficient ρ

• A hypothesis test can also be used to determine

whether the sample correlation coefficient r provides

enough evidence to conclude that the population

correlation coefficient ρ is significant at a specified

level of significance.

• A hypothesis test can be one-tailed or two-tailed.

.

Page 26: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 26

Hypothesis Testing for a Population

Correlation Coefficient ρ

• Left-tailed test

• Right-tailed test

• Two-tailed test

H0: ρ 0 (no significant negative correlation)

Ha: ρ < 0 (significant negative correlation)

H0: ρ 0 (no significant positive correlation)

Ha: ρ > 0 (significant positive correlation)

H0: ρ = 0 (no significant correlation)

Ha: ρ 0 (significant correlation)

.

Page 27: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 27

The t-Test for the Correlation Coefficient

• Can be used to test whether the correlation between

two variables is significant.

• The test statistic is r

• The standardized test statistic

follows a t-distribution with d.f. = n – 2.

• In this text, only two-tailed hypothesis tests for ρ are

considered.

212

r

r rt

rn

.

Page 28: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 28

Using the t-Test for ρ

1. State the null and alternative

hypothesis.

2. Specify the level of

significance.

3. Identify the degrees of

freedom.

4. Determine the critical

value(s) and rejection

region(s).

State H0 and Ha.

Identify .

d.f. = n – 2.

Use Table 5 in

Appendix B.

In Words In Symbols

.

Page 29: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 29

Using the t-Test for ρ

5. Find the standardized test

statistic.

6. Make a decision to reject or

fail to reject the null

hypothesis.

7. Interpret the decision in the

context of the original claim.

In Words In Symbols

If t is in the rejection

region, reject H0.

Otherwise fail to reject

H0.

212

rt

rn

.

Page 30: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 30

Example: t-Test for a Correlation

Coefficient

Previously you calculated

r ≈ 0.882. Test the significance

of this correlation coefficient.

Use α = 0.05.

GDP

(trillions of $),

x

CO2 emission

(millions of

metric tons), y

1.6 428.2

3.6 828.8

4.9 1214.2

1.1 444.6

0.9 264.0

2.9 415.3

2.7 571.8

2.3 454.9

1.6 358.7

1.5 573.5

.

Page 31: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 31

Solution: t-Test for a Correlation

Coefficient

• H0:

• Ha:

• d.f. =

• Rejection Region:

• Test Statistic:

0.05

10 – 2 = 8

2

0.8825.294

1 (0.882)

10 2

t

ρ = 0

ρ ≠ 0

• Decision:

At the 5% level of significance,

there is enough evidence to

conclude that there is a

significant linear correlation

between gross domestic products

and carbon dioxide emissions.

Reject H0

.

Page 32: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 32

Correlation and Causation

• The fact that two variables are strongly correlated

does not in itself imply a cause-and-effect

relationship between the variables.

• If there is a significant correlation between two

variables, you should consider the following

possibilities.

1. Is there a direct cause-and-effect relationship

between the variables?

• Does x cause y?

.

Page 33: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 33

Correlation and Causation

2. Is there a reverse cause-and-effect relationship

between the variables?

• Does y cause x?

3. Is it possible that the relationship between the

variables can be caused by a third variable or by a

combination of several other variables?

4. Is it possible that the relationship between two

variables may be a coincidence?

.

Page 34: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 34

Regression lines

• After verifying that the linear correlation between

two variables is significant, next we determine the

equation of the line that best models the data

(regression line).

• Can be used to predict the value of y for a given value

of x.

x

y

.

Page 35: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 35

Residuals

Residual

• The difference between the observed y-value and the

predicted y-value for a given x-value on the line.

For a given x-value,

di = (observed y-value) – (predicted y-value)

x

y

}d1

}d2

d3{d4{

}d5

d6{

Predicted

y-value

Observed

y-value

.

Page 36: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 36

Regression line (line of best fit)

• The line for which the sum of the squares of the

residuals is a minimum.

• The equation of a regression line for an independent

variable x and a dependent variable y is

ŷ = mx + b

Regression Line

Predicted

y-value for

a given x-

value

Slope

y-intercept

.

Page 37: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 37

The Equation of a Regression Line

• ŷ = mx + b where

• is the mean of the y-values in the data

• is the mean of the x-values in the data

• The regression line always passes through the point

22

n xy x ym

n x x

y xb y mx m

n n

y

x

,x y

.

Page 38: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 38

Example: Finding the Equation of a

Regression Line

Find the equation of the

regression line for the gross

domestic products and

carbon dioxide emissions

data.

GDP

(trillions of $), x

CO2 emission

(millions of

metric tons), y

1.6 428.2

3.6 828.8

4.9 1214.2

1.1 444.6

0.9 264.0

2.9 415.3

2.7 571.8

2.3 454.9

1.6 358.7

1.5 573.5

.

Page 39: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 39

Solution: Finding the Equation of a

Regression Line

x y xy x2 y2

1.6 428.2 685.12 2.56 183,355.24

3.6 828.8 2983.68 12.96 686,909.44

4.9 1214.2 5949.58 24.01 1,474,281.64

1.1 444.6 489.06 1.21 197,669.16

0.9 264.0 237.6 0.81 69,696

2.9 415.3 1204.37 8.41 172,474.09

2.7 571.8 1543.86 7.29 326,955.24

2.3 454.9 1046.27 5.29 206,934.01

1.6 358.7 573.92 2.56 128,665.69

1.5 573.5 860.25 2.25 328,902.25

Σx = 23.1 Σy = 5554 Σxy = 15,573.71 Σx2 = 67.35

Σy2 =

3,775,842.76

Recall from section 9.1:

.

Page 40: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 40

Solution: Finding the Equation of a

Regression Line

Σx = 23.1 Σy = 5554 Σxy = 15,573.71 Σx2 = 67.35

22

n xy x ym

n x x

b y mx

2

10(15,573.71) (23.1)(5554)

10(67.35) 23.1

27,439.7196.151977

139.89

5554 23.1(196.151977)

10 10

555.4 (196.151977)(2.31) 102.2889

ˆ 196.152 102.289y x Equation of the regression line

.

Page 41: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 41

Solution: Finding the Equation of a

Regression Line

• To sketch the regression line, use any two x-values within

the range of the data and calculate the corresponding y-

values from the regression line.

.

Page 42: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 42

Example: Using Technology to Find a

Regression Equation

Use a technology tool to find the

equation of the regression line for

the Old Faithful data.

Duration

x

Time,

y

Duration

x

Time,

y

1.8 56 3.78 79

1.82 58 3.83 85

1.9 62 3.88 80

1.93 56 4.1 89

1.98 57 4.27 90

2.05 57 4.3 89

2.13 60 4.43 89

2.3 57 4.47 86

2.37 61 4.53 89

2.82 73 4.55 86

3.13 76 4.6 92

3.27 77 4.63 91

3.65 77

.

Page 43: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 43

Solution: Using Technology to Find a

Regression Equation

550

100

1

.

Page 44: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 44

Example: Predicting y-Values Using

Regression Equations

The regression equation for the gross domestic products

(in trillions of dollars) and carbon dioxide emissions (in

millions of metric tons) data is ŷ = 196.152x + 102.289.

Use this equation to predict the expected carbon dioxide

emissions for the following gross domestic products.

(Recall from section 9.1 that x and y have a significant

linear correlation.)

1. 1.2 trillion dollars

2. 2.0 trillion dollars

3. 2.5 trillion dollars

.

Page 45: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 45

Solution: Predicting y-Values Using

Regression Equations

ŷ = 196.152x + 102.289

1. 1.2 trillion dollars

When the gross domestic product is $1.2 trillion, the

CO2 emissions are about 337.671 million metric tons.

ŷ =196.152(1.2) + 102.289 ≈ 337.671

2. 2.0 trillion dollars

When the gross domestic product is $2.0 trillion, the

CO2 emissions are 494.595 million metric tons.

ŷ =196.152(2.0) + 102.289 ≈ 494.593

.

Page 46: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 46

Solution: Predicting y-Values Using

Regression Equations

3. 2.5 trillion dollars

When the gross domestic product is $2.5 trillion, the

CO2 emissions are 592.669 million metric tons.

ŷ =196.152(2.5) + 102.289 ≈ 592.669

Prediction values are meaningful only for x-values in

(or close to) the range of the data. The x-values in the

original data set range from 0.9 to 4.9. So, it would

not be appropriate to use the regression line to predict

carbon dioxide emissions for gross domestic products

such as $0.2 or $14.5 trillion dollars.

.

Page 47: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 47

Variation About a Regression Line

• Three types of variation about a regression line

Total variation

Explained variation

Unexplained variation

• To find the total variation, you must first calculate

The total deviation

The explained deviation

The unexplained deviation

.

Page 48: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 48

Variation About a Regression Line

iy y

ˆiy y

ˆi iy y

(xi, ŷi)

x

y (xi, yi)

(xi, yi)

Unexplained

deviationˆi iy y

Total

deviationiy y Explained

deviationˆiy y

y

x

Total Deviation =

Explained Deviation =

Unexplained Deviation =

.

Page 49: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 49

Total variation

• The sum of the squares of the differences between the

y-value of each ordered pair and the mean of y.

Explained variation

• The sum of the squares of the differences between

each predicted y-value and the mean of y.

Variation About a Regression Line

2

iy y Total variation =

Explained variation = 2

ˆiy y

.

Page 50: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 50

Unexplained variation

• The sum of the squares of the differences between the

y-value of each ordered pair and each corresponding

predicted y-value.

Variation About a Regression Line

2

ˆi iy y Unexplained variation =

The sum of the explained and unexplained variation is

equal to the total variation.

Total variation = Explained variation + Unexplained variation

.

Page 51: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 51

Coefficient of Determination

Coefficient of determination

• The ratio of the explained variation to the total

variation.

• Denoted by r2

2 Explained variationTotal variation

r

.

Page 52: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 52

Example: Coefficient of Determination

r2 » (0.883)2

» 0.780

About 78% of the variation in the carbon emissions can be

explained by the variation in the gross domestic products.

About 22% of the variation is unexplained.

The correlation coefficient for the gross domestic

products and carbon dioxide emissions data is r ≈ 0.883.

Find the coefficient of determination. What does this tell

you about the explained variation of the data about the

regression line? About the unexplained variation?

Solution:

.

Page 53: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 53

The Standard Error of Estimate

Standard error of estimate

• The standard deviation of the observed yi -values

about the predicted ŷ-value for a given xi -value.

• Denoted by se.

• The closer the observed y-values are to the predicted

y-values, the smaller the standard error of estimate

will be.

2( )ˆ2

i ie

y ys

n

n is the number of ordered pairs

in the data set

.

Page 54: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 54

The Standard Error of Estimate

2

, , , ( ), ˆ ˆ

( )ˆi i i i i

i i

x y y y y

y y

1. Make a table that includes the

column heading shown.

2. Use the regression equation to

calculate the predicted y-values.

3. Calculate the sum of the squares

of the differences between each

observed y-value and the

corresponding predicted y-value.

4. Find the standard error of

estimate.

ˆ iy mx b

2 ( )ˆi iy y

2( )ˆ2

i ie

y ys

n

In Words In Symbols

.

Page 55: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 55

Example: Standard Error of Estimate

The regression equation for the gross domestic products

and carbon dioxide emissions data as calculated in

section 9.2 is

ŷ = 196.152x + 102.289

Find the standard error of estimate.

Solution:

Use a table to calculate the sum of the squared

differences of each observed y-value and the

corresponding predicted y-value.

.

Page 56: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 56

Solution: Standard Error of Estimate

x y ŷ i (yi – ŷ i)2

1.6 428.2 416.1322 145.63179684

3.6 828.8 808.4362 414.68435044

4.9 1214.2 1063.4338 22,730.44706244

1.1 444.6 318.0562 16,013.33331844

0.9 264.0 278.8258 219.80434564

2.9 415.3 671.1298 65,448.88656804

2.7 571.8 631.8994 3611.93788036

2.3 454.9 553.4386 9709.85568996

1.6 358.7 416.1322 3298.45759684

1.5 573.5 396.517 31,322.982289

Σ = 152,916.020898

unexplained variation.

Page 57: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 57

Solution: Standard Error of Estimate

• n = 10, Σ(yi – ŷ i)2 = 152,916.020898

2( )ˆ2

i ie

y ys

n

The standard error of estimate of the carbon dioxide

emissions for a specific gross domestic product is

about 138.255 million metric tons.

152,916.020898138.255

10 2

.

Page 58: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 58

Prediction Intervals

• Two variables have a bivariate normal distribution

if for any fixed value of x, the corresponding values

of y are normally distributed and for any fixed values

of y, the corresponding x-values are normally

distributed.

.

Page 59: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 59

Prediction Intervals

• A prediction interval can be constructed for the true

value of y.

• Given a linear regression equation ŷ = mx + b and x0,

a specific value of x, a c-prediction interval for y is

ŷ – E < y < ŷ + E where

• The point estimate is ŷ and the margin of error is E.

The probability that the prediction interval contains y

is c.

202 2

( )11

( )c e

n x xE t s

n n x x

.

Page 60: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 60

Constructing a Prediction Interval for y

for a Specific Value of x

1. Identify the number of ordered

pairs in the data set n and the

degrees of freedom.

2. Use the regression equation and

the given x-value to find the

point estimate ŷ.

3. Find the critical value tc that

corresponds to the given level of

confidence c.

ˆi iy mx b

Use Table 5 in

Appendix B.

In Words In Symbols

d.f. = n – 2

.

Page 61: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 61

Constructing a Prediction Interval for y

for a Specific Value of x

4. Find the standard error of

estimate se.

5. Find the margin of error E.

6. Find the left and right

endpoints and form the

prediction interval.

In Words In Symbols

2( )ˆ2

i ie

y ys

n

202 2

( )11

( )c e

n x xE t s

n n x x

Left endpoint: ŷ – E

Right endpoint: ŷ + E

Interval: ŷ – E < y < ŷ + E

.

Page 62: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 62

Example: Constructing a Prediction

Interval

Construct a 95% prediction interval for the carbon

dioxide emission when the gross domestic product is

$3.5 trillion. What can you conclude?

Recall, n = 10, ŷ = 196.152x + 102.289, se = 138.255

Solution:

Point estimate:

ŷ = 196.152(3.5) + 102.289 ≈ 788.821

Critical value:

d.f. = n –2 = 10 – 2 = 8 tc = 2.306

215.8, 32.44, 1.975x x x

.

Page 63: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 63

Solution: Constructing a Prediction

Interval

0

2

2

2

2 2

1 10(3.5 2.31)(2.447)(10.290)

(

1 349.42410 10(67.35) (

)11

2 .1

( )

3 )

c e

n x xE t s

n n x x

Left Endpoint: ŷ – E Right Endpoint: ŷ + E

439.397 < y < 1138.245

788.821 – 349.424

≈ 439.397

788.821 + 349.424

≈ 1138.245

You can be 95% confident that when the gross domestic

product is $3.5 trillion, the carbon dioxide emissions will be

between 439.397 and 1138.245 million metric tons..

Page 64: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 64

Multiple Regression Equation

• In many instances, a better prediction can be found

for a dependent (response) variable by using more

than one independent (explanatory) variable.

• For example, a more accurate prediction for the

company sales discussed in previous sections might

be made by considering the number of employees on

the sales staff as well as the advertising expenses.

.

Page 65: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 65

Multiple Regression Equation

Multiple regression equation

• ŷ = b + m1x1 + m2x2 + m3x3 + … + mkxk

• x1, x2, x3,…, xk are independent variables

• b is the y-intercept

• y is the dependent variable

* Because the mathematics associated with this concept is

complicated, technology is generally used to calculate the

multiple regression equation.

.

Page 66: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 66

Example: Finding a Multiple Regression

Equation

A researcher wants to determine how employee salaries

at a certain company are related to the length of

employment, previous experience, and education. The

researcher selects eight employees from the company

and obtains the data shown on the next slide. Use

Minitab to find a multiple regression equation that

models the data.

.

Page 67: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 67

Example: Finding a Multiple Regression

Equation

Employee Salary, y

Employment

(yrs), x1

Experience

(yrs), x2

Education

(yrs), x3

A 57,310 10 2 16

B 57,380 5 6 16

C 54,135 3 1 12

D 56,985 6 5 14

E 58,715 8 8 16

F 60,620 20 0 12

G 59,200 8 4 18

H 60,320 14 6 17

.

Page 68: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 68

Solution: Finding a Multiple Regression

Equation

• Enter the y-values in C1 and the x1-, x2-, and x3-

values in C2, C3 and C4 respectively.

• Select “Regression > Regression…” from the Stat

menu.

• Use the salaries as the response variable and the

remaining data as the predictors.

.

Page 69: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 69

Solution: Finding a Multiple Regression

Equation

The regression equation is

ŷ = 49,764 + 364x1 + 228x2 + 267x3

.

Page 70: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 70

Predicting y-Values

• After finding the equation of the multiple regression

line, you can use the equation to predict y-values over

the range of the data.

• To predict y-values, substitute the given value for

each independent variable into the equation, then

calculate ŷ.

.

Page 71: Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5Example: Constructing a Scatter Plot An economist wants to determine whether

Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 71

Example: Predicting y-Values

Use the regression equation

ŷ = 49,764 + 364x1 + 228x2 + 267x3

to predict an employee’s salary given 12 years of

current employment, 5 years of experience, and 16

years of education.

Solution:

ŷ = 49,764 + 364(12) + 228(5) + 267(16)

= 59,544

The employee’s predicted salary is $59,544.

.