Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education,...
Transcript of Chapter 9 - National Paralegal College for...Copyright © 2015, 2012, and 2009 Pearson Education,...
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1
Chapter
Correlation and
Regression
9
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 2
Correlation
Correlation
• A relationship between two variables.
• The data can be represented by ordered pairs (x, y)
x is the independent (or explanatory) variable
y is the dependent (or response) variable
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 3
Correlation
x 1 2 3 4 5
y – 4 – 2 – 1 0 2
A scatter plot can be used to determine whether a
linear (straight line) correlation exists between two
variables.
x
2 4
–2
– 4
y
2
6
Example:
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 4
Types of Correlation
x
y
Negative Linear Correlation
x
y
No Correlation
x
y
Positive Linear Correlation
x
y
Nonlinear Correlation
As x increases, y
tends to decrease.
As x increases, y
tends to increase.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 5
Example: Constructing a Scatter Plot
An economist wants to determine
whether there is a linear relationship
between a country’s gross domestic
product (GDP)
and carbon dioxide (CO2)
emissions. The data are shown in
the table. Display the data in a
scatter plot and determine whether
there appears to be a positive or
negative linear correlation or no
linear correlation. (Source: World
Bank and U.S. Energy Information
Administration)
GDP
(trillions of
$), x
CO2
emission
(millions of
metric tons),
y
1.6 428.2
3.6 828.8
4.9 1214.2
1.1 444.6
0.9 264.0
2.9 415.3
2.7 571.8
2.3 454.9
1.6 358.7
1.5 573.5
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 6
Solution: Constructing a Scatter Plot
Appears to be a positive linear correlation. As the
gross domestic products increase, the carbon dioxide
emissions tend to increase..
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 7
Example: Constructing a Scatter Plot
Using Technology
Old Faithful, located in
Yellowstone National Park, is the
world’s most famous geyser. The
duration (in minutes) of several of
Old Faithful’s eruptions and the
times (in minutes) until the next
eruption are shown in the table.
Using a TI-83/84, display the data
in a scatter plot. Determine the
type of correlation.
Duration
x
Time,
y
Duration
x
Time,
y
1.8 56 3.78 79
1.82 58 3.83 85
1.9 62 3.88 80
1.93 56 4.1 89
1.98 57 4.27 90
2.05 57 4.3 89
2.13 60 4.43 89
2.3 57 4.47 86
2.37 61 4.53 89
2.82 73 4.55 86
3.13 76 4.6 92
3.27 77 4.63 91
3.65 77
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 8
Solution: Constructing a Scatter Plot
Using Technology
From the scatter plot, it appears that the variables have a
positive linear correlation, as the durations of the
eruptions increase, the times until the next eruption tend
to increase.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 9
Correlation Coefficient
Correlation coefficient
• A measure of the strength and the direction of a linear
relationship between two variables.
• The symbol r represents the sample correlation
coefficient.
• A formula for r is
• The population correlation coefficient is represented
by ρ (rho).
2 22 2
n xy x yr
n x x n y y
n is the number
of data pairs
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 10
Correlation Coefficient
• The range of the correlation coefficient is -1 to 1.
-1 0 1
If r = -1 there is a
perfect negative
correlation
If r = 1 there is a
perfect positive
correlation
If r is close to 0
there is no linear
correlation
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 11
Linear Correlation
Strong negative correlation
Weak positive correlation
Strong positive correlation
No Correlation
x
y
x
y
x
y
x
y
r = 0.91 r = 0.88
r = 0.42 r = 0.07
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 12
Calculating a Correlation Coefficient
1. Find the sum of the x-
values.
2. Find the sum of the y-
values.
3. Multiply each x-value by
its corresponding y-value
and find the sum.
x
y
xy
In Words In Symbols
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 13
Calculating a Correlation Coefficient
4. Square each x-value
and find the sum.
5. Square each y-value
and find the sum.
6. Use these five sums to
calculate the
correlation coefficient.
2x
2y
2 22 2
n xy x yr
n x x n y y
In Words In Symbols
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 14
Example: Calculating the Correlation
Coefficient
Calculate the correlation
coefficient for the gross
domestic products and carbon
dioxide emissions data. What
can you conclude?
GDP
(trillions of $),
x
CO2 emission
(millions of
metric tons), y
1.6 428.2
3.6 828.8
4.9 1214.2
1.1 444.6
0.9 264.0
2.9 415.3
2.7 571.8
2.3 454.9
1.6 358.7
1.5 573.5
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 15
Solution: Calculating the Correlation
Coefficient
x y xy x2 y2
1.6 428.2 685.12 2.56 183,355.24
3.6 828.8 2983.68 12.96 686,909.44
4.9 1214.2 5949.58 24.01 1,474,281.64
1.1 444.6 489.06 1.21 197,669.16
0.9 264.0 237.6 0.81 69,696
2.9 415.3 1204.37 8.41 172,474.09
2.7 571.8 1543.86 7.29 326,955.24
2.3 454.9 1046.27 5.29 206,934.01
1.6 358.7 573.92 2.56 128,665.69
1.5 573.5 860.25 2.25 328,902.25
Σx = 23.1 Σy = 5554 Σxy = 15,573.71 Σx2 = 67.35 Σy2 = 3,775,842.76.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 16
Solution: Calculating the Correlation
Coefficient
2 22 2
n xy x yr
n x x n y y
2 2
10(15,573.71) 23.1 5554
10(67.35) 23.1 10(3,775,842.76) 5554
27,439.70.882
139.89 6,911,511.6
Σx = 23.1 Σy = 5554 Σxy = 15,573.71 Σx2 = 32.44
Σy2 = 3,775,842.76
r ≈ 0.882 suggests a strong positive linear correlation. As
the gross domestic product increases, the carbon dioxide
emissions also increase..
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 17
Example: Using Technology to Find a
Correlation Coefficient
Use a technology tool to calculate
the correlation coefficient for the
Old Faithful data. What can you
conclude?
Duration
x
Time,
y
Duration
x
Time,
y
1.8 56 3.78 79
1.82 58 3.83 85
1.9 62 3.88 80
1.93 56 4.1 89
1.98 57 4.27 90
2.05 57 4.3 89
2.13 60 4.43 89
2.3 57 4.47 86
2.37 61 4.53 89
2.82 73 4.55 86
3.13 76 4.6 92
3.27 77 4.63 91
3.65 77
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 18
Solution: Using Technology to Find a
Correlation Coefficient
STAT > CalcTo calculate r, you must first enter the
DiagnosticOn command found in the Catalog menu
r ≈ 0.979 suggests a strong positive correlation.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 19
Using a Table to Test a Population
Correlation Coefficient ρ
• Once the sample correlation coefficient r has been
calculated, we need to determine whether there is
enough evidence to decide that the population
correlation coefficient ρ is significant at a specified
level of significance.
• Use Table 11 in Appendix B.
• If |r| is greater than the critical value, there is enough
evidence to decide that the correlation coefficient ρ is
significant.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 20
Using a Table to Test a Population
Correlation Coefficient ρ
• Determine whether ρ is significant for five pairs of
data (n = 5) at a level of significance of α = 0.01.
• If |r| > 0.959, the correlation is significant. Otherwise,
there is not enough evidence to conclude that the
correlation is significant.
Number of
pairs of data
in sample
level of significance
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 21
Using a Table to Test a Population
Correlation Coefficient ρ
1. Determine the number
of pairs of data in the
sample.
2. Specify the level of
significance.
3. Find the critical value.
Determine n.
Identify .
Use Table 11 in
Appendix B.
In Words In Symbols
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 22
Using a Table to Test a Population
Correlation Coefficient ρ
In Words In Symbols
4. Decide if the
correlation is
significant.
5. Interpret the decision
in the context of the
original claim.
If |r| > critical value, the
correlation is significant.
Otherwise, there is not
enough evidence to
support that the
correlation is significant.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 23
Example: Using a Table to Test a
Population Correlation Coefficient ρ
Using the Old Faithful data, you
used 25 pairs of data to find
r ≈ 0.979. Is the correlation
coefficient significant? Use
α = 0.05.
Duration
x
Time,
y
Duration
x
Time,
y
1.8 56 3.78 79
1.82 58 3.83 85
1.9 62 3.88 80
1.93 56 4.1 89
1.98 57 4.27 90
2.05 57 4.3 89
2.13 60 4.43 89
2.3 57 4.47 86
2.37 61 4.53 89
2.82 73 4.55 86
3.13 76 4.6 92
3.27 77 4.63 91
3.65 77
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 24
Solution: Using a Table to Test a
Population Correlation Coefficient ρ
• n = 25, α = 0.05
• |r| ≈ 0.979 > 0.396
• There is enough evidence
at the 5% level of
significance to conclude
that there is a significant
linear correlation between
the duration of Old
Faithful’s eruptions and the
time between eruptions.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 25
Hypothesis Testing for a Population
Correlation Coefficient ρ
• A hypothesis test can also be used to determine
whether the sample correlation coefficient r provides
enough evidence to conclude that the population
correlation coefficient ρ is significant at a specified
level of significance.
• A hypothesis test can be one-tailed or two-tailed.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 26
Hypothesis Testing for a Population
Correlation Coefficient ρ
• Left-tailed test
• Right-tailed test
• Two-tailed test
H0: ρ 0 (no significant negative correlation)
Ha: ρ < 0 (significant negative correlation)
H0: ρ 0 (no significant positive correlation)
Ha: ρ > 0 (significant positive correlation)
H0: ρ = 0 (no significant correlation)
Ha: ρ 0 (significant correlation)
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 27
The t-Test for the Correlation Coefficient
• Can be used to test whether the correlation between
two variables is significant.
• The test statistic is r
• The standardized test statistic
follows a t-distribution with d.f. = n – 2.
• In this text, only two-tailed hypothesis tests for ρ are
considered.
212
r
r rt
rn
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 28
Using the t-Test for ρ
1. State the null and alternative
hypothesis.
2. Specify the level of
significance.
3. Identify the degrees of
freedom.
4. Determine the critical
value(s) and rejection
region(s).
State H0 and Ha.
Identify .
d.f. = n – 2.
Use Table 5 in
Appendix B.
In Words In Symbols
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 29
Using the t-Test for ρ
5. Find the standardized test
statistic.
6. Make a decision to reject or
fail to reject the null
hypothesis.
7. Interpret the decision in the
context of the original claim.
In Words In Symbols
If t is in the rejection
region, reject H0.
Otherwise fail to reject
H0.
212
rt
rn
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 30
Example: t-Test for a Correlation
Coefficient
Previously you calculated
r ≈ 0.882. Test the significance
of this correlation coefficient.
Use α = 0.05.
GDP
(trillions of $),
x
CO2 emission
(millions of
metric tons), y
1.6 428.2
3.6 828.8
4.9 1214.2
1.1 444.6
0.9 264.0
2.9 415.3
2.7 571.8
2.3 454.9
1.6 358.7
1.5 573.5
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 31
Solution: t-Test for a Correlation
Coefficient
• H0:
• Ha:
•
• d.f. =
• Rejection Region:
• Test Statistic:
0.05
10 – 2 = 8
2
0.8825.294
1 (0.882)
10 2
t
ρ = 0
ρ ≠ 0
• Decision:
At the 5% level of significance,
there is enough evidence to
conclude that there is a
significant linear correlation
between gross domestic products
and carbon dioxide emissions.
Reject H0
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 32
Correlation and Causation
• The fact that two variables are strongly correlated
does not in itself imply a cause-and-effect
relationship between the variables.
• If there is a significant correlation between two
variables, you should consider the following
possibilities.
1. Is there a direct cause-and-effect relationship
between the variables?
• Does x cause y?
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 33
Correlation and Causation
2. Is there a reverse cause-and-effect relationship
between the variables?
• Does y cause x?
3. Is it possible that the relationship between the
variables can be caused by a third variable or by a
combination of several other variables?
4. Is it possible that the relationship between two
variables may be a coincidence?
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 34
Regression lines
• After verifying that the linear correlation between
two variables is significant, next we determine the
equation of the line that best models the data
(regression line).
• Can be used to predict the value of y for a given value
of x.
x
y
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 35
Residuals
Residual
• The difference between the observed y-value and the
predicted y-value for a given x-value on the line.
For a given x-value,
di = (observed y-value) – (predicted y-value)
x
y
}d1
}d2
d3{d4{
}d5
d6{
Predicted
y-value
Observed
y-value
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 36
Regression line (line of best fit)
• The line for which the sum of the squares of the
residuals is a minimum.
• The equation of a regression line for an independent
variable x and a dependent variable y is
ŷ = mx + b
Regression Line
Predicted
y-value for
a given x-
value
Slope
y-intercept
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 37
The Equation of a Regression Line
• ŷ = mx + b where
• is the mean of the y-values in the data
• is the mean of the x-values in the data
• The regression line always passes through the point
22
n xy x ym
n x x
y xb y mx m
n n
y
x
,x y
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 38
Example: Finding the Equation of a
Regression Line
Find the equation of the
regression line for the gross
domestic products and
carbon dioxide emissions
data.
GDP
(trillions of $), x
CO2 emission
(millions of
metric tons), y
1.6 428.2
3.6 828.8
4.9 1214.2
1.1 444.6
0.9 264.0
2.9 415.3
2.7 571.8
2.3 454.9
1.6 358.7
1.5 573.5
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 39
Solution: Finding the Equation of a
Regression Line
x y xy x2 y2
1.6 428.2 685.12 2.56 183,355.24
3.6 828.8 2983.68 12.96 686,909.44
4.9 1214.2 5949.58 24.01 1,474,281.64
1.1 444.6 489.06 1.21 197,669.16
0.9 264.0 237.6 0.81 69,696
2.9 415.3 1204.37 8.41 172,474.09
2.7 571.8 1543.86 7.29 326,955.24
2.3 454.9 1046.27 5.29 206,934.01
1.6 358.7 573.92 2.56 128,665.69
1.5 573.5 860.25 2.25 328,902.25
Σx = 23.1 Σy = 5554 Σxy = 15,573.71 Σx2 = 67.35
Σy2 =
3,775,842.76
Recall from section 9.1:
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 40
Solution: Finding the Equation of a
Regression Line
Σx = 23.1 Σy = 5554 Σxy = 15,573.71 Σx2 = 67.35
22
n xy x ym
n x x
b y mx
2
10(15,573.71) (23.1)(5554)
10(67.35) 23.1
27,439.7196.151977
139.89
5554 23.1(196.151977)
10 10
555.4 (196.151977)(2.31) 102.2889
ˆ 196.152 102.289y x Equation of the regression line
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 41
Solution: Finding the Equation of a
Regression Line
• To sketch the regression line, use any two x-values within
the range of the data and calculate the corresponding y-
values from the regression line.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 42
Example: Using Technology to Find a
Regression Equation
Use a technology tool to find the
equation of the regression line for
the Old Faithful data.
Duration
x
Time,
y
Duration
x
Time,
y
1.8 56 3.78 79
1.82 58 3.83 85
1.9 62 3.88 80
1.93 56 4.1 89
1.98 57 4.27 90
2.05 57 4.3 89
2.13 60 4.43 89
2.3 57 4.47 86
2.37 61 4.53 89
2.82 73 4.55 86
3.13 76 4.6 92
3.27 77 4.63 91
3.65 77
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 43
Solution: Using Technology to Find a
Regression Equation
550
100
1
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 44
Example: Predicting y-Values Using
Regression Equations
The regression equation for the gross domestic products
(in trillions of dollars) and carbon dioxide emissions (in
millions of metric tons) data is ŷ = 196.152x + 102.289.
Use this equation to predict the expected carbon dioxide
emissions for the following gross domestic products.
(Recall from section 9.1 that x and y have a significant
linear correlation.)
1. 1.2 trillion dollars
2. 2.0 trillion dollars
3. 2.5 trillion dollars
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 45
Solution: Predicting y-Values Using
Regression Equations
ŷ = 196.152x + 102.289
1. 1.2 trillion dollars
When the gross domestic product is $1.2 trillion, the
CO2 emissions are about 337.671 million metric tons.
ŷ =196.152(1.2) + 102.289 ≈ 337.671
2. 2.0 trillion dollars
When the gross domestic product is $2.0 trillion, the
CO2 emissions are 494.595 million metric tons.
ŷ =196.152(2.0) + 102.289 ≈ 494.593
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 46
Solution: Predicting y-Values Using
Regression Equations
3. 2.5 trillion dollars
When the gross domestic product is $2.5 trillion, the
CO2 emissions are 592.669 million metric tons.
ŷ =196.152(2.5) + 102.289 ≈ 592.669
Prediction values are meaningful only for x-values in
(or close to) the range of the data. The x-values in the
original data set range from 0.9 to 4.9. So, it would
not be appropriate to use the regression line to predict
carbon dioxide emissions for gross domestic products
such as $0.2 or $14.5 trillion dollars.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 47
Variation About a Regression Line
• Three types of variation about a regression line
Total variation
Explained variation
Unexplained variation
• To find the total variation, you must first calculate
The total deviation
The explained deviation
The unexplained deviation
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 48
Variation About a Regression Line
iy y
ˆiy y
ˆi iy y
(xi, ŷi)
x
y (xi, yi)
(xi, yi)
Unexplained
deviationˆi iy y
Total
deviationiy y Explained
deviationˆiy y
y
x
Total Deviation =
Explained Deviation =
Unexplained Deviation =
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 49
Total variation
• The sum of the squares of the differences between the
y-value of each ordered pair and the mean of y.
Explained variation
• The sum of the squares of the differences between
each predicted y-value and the mean of y.
Variation About a Regression Line
2
iy y Total variation =
Explained variation = 2
ˆiy y
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 50
Unexplained variation
• The sum of the squares of the differences between the
y-value of each ordered pair and each corresponding
predicted y-value.
Variation About a Regression Line
2
ˆi iy y Unexplained variation =
The sum of the explained and unexplained variation is
equal to the total variation.
Total variation = Explained variation + Unexplained variation
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 51
Coefficient of Determination
Coefficient of determination
• The ratio of the explained variation to the total
variation.
• Denoted by r2
2 Explained variationTotal variation
r
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 52
Example: Coefficient of Determination
r2 » (0.883)2
» 0.780
About 78% of the variation in the carbon emissions can be
explained by the variation in the gross domestic products.
About 22% of the variation is unexplained.
The correlation coefficient for the gross domestic
products and carbon dioxide emissions data is r ≈ 0.883.
Find the coefficient of determination. What does this tell
you about the explained variation of the data about the
regression line? About the unexplained variation?
Solution:
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 53
The Standard Error of Estimate
Standard error of estimate
• The standard deviation of the observed yi -values
about the predicted ŷ-value for a given xi -value.
• Denoted by se.
• The closer the observed y-values are to the predicted
y-values, the smaller the standard error of estimate
will be.
2( )ˆ2
i ie
y ys
n
n is the number of ordered pairs
in the data set
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 54
The Standard Error of Estimate
2
, , , ( ), ˆ ˆ
( )ˆi i i i i
i i
x y y y y
y y
1. Make a table that includes the
column heading shown.
2. Use the regression equation to
calculate the predicted y-values.
3. Calculate the sum of the squares
of the differences between each
observed y-value and the
corresponding predicted y-value.
4. Find the standard error of
estimate.
ˆ iy mx b
2 ( )ˆi iy y
2( )ˆ2
i ie
y ys
n
In Words In Symbols
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 55
Example: Standard Error of Estimate
The regression equation for the gross domestic products
and carbon dioxide emissions data as calculated in
section 9.2 is
ŷ = 196.152x + 102.289
Find the standard error of estimate.
Solution:
Use a table to calculate the sum of the squared
differences of each observed y-value and the
corresponding predicted y-value.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 56
Solution: Standard Error of Estimate
x y ŷ i (yi – ŷ i)2
1.6 428.2 416.1322 145.63179684
3.6 828.8 808.4362 414.68435044
4.9 1214.2 1063.4338 22,730.44706244
1.1 444.6 318.0562 16,013.33331844
0.9 264.0 278.8258 219.80434564
2.9 415.3 671.1298 65,448.88656804
2.7 571.8 631.8994 3611.93788036
2.3 454.9 553.4386 9709.85568996
1.6 358.7 416.1322 3298.45759684
1.5 573.5 396.517 31,322.982289
Σ = 152,916.020898
unexplained variation.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 57
Solution: Standard Error of Estimate
• n = 10, Σ(yi – ŷ i)2 = 152,916.020898
2( )ˆ2
i ie
y ys
n
The standard error of estimate of the carbon dioxide
emissions for a specific gross domestic product is
about 138.255 million metric tons.
152,916.020898138.255
10 2
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 58
Prediction Intervals
• Two variables have a bivariate normal distribution
if for any fixed value of x, the corresponding values
of y are normally distributed and for any fixed values
of y, the corresponding x-values are normally
distributed.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 59
Prediction Intervals
• A prediction interval can be constructed for the true
value of y.
• Given a linear regression equation ŷ = mx + b and x0,
a specific value of x, a c-prediction interval for y is
ŷ – E < y < ŷ + E where
• The point estimate is ŷ and the margin of error is E.
The probability that the prediction interval contains y
is c.
202 2
( )11
( )c e
n x xE t s
n n x x
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 60
Constructing a Prediction Interval for y
for a Specific Value of x
1. Identify the number of ordered
pairs in the data set n and the
degrees of freedom.
2. Use the regression equation and
the given x-value to find the
point estimate ŷ.
3. Find the critical value tc that
corresponds to the given level of
confidence c.
ˆi iy mx b
Use Table 5 in
Appendix B.
In Words In Symbols
d.f. = n – 2
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 61
Constructing a Prediction Interval for y
for a Specific Value of x
4. Find the standard error of
estimate se.
5. Find the margin of error E.
6. Find the left and right
endpoints and form the
prediction interval.
In Words In Symbols
2( )ˆ2
i ie
y ys
n
202 2
( )11
( )c e
n x xE t s
n n x x
Left endpoint: ŷ – E
Right endpoint: ŷ + E
Interval: ŷ – E < y < ŷ + E
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 62
Example: Constructing a Prediction
Interval
Construct a 95% prediction interval for the carbon
dioxide emission when the gross domestic product is
$3.5 trillion. What can you conclude?
Recall, n = 10, ŷ = 196.152x + 102.289, se = 138.255
Solution:
Point estimate:
ŷ = 196.152(3.5) + 102.289 ≈ 788.821
Critical value:
d.f. = n –2 = 10 – 2 = 8 tc = 2.306
215.8, 32.44, 1.975x x x
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 63
Solution: Constructing a Prediction
Interval
0
2
2
2
2 2
1 10(3.5 2.31)(2.447)(10.290)
(
1 349.42410 10(67.35) (
)11
2 .1
( )
3 )
c e
n x xE t s
n n x x
Left Endpoint: ŷ – E Right Endpoint: ŷ + E
439.397 < y < 1138.245
788.821 – 349.424
≈ 439.397
788.821 + 349.424
≈ 1138.245
You can be 95% confident that when the gross domestic
product is $3.5 trillion, the carbon dioxide emissions will be
between 439.397 and 1138.245 million metric tons..
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 64
Multiple Regression Equation
• In many instances, a better prediction can be found
for a dependent (response) variable by using more
than one independent (explanatory) variable.
• For example, a more accurate prediction for the
company sales discussed in previous sections might
be made by considering the number of employees on
the sales staff as well as the advertising expenses.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 65
Multiple Regression Equation
Multiple regression equation
• ŷ = b + m1x1 + m2x2 + m3x3 + … + mkxk
• x1, x2, x3,…, xk are independent variables
• b is the y-intercept
• y is the dependent variable
* Because the mathematics associated with this concept is
complicated, technology is generally used to calculate the
multiple regression equation.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 66
Example: Finding a Multiple Regression
Equation
A researcher wants to determine how employee salaries
at a certain company are related to the length of
employment, previous experience, and education. The
researcher selects eight employees from the company
and obtains the data shown on the next slide. Use
Minitab to find a multiple regression equation that
models the data.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 67
Example: Finding a Multiple Regression
Equation
Employee Salary, y
Employment
(yrs), x1
Experience
(yrs), x2
Education
(yrs), x3
A 57,310 10 2 16
B 57,380 5 6 16
C 54,135 3 1 12
D 56,985 6 5 14
E 58,715 8 8 16
F 60,620 20 0 12
G 59,200 8 4 18
H 60,320 14 6 17
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 68
Solution: Finding a Multiple Regression
Equation
• Enter the y-values in C1 and the x1-, x2-, and x3-
values in C2, C3 and C4 respectively.
• Select “Regression > Regression…” from the Stat
menu.
• Use the salaries as the response variable and the
remaining data as the predictors.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 69
Solution: Finding a Multiple Regression
Equation
The regression equation is
ŷ = 49,764 + 364x1 + 228x2 + 267x3
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 70
Predicting y-Values
• After finding the equation of the multiple regression
line, you can use the equation to predict y-values over
the range of the data.
• To predict y-values, substitute the given value for
each independent variable into the equation, then
calculate ŷ.
.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 71
Example: Predicting y-Values
Use the regression equation
ŷ = 49,764 + 364x1 + 228x2 + 267x3
to predict an employee’s salary given 12 years of
current employment, 5 years of experience, and 16
years of education.
Solution:
ŷ = 49,764 + 364(12) + 228(5) + 267(16)
= 59,544
The employee’s predicted salary is $59,544.
.