Lecture 4: Inference in SLR (continued) Diagnostic approaches in SLR
Ch 15 - SLR - Part 2 - Rev_11_22_13
-
Upload
jerry-wong -
Category
Documents
-
view
222 -
download
1
description
Transcript of Ch 15 - SLR - Part 2 - Rev_11_22_13
Simple Linear RegressionSimple Linear RegressionPart 2 Part 2
((Selected material from Chapter Selected material from Chapter 15)15)
to accompanyto accompany
Managerial StatisticsManagerial Statistics7th edition, by Ronald M. Weiers7th edition, by Ronald M. Weiers
Prepared by Professor John KnoxPrepared by Professor John KnoxFor TOM 302For TOM 302
Cal Poly, PomonaCal Poly, Pomona
Simple Linear Regression – Part Simple Linear Regression – Part 22
Simple Linear Regression – Part Simple Linear Regression – Part 22
Chapter15151515
•Standard Error of Estimate•Coefficient of DeterminationCoefficient of Determination•CorrelationCorrelation•Test of Significance for SlopeTest of Significance for Slope•Confidence & Prediction IntervalsConfidence & Prediction Intervals•Statistix 9Statistix 9
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
12A-3
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Example Problem (Restaurant Sales):Example Problem (Restaurant Sales):
Prior to opening a new restaurant, the management of a chain of restaurants requires an estimate of the quarterly sales revenue. The management believes that the size of the student population at the nearby college campus is related to the quarterly sales revenue. To evaluate the relationship between student population (x) and quarterly sales (y), data are collected from a sample of ten restaurants located near college campuses.
12A-4
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
12A-5
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
12A-6
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Calculation of sample regression equation:Calculation of sample regression equation:
ix iy i ix y 2ix
2iy
1 2 58 116 4 3,364
2 6 105 630 36 11,025
3 8 88 704 64 7,744
4 8 118 944 64 13,924
5 12 117 1,404 144 13,689
6 16 137 2,192 256 18,769
7 20 157 3,140 400 24,649
8 20 169 3,380 400 28,561
9 22 149 3,278 484 22,201
10 26 202 5,252 676 40,804
Totals: 140 1,300 21,040 2,528 184,730
ix iy i ix y 2ix 2
iy
i
12A-7
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
1 2 2
2
140 1,30021,040 2,84010 5.0000
5681402,528
10
i ii i
ii
x yx y
nbx
xn
0 1 1
1,300 1405 130 5 14 60.0000
10 10i iy x
b Y b X bn n
ˆ 60 5i iy x Sample Regression Equation:Sample Regression Equation:
ˆwhere quarterly sales in thousands of dollars
student population in thousands of studentsi
i
y
x
12A-12A-88
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Statistix 10 Linear Statistix 10 Linear Regression Output:Regression Output:
12A-9
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
2
1|
ˆ( )SSE
MSE2 2
n
i ii
y x
y y
sn n
Standard error of estimate (estimated standard Standard error of estimate (estimated standard deviation of population data around regression line) deviation of population data around regression line)
th
th
where actual value of Y ( value of Y in the sample)
ˆ predicted value of Y (calculated value of Y using sample
regression equation with value of X in the sa
i
i
y i
y
i
mple)
12A-10
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
20 1
1 1 1| 2
n n n
i i i ii i i
y x
y b y b x y
sn
Standard error of estimateStandard error of estimate
Alternate formula (computational formula)Alternate formula (computational formula)
Example Problem (Restaurant Sales):Example Problem (Restaurant Sales):
Standard error of estimate value is 13.83 units of Y ($13,830).
|
184,730 60 1300 5.0 21,040 1530191.25 13.8293
10 2 8y xs
12A-11
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Standard error of estimate can be compared with Standard error of estimate can be compared with sample standard deviation of Y-values (ssample standard deviation of Y-values (syy).).
Standard error of estimate value is 13.83 ($13,830), which is much
smaller than sample standard deviation of 41.81 ($41,810).
2
1
( )
1 1
15,730 15,7301,747.78 41.8064
10 1 9
n
ii
y
y
y YSST
sn n
s
12A-12A-1212
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Statistix 10 Linear Statistix 10 Linear Regression Output:Regression Output:
12A-13
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Total variation = SST = Total variation = SST = 2
iy Y
• Coefficient of Determination: The proportion Coefficient of Determination: The proportion of the variation in the dependent variable that of the variation in the dependent variable that is explained by the independent variable.is explained by the independent variable.
Total variation = Unexplained variation + Explained variationTotal variation = Unexplained variation + Explained variation(SST) (SSE) (SSR)
Unexplained variation = SSE = Unexplained variation = SSE =
Explained variation = SSR = Explained variation = SSR =
2ˆi iy y
2ˆiy Y
12A-14
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
12A-15
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Coefficient of Determination:Coefficient of Determination:
2 explained variation SSR
total variation SSTr
2 unexplained variation SSE1 1
total variation SSTr
220 1
1 1 12 12
2
1 12
1
ˆ( )
1 1
( )
n n nn
i i i ii ii i ii
n n
i ini ii
i
y b y b x yy y
r
y Y y
yn
12A-16
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Coefficient of Determination – Example Problem:Coefficient of Determination – Example Problem:
20 1
1 1 122 2
12
1
184,730 60 1,300 5.0 21,0401 1
1,300184,730
10
n n n
i i i ii i i
n
ini
ii
y b y b x y
r
y
yn
2 1,5301 1 0.0973 0.9027
15,730r
Approximately 90% of the variation in quarterly sales can be Approximately 90% of the variation in quarterly sales can be explained by the influence of the student population.explained by the influence of the student population.
12A-12A-1717
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Statistix 10 Linear Statistix 10 Linear Regression Output:Regression Output:
12A-18
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Correlation Analysis – used to measure the Correlation Analysis – used to measure the strength of association between X and Y.strength of association between X and Y.
(Note: Correlation analysis does not establish a (Note: Correlation analysis does not establish a cause and effect relationship between X and Y.)cause and effect relationship between X and Y.)
Coefficient of correlation (r) is a measure of the Coefficient of correlation (r) is a measure of the strength of the linear relationship between X and Y.strength of the linear relationship between X and Y.
12A-19
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
1b
r
1b
r
1 0 and 0b r
2 2 where = coefficient of determinationr r r
1 1If 0, then 0; if 0, then 0b r b r
12A-20
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
•Coefficient of Correlation – Example Problem:Coefficient of Correlation – Example Problem:
210.9027 and 5.00 0.9027 0.9501r b r
Coefficient of correlation (r) ranges from Coefficient of correlation (r) ranges from −1 to +1.−1 to +1.−−1 indicates perfect negative correlation.1 indicates perfect negative correlation.+1 indicates perfect positive correlation.+1 indicates perfect positive correlation.0 indicates no correlation.0 indicates no correlation.The closer r is to −1 or +1, the stronger is the The closer r is to −1 or +1, the stronger is the association between X and Y.association between X and Y.
12A-21
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ1) – Example Problem:1) – Example Problem:
Test the hypothesis that there is no linear relationship between student population (X) and quarterly sales (Y) using a 0.05 level of significance.
12A-22
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ11) ) – Example Problem:– Example Problem:
If there is no linear relationship between the student population If there is no linear relationship between the student population (x) and the quarterly sales (y), then (x) and the quarterly sales (y), then ββ11 = 0. = 0.
Hypotheses: 0 1
1 1
H : 0
H : 0
Location of rejection regions: two-tail testLocation of rejection regions: two-tail test
Level of significance (Level of significance (αα) = 0.05) = 0.05
12A-23
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ11) ) – Example Problem:– Example Problem:
12.306cvt
22.306cvt
2 10 2 8df n Decision rule: If the Decision rule: If the calculated t from the calculated t from the sample is less than -2.306 sample is less than -2.306 or greater than 2.306, then or greater than 2.306, then reject Hreject H00; otherwise do not ; otherwise do not
reject Hreject H00..
Alternate decision rule using p-value: If the two-tail p-value is less than 0.05, then reject H0; otherwise do not reject H0.
12A-24
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ11) ) – Example Problem:– Example Problem:
1
1 1
b
bt
s
1
1
5.0000
0
b
1
|
2 2
12
1
13.82932 13.829320.58027
5681402,528
10
y xb
n
ini
ii
ss
x
xn
5.0000 08.6167
0.58027t
08.6167 2.306 Reject H
12A-25
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ11) ) – Example Problem:– Example Problem:
At the 0.05 level of significance, there is sufficient sample At the 0.05 level of significance, there is sufficient sample evidence to conclude that there is a linear relationship between evidence to conclude that there is a linear relationship between the student population (x) and the quarterly sales (y).the student population (x) and the quarterly sales (y).
Using computer output, the two-tail p-value is 0.0000, which is Using computer output, the two-tail p-value is 0.0000, which is less than 0.05; so reject Hless than 0.05; so reject H00 (same decision as above). (same decision as above).
Statistix 9 linear regression output:Statistix 9 linear regression output:
11 5.00000 s 0.58027 8.62 p-value 0.0000bb t
12A-12A-2626
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Statistix 10 Linear Statistix 10 Linear Regression Output:Regression Output:
12A-27
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Confidence Interval for Slope (Confidence Interval for Slope (ββ11) ) – Example Problem:– Example Problem:
Calculate the 95% confidence interval estimate for the population slope where student population (X) is the independent variable and quarterly sales (Y) is the dependent variable.
12A-28
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Confidence Interval for Slope (Confidence Interval for Slope (ββ11) ) – Example Problem:– Example Problem:
11 bb ts
0.025 0.025
0.95 2
10 2 8
df n
2.306t
We are 95% confident that the slope of the population regression line is within the interval 3.6618 to 6.3382. An increase in the student population of one thousand students will produce an expected increase in quarterly sales of between $3,662 to $6,338.
1 5.0b
1 0 10.5803 (see H test of for calculation)bs
11 5.0 2.306 0.5803 5.0 1.3382 3.6618 to 6.3382bb ts
12A-29
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:
Calculate the 90% confidence interval estimate for mean quarterly sales of all restaurants located near college campuses with 8,000 students.
12A-30
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:
2
| | 2
2
( )1ˆCI of = i
y x i y x
ii
x xy ts
n xx
n
8,000 students
Adjustment for scaling factor ( ): 8 units of 1,000 students per unit of
x xx
0 1ˆ 60 5.0(8) 100i iy b b x
|previously determined values: 13.82932 14.0y xs X
12A-31
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:
0.90
0.050.05
10 2 8
1.860
df
t
2
| 2
1 (8 14)90% CI of = 100 (1.860)(13.82932)
10 (140)2,528
10
y x
2
2
1 (8 14)Standard error (SE) = (13.82932) (13.82932) 0.16338 5.58985
10 (140)2,528
10
12A-32
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
12A-33
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
90% limits for mean90% limits for mean
90% limits for individual 90% limits for individual predicted valuespredicted values
12A-34
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:
Margin of error (e) = t(SE) (1.860)(5.58985) 10.3971
|90% CI of = 100 10.3971 89.6029 to 110.3971y x
Adjustment for scaling factor ( ):
(89.603 units of )($1,000 per unit of ) $89,603
(110.397 units of )($1,000 per unit of ) $110,397
y
y y
y y
12A-35
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
We are 90 percent confident that the average quarterly sales of all restaurants that are located near college campuses with 8,000 students is within the interval of $89,603 to $110,397.
•Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:
Interpretation of confidence interval:
12A-36
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Example Problem - Statistix 10 Confidence IntervalExample Problem - Statistix 10 Confidence Interval
12A-37
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
• Prediction Interval for Individual YPrediction Interval for Individual Yxx – Example Problem:– Example Problem:
Calculate the 90% prediction interval estimate for the quarterly sales of a particular restaurant located near a college campus with 8,000 students.
12A-38
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
•Prediction Interval for Individual YPrediction Interval for Individual Yxx – Example Problem: – Example Problem:
2
| 2
2
( )1ˆPI of = 1 i
x i y x
ii
x xy y ts
n xx
n
0 1ˆ 60 5.0(8) 100i iy b b x
|previously determined values: 13.82932 14.0y xs X 1.860t
12A-39
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
•Prediction Interval for Individual YPrediction Interval for Individual Yxx – Example Problem: – Example Problem:
2
2
1 (8 14)Standard error (SE) = (13.82932) 1 (13.82932) 1.16338 14.91632
10 (140)2,528
10
2
2
1 (8 14)90% PI of = 100 (1.860)(13.82932) 1
10 (140)2,528
10
xy
Margin of error (e) = t(SE) (1.860)(14.91632) 27.74436
90% PI of = 100 27.74436 72.256 to 127.744xy
12A-40
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
•Prediction Interval for Individual YPrediction Interval for Individual Yxx – Example Problem: – Example Problem:
Interpretation of prediction interval:We are 90 percent confident that the quarterly sales of a restaurant that is located near a college campus with 8,000 students is within the interval of $72,256 to $127,744.
Adjustment for scaling factor ( ):
(72.256 units of )($1,000 per unit of ) $72,256
(127.744 units of )($1,000 per unit of ) $127,744
y
y y
y y
12A-41
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Example Problem - Statistix 10 Prediction IntervalExample Problem - Statistix 10 Prediction Interval
12A-42
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression
Note: The 90% prediction interval for yi is wider than the 90% confidence interval for y|x (where yi is the value of y for an individual element of the population and y|x is the average value of y for a subset of the population having the same value of x).
90% limits for mean90% limits for mean
90% limits for individual 90% limits for individual predicted valuespredicted values
12A-43
Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear RegressionIn this example problem, simple linear regression produced a
confidence interval that is narrower and centered on a different value than we would have gotten with a simple confidence interval based solely on a sample of y-values.
Simple 90% confidence interval of y = 130 ± 24.23 = 105.77 to 154.23
SLR 90% confidence interval of y|x = 100 ± 10.39 = 89.61 to 110.39
Midpoint of simple 90%confidence interval = 130
Midpoint of SLR 90% confidence interval = 100
Margin of error for simple 90% confidence interval = 24.23
Margin of error for SLR 90% confidence interval = 10.39
Managerial Managerial StatisticsStatistics
End of Simple Linear End of Simple Linear RegressionRegression
Part 2Part 2