Ch 15 - SLR - Part 2 - Rev_11_22_13

44
Simple Linear Regression Simple Linear Regression Part 2 Part 2 ( ( Selected material from Chapter 15) Selected material from Chapter 15) to accompany to accompany Managerial Statistics Managerial Statistics 7th edition, by Ronald M. Weiers 7th edition, by Ronald M. Weiers Prepared by Professor John Knox Prepared by Professor John Knox For TOM 302 For TOM 302 Cal Poly, Pomona Cal Poly, Pomona

description

A powerpoint presentation about Simple Linear Regression.

Transcript of Ch 15 - SLR - Part 2 - Rev_11_22_13

Page 1: Ch 15 - SLR - Part 2 - Rev_11_22_13

Simple Linear RegressionSimple Linear RegressionPart 2 Part 2

((Selected material from Chapter Selected material from Chapter 15)15)

to accompanyto accompany

Managerial StatisticsManagerial Statistics7th edition, by Ronald M. Weiers7th edition, by Ronald M. Weiers

Prepared by Professor John KnoxPrepared by Professor John KnoxFor TOM 302For TOM 302

Cal Poly, PomonaCal Poly, Pomona

Page 2: Ch 15 - SLR - Part 2 - Rev_11_22_13

Simple Linear Regression – Part Simple Linear Regression – Part 22

Simple Linear Regression – Part Simple Linear Regression – Part 22

Chapter15151515

•Standard Error of Estimate•Coefficient of DeterminationCoefficient of Determination•CorrelationCorrelation•Test of Significance for SlopeTest of Significance for Slope•Confidence & Prediction IntervalsConfidence & Prediction Intervals•Statistix 9Statistix 9

McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.

Page 3: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-3

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Example Problem (Restaurant Sales):Example Problem (Restaurant Sales):

Prior to opening a new restaurant, the management of a chain of restaurants requires an estimate of the quarterly sales revenue. The management believes that the size of the student population at the nearby college campus is related to the quarterly sales revenue. To evaluate the relationship between student population (x) and quarterly sales (y), data are collected from a sample of ten restaurants located near college campuses.

Page 4: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-4

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Page 5: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-5

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Page 6: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-6

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Calculation of sample regression equation:Calculation of sample regression equation:

ix iy i ix y 2ix

2iy

1 2 58 116 4 3,364

2 6 105 630 36 11,025

3 8 88 704 64 7,744

4 8 118 944 64 13,924

5 12 117 1,404 144 13,689

6 16 137 2,192 256 18,769

7 20 157 3,140 400 24,649

8 20 169 3,380 400 28,561

9 22 149 3,278 484 22,201

10 26 202 5,252 676 40,804

Totals: 140 1,300 21,040 2,528 184,730

ix iy i ix y 2ix 2

iy

i

Page 7: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-7

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

1 2 2

2

140 1,30021,040 2,84010 5.0000

5681402,528

10

i ii i

ii

x yx y

nbx

xn

0 1 1

1,300 1405 130 5 14 60.0000

10 10i iy x

b Y b X bn n

ˆ 60 5i iy x Sample Regression Equation:Sample Regression Equation:

ˆwhere quarterly sales in thousands of dollars

student population in thousands of studentsi

i

y

x

Page 8: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-12A-88

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Statistix 10 Linear Statistix 10 Linear Regression Output:Regression Output:

Page 9: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-9

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

2

1|

ˆ( )SSE

MSE2 2

n

i ii

y x

y y

sn n

Standard error of estimate (estimated standard Standard error of estimate (estimated standard deviation of population data around regression line) deviation of population data around regression line)

th

th

where actual value of Y ( value of Y in the sample)

ˆ predicted value of Y (calculated value of Y using sample

regression equation with value of X in the sa

i

i

y i

y

i

mple)

Page 10: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-10

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

20 1

1 1 1| 2

n n n

i i i ii i i

y x

y b y b x y

sn

Standard error of estimateStandard error of estimate

Alternate formula (computational formula)Alternate formula (computational formula)

Example Problem (Restaurant Sales):Example Problem (Restaurant Sales):

Standard error of estimate value is 13.83 units of Y ($13,830).

|

184,730 60 1300 5.0 21,040 1530191.25 13.8293

10 2 8y xs

Page 11: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-11

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Standard error of estimate can be compared with Standard error of estimate can be compared with sample standard deviation of Y-values (ssample standard deviation of Y-values (syy).).

Standard error of estimate value is 13.83 ($13,830), which is much

smaller than sample standard deviation of 41.81 ($41,810).

2

1

( )

1 1

15,730 15,7301,747.78 41.8064

10 1 9

n

ii

y

y

y YSST

sn n

s

Page 12: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-12A-1212

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Statistix 10 Linear Statistix 10 Linear Regression Output:Regression Output:

Page 13: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-13

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Total variation = SST = Total variation = SST = 2

iy Y

• Coefficient of Determination: The proportion Coefficient of Determination: The proportion of the variation in the dependent variable that of the variation in the dependent variable that is explained by the independent variable.is explained by the independent variable.

Total variation = Unexplained variation + Explained variationTotal variation = Unexplained variation + Explained variation(SST) (SSE) (SSR)

Unexplained variation = SSE = Unexplained variation = SSE =

Explained variation = SSR = Explained variation = SSR =

2ˆi iy y

2ˆiy Y

Page 14: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-14

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Page 15: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-15

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Coefficient of Determination:Coefficient of Determination:

2 explained variation SSR

total variation SSTr

2 unexplained variation SSE1 1

total variation SSTr

220 1

1 1 12 12

2

1 12

1

ˆ( )

1 1

( )

n n nn

i i i ii ii i ii

n n

i ini ii

i

y b y b x yy y

r

y Y y

yn

Page 16: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-16

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Coefficient of Determination – Example Problem:Coefficient of Determination – Example Problem:

20 1

1 1 122 2

12

1

184,730 60 1,300 5.0 21,0401 1

1,300184,730

10

n n n

i i i ii i i

n

ini

ii

y b y b x y

r

y

yn

2 1,5301 1 0.0973 0.9027

15,730r

Approximately 90% of the variation in quarterly sales can be Approximately 90% of the variation in quarterly sales can be explained by the influence of the student population.explained by the influence of the student population.

Page 17: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-12A-1717

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Statistix 10 Linear Statistix 10 Linear Regression Output:Regression Output:

Page 18: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-18

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Correlation Analysis – used to measure the Correlation Analysis – used to measure the strength of association between X and Y.strength of association between X and Y.

(Note: Correlation analysis does not establish a (Note: Correlation analysis does not establish a cause and effect relationship between X and Y.)cause and effect relationship between X and Y.)

Coefficient of correlation (r) is a measure of the Coefficient of correlation (r) is a measure of the strength of the linear relationship between X and Y.strength of the linear relationship between X and Y.

Page 19: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-19

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

1b

r

1b

r

1 0 and 0b r

2 2 where = coefficient of determinationr r r

1 1If 0, then 0; if 0, then 0b r b r

Page 20: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-20

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

•Coefficient of Correlation – Example Problem:Coefficient of Correlation – Example Problem:

210.9027 and 5.00 0.9027 0.9501r b r

Coefficient of correlation (r) ranges from Coefficient of correlation (r) ranges from −1 to +1.−1 to +1.−−1 indicates perfect negative correlation.1 indicates perfect negative correlation.+1 indicates perfect positive correlation.+1 indicates perfect positive correlation.0 indicates no correlation.0 indicates no correlation.The closer r is to −1 or +1, the stronger is the The closer r is to −1 or +1, the stronger is the association between X and Y.association between X and Y.

Page 21: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-21

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ1) – Example Problem:1) – Example Problem:

Test the hypothesis that there is no linear relationship between student population (X) and quarterly sales (Y) using a 0.05 level of significance.

Page 22: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-22

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ11) ) – Example Problem:– Example Problem:

If there is no linear relationship between the student population If there is no linear relationship between the student population (x) and the quarterly sales (y), then (x) and the quarterly sales (y), then ββ11 = 0. = 0.

Hypotheses: 0 1

1 1

H : 0

H : 0

Location of rejection regions: two-tail testLocation of rejection regions: two-tail test

Level of significance (Level of significance (αα) = 0.05) = 0.05

Page 23: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-23

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ11) ) – Example Problem:– Example Problem:

12.306cvt

22.306cvt

2 10 2 8df n Decision rule: If the Decision rule: If the calculated t from the calculated t from the sample is less than -2.306 sample is less than -2.306 or greater than 2.306, then or greater than 2.306, then reject Hreject H00; otherwise do not ; otherwise do not

reject Hreject H00..

Alternate decision rule using p-value: If the two-tail p-value is less than 0.05, then reject H0; otherwise do not reject H0.

Page 24: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-24

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ11) ) – Example Problem:– Example Problem:

1

1 1

b

bt

s

1

1

5.0000

0

b

1

|

2 2

12

1

13.82932 13.829320.58027

5681402,528

10

y xb

n

ini

ii

ss

x

xn

5.0000 08.6167

0.58027t

08.6167 2.306 Reject H

Page 25: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-25

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Hypothesis Test of Population Slope (Hypothesis Test of Population Slope (ββ11) ) – Example Problem:– Example Problem:

At the 0.05 level of significance, there is sufficient sample At the 0.05 level of significance, there is sufficient sample evidence to conclude that there is a linear relationship between evidence to conclude that there is a linear relationship between the student population (x) and the quarterly sales (y).the student population (x) and the quarterly sales (y).

Using computer output, the two-tail p-value is 0.0000, which is Using computer output, the two-tail p-value is 0.0000, which is less than 0.05; so reject Hless than 0.05; so reject H00 (same decision as above). (same decision as above).

Statistix 9 linear regression output:Statistix 9 linear regression output:

11 5.00000 s 0.58027 8.62 p-value 0.0000bb t

Page 26: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-12A-2626

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Statistix 10 Linear Statistix 10 Linear Regression Output:Regression Output:

Page 27: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-27

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Confidence Interval for Slope (Confidence Interval for Slope (ββ11) ) – Example Problem:– Example Problem:

Calculate the 95% confidence interval estimate for the population slope where student population (X) is the independent variable and quarterly sales (Y) is the dependent variable.

Page 28: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-28

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Confidence Interval for Slope (Confidence Interval for Slope (ββ11) ) – Example Problem:– Example Problem:

11 bb ts

0.025 0.025

0.95 2

10 2 8

df n

2.306t

We are 95% confident that the slope of the population regression line is within the interval 3.6618 to 6.3382. An increase in the student population of one thousand students will produce an expected increase in quarterly sales of between $3,662 to $6,338.

1 5.0b

1 0 10.5803 (see H test of for calculation)bs

11 5.0 2.306 0.5803 5.0 1.3382 3.6618 to 6.3382bb ts

Page 29: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-29

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:

Calculate the 90% confidence interval estimate for mean quarterly sales of all restaurants located near college campuses with 8,000 students.

Page 30: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-30

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:

2

| | 2

2

( )1ˆCI of = i

y x i y x

ii

x xy ts

n xx

n

8,000 students

Adjustment for scaling factor ( ): 8 units of 1,000 students per unit of

x xx

0 1ˆ 60 5.0(8) 100i iy b b x

|previously determined values: 13.82932 14.0y xs X

Page 31: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-31

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:

0.90

0.050.05

10 2 8

1.860

df

t

2

| 2

1 (8 14)90% CI of = 100 (1.860)(13.82932)

10 (140)2,528

10

y x

2

2

1 (8 14)Standard error (SE) = (13.82932) (13.82932) 0.16338 5.58985

10 (140)2,528

10

Page 32: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-32

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Page 33: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-33

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

90% limits for mean90% limits for mean

90% limits for individual 90% limits for individual predicted valuespredicted values

Page 34: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-34

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:

Margin of error (e) = t(SE) (1.860)(5.58985) 10.3971

|90% CI of = 100 10.3971 89.6029 to 110.3971y x

Adjustment for scaling factor ( ):

(89.603 units of )($1,000 per unit of ) $89,603

(110.397 units of )($1,000 per unit of ) $110,397

y

y y

y y

Page 35: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-35

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

We are 90 percent confident that the average quarterly sales of all restaurants that are located near college campuses with 8,000 students is within the interval of $89,603 to $110,397.

•Confidence Interval for Confidence Interval for y|xy|x – Example Problem:– Example Problem:

Interpretation of confidence interval:

Page 36: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-36

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Example Problem - Statistix 10 Confidence IntervalExample Problem - Statistix 10 Confidence Interval

Page 37: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-37

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

• Prediction Interval for Individual YPrediction Interval for Individual Yxx – Example Problem:– Example Problem:

Calculate the 90% prediction interval estimate for the quarterly sales of a particular restaurant located near a college campus with 8,000 students.

Page 38: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-38

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

•Prediction Interval for Individual YPrediction Interval for Individual Yxx – Example Problem: – Example Problem:

2

| 2

2

( )1ˆPI of = 1 i

x i y x

ii

x xy y ts

n xx

n

0 1ˆ 60 5.0(8) 100i iy b b x

|previously determined values: 13.82932 14.0y xs X 1.860t

Page 39: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-39

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

•Prediction Interval for Individual YPrediction Interval for Individual Yxx – Example Problem: – Example Problem:

2

2

1 (8 14)Standard error (SE) = (13.82932) 1 (13.82932) 1.16338 14.91632

10 (140)2,528

10

2

2

1 (8 14)90% PI of = 100 (1.860)(13.82932) 1

10 (140)2,528

10

xy

Margin of error (e) = t(SE) (1.860)(14.91632) 27.74436

90% PI of = 100 27.74436 72.256 to 127.744xy

Page 40: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-40

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

•Prediction Interval for Individual YPrediction Interval for Individual Yxx – Example Problem: – Example Problem:

Interpretation of prediction interval:We are 90 percent confident that the quarterly sales of a restaurant that is located near a college campus with 8,000 students is within the interval of $72,256 to $127,744.

Adjustment for scaling factor ( ):

(72.256 units of )($1,000 per unit of ) $72,256

(127.744 units of )($1,000 per unit of ) $127,744

y

y y

y y

Page 41: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-41

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Example Problem - Statistix 10 Prediction IntervalExample Problem - Statistix 10 Prediction Interval

Page 42: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-42

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear Regression

Note: The 90% prediction interval for yi is wider than the 90% confidence interval for y|x (where yi is the value of y for an individual element of the population and y|x is the average value of y for a subset of the population having the same value of x).

90% limits for mean90% limits for mean

90% limits for individual 90% limits for individual predicted valuespredicted values

Page 43: Ch 15 - SLR - Part 2 - Rev_11_22_13

12A-43

Simple Linear RegressionSimple Linear RegressionSimple Linear RegressionSimple Linear RegressionIn this example problem, simple linear regression produced a

confidence interval that is narrower and centered on a different value than we would have gotten with a simple confidence interval based solely on a sample of y-values.

Simple 90% confidence interval of y = 130 ± 24.23 = 105.77 to 154.23

SLR 90% confidence interval of y|x = 100 ± 10.39 = 89.61 to 110.39

Midpoint of simple 90%confidence interval = 130

Midpoint of SLR 90% confidence interval = 100

Margin of error for simple 90% confidence interval = 24.23

Margin of error for SLR 90% confidence interval = 10.39

Page 44: Ch 15 - SLR - Part 2 - Rev_11_22_13

Managerial Managerial StatisticsStatistics

End of Simple Linear End of Simple Linear RegressionRegression

Part 2Part 2