Chapter 13 Student Lecture Notes 13-1 · Statistics for Managers Using Microsoft Excel, 2/e © 1999...

23
Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc. Chapter 13 Student Lecture Notes 13-1 © 2004 Prentice-Hall, Inc. Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression © 2004 Prentice-Hall, Inc. Chap 13-2 Chapter Topics Types of Regression Models Determining the Simple Linear Regression Equation Measures of Variation Assumptions of Regression and Correlation Residual Analysis Measuring Autocorrelation Inferences about the Slope © 2004 Prentice-Hall, Inc. Chap 13-3 Chapter Topics Correlation - Measuring the Strength of the Association Estimation of Mean Values and Prediction of Individual Values Pitfalls in Regression and Ethical Issues (continued)

Transcript of Chapter 13 Student Lecture Notes 13-1 · Statistics for Managers Using Microsoft Excel, 2/e © 1999...

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-1

© 2004 Prentice-Hall, Inc. Chap 13-1

Basic Business Statistics(9th Edition)

Chapter 13Simple Linear Regression

© 2004 Prentice-Hall, Inc. Chap 13-2

Chapter Topics

Types of Regression ModelsDetermining the Simple Linear Regression EquationMeasures of VariationAssumptions of Regression and CorrelationResidual Analysis Measuring AutocorrelationInferences about the Slope

© 2004 Prentice-Hall, Inc. Chap 13-3

Chapter Topics

Correlation - Measuring the Strength of the AssociationEstimation of Mean Values and Prediction of Individual ValuesPitfalls in Regression and Ethical Issues

(continued)

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-2

© 2004 Prentice-Hall, Inc. Chap 13-4

Purpose of Regression Analysis

Regression Analysis is Used Primarily to Model Causality and Provide Prediction

Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variableExplain the effect of the independent variables on the dependent variable

© 2004 Prentice-Hall, Inc. Chap 13-5

Types of Regression ModelsPositive Linear Relationship

Negative Linear Relationship

Relationship NOT Linear

No Relationship

© 2004 Prentice-Hall, Inc. Chap 13-6

Simple Linear Regression Model

Relationship between Variables is Described by a Linear FunctionThe Change of One Variable Causes the Other Variable to ChangeA Dependency of One Variable on the Other

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-3

© 2004 Prentice-Hall, Inc. Chap 13-7

PopulationRegressionLine (Conditional Mean)

Simple Linear Regression Model

Population regression line is a straight line that describes the dependence of the average value (conditional mean) of one variable on the other

Population Y Intercept

Population SlopeCoefficient

Random Error

Dependent (Response) Variable

Independent (Explanatory) Variable

ii iY Xβ β ε0 1+ +=

|Y Xµ

(continued)

© 2004 Prentice-Hall, Inc. Chap 13-8

Simple Linear Regression Model(continued)

ii iY Xβ β ε0 1+ +=

= Random Error

Y

X

(Observed Value of Y) =

Observed Value of Y

|Y X iXµ β β0 1= +

β0

β1

(Conditional Mean)

© 2004 Prentice-Hall, Inc. Chap 13-9

Sample regression line provides an estimate of the population regression line as well as a predicted value of Y

Linear Regression Equation

Sample Y Intercept

SampleSlopeCoefficient

Residual0 1i iib bY X e+ +=

0 1Y b b X= + = Simple Regression Equation (Fitted Regression Line, Predicted Value)

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-4

© 2004 Prentice-Hall, Inc. Chap 13-10

Linear Regression Equation

and are obtained by finding the values of and that minimize the sum of the squared residuals

provides an estimate of provides an estimate of

0b 1b0b 1b

0b β0

1b β1

(continued)

( )2 2

1 1

ˆn n

i i ii i

Y Y e= =

− =∑ ∑

© 2004 Prentice-Hall, Inc. Chap 13-11

Linear Regression Equation(continued)

Y

XObserved Value

|Y X iXµ β β0 1= +

β0

β1

ii iY Xβ β ε0 1+ +=

0 1i iY b b X= +

ie

0 1i iib bY X e+ +=1b

0b

© 2004 Prentice-Hall, Inc. Chap 13-12

Interpretation of the Slopeand Intercept

is the average value of Y when the value of X is zero

measures the change in the average value of Y as a result of a one-

unit change in X

| 0Y Xβ µ0 ==

|1

change in change in

Y X

β =

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-5

© 2004 Prentice-Hall, Inc. Chap 13-13

Interpretation of the Slopeand Intercept

is the estimated average

value of Y when the value of X is zero

is the estimated change

in the average value of Y as a result of a one-

unit change in X

(continued)

( )ˆ 0b Y X0 = =

1

ˆchange in change in

YbX

=

© 2004 Prentice-Hall, Inc. Chap 13-14

Simple Linear Regression: Example

You wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best.

Annual Store Square Sales

Feet ($1000)1 1,726 3,6812 1,542 3,3953 2,816 6,6534 5,555 9,5435 1,292 3,3186 2,208 5,5637 1,313 3,760

© 2004 Prentice-Hall, Inc. Chap 13-15

Scatter Diagram: Example

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000 6000

Square Feet

Ann

ual S

ales

($00

0)

Excel Output

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-6

© 2004 Prentice-Hall, Inc. Chap 13-16

Simple Linear Regression Equation: Example

0 1ˆ

1636.415 1.487i i

i

Y b b XX

= += +

From Excel Printout:Coefficien ts

In te rce p t 1636.414726X V a ria b le 1.486633657

© 2004 Prentice-Hall, Inc. Chap 13-17

Graph of the Simple Linear Regression Equation: Example

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000 6000

Square Fe e t

Ann

ual S

ales

($00

0)

Yi = 1636.415 +1.487Xi

© 2004 Prentice-Hall, Inc. Chap 13-18

Interpretation of Results: Example

The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.

The equation estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487.

ˆ 1636.415 1.487i iY X= +

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-7

© 2004 Prentice-Hall, Inc. Chap 13-19

Simple Linear Regressionin PHStat

In Excel, use PHStat | Regression | Simple Linear Regression …

Excel Spreadsheet of Regression Sales on Footage

Microsoft Excel Worksheet

© 2004 Prentice-Hall, Inc. Chap 13-20

Measures of Variation: The Sum of Squares

SST = SSR + SSE

Total Sample

Variability

= Explained Variability

+ Unexplained Variability

© 2004 Prentice-Hall, Inc. Chap 13-21

Measures of Variation: The Sum of Squares

SST = Total Sum of Squares Measures the variation of the Yi values around their mean,

SSR = Regression Sum of Squares Explained variation attributable to the relationship between X and Y

SSE = Error Sum of Squares Variation attributable to factors other than the relationship between X and Y

(continued)

Y

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-8

© 2004 Prentice-Hall, Inc. Chap 13-22

Measures of Variation: The Sum of Squares

(continued)

Xi

Y

X

Y

SST = ∑(Yi - Y)2

SSE =∑(Yi - Yi )2∧

SSR = ∑(Yi - Y)2∧

__

_

© 2004 Prentice-Hall, Inc. Chap 13-23

Venn Diagrams and Explanatory Power of Regression

Sales

SizesVariations in Sales explained by Sizes or variations in Sizes used in explaining variation in Sales

Variations in Sales explained by the error term or unexplained by Sizes

Variations in store Sizes not used in explaining variation in Sales

( )SSE

( )SSR

© 2004 Prentice-Hall, Inc. Chap 13-24

The ANOVA Table in Excel

SSTn-1Total

MSE=SSE/(n-k-1)

SSEn-k-1Error

P-value of the F Test

MSR/MSEMSR=SSR/k

SSRkRegression

Significance F

FMSSSdf

ANOVA

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-9

© 2004 Prentice-Hall, Inc. Chap 13-25

Measures of VariationThe Sum of Squares: Example

ANOVAdf SS MS F Significance F

Regression 1 30380456.12 30380456.1 81.1790902 0.000281201

Error 5 1871199.595 374239.919

Total 6 32251655.71

Excel Output for Produce Stores

SSRSSE

Regression (explained) df

Degrees of freedom

Error (unexplained) dfTotal df

SST

© 2004 Prentice-Hall, Inc. Chap 13-26

The Coefficient of Determination

Measures the proportion of variation in Y that is explained by the independent variable X in the regression model

2 Regression Sum of SquaresTotal Sum of Squares

SSRrSST

= =

© 2004 Prentice-Hall, Inc. Chap 13-27

Venn Diagrams and Explanatory Power of Regression

Sales

Sizes

2

SSRSSR S

r

SE

=

=+

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-10

© 2004 Prentice-Hall, Inc. Chap 13-28

Coefficients of Determination (r 2) and Correlation (r)

r2 = 1, r2 = 1,

r2 = .81, r2 = 0,Y

Yi = b0 + b1Xi

X^

YYi = b0 + b1Xi

X

^Y

Yi = b0 + b1XiX

^

Y

Yi = b0 + b1Xi

X^

r = +1 r = -1

r = +0.9 r = 0

© 2004 Prentice-Hall, Inc. Chap 13-29

Standard Error of Estimate

Measures the standard deviation (variation) of the Y values around the regression equation

( )2

1

ˆ

2 2

n

ii

YX

Y YSSESn n

=

−= =

− −

© 2004 Prentice-Hall, Inc. Chap 13-30

Measures of Variation: Produce Store Example

Reg ressio n S tatisticsM ult ip le R 0.9705572R S quare 0.94198129A djus ted R S quare 0.93037754S tandard E rror 611.751517O bs ervat ions 7

Excel Output for Produce Stores

r2 = .94 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage.

Syxn

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-11

© 2004 Prentice-Hall, Inc. Chap 13-31

Linear Regression Assumptions

NormalityY values are normally distributed for each XProbability distribution of error is normal

Homoscedasticity (Constant Variance)Independence of Errors

© 2004 Prentice-Hall, Inc. Chap 13-32

Consequences of Violationof the Assumptions

Violation of the AssumptionsNon-normality (error not normally distributed)Heteroscedasticity (variance not constant)

Usually happens in cross-sectional data

Autocorrelation (errors are not independent)Usually happens in time-series data

Consequences of Any Violation of the Assumptions

Predictions and estimations obtained from the sample regression line will not be accurateHypothesis testing results will not be reliable

It is Important to Verify the Assumptions

© 2004 Prentice-Hall, Inc. Chap 13-33

• Y values are normally distributed around the regression line.

• For each X value, the “spread” or variance around the regression line is the same.

Variation of Errors Aroundthe Regression Line

X1

X2

X

Y

f(e)

Sample Regression Line

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-12

© 2004 Prentice-Hall, Inc. Chap 13-34

Residual Analysis

PurposesExamine linearity Evaluate violations of assumptions

Graphical Analysis of ResidualsPlot residuals vs. X and time

© 2004 Prentice-Hall, Inc. Chap 13-35

Residual Analysis for Linearity

Not Linear Linear

X

e e

X

Y

X

Y

X0 0

00

© 2004 Prentice-Hall, Inc. Chap 13-36

Residual Analysis forHomoscedasticity

Heteroscedasticity Homoscedasticity

SR

X

SR

X

Y

X X

Y

0

0

0

0

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-13

© 2004 Prentice-Hall, Inc. Chap 13-37

Residual Plot

0 1000 2000 3000 4000 5000 6000

Square Feet

Residual Analysis: Excel Output for Produce Stores Example

Excel Output

Observation Predicted Y Residuals1 4202.344417 -521.34441732 3928.803824 -533.80382453 5822.775103 830.22489714 9894.664688 -351.66468825 3557.14541 -239.14541036 4918.90184 644.09816037 3588.364717 171.6352829

© 2004 Prentice-Hall, Inc. Chap 13-38

Residual Analysis for Independence

The Durbin-Watson StatisticUsed when data are collected over time to detect autocorrelation (residuals in one time period are related to residuals in another period)Measures violation of independence assumption

21

2

2

1

( )n

i ii

n

ii

e eD

e

−=

=

−=

Should be close to 2.

If not, examine the model for autocorrelation.

© 2004 Prentice-Hall, Inc. Chap 13-39

Durbin-Watson Statisticin PHStat

PHStat | Regression | Simple Linear Regression …

Check the box for Durbin-Watson Statistic

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-14

© 2004 Prentice-Hall, Inc. Chap 13-40

5α =.0

k=1 k=2

n dL dU dL dU

15 1.08 1.36 .95 1.54

16 1.10 1.37 .98 1.54

Obtaining the Critical Values of Durbin-Watson Statistic

Table 13.4 Finding Critical Values of Durbin-Watson Statistic

© 2004 Prentice-Hall, Inc. Chap 13-41

Do not reject H0 (no autocorrelation)

Using the Durbin-Watson Statistic

: No autocorrelation (error terms are independent): There is autocorrelation (error terms are not

independent)

0H1H

0 42dL 4-dLdU 4-dU

Reject H0(positive

autocorrelation)

Inconclusive Reject H0(negative

autocorrelation)

© 2004 Prentice-Hall, Inc. Chap 13-42

Residual Analysis for Independence

Not Independent Independent

e e

TimeTime

Residual is Plotted Against Time to Detect Any Autocorrelation

No Particular PatternCyclical Pattern

Graphical Approach

0 0

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-15

© 2004 Prentice-Hall, Inc. Chap 13-43

Inference about the Slope: t Test

t Test for a Population SlopeIs there a linear relationship between Y and X ?

Null and Alternative HypothesesH0: β1 = 0 (no linear relationship)H1: β1 ≠ 0 (linear relationship)

Test Statistic

1

1

1 1

2

1

where ( )

YXb n

bi

i

b St SS

X X

β

=

−= =

−∑. . 2d f n= −

© 2004 Prentice-Hall, Inc. Chap 13-44

Example: Produce StoreData for 7 Stores:

Estimated Regression Equation:Annual

Store Square SalesFeet ($000)

1 1,726 3,6812 1,542 3,3953 2,816 6,6534 5,555 9,5435 1,292 3,3186 2,208 5,5637 1,313 3,760

ˆ 1636.415 1.487i iY X= +

The slope of this model is 1.487.

Are square footage and annual sales linearly related?

© 2004 Prentice-Hall, Inc. Chap 13-45

Inferences about the Slope: t Test Example

H0: β1 = 0H1: β1 ≠ 0α = .05df = 7 - 2 = 5Critical Value(s):

Test Statistic:

Decision:

Conclusion:There is evidence that square footage is linearly related to annual sales.

t0 2.5706-2.5706

.025

Reject Reject

.025

From Excel Printout

Reject H0.

Coefficients Standard Error t Stat P-valueIntercept 1636.4147 451.4953 3.6244 0.01515Footage 1.4866 0.1650 9.0099 0.00028

1b 1bS t

p-value

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-16

© 2004 Prentice-Hall, Inc. Chap 13-46

Inferences about the Slope: Confidence Interval Example

Confidence Interval Estimate of the Slope:

11 2n bb t S−±Excel Printout for Produce Stores

At 95% level of confidence, the confidence interval for the slope is (1.062, 1.911). Does not include 0.

Conclusion: There is a significant linear relationship between annual sales and the size of the store.

Lower 95% Upper 95%Intercept 475.810926 2797.01853Footage 1.06249037 1.91077694

© 2004 Prentice-Hall, Inc. Chap 13-47

Inferences about the Slope: F Test

F Test for a Population SlopeIs there a linear relationship between Y and X ?

Null and Alternative HypothesesH0: β1 = 0 (no linear relationship)H1: β1 ≠ 0 (linear relationship)

Test Statistic

Numerator d.f.=1, denominator d.f.=n-2

( )

1

2

S S R

F S S En

=

© 2004 Prentice-Hall, Inc. Chap 13-48

Relationship between a t Test and an F Test

Null and Alternative HypothesesH0: β1 = 0 (no linear relationship)H1: β1 ≠ 0 (linear relationship)

The p –value of a t Test and the p –value of an F Test are Exactly the SameThe Rejection Region of an F Test is Always in the Upper Tail

( )2

2 1, 2n nt F− −=

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-17

© 2004 Prentice-Hall, Inc. Chap 13-49

ANOVAdf SS MS F Significance F

Regression 1 30380456.12 30380456.12 81.179 0.000281Residual 5 1871199.595 374239.919Total 6 32251655.71

Inferences about the Slope: F Test Example

Test Statistic:

Decision:

Conclusion:

H0: β1 = 0H1: β1 ≠ 0α = .05numerator df = 1denominator df = 7 - 2 = 5

There is evidence that square footage is linearly related to annual sales.

From Excel Printout

Reject H0.

0 6.61

Reject

α = .05

1, 2nF −

p-value

© 2004 Prentice-Hall, Inc. Chap 13-50

Purpose of Correlation Analysis

Correlation Analysis is Used to Measure Strength of Association (Linear Relationship) Between 2 Numerical Variables

Only strength of the relationship is concernedNo causal effect is implied

© 2004 Prentice-Hall, Inc. Chap 13-51

Purpose of Correlation Analysis

Population Correlation Coefficient ρ (Rho) is Used to Measure the Strength between the Variables

(continued)

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-18

© 2004 Prentice-Hall, Inc. Chap 13-52

Sample Correlation Coefficient r is an Estimate of ρ and is Used to Measure the Strength of the Linear Relationship in the Sample Observations

Purpose of Correlation Analysis(continued)

( ) ( )

( ) ( )1

2 2

1 1

n

i ii

n n

i ii i

X X Y Yr

X X Y Y

=

= =

− −=

− −

∑ ∑

© 2004 Prentice-Hall, Inc. Chap 13-53r = .6 r = 1

Sample Observations from Various r Values

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

© 2004 Prentice-Hall, Inc. Chap 13-54

Features of ρ and r

Unit FreeRange between -1 and 1The Closer to -1, the Stronger the Negative Linear RelationshipThe Closer to 1, the Stronger the Positive Linear RelationshipThe Closer to 0, the Weaker the Linear Relationship

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-19

© 2004 Prentice-Hall, Inc. Chap 13-55

Hypotheses H0: ρ = 0 (no correlation) H1: ρ ≠ 0 (correlation)

Test Statistic

( )( )

( ) ( )

2

2 1

2 2

1 1

where

2n

i ii

n n

i ii i

rtr

n

X X Y Yr r

X X Y Y

ρ

=

= =

−=

1−−

− −= =

− −

∑ ∑

t Test for Correlation

© 2004 Prentice-Hall, Inc. Chap 13-56

Example: Produce Stores

Regression StatisticsMultiple R 0.9705572R Square 0.94198129Adjusted R Square 0.93037754Standard Error 611.751517Observations 7

From Excel Printout r

Is there any evidence of linear relationship between annual sales of a store and its square footage at .05 level of significance? H0: ρ = 0 (no association)

H1: ρ ≠ 0 (association)α = .05df = 7 - 2 = 5

© 2004 Prentice-Hall, Inc. Chap 13-57

Example: Produce Stores Solution

0 2.5706-2.5706

.025

Reject Reject

.025

Critical Value(s):

Conclusion:There is evidence of a linear relationship at 5% level of significance.

Decision:Reject H0.

2

.9706 9.00991 .9420

52

rtr

n

ρ−= = =

−1−−

The value of the t statistic is exactly the same as the t statistic value for test on the slope coefficient.

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-20

© 2004 Prentice-Hall, Inc. Chap 13-58

Estimation of Mean Values

Confidence Interval Estimate for :

The Mean of Y Given a Particular Xi

2

22

1

( )1ˆ( )

ii n YX n

ii

X XY t Sn X X

=

−± +

−∑t value from table with df=n-2

Standard error of the estimate

Size of interval varies according to distance away from mean, X

| iY X Xµ =

© 2004 Prentice-Hall, Inc. Chap 13-59

Prediction of Individual Values

Prediction Interval for Individual Response Yi at a Particular Xi

Addition of 1 increases width of interval from that for the mean of Y

2

22

1

( )1ˆ 1( )

ii n YX n

ii

X XY t Sn X X

=

−± + +

−∑

© 2004 Prentice-Hall, Inc. Chap 13-60

Interval Estimates for Different Values of X

Y

X

Prediction Interval for an Individual Yi

a given X

Confidence Interval for the Mean of Y

Yi = b0 + b1Xi∧

X

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-21

© 2004 Prentice-Hall, Inc. Chap 13-61

Example: Produce Stores

Yi = 1636.415 +1.487Xi

Data for 7 Stores:

Regression Model Obtained:∧

Annual Store Square Sales

Feet ($000)1 1,726 3,6812 1,542 3,3953 2,816 6,6534 5,555 9,5435 1,292 3,3186 2,208 5,5637 1,313 3,760

Consider a store with 2000 square feet.

© 2004 Prentice-Hall, Inc. Chap 13-62

Estimation of Mean Values: Example

Find the 95% confidence interval for the average annual sales for stores of 2,000 square feet.

2

22

1

( )1ˆ 4610.45 612.66( )

ii n YX n

ii

X XY t Sn X X

=

−± + = ±

−∑

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 (in $000)∧

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

Confidence Interval Estimate for | iY X Xµ =

|3997.02 5222.34iY X Xµ =< <

© 2004 Prentice-Hall, Inc. Chap 13-63

Prediction Interval for Y : Example

Find the 95% prediction interval for annual sales of one particular store of 2,000 square feet.

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 (in $000)∧

X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.57062

22

1

( )1ˆ 1 4610.45 1687.68( )

ii n YX n

ii

X XY t Sn X X

=

−± + + = ±

−∑

Prediction Interval for Individual

2922.00 6297.37iX XY =< <

iX XY =

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-22

© 2004 Prentice-Hall, Inc. Chap 13-64

Estimation of Mean Values and Prediction of Individual Values in PHStat

In Excel, use PHStat | Regression | Simple Linear Regression …

Check the “Confidence and Prediction Interval for X=” box

Excel Spreadsheet of Regression Sales on Footage

Microsoft Excel Worksheet

© 2004 Prentice-Hall, Inc. Chap 13-65

Pitfalls of Regression Analysis

Lacking an Awareness of the Assumptions Underlining Least-Squares RegressionNot Knowing How to Evaluate the AssumptionsNot Knowing What the Alternatives to Least-Squares Regression are if a Particular Assumption is ViolatedUsing a Regression Model Without Knowledge of the Subject Matter

© 2004 Prentice-Hall, Inc. Chap 13-66

Strategy for Avoiding the Pitfalls of Regression

Start with a scatter plot to observe possible relationship between X on Y Perform residual analysis to check the assumptionsUse a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 13 Student Lecture Notes 13-23

© 2004 Prentice-Hall, Inc. Chap 13-67

Strategy for Avoiding the Pitfalls of Regression

If there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression)If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals

(continued)

© 2004 Prentice-Hall, Inc. Chap 13-68

Chapter Summary

Introduced Types of Regression ModelsDiscussed Determining the Simple Linear Regression EquationDescribed Measures of VariationAddressed Assumptions of Regression and CorrelationDiscussed Residual Analysis Addressed Measuring Autocorrelation

© 2004 Prentice-Hall, Inc. Chap 13-69

Chapter Summary

Described Inference about the SlopeDiscussed Correlation - Measuring the Strength of the Association Addressed Estimation of Mean Values and Prediction of Individual ValuesDiscussed Pitfalls in Regression and Ethical Issues

(continued)