© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple...

69
© 2003 Prentice-Hall, Inc. Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression

Transcript of © 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple...

  • Basic Business Statistics(9th Edition)Chapter 13Simple Linear Regression

  • Chapter TopicsTypes of Regression ModelsDetermining the Simple Linear Regression EquationMeasures of VariationAssumptions of Regression and CorrelationResidual Analysis Measuring AutocorrelationInferences about the Slope

  • Chapter TopicsCorrelation - Measuring the Strength of the AssociationEstimation of Mean Values and Prediction of Individual ValuesPitfalls in Regression and Ethical Issues(continued)

  • Purpose of Regression AnalysisRegression Analysis is Used Primarily to Model Causality and Provide PredictionPredict the values of a dependent (response) variable based on values of at least one independent (explanatory) variableExplain the effect of the independent variables on the dependent variable

  • Types of Regression ModelsPositive Linear RelationshipNegative Linear RelationshipRelationship NOT LinearNo Relationship

  • Simple Linear Regression ModelRelationship between Variables is Described by a Linear FunctionThe Change of One Variable Causes the Other Variable to ChangeA Dependency of One Variable on the Other

  • Simple Linear Regression ModelPopulation RegressionLine (Conditional Mean)Population regression line is a straight line that describes the dependence of the average value (conditional mean) of one variable on the other Population Y Intercept Population Slope Coefficient Random ErrorDependent (Response) VariableIndependent (Explanatory) Variable(continued)

  • Simple Linear Regression Model(continued) = Random ErrorYX(Observed Value of Y) =Observed Value of Y(Conditional Mean)

  • Linear Regression EquationSample regression line provides an estimate of the population regression line as well as a predicted value of YSample Y InterceptSample Slope CoefficientResidual Simple Regression Equation (Fitted Regression Line, Predicted Value)

  • Linear Regression Equation and are obtained by finding the values of and that minimize the sum of the squared residuals

    provides an estimate of provides an estimate of (continued)

  • Linear Regression Equation(continued)YXObserved Value

  • Interpretation of the Slopeand Intercept is the average value of Y when the value of X is zero

    measures the change in the average value of Y as a result of a one-unit change in X

  • Interpretation of the Slopeand Intercept is the estimated average value of Y when the value of X is zero

    is the estimated change in the average value of Y as a result of a one-unit change in X

    (continued)

  • Simple Linear Regression: ExampleYou wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best.Annual Store Square Sales Feet($1000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760

  • Scatter Diagram: ExampleExcel Output

    Chart1

    3681

    3395

    6653

    9543

    3318

    5563

    3760

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet1

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet2

    Sheet3

  • Simple Linear Regression Equation: ExampleFrom Excel Printout:

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet1

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet1

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet2

    Sheet3

  • Graph of the Simple Linear Regression Equation: ExampleYi = 1636.415 +1.487Xi

    Chart2

    3681

    3395

    6653

    9543

    3318

    5563

    3760

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet1

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet2

    Sheet3

  • Interpretation of Results: ExampleThe slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.The equation estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487.

  • Simple Linear Regressionin PHStatIn Excel, use PHStat | Regression | Simple Linear Regression Excel Spreadsheet of Regression Sales on Footage

    Scatter

    3681

    3395

    6653

    9543

    3318

    5563

    3760

    Sales

    X

    Y

    Scatter Diagram

    DataCopy

    FootageSales(X-XBar)^2

    17263681389732.653061224

    15423395653325.795918367

    28166653216889.795918367

    5555954310270193.6530612

    129233181119968.65306122

    2208556320245.2244897959

    131337601075961.65306122

    SLR

    Regression Analysis

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536

    Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted SalesResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    SLR

    1726

    1542

    2816

    5555

    1292

    2208

    1313

    Footage

    Residuals

    Footage Residual Plot

    Estimate

    Confidence Interval Estimate

    X Value2000

    Confidence Level95%

    Sample Size7

    Degrees of Freedom5

    t Value2.5705776352

    Sample Mean2350.2857142857

    Sum of Squared Difference13746317.43

    Standard Error of the Estimate611.7515173314

    h Statistic0.1517831758

    Average Predicted Y (YHat)4609.6820391647

    For Average Predicted Y (YHat)

    Interval Half Width612.6572787957

    Confidence Interval Lower Limit3997.024760369

    Confidence Interval Upper Limit5222.3393179604

    For Individual Response Y

    Interval Half Width1687.6840468462

    Prediction Interval Lower Limit2921.9979923185

    Prediction Interval Upper Limit6297.3660860109

    This sheet is generated by:(1) PHStat | Regression | Simple Linear Regression (2) checking the "ANOVA and Coefficients Table", and the "Residual Plot" boxes

    &A

    Page &P

    This sheet is generated by:(1) PHStat | Regression | Simple Linear Regression (2) checking the "Confidence and Prediction Intervals for X=" box

    data

    StoreFootageSales

    11,7263,681

    21,5423,395

    32,8166,653

    45,5559,543

    51,2923,318

    62,2085,563

    71,3133,760

  • Measures of Variation: The Sum of SquaresSST = SSR + SSETotal Sample Variability=Explained Variability+Unexplained Variability

  • Measures of Variation: The Sum of SquaresSST = Total Sum of Squares Measures the variation of the Yi values around their mean, SSR = Regression Sum of Squares Explained variation attributable to the relationship between X and YSSE = Error Sum of Squares Variation attributable to factors other than the relationship between X and Y(continued)

  • Measures of Variation: The Sum of Squares(continued)XiYXYSST = (Yi - Y)2SSE =(Yi - Yi )2 SSR = (Yi - Y)2 ___

  • Venn Diagrams and Explanatory Power of RegressionSalesSizesVariations in Sales explained by Sizes or variations in Sizes used in explaining variation in SalesVariations in Sales explained by the error term or unexplained by SizesVariations in store Sizes not used in explaining variation in Sales

  • The ANOVA Table in Excel

    ANOVAdfSSMSFSignificance FRegressionkSSRMSR=SSR/kMSR/MSEP-value of the F TestResidualsn-k-1SSEMSE=SSE/(n-k-1)Totaln-1SST

  • Measures of VariationThe Sum of Squares: ExampleExcel Output for Produce StoresSSRSSERegression (explained) dfDegrees of freedomError (residual) dfTotal dfSST

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet1

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet2

    Sheet3

  • The Coefficient of Determination

    Measures the proportion of variation in Y that is explained by the independent variable X in the regression model

  • Venn Diagrams and Explanatory Power of RegressionSalesSizes

  • Coefficients of Determination (r 2) and Correlation (r)r2 = 1,r2 = 1,r2 = .81,r2 = 0,YYi = b0 + b1XiX^YYi = b0 + b1XiX^YYi = b0 + b1XiX^YYi = b0 + b1XiX^r = +1r = -1r = +0.9r = 0

  • Standard Error of Estimate

    Measures the standard deviation (variation) of the Y values around the regression equation

  • Measures of Variation: Produce Store ExampleExcel Output for Produce Stores r2 = .94 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage.Syxn

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet1

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet2

    Sheet3

  • Linear Regression AssumptionsNormalityY values are normally distributed for each XProbability distribution of error is normalHomoscedasticity (Constant Variance)Independence of Errors

  • Consequences of Violationof the AssumptionsViolation of the AssumptionsNon-normality (error not normally distributed)Heteroscedasticity (variance not constant)Usually happens in cross-sectional dataAutocorrelation (errors are not independent)Usually happens in time-series dataConsequences of Any Violation of the AssumptionsPredictions and estimations obtained from the sample regression line will not be accurateHypothesis testing results will not be reliableIt is Important to Verify the Assumptions

  • Variation of Errors Aroundthe Regression Line Y values are normally distributed around the regression line. For each X value, the spread or variance around the regression line is the same.X1X2XY f(e)Sample Regression Line

  • Residual AnalysisPurposesExamine linearity Evaluate violations of assumptionsGraphical Analysis of ResidualsPlot residuals vs. X and time

  • Residual Analysis for LinearityNot LinearLinearX e eXYXYX

  • Residual Analysis for Homoscedasticity HeteroscedasticityHomoscedasticitySRXSRXYXXY

  • Residual Analysis: Excel Output for Produce Stores ExampleExcel Output

    Chart3

    -521.3444172716

    -533.8038244674

    830.224897095

    -351.6646881801

    -239.1454103313

    644.098160274

    171.6352828813

    Square Feet

    Residual Plot

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet6

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResidualsStandard Residuals

    14202.3444172716-521.3444172716-0.9335558294

    23928.8038244674-533.8038244674-0.9558665166

    35822.775102905830.2248970951.486658851

    49894.6646881801-351.6646881801-0.6297154218

    53557.1454103313-239.1454103313-0.4282305219

    64918.901839726644.0981602741.1533672795

    73588.3647171187171.63528288130.3073421591

    Sheet6

    0

    0

    0

    0

    0

    0

    0

    Square Feet

    Residuals

    Residual Plot

    Sheet1

    36810

    33950

    66530

    95430

    33180

    55630

    37600

    Y

    Predicted Y

    X Variable 1

    Y

    X Variable 1 Line Fit Plot

    Sheet2

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet2

    0

    0

    0

    0

    0

    0

    0

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet3

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet1

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet2

    Sheet3

  • Residual Analysis for IndependenceThe Durbin-Watson StatisticUsed when data is collected over time to detect autocorrelation (residuals in one time period are related to residuals in another period)Measures violation of independence assumptionShould be close to 2. If not, examine the model for autocorrelation.

  • Durbin-Watson Statisticin PHStatPHStat | Regression | Simple Linear Regression Check the box for Durbin-Watson Statistic

  • Obtaining the Critical Values of Durbin-Watson StatisticTable 13.4 Finding Critical Values of Durbin-Watson Statistic

    k=1

    k=2

    n

    dL

    dU

    dL

    dU

    15

    1.08

    1.36

    .95

    1.54

    16

    1.10

    1.37

    .98

    1.54

    _1013327115.unknown

    _1044645607.unknown

    _1013326860.unknown

  • Using the Durbin-Watson StatisticAccept H0 (no autocorrelation) : No autocorrelation (error terms are independent) : There is autocorrelation (error terms are not independent)042dL4-dLdU4-dUReject H0 (positive autocorrelation)InconclusiveReject H0 (negative autocorrelation)

  • Residual Analysis for IndependenceNot IndependentIndependenteeTimeTimeResidual is Plotted Against Time to Detect Any AutocorrelationNo Particular PatternCyclical PatternGraphical Approach

  • Inference about the Slope: t Testt Test for a Population SlopeIs there a linear dependency of Y on X ?Null and Alternative HypothesesH0: 1 = 0(no linear dependency)H1: 1 0(linear dependency)Test Statistic

  • Example: Produce StoreData for 7 Stores:Estimated Regression Equation:Annual Store Square Sales Feet($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760The slope of this model is 1.487. Does square footage affect annual sales?

  • Inferences about the Slope: t Test ExampleH0: 1 = 0H1: 1 0 .05df 7 - 2 = 5Critical Value(s):Test Statistic: Decision:

    Conclusion:

    There is evidence that square footage affects annual sales.t02.5706-2.5706.025RejectReject.025From Excel Printout Reject H0.p-value

    SLR

    Regression Analysis

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

    Intercept1636.4147451.49533.62440.01515475.81092619832797.0185259536

    Footage1.48660.16509.00990.000281.06249037151.9107769416

    Sheet1

    FootageAnnual Sales

    11,7263,681

    21,5423,395

    32,8166,653

    45,5559,543

    51,2923,318

    62,2085,563

    71,3133,760

    Sheet2

    Sheet3

  • Inferences about the Slope: Confidence Interval ExampleConfidence Interval Estimate of the Slope: Excel Printout for Produce StoresAt 95% level of confidence, the confidence interval for the slope is (1.062, 1.911). Does not include 0.Conclusion: There is a significant linear dependency of annual sales on the size of the store.

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet6

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResidualsStandard Residuals

    14202.3444172716-521.3444172716-0.9335558294

    23928.8038244674-533.8038244674-0.9558665166

    35822.775102905830.2248970951.486658851

    49894.6646881801-351.6646881801-0.6297154218

    53557.1454103313-239.1454103313-0.4282305219

    64918.901839726644.0981602741.1533672795

    73588.3647171187171.63528288130.3073421591

    Sheet6

    Residual Plot

    Sheet1

    36811726

    33951542

    66532816

    95435555

    33181292

    55632208

    37601313

    Y

    Predicted Y

    X Variable 1

    Y

    X Variable 1 Line Fit Plot

    Sheet2

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet2

    0

    0

    0

    0

    0

    0

    0

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet3

  • Inferences about the Slope: F TestF Test for a Population SlopeIs there a linear dependency of Y on X ?Null and Alternative HypothesesH0: 1 = 0(no linear dependency)H1: 1 0(linear dependency)Test Statistic

    Numerator d.f.=1, denominator d.f.=n-2

  • Relationship between a t Test and an F TestNull and Alternative HypothesesH0: 1 = 0(no linear dependency)H1: 1 0(linear dependency)

    The p value of a t Test and the p value of an F Test are Exactly the SameThe Rejection Region of an F Test is Always in the Upper Tail

  • Inferences about the Slope: F Test ExampleTest Statistic: Decision:Conclusion:

    H0: 1 = 0H1: 1 0 .05numerator df = 1denominator df 7 - 2 = 5

    There is evidence that square footage affects annual sales.From Excel Printout Reject H0.06.61Reject = .05p-value

    Scatter

    3681

    3395

    6653

    9543

    3318

    5563

    3760

    Sales

    X

    Y

    Scatter Diagram

    DataCopy

    FootageSales(X-XBar)^2

    17263681389732.653061224

    15423395653325.795918367

    28166653216889.795918367

    5555954310270193.6530612

    129233181119968.65306122

    2208556320245.2244897959

    131337601075961.65306122

    SLR

    Regression Analysis

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.000281

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536

    Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted SalesResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    SLR

    1726

    1542

    2816

    5555

    1292

    2208

    1313

    Footage

    Residuals

    Footage Residual Plot

    Estimate

    Confidence Interval Estimate

    X Value2000

    Confidence Level95%

    Sample Size7

    Degrees of Freedom5

    t Value2.5705776352

    Sample Mean2350.2857142857

    Sum of Squared Difference13746317.43

    Standard Error of the Estimate611.7515173314

    h Statistic0.1517831758

    Average Predicted Y (YHat)4609.6820391647

    For Average Predicted Y (YHat)

    Interval Half Width612.6572787957

    Confidence Interval Lower Limit3997.024760369

    Confidence Interval Upper Limit5222.3393179604

    For Individual Response Y

    Interval Half Width1687.6840468462

    Prediction Interval Lower Limit2921.9979923185

    Prediction Interval Upper Limit6297.3660860109

    &A

    Page &P

    data

    StoreFootageSales

    11,7263,681

    21,5423,395

    32,8166,653

    45,5559,543

    51,2923,318

    62,2085,563

    71,3133,760

  • Purpose of Correlation Analysis Correlation Analysis is Used to Measure Strength of Association (Linear Relationship) Between 2 Numerical VariablesOnly strength of the relationship is concernedNo causal effect is implied

  • Purpose of Correlation AnalysisPopulation Correlation Coefficient (Rho) is Used to Measure the Strength between the Variables

    (continued)

  • Purpose of Correlation AnalysisSample Correlation Coefficient r is an Estimate of and is Used to Measure the Strength of the Linear Relationship in the Sample Observations

    (continued)

  • Sample Observations from Various r Valuesr = .6r = 1YXYXYXYXYXr = -1r = -.6r = 0

  • Features of r and rUnit FreeRange between -1 and 1The Closer to -1, the Stronger the Negative Linear RelationshipThe Closer to 1, the Stronger the Positive Linear RelationshipThe Closer to 0, the Weaker the Linear Relationship

  • t Test for CorrelationHypotheses H0: = 0 (no correlation) H1: 0 (correlation)Test Statistic

  • Example: Produce StoresFrom Excel Printout rIs there any evidence of linear relationship between annual sales of a store and its square footage at .05 level of significance?H0: = 0 (no association)H1: 0 (association) .05df 7 - 2 = 5

    Sheet5

    SUMMARY OUTPUT

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536

    X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416

    RESIDUAL OUTPUT

    ObservationPredicted YResiduals

    14202.3444172716-521.3444172716

    23928.8038244674-533.8038244674

    35822.775102905830.224897095

    49894.6646881801-351.6646881801

    53557.1454103313-239.1454103313

    64918.901839726644.098160274

    73588.3647171187171.6352828813

    Sheet1

    StoreSquare FeetAnnual Sales ($000)

    117263681

    215423395

    328166653

    455559543

    512923318

    622085563

    713133760

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    Annual Sales ($000)

    Square Feet

    Annual Sales ($000)

    Sheet2

    Sheet3

  • Example: Produce Stores Solution02.5706-2.5706.025RejectReject.025Critical Value(s):Conclusion: There is evidence of a linear relationship at 5% level of significance.Decision: Reject H0.The value of the t statistic is exactly the same as the t statistic value for test on the slope coefficient.

  • Estimation of Mean ValuesConfidence Interval Estimate for :The Mean of Y Given a Particular Xit value from table with df=n-2Standard error of the estimateSize of interval varies according to distance away from mean,

  • Prediction of Individual ValuesPrediction Interval for Individual Response Yi at a Particular XiAddition of 1 increases width of interval from that for the mean of Y

  • Interval Estimates for Different Values of XYXPrediction Interval for a Individual Yia given XConfidence Interval for the Mean of YYi = b0 + b1Xi

  • Example: Produce StoresYi = 1636.415 +1.487XiData for 7 Stores:Regression Model Obtained:Annual Store Square Sales Feet($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760Consider a store with 2000 square feet.

  • Estimation of Mean Values: ExampleFind the 95% confidence interval for the average annual sales for stores of 2,000 square feet.Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)X = 2350.29SYX = 611.75 tn-2 = t5 = 2.5706Confidence Interval Estimate for

  • Prediction Interval for Y : ExampleFind the 95% prediction interval for annual sales of one particular store of 2,000 square feet.Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)X = 2350.29SYX = 611.75 tn-2 = t5 = 2.5706Prediction Interval for Individual

  • Estimation of Mean Values and Prediction of Individual Values in PHStatIn Excel, use PHStat | Regression | Simple Linear Regression Check the Confidence and Prediction Interval for X= boxExcel Spreadsheet of Regression Sales on Footage

    DataCopy

    FootageSales(X-XBar)^2

    17263681389732.653061224

    15423395653325.795918367

    28166653216889.795918367

    5555954310270193.6530612

    129233181119968.65306122

    2208556320245.2244897959

    131337601075961.65306122

    SLR

    Regression Analysis

    Regression Statistics

    Multiple R0.9705572038

    R Square0.9419812858

    Adjusted R Square0.930377543

    Standard Error611.7515173314

    Observations7

    ANOVA

    dfSSMSFSignificance F

    Regression130380456.119499630380456.119499681.1790901520.0002812009

    Residual51871199.59478615374239.91895723

    Total632251655.7142857

    CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

    Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536

    Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.9107769416

    Estimate

    Confidence Interval Estimate

    X Value2000

    Confidence Level95%

    Sample Size7

    Degrees of Freedom5

    t Value2.5705776352

    Sample Mean2350.2857142857

    Sum of Squared Difference13746317.43

    Standard Error of the Estimate611.7515173314

    h Statistic0.1517831758

    Average Predicted Y (YHat)4609.6820391647

    For Average Predicted Y (YHat)

    Interval Half Width612.6572787957

    Confidence Interval Lower Limit3997.024760369

    Confidence Interval Upper Limit5222.3393179604

    For Individual Response Y

    Interval Half Width1687.6840468462

    Prediction Interval Lower Limit2921.9979923185

    Prediction Interval Upper Limit6297.3660860109

    &A

    Page &P

    data

    StoreFootageSales

    11,7263,681

    21,5423,395

    32,8166,653

    45,5559,543

    51,2923,318

    62,2085,563

    71,3133,760

  • Pitfalls of Regression AnalysisLacking an Awareness of the Assumptions Underlining Least-Squares RegressionNot Knowing How to Evaluate the AssumptionsNot Knowing What the Alternatives to Least-Squares Regression are if a Particular Assumption is ViolatedUsing a Regression Model Without Knowledge of the Subject Matter

  • Strategy for Avoiding the Pitfalls of RegressionStart with a scatter plot of X on Y to observe possible relationshipPerform residual analysis to check the assumptionsUse a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality

  • Strategy for Avoiding the Pitfalls of RegressionIf there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression)If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals(continued)

  • Chapter SummaryIntroduced Types of Regression ModelsDiscussed Determining the Simple Linear Regression EquationDescribed Measures of VariationAddressed Assumptions of Regression and CorrelationDiscussed Residual Analysis Addressed Measuring Autocorrelation

  • Chapter SummaryDescribed Inference about the SlopeDiscussed Correlation - Measuring the Strength of the Association Addressed Estimation of Mean Values and Prediction of Individual ValuesDiscussed Pitfalls in Regression and Ethical Issues

    (continued)