© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple...
-
Upload
mervin-reynolds -
Category
Documents
-
view
221 -
download
0
Transcript of © 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple...
-
Basic Business Statistics(9th Edition)Chapter 13Simple Linear Regression
-
Chapter TopicsTypes of Regression ModelsDetermining the Simple Linear Regression EquationMeasures of VariationAssumptions of Regression and CorrelationResidual Analysis Measuring AutocorrelationInferences about the Slope
-
Chapter TopicsCorrelation - Measuring the Strength of the AssociationEstimation of Mean Values and Prediction of Individual ValuesPitfalls in Regression and Ethical Issues(continued)
-
Purpose of Regression AnalysisRegression Analysis is Used Primarily to Model Causality and Provide PredictionPredict the values of a dependent (response) variable based on values of at least one independent (explanatory) variableExplain the effect of the independent variables on the dependent variable
-
Types of Regression ModelsPositive Linear RelationshipNegative Linear RelationshipRelationship NOT LinearNo Relationship
-
Simple Linear Regression ModelRelationship between Variables is Described by a Linear FunctionThe Change of One Variable Causes the Other Variable to ChangeA Dependency of One Variable on the Other
-
Simple Linear Regression ModelPopulation RegressionLine (Conditional Mean)Population regression line is a straight line that describes the dependence of the average value (conditional mean) of one variable on the other Population Y Intercept Population Slope Coefficient Random ErrorDependent (Response) VariableIndependent (Explanatory) Variable(continued)
-
Simple Linear Regression Model(continued) = Random ErrorYX(Observed Value of Y) =Observed Value of Y(Conditional Mean)
-
Linear Regression EquationSample regression line provides an estimate of the population regression line as well as a predicted value of YSample Y InterceptSample Slope CoefficientResidual Simple Regression Equation (Fitted Regression Line, Predicted Value)
-
Linear Regression Equation and are obtained by finding the values of and that minimize the sum of the squared residuals
provides an estimate of provides an estimate of (continued)
-
Linear Regression Equation(continued)YXObserved Value
-
Interpretation of the Slopeand Intercept is the average value of Y when the value of X is zero
measures the change in the average value of Y as a result of a one-unit change in X
-
Interpretation of the Slopeand Intercept is the estimated average value of Y when the value of X is zero
is the estimated change in the average value of Y as a result of a one-unit change in X
(continued)
-
Simple Linear Regression: ExampleYou wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage. Sample data for 7 stores were obtained. Find the equation of the straight line that fits the data best.Annual Store Square Sales Feet($1000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760
-
Scatter Diagram: ExampleExcel Output
Chart1
3681
3395
6653
9543
3318
5563
3760
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
0
0
0
0
0
0
0
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
-
Simple Linear Regression Equation: ExampleFrom Excel Printout:
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
-
Graph of the Simple Linear Regression Equation: ExampleYi = 1636.415 +1.487Xi
Chart2
3681
3395
6653
9543
3318
5563
3760
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
0
0
0
0
0
0
0
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
-
Interpretation of Results: ExampleThe slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.The equation estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487.
-
Simple Linear Regressionin PHStatIn Excel, use PHStat | Regression | Simple Linear Regression Excel Spreadsheet of Regression Sales on Footage
Scatter
3681
3395
6653
9543
3318
5563
3760
Sales
X
Y
Scatter Diagram
DataCopy
FootageSales(X-XBar)^2
17263681389732.653061224
15423395653325.795918367
28166653216889.795918367
5555954310270193.6530612
129233181119968.65306122
2208556320245.2244897959
131337601075961.65306122
SLR
Regression Analysis
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536
Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted SalesResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
SLR
1726
1542
2816
5555
1292
2208
1313
Footage
Residuals
Footage Residual Plot
Estimate
Confidence Interval Estimate
X Value2000
Confidence Level95%
Sample Size7
Degrees of Freedom5
t Value2.5705776352
Sample Mean2350.2857142857
Sum of Squared Difference13746317.43
Standard Error of the Estimate611.7515173314
h Statistic0.1517831758
Average Predicted Y (YHat)4609.6820391647
For Average Predicted Y (YHat)
Interval Half Width612.6572787957
Confidence Interval Lower Limit3997.024760369
Confidence Interval Upper Limit5222.3393179604
For Individual Response Y
Interval Half Width1687.6840468462
Prediction Interval Lower Limit2921.9979923185
Prediction Interval Upper Limit6297.3660860109
This sheet is generated by:(1) PHStat | Regression | Simple Linear Regression (2) checking the "ANOVA and Coefficients Table", and the "Residual Plot" boxes
&A
Page &P
This sheet is generated by:(1) PHStat | Regression | Simple Linear Regression (2) checking the "Confidence and Prediction Intervals for X=" box
data
StoreFootageSales
11,7263,681
21,5423,395
32,8166,653
45,5559,543
51,2923,318
62,2085,563
71,3133,760
-
Measures of Variation: The Sum of SquaresSST = SSR + SSETotal Sample Variability=Explained Variability+Unexplained Variability
-
Measures of Variation: The Sum of SquaresSST = Total Sum of Squares Measures the variation of the Yi values around their mean, SSR = Regression Sum of Squares Explained variation attributable to the relationship between X and YSSE = Error Sum of Squares Variation attributable to factors other than the relationship between X and Y(continued)
-
Measures of Variation: The Sum of Squares(continued)XiYXYSST = (Yi - Y)2SSE =(Yi - Yi )2 SSR = (Yi - Y)2 ___
-
Venn Diagrams and Explanatory Power of RegressionSalesSizesVariations in Sales explained by Sizes or variations in Sizes used in explaining variation in SalesVariations in Sales explained by the error term or unexplained by SizesVariations in store Sizes not used in explaining variation in Sales
-
The ANOVA Table in Excel
ANOVAdfSSMSFSignificance FRegressionkSSRMSR=SSR/kMSR/MSEP-value of the F TestResidualsn-k-1SSEMSE=SSE/(n-k-1)Totaln-1SST
-
Measures of VariationThe Sum of Squares: ExampleExcel Output for Produce StoresSSRSSERegression (explained) dfDegrees of freedomError (residual) dfTotal dfSST
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
0
0
0
0
0
0
0
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
-
The Coefficient of Determination
Measures the proportion of variation in Y that is explained by the independent variable X in the regression model
-
Venn Diagrams and Explanatory Power of RegressionSalesSizes
-
Coefficients of Determination (r 2) and Correlation (r)r2 = 1,r2 = 1,r2 = .81,r2 = 0,YYi = b0 + b1XiX^YYi = b0 + b1XiX^YYi = b0 + b1XiX^YYi = b0 + b1XiX^r = +1r = -1r = +0.9r = 0
-
Standard Error of Estimate
Measures the standard deviation (variation) of the Y values around the regression equation
-
Measures of Variation: Produce Store ExampleExcel Output for Produce Stores r2 = .94 94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage.Syxn
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
0
0
0
0
0
0
0
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
-
Linear Regression AssumptionsNormalityY values are normally distributed for each XProbability distribution of error is normalHomoscedasticity (Constant Variance)Independence of Errors
-
Consequences of Violationof the AssumptionsViolation of the AssumptionsNon-normality (error not normally distributed)Heteroscedasticity (variance not constant)Usually happens in cross-sectional dataAutocorrelation (errors are not independent)Usually happens in time-series dataConsequences of Any Violation of the AssumptionsPredictions and estimations obtained from the sample regression line will not be accurateHypothesis testing results will not be reliableIt is Important to Verify the Assumptions
-
Variation of Errors Aroundthe Regression Line Y values are normally distributed around the regression line. For each X value, the spread or variance around the regression line is the same.X1X2XY f(e)Sample Regression Line
-
Residual AnalysisPurposesExamine linearity Evaluate violations of assumptionsGraphical Analysis of ResidualsPlot residuals vs. X and time
-
Residual Analysis for LinearityNot LinearLinearX e eXYXYX
-
Residual Analysis for Homoscedasticity HeteroscedasticityHomoscedasticitySRXSRXYXXY
-
Residual Analysis: Excel Output for Produce Stores ExampleExcel Output
Chart3
-521.3444172716
-533.8038244674
830.224897095
-351.6646881801
-239.1454103313
644.098160274
171.6352828813
Square Feet
Residual Plot
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet6
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResidualsStandard Residuals
14202.3444172716-521.3444172716-0.9335558294
23928.8038244674-533.8038244674-0.9558665166
35822.775102905830.2248970951.486658851
49894.6646881801-351.6646881801-0.6297154218
53557.1454103313-239.1454103313-0.4282305219
64918.901839726644.0981602741.1533672795
73588.3647171187171.63528288130.3073421591
Sheet6
0
0
0
0
0
0
0
Square Feet
Residuals
Residual Plot
Sheet1
36810
33950
66530
95430
33180
55630
37600
Y
Predicted Y
X Variable 1
Y
X Variable 1 Line Fit Plot
Sheet2
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet2
0
0
0
0
0
0
0
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet3
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
0
0
0
0
0
0
0
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
-
Residual Analysis for IndependenceThe Durbin-Watson StatisticUsed when data is collected over time to detect autocorrelation (residuals in one time period are related to residuals in another period)Measures violation of independence assumptionShould be close to 2. If not, examine the model for autocorrelation.
-
Durbin-Watson Statisticin PHStatPHStat | Regression | Simple Linear Regression Check the box for Durbin-Watson Statistic
-
Obtaining the Critical Values of Durbin-Watson StatisticTable 13.4 Finding Critical Values of Durbin-Watson Statistic
k=1
k=2
n
dL
dU
dL
dU
15
1.08
1.36
.95
1.54
16
1.10
1.37
.98
1.54
_1013327115.unknown
_1044645607.unknown
_1013326860.unknown
-
Using the Durbin-Watson StatisticAccept H0 (no autocorrelation) : No autocorrelation (error terms are independent) : There is autocorrelation (error terms are not independent)042dL4-dLdU4-dUReject H0 (positive autocorrelation)InconclusiveReject H0 (negative autocorrelation)
-
Residual Analysis for IndependenceNot IndependentIndependenteeTimeTimeResidual is Plotted Against Time to Detect Any AutocorrelationNo Particular PatternCyclical PatternGraphical Approach
-
Inference about the Slope: t Testt Test for a Population SlopeIs there a linear dependency of Y on X ?Null and Alternative HypothesesH0: 1 = 0(no linear dependency)H1: 1 0(linear dependency)Test Statistic
-
Example: Produce StoreData for 7 Stores:Estimated Regression Equation:Annual Store Square Sales Feet($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760The slope of this model is 1.487. Does square footage affect annual sales?
-
Inferences about the Slope: t Test ExampleH0: 1 = 0H1: 1 0 .05df 7 - 2 = 5Critical Value(s):Test Statistic: Decision:
Conclusion:
There is evidence that square footage affects annual sales.t02.5706-2.5706.025RejectReject.025From Excel Printout Reject H0.p-value
SLR
Regression Analysis
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept1636.4147451.49533.62440.01515475.81092619832797.0185259536
Footage1.48660.16509.00990.000281.06249037151.9107769416
Sheet1
FootageAnnual Sales
11,7263,681
21,5423,395
32,8166,653
45,5559,543
51,2923,318
62,2085,563
71,3133,760
Sheet2
Sheet3
-
Inferences about the Slope: Confidence Interval ExampleConfidence Interval Estimate of the Slope: Excel Printout for Produce StoresAt 95% level of confidence, the confidence interval for the slope is (1.062, 1.911). Does not include 0.Conclusion: There is a significant linear dependency of annual sales on the size of the store.
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet6
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResidualsStandard Residuals
14202.3444172716-521.3444172716-0.9335558294
23928.8038244674-533.8038244674-0.9558665166
35822.775102905830.2248970951.486658851
49894.6646881801-351.6646881801-0.6297154218
53557.1454103313-239.1454103313-0.4282305219
64918.901839726644.0981602741.1533672795
73588.3647171187171.63528288130.3073421591
Sheet6
Residual Plot
Sheet1
36811726
33951542
66532816
95435555
33181292
55632208
37601313
Y
Predicted Y
X Variable 1
Y
X Variable 1 Line Fit Plot
Sheet2
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet2
0
0
0
0
0
0
0
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet3
-
Inferences about the Slope: F TestF Test for a Population SlopeIs there a linear dependency of Y on X ?Null and Alternative HypothesesH0: 1 = 0(no linear dependency)H1: 1 0(linear dependency)Test Statistic
Numerator d.f.=1, denominator d.f.=n-2
-
Relationship between a t Test and an F TestNull and Alternative HypothesesH0: 1 = 0(no linear dependency)H1: 1 0(linear dependency)
The p value of a t Test and the p value of an F Test are Exactly the SameThe Rejection Region of an F Test is Always in the Upper Tail
-
Inferences about the Slope: F Test ExampleTest Statistic: Decision:Conclusion:
H0: 1 = 0H1: 1 0 .05numerator df = 1denominator df 7 - 2 = 5
There is evidence that square footage affects annual sales.From Excel Printout Reject H0.06.61Reject = .05p-value
Scatter
3681
3395
6653
9543
3318
5563
3760
Sales
X
Y
Scatter Diagram
DataCopy
FootageSales(X-XBar)^2
17263681389732.653061224
15423395653325.795918367
28166653216889.795918367
5555954310270193.6530612
129233181119968.65306122
2208556320245.2244897959
131337601075961.65306122
SLR
Regression Analysis
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.000281
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536
Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted SalesResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
SLR
1726
1542
2816
5555
1292
2208
1313
Footage
Residuals
Footage Residual Plot
Estimate
Confidence Interval Estimate
X Value2000
Confidence Level95%
Sample Size7
Degrees of Freedom5
t Value2.5705776352
Sample Mean2350.2857142857
Sum of Squared Difference13746317.43
Standard Error of the Estimate611.7515173314
h Statistic0.1517831758
Average Predicted Y (YHat)4609.6820391647
For Average Predicted Y (YHat)
Interval Half Width612.6572787957
Confidence Interval Lower Limit3997.024760369
Confidence Interval Upper Limit5222.3393179604
For Individual Response Y
Interval Half Width1687.6840468462
Prediction Interval Lower Limit2921.9979923185
Prediction Interval Upper Limit6297.3660860109
&A
Page &P
data
StoreFootageSales
11,7263,681
21,5423,395
32,8166,653
45,5559,543
51,2923,318
62,2085,563
71,3133,760
-
Purpose of Correlation Analysis Correlation Analysis is Used to Measure Strength of Association (Linear Relationship) Between 2 Numerical VariablesOnly strength of the relationship is concernedNo causal effect is implied
-
Purpose of Correlation AnalysisPopulation Correlation Coefficient (Rho) is Used to Measure the Strength between the Variables
(continued)
-
Purpose of Correlation AnalysisSample Correlation Coefficient r is an Estimate of and is Used to Measure the Strength of the Linear Relationship in the Sample Observations
(continued)
-
Sample Observations from Various r Valuesr = .6r = 1YXYXYXYXYXr = -1r = -.6r = 0
-
Features of r and rUnit FreeRange between -1 and 1The Closer to -1, the Stronger the Negative Linear RelationshipThe Closer to 1, the Stronger the Positive Linear RelationshipThe Closer to 0, the Weaker the Linear Relationship
-
t Test for CorrelationHypotheses H0: = 0 (no correlation) H1: 0 (correlation)Test Statistic
-
Example: Produce StoresFrom Excel Printout rIs there any evidence of linear relationship between annual sales of a store and its square footage at .05 level of significance?H0: = 0 (no association)H1: 0 (association) .05df 7 - 2 = 5
Sheet5
SUMMARY OUTPUT
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536475.81092619832797.0185259536
X Variable 11.48663365650.16499921239.00994395940.00028120091.06249037151.91077694161.06249037151.9107769416
RESIDUAL OUTPUT
ObservationPredicted YResiduals
14202.3444172716-521.3444172716
23928.8038244674-533.8038244674
35822.775102905830.224897095
49894.6646881801-351.6646881801
53557.1454103313-239.1454103313
64918.901839726644.098160274
73588.3647171187171.6352828813
Sheet1
StoreSquare FeetAnnual Sales ($000)
117263681
215423395
328166653
455559543
512923318
622085563
713133760
Sheet1
0
0
0
0
0
0
0
Annual Sales ($000)
Square Feet
Annual Sales ($000)
Sheet2
Sheet3
-
Example: Produce Stores Solution02.5706-2.5706.025RejectReject.025Critical Value(s):Conclusion: There is evidence of a linear relationship at 5% level of significance.Decision: Reject H0.The value of the t statistic is exactly the same as the t statistic value for test on the slope coefficient.
-
Estimation of Mean ValuesConfidence Interval Estimate for :The Mean of Y Given a Particular Xit value from table with df=n-2Standard error of the estimateSize of interval varies according to distance away from mean,
-
Prediction of Individual ValuesPrediction Interval for Individual Response Yi at a Particular XiAddition of 1 increases width of interval from that for the mean of Y
-
Interval Estimates for Different Values of XYXPrediction Interval for a Individual Yia given XConfidence Interval for the Mean of YYi = b0 + b1Xi
-
Example: Produce StoresYi = 1636.415 +1.487XiData for 7 Stores:Regression Model Obtained:Annual Store Square Sales Feet($000) 1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760Consider a store with 2000 square feet.
-
Estimation of Mean Values: ExampleFind the 95% confidence interval for the average annual sales for stores of 2,000 square feet.Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)X = 2350.29SYX = 611.75 tn-2 = t5 = 2.5706Confidence Interval Estimate for
-
Prediction Interval for Y : ExampleFind the 95% prediction interval for annual sales of one particular store of 2,000 square feet.Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)X = 2350.29SYX = 611.75 tn-2 = t5 = 2.5706Prediction Interval for Individual
-
Estimation of Mean Values and Prediction of Individual Values in PHStatIn Excel, use PHStat | Regression | Simple Linear Regression Check the Confidence and Prediction Interval for X= boxExcel Spreadsheet of Regression Sales on Footage
DataCopy
FootageSales(X-XBar)^2
17263681389732.653061224
15423395653325.795918367
28166653216889.795918367
5555954310270193.6530612
129233181119968.65306122
2208556320245.2244897959
131337601075961.65306122
SLR
Regression Analysis
Regression Statistics
Multiple R0.9705572038
R Square0.9419812858
Adjusted R Square0.930377543
Standard Error611.7515173314
Observations7
ANOVA
dfSSMSFSignificance F
Regression130380456.119499630380456.119499681.1790901520.0002812009
Residual51871199.59478615374239.91895723
Total632251655.7142857
CoefficientsStandard Errort StatP-valueLower 95%Upper 95%
Intercept1636.4147260759451.49533084953.62443333130.0151488191475.81092619832797.0185259536
Footage1.48663365650.16499921239.00994395940.00028120091.06249037151.9107769416
Estimate
Confidence Interval Estimate
X Value2000
Confidence Level95%
Sample Size7
Degrees of Freedom5
t Value2.5705776352
Sample Mean2350.2857142857
Sum of Squared Difference13746317.43
Standard Error of the Estimate611.7515173314
h Statistic0.1517831758
Average Predicted Y (YHat)4609.6820391647
For Average Predicted Y (YHat)
Interval Half Width612.6572787957
Confidence Interval Lower Limit3997.024760369
Confidence Interval Upper Limit5222.3393179604
For Individual Response Y
Interval Half Width1687.6840468462
Prediction Interval Lower Limit2921.9979923185
Prediction Interval Upper Limit6297.3660860109
&A
Page &P
data
StoreFootageSales
11,7263,681
21,5423,395
32,8166,653
45,5559,543
51,2923,318
62,2085,563
71,3133,760
-
Pitfalls of Regression AnalysisLacking an Awareness of the Assumptions Underlining Least-Squares RegressionNot Knowing How to Evaluate the AssumptionsNot Knowing What the Alternatives to Least-Squares Regression are if a Particular Assumption is ViolatedUsing a Regression Model Without Knowledge of the Subject Matter
-
Strategy for Avoiding the Pitfalls of RegressionStart with a scatter plot of X on Y to observe possible relationshipPerform residual analysis to check the assumptionsUse a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality
-
Strategy for Avoiding the Pitfalls of RegressionIf there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression)If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals(continued)
-
Chapter SummaryIntroduced Types of Regression ModelsDiscussed Determining the Simple Linear Regression EquationDescribed Measures of VariationAddressed Assumptions of Regression and CorrelationDiscussed Residual Analysis Addressed Measuring Autocorrelation
-
Chapter SummaryDescribed Inference about the SlopeDiscussed Correlation - Measuring the Strength of the Association Addressed Estimation of Mean Values and Prediction of Individual ValuesDiscussed Pitfalls in Regression and Ethical Issues
(continued)