Chapter 13 Simple Linear Regression - casrilanka.com eight - linear... · 2 Statistics for Managers...
Transcript of Chapter 13 Simple Linear Regression - casrilanka.com eight - linear... · 2 Statistics for Managers...
-
1
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-1
Statistics for ManagersUsing Microsoft® Excel
5th Edition
Chapter 13
Simple Linear Regression
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-2
Scatter Plots
Scatter plots are used for numerical data consistingof paired observations taken from two numericalvariables
One variable is measured on the vertical axis and theother variable is measured on the horizontal axis
-
2
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-3
Scatter Plot Example
Volumeper day
Cost perday
23 125
26 140
29 146
33 160
38 167
42 170
50 188
55 195
60 200
Cost per Day vs. Production Volume
0
50
100
150
200
250
20 30 40 50 60 70
Volume per Day
Co
st
per
Day
Class Exercise
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-4
-
3
Class Exercise
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-5
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-6
Correlation vs. Regression
A scatter plot (or scatter diagram) can beused to show the relationship between twonumerical variables
Correlation analysis is used to measurestrength of the association (linearrelationship) between two variables Correlation is only concerned with strength of
the relationship No causal effect is implied with correlation
-
4
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-7
Regression Analysis
Regression analysis is used to:
Predict the value of a dependent variable basedon the value of at least one independent variable
Explain the impact of changes in an independentvariable on the dependent variable
Dependent variable: the variable you wish to explain
Independent variable: the variable used to explainthe dependent variable
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-8
Simple Linear RegressionModel
Only one independent variable, X Relationship between X and Y is described
by a linear function
Changes in Y are related to changes in X
-
5
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-9
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-10
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
-
6
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-11
Types of Relationships
Y
X
Y
XNo relationship
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-12
The Linear Regression Model
ii10i εXββY Linear component
The population regression model:
PopulationY intercept
PopulationSlopeCoefficient
RandomErrorterm
DependentVariable
IndependentVariable
Random Errorcomponent
-
7
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-13
The Linear Regression Model
Random Errorfor this Xi value
Y
X
Observed Valueof Y for Xi
Predicted Valueof Y for Xi
ii10i εXββY
Xi
Slope = β1
Intercept = β0
εi
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-14
Linear Regression Equation
i10i XbaŶ
The simple linear regression equation provides anestimate of the population regression line
Estimate of theregressionintercept
Estimate of theregression slope
Estimated (orpredicted) Yvalue forobservation i
Value of X forobservation i
-
8
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-15
The Least Squares Method
b0 and b1 are obtained by finding the values of b0and b1 that minimize the sum of the squared
differences between Y and :
2i10i
2ii ))Xb(b(Ymin)Ŷ(Ymin
Ŷ
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-16
Finding the Least SquaresEquation
The coefficients a0 and b1 , and other regressionresults in this chapter, will be found using thefollowing formulas
= Σ − . Σ= Σ − Σ . ΣΣ − (Σ )
i10i XbaŶ
-
9
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-17
Interpretation of theIntercept and the Slope
a0 is the estimated mean value of Y when thevalue of X is zero
b1 is the estimated change in the mean value ofY for every one-unit change in X
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-18
Linear Regression Example
A real estate agent wishes to examine therelationship between the selling price of ahome and its size (measured in square feet)
A random sample of 10 houses is selected Dependent variable (Y) = house price in
$1000s
Independent variable (X) = square feet
-
10
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-19
Linear Regression ExampleData
House Price in $1000s(Y)
Square Feet(X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-20
Linear Regression ExampleData
Square Feet(X)
House Price in$1000s
(Y)XY X2
1400 245
1600 312
1700 279
1875 308
1100 199
1550 219
2350 405
2450 324
1425 319
1700 255
-
11
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-21
Linear Regression ExampleScatterplot
050
100150200250300350400450
0 500 1000 1500 2000 2500 3000
Square Feet
Hou
se P
rice
($10
00s)
House price model: scatter plot
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-22
Linear Regression ExampleExcel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVAdf SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
The regression equation is:feet)(square0.1097798.24833pricehouse
-
12
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-23
Linear Regression ExampleGraphical Representation
050
100150200250300350400450
0 500 1000 1500 2000 2500 3000
Square Feet
Hou
se P
rice
($10
00s)
House price model: scatter plot and regression line
feet)(square0.1097798.24833pricehouse
Slope= 0.10977
Intercept= 98.248
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-24
Linear Regression ExampleInterpretation of b0
b0 is the estimated mean value of Y when the valueof X is zero (if X = 0 is in the range of observed Xvalues)
Because the square footage of the house cannot be 0,the Y intercept has no practical application.
feet)(square0.1097798.24833pricehouse
-
13
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-25
Linear Regression ExampleInterpretation of b1
b1 measures the mean change in the average value ofY as a result of a one-unit change in X
Here, b1 = .10977 tells us that the mean value of ahouse increases by .10977($1000) = $109.77, onaverage, for each additional one square foot of size
feet)(square0.1097798.24833pricehouse
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-26
Linear Regression ExampleMaking Predictions
317.85
0)0.1098(20098.25
(sq.ft.)0.109898.25pricehouse
Predict the price for a house with 2000 square feet:
The predicted price for a house with 2000 square feetis 317.85($1,000s) = $317,850
-
14
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-27
Linear Regression ExampleMaking Predictions
050
100150200250300350400450
0 500 1000 1500 2000 2500 3000
Square Feet
Hou
se P
rice
($10
00s)
When using a regression model for prediction, onlypredict within the relevant range of data
Relevant range forinterpolation
Do not try toextrapolate beyond
the range ofobserved X’s
Class Exercises
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-28
-
15
Class Exercises
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-29
a. Construct Scatter Plotb. Interpret the slope, b1 in this
problemc. Predict the mean audited
newsstand sales for a magazinethat reports newsstand sales of400 000.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-30
Measures of VariationTotal variation is made up of two parts:
SSESSRSST Total Sum ofSquares
Regression Sum ofSquares
Error Sum ofSquares
2i )YY(SST 2ii )ŶY(SSE 2i )YŶ(SSRwhere:
= Mean value of the dependent variable
Yi = Observed values of the dependent variable
i = Predicted value of Y for the given Xi valueŶ
Y
-
16
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-31
Measures of Variation
SST = total sum of squares Measures the variation of the Yi values around
their mean Y
SSR = regression sum of squares Explained variation attributable to the
relationship between X and Y
SSE = error sum of squares Variation attributable to factors other than the
relationship between X and Y
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-32
Measures of Variation
Xi
Y
X
Yi
SST = (Yi - Y)2
SSE = (Yi - Yi )2
SSR = (Yi - Y)2
__
_Y
Y
Y_Y
-
17
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-33
Linear Regression ExampleDataSquare Feet
(X)
HousePrice
in$1000s
(Y)
Yi
SSR SSE
SST
(Yi - Y)2 (Yi - Y)2
1400 245
1600 312
1700 279
1875 308
1100 199
1550 219
2350 405
2450 324
1425 319
1700 255
-
Class Exercises
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-34
Calculate the total variation in both of the exercises done above.
-
18
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-35
Coefficient of Determination, r2
The coefficient of determination is the portion ofthe total variation in the dependent variable thatis explained by variation in the independentvariable
The coefficient of determination is also called r-squared and is denoted as r2
1r0 2
squaresofsumtotalsquaresofsumregression
SSTSSRr2
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-36
Coefficient of Determination, r2
r2 = 1
Y
X
Y
X
r2 = -1
r2 = 1
Perfect linear relationship betweenX and Y:
100% of the variation in Y isexplained by variation in X
-
19
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-37
Coefficient of Determination, r2
Y
X
Y
X
0 < r2 < 1
Weaker linear relationshipsbetween X and Y:
Some but not all of the variationin Y is explained by variation inX
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-38
Coefficient of Determination, r2
r2 = 0
No linear relationship between Xand Y:
The value of Y is not related to X.(None of the variation in Y isexplained by variation in X)
Y
Xr2 = 0
-
20
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-39
Linear Regression ExampleCoefficient of Determination, r2
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVAdf SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
58.08% of the variation in houseprices is explained by variation in
square feet
0.5808232600.5000
18934.9348
SST
SSRr2
Class Exercises
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-40
Calculate the coefficient of determination in both of the exercises done above.
-
21
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-41
Assumptions of RegressionL.I.N.E
Linearity
The relationship between X and Y is linear
Independence of Errors
Error values are statistically independent
Normality of Error
Error values are normally distributed for any givenvalue of X
Equal Variance (also called homoscedasticity)
The probability distribution of the errors has constantvariance