OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
-
Upload
amber-miller -
Category
Documents
-
view
228 -
download
0
Transcript of OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
![Page 1: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/1.jpg)
OPIM 303-Lecture #8
Jose M. Cruz
Assistant Professor
![Page 2: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/2.jpg)
Session 8 - Overview
• Simple Regression Model• Determining the best fit• “Goodness of Fit”
– R2
– Confidence Intervals– Hypothesis tests– Residual Analysis
![Page 3: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/3.jpg)
Purpose of Regression Analysis
• Regression analysis is used primarily to model causality and provide prediction– Predicts the value of a dependent (response)
variable based on the value of at least one independent (explanatory) variable
– Explains the effect of the independent variables on the dependent variable
![Page 4: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/4.jpg)
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
![Page 5: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/5.jpg)
Simple Linear Regression Model
• Relationship between variables is described by a linear function
• The change of one variable causes the change in the other variable
• A dependency of one variable on the other
![Page 6: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/6.jpg)
PopulationRegressionLine (conditional mean)
Population Linear Regression
Population regression line is a straight line that describes the dependence of the average value (conditional mean)average value (conditional mean) of one variable on the other
Population Y intercept
Population SlopeCoefficient
Random Error
Dependent (Response) Variable
Independent (Explanatory) Variable
ii iY X
YX
![Page 7: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/7.jpg)
Population Linear Regression
(continued)
ii iY X
= Random Error
Y
X
(Observed Value of Y) =
Observed Value of Y
YX iX
i
(Conditional Mean)
![Page 8: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/8.jpg)
Sample regression line provides an estimateestimate of the population regression line as well as a predicted value of Y
Sample Linear Regression
Sample Y Intercept
SampleSlopeCoefficient
Residual0 1i iib bY X e
0 1Y b b X Sample Regression Line (Fitted Regression Line, Predicted Value)
![Page 9: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/9.jpg)
Sample Linear Regression
• and are obtained by finding the values of and that minimizes the sum of the squared residuals
• provides an estimateestimate of • provides and estimateestimate of
0b 1b 0b1b
0b
1b
(continued)
22
1 1
ˆn n
i i ii i
Y Y e
![Page 10: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/10.jpg)
Sample Linear Regression
(continued)
Y
XObserved Value
YX iX
i
ii iY X
0 1i iY b b X
ie
0 1i iib bY X e 1b
0b
![Page 11: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/11.jpg)
Interpretation of the Slope and the Intercept
• is the average value of Y when
the value of X is zero.
• measures the change in the
average value of Y as a result of a one-unit
change in X.
| 0E Y X
1
|E Y X
X
![Page 12: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/12.jpg)
• is the estimatedestimated average
value of Y when the value of X is zero.
• is the estimatedestimated change in
the average value of Y as a result of a one-unit
change in X.
(continued)
ˆ | 0b E Y X
1
ˆ |E Y Xb
X
Interpretation of the Slope and the Intercept
![Page 13: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/13.jpg)
Simple Linear Regression: Example
You want to examine the linear dependency of the annual sales of produce stores on their size in square footage. Sample data for seven stores were obtained. Find the equation of the straight line that fits the data best.
Annual Store Square Sales
Feet ($1000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
![Page 14: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/14.jpg)
Scatter Diagram: Example
0
2000
4000
6000
8000
10000
12000
0 1000 2000 3000 4000 5000 6000
Square Feet
An
nu
al
Sa
les
($00
0)
Excel Output
![Page 15: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/15.jpg)
Equation for the Sample Regression Line: Example
0 1ˆ
1636.415 1.487i i
i
Y b b X
X
From Excel Printout:
CoefficientsIntercept 1636.414726X Variable 1 1.486633657
![Page 16: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/16.jpg)
Excel Output
Regression Statistics
Multiple R 0.970557
R Square 0.941981
Adjusted R Square 0.930378
Standard Error 611.7515
Observations 7
ANOVA
df SS MS FSignificance
F
Regression 1 30380456 30380456 81.17909 0.000281
Residual 5 1871200 374239.9
Total 6 32251656
Coefficient
sStandard
Error t Stat P-value Lower 95% Upper 95%Intercept 1636.415 451.4953 3.624433 0.015149 475.8109 2797.019X Variable 1 1.486634 0.164999 9.009944 0.000281 1.06249 1.910777
![Page 17: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/17.jpg)
Graph of the Sample Regression Line: Example
0
2000
4000
6000
8000
10000
12000
0 1000 2000 3000 4000 5000 6000
Square Feet
An
nu
al
Sa
les
($00
0)
Y i = 1636.415 +1.487X i
![Page 18: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/18.jpg)
Interpretation of Results: Example
The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.
The model estimates that for each increase of one square foot in the size of the store, the expected annual sales are predicted to increase by $1487.
ˆ 1636.415 1.487i iY X
![Page 19: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/19.jpg)
How Good is the regression?
• R2
• Residual Plots• Analysis of Variance• Confidence Intervals• Hypothesis (t) tests
![Page 20: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/20.jpg)
Coefficient of Correlation
• Measures the strength of the linear relationship between two quantitative variables
1
2 2
1 1
n
i ii
n n
i ii i
X X Y Yr
X X Y Y
![Page 21: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/21.jpg)
The Coefficient of Determination
• Denoted by R2
• Measures the proportion of variation in Y that is explained by the independent variable X in the regression model
![Page 22: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/22.jpg)
Coefficients of Determination (r 2) and Correlation (r)
r2 = 1, r2 = 1,
r2 = .8, r2 = 0,Y
Yi = b0 + b1Xi
X
^
YYi = b0 + b1Xi
X
^Y
Yi = b0 + b1Xi
X
^
Y
Yi = b0 + b1Xi
X
^
r = +1 r = -1
r = +0.9 r = 0
![Page 23: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/23.jpg)
Linear Regression Assumptions
1. Linearity
2. Normality– Y values are normally distributed for each X– Probability distribution of error is normal
2. Homoscedasticity (Constant Variance)
3. Independence of Errors
![Page 24: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/24.jpg)
Residual Analysis
• Purposes– Examine linearity – Evaluate violations of assumptions
• Graphical Analysis of Residuals– Plot residuals vs. Xi , Yi and time
![Page 25: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/25.jpg)
Residual Analysis for Linearity
Not Linear Linear
X
e eX
Y
X
Y
X
![Page 26: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/26.jpg)
• Y values are normally distributed around the regression line.
• For each X value, the “spread” or variance around the regression line is the same.
Variation of Errors around the Regression Line
X1
X2
X
Y
f(e)
Sample Regression Line
![Page 27: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/27.jpg)
Residual Analysis for Homoscedasticity
Heteroscedasticity Homoscedasticity
SR
X
SR
X
Y
X X
Y
![Page 28: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/28.jpg)
Residual Plot
0 1000 2000 3000 4000 5000 6000
Square Feet
Residual Analysis:Excel Output for Produce Stores Example
Excel Output
Observation Predicted Y Residuals1 4202.344417 -521.34441732 3928.803824 -533.80382453 5822.775103 830.22489714 9894.664688 -351.66468825 3557.14541 -239.14541036 4918.90184 644.09816037 3588.364717 171.6352829
![Page 29: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/29.jpg)
Residual Analysis for Independence
Not Independent Independente e
TimeTime
Residual is plotted against time to detect any autocorrelation
No Particular PatternCyclical Pattern
Graphical Approach
![Page 30: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/30.jpg)
The ANOVA Table in Excel
ANOVA
df SS MS FSignificance
F
Regression p SSRMSR
=SSR/pMSR/MSE
P-value of
the F Test
Residuals n-p-1 SSEMSE
=SSE/(n-p-1)
Total n-1 SST
![Page 31: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/31.jpg)
Measures of VariationThe Sum of Squares: Example
Excel Output for Produce Stores
ANOVA
df SS MS F Significance F
Regression 1 30380456.12 30380456 81.17909 0.000281201
Residual 5 1871199.595 374239.92
Total 6 32251655.71
![Page 32: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/32.jpg)
Measures of Variation: Produce Store Example
Regression StatisticsMultiple R 0.9705572R Square 0.94198129Adjusted R Square 0.93037754Standard Error 611.751517Observations 7
Excel Output for Produce Stores
r2 = .94
94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage
![Page 33: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/33.jpg)
Inference about the Slope: t Test
• t test for a population slope– Is there a linear dependency of Y on X ?
• Null and alternative hypotheses– H0: 1 = 0 (no linear dependency)– H1: 1 0 (linear dependency)
• Test statistic–
–
1
1
1 1
2
1
where
( )
YXb n
bi
i
b St S
SX X
. . 2d f n
![Page 34: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/34.jpg)
Example: Produce Store
Data for Seven Stores: Estimated Regression Equation:
The slope of this model is 1.487.
Is square footage of the store affecting its annual sales?
Annual Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
Yi = 1636.415 +1.487Xi
![Page 35: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/35.jpg)
Inferences about the Slope: t Test Example
H0: 1 = 0
H1: 1 0
.05
df 7 - 2 = 5
Critical Value(s):
Test Statistic:
Decision:
Conclusion:There is evidence that square footage affects annual sales.
t0 2.5706-2.5706
.025
Reject Reject
.025
From Excel Printout
Reject H0
Coefficients Standard Error t Stat P-valueIntercept 1636.4147 451.4953 3.6244 0.01515Footage 1.4866 0.1650 9.0099 0.00028
1b 1bS t
![Page 36: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/36.jpg)
Inferences about the Slope: Confidence Interval Example
Confidence Interval Estimate of the Slope:
11 2n bb t S Excel Printout for Produce Stores
At 95% level of confidence, the confidence interval for the slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear dependency of annual sales on the size of the store.
Lower 95% Upper 95%Intercept 475.810926 2797.01853X Variable 11.06249037 1.91077694
![Page 37: OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.](https://reader030.fdocuments.in/reader030/viewer/2022033102/56649e875503460f94b8a76b/html5/thumbnails/37.jpg)
Confidence Intervals for Estimators
Regression Statistics
Multiple R 0.970557
R Square 0.941981
Adjusted R Square 0.930378
Standard Error 611.7515
Observations 7
ANOVA
df SS MS FSignificance
F
Regression 1 30380456 30380456 81.17909 0.000281
Residual 5 1871200 374239.9
Total 6 32251656
Coefficient
sStandard
Error t Stat P-value Lower 95% Upper 95%Intercept 1636.415 451.4953 3.624433 0.015149 475.8109 2797.019X Variable 1 1.486634 0.164999 9.009944 0.000281 1.06249 1.910777