QTM Regression Analysis Ch4 RSH
-
Upload
nadia-khan -
Category
Documents
-
view
265 -
download
0
Transcript of QTM Regression Analysis Ch4 RSH
-
7/31/2019 QTM Regression Analysis Ch4 RSH
1/40
How to performRegression analysis
Nadia Z Khan
NUST Business SchoolFriday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
2/40
regression analysis
A very valuable tool for todays manager.Regression Analysis is used to:
Understand the relationship between variables.
Predict the value of one variable based on
another variable.
A regression model has:
dependent, or response, variable - Y axis
an independent, or predictor, variable - X axisNadia Z KhanNUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
3/40
regression analysis
Triple A Construction Company renovates oldhomes in Albany. They have found that its dollar
volume of renovation work is dependent on the
Albany area payroll.
Local Payroll($100,000,000's)
Triple A Sales($100,000's)
3 6
4 86 9
4 5
2 4.5
5 9.5 Nadia Z KhanNUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
4/40
Scatter plot
0
2
4
6
8
10
0 1 2 3 4 5 6
Local Payroll($100,000,000's)
Sales
100,0
0
0
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
5/40
regression analysis model
Create a Scatter Plot
Perform Regression Analysis
some random error
that cannot be
predicted.
Slope
Intercept
(Value of Y when
X=0)
Independent
Variable, Predictor
Dependent
Variable,Response
Regression: Understand & Predict
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
6/40
regression analysis model
Sample data are used to estimatethe true values for the intercept and
slope.Y= b + bX
Where,
Y = predicted value of Y
Error = (actual value) (predicted value)
e = Y - Y
The difference between the actual
value of Y and the predicted value(using sample data) is known as
the error.
0 1
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
7/40
regression analysis model
Sales (Y) Payroll (X) (X - X) (X-X)(Y-Y)
6 3 1 1
8 4 0 0
9 6 4 4
5 4 0 0
4.5 2 4 5
9.5 5 1 2.5
Summations for each column:
42 24 10 12.5
Y = 42/6 = 7 X = 24/6 = 4
_ _ _
__
Calculating the required
parameters:
b = (X-X)(Y-Y) 12.5
(X-X) 10
b = Y b X = 7 (1.25)(4) = 2
So,
Y = 2 + 1.25 X
2
o 1
1 = = 1.25
2
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
8/40
Measuring the Fit of
the linear RegressionModel
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
9/40
Measuring the Fit of the linear
Regression Model
To understand how well the X predicts the Y, weevaluate
Variability in the Y
variableSSR > Regression Variability
that is explained by therelationship b/w X & Y
+
SSE > UnexplainedVariability, due to factors thenthe regression
------------------------------------SST > Total variability about
the mean
Coefficient of
DeterminationR Sq - Proportion of
explained variation
Correlation
Coefficientr Strength of the
relationshipbetween Y and X
variables
Standard
ErrorSt Deviation
of erroraround theRegression
Line
Residual
AnalysisValidation of
Model
Test for LinearitySignificance of the
Regression Model i.e.
Linear Regression ModelNadia Z KhanNUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
10/40
Variability
0
2
4
6
8
10
0 1 2 3 4 5 6
y = 1.25x + 2R = 0.6944
Local Payroll($100,000,000's)
Regression Line
Y_
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
11/40
Variability
Sum of Squares Total (SST) measures thetotal variable in Y.
Sum of the Squared Error (SSE) is lessthan the SST because the regression line
reduced the variability.
Sum of Squares due to Regression (SSR)indicated how much of the total variabilityis explained by the regression model.
Errors (deviations) may be positive ornegative. Summing the errors would be
misleading, thus we square the terms
prior to summing.
SST = (Y-Y)2
SSE = e = (Y-Y)2 2
SSR = (Y-Y) 2
For Triple A Construction:
SST = (Y-Y)2
SSE = e = (Y-Y)2 2
SSR = (Y-Y) 2
= 22.5
= 6.875
= 15.625
Note:
SST = SSR + SSE
Explained
Variability
Unexplained
Variability
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
12/40
Coefficient of Determination
The coefficient of determination (r
2
)is the proportion of the variability in Y
that is explained by the regression
equation.
r2 = SSR = 1 SSE
SST SST
For Triple A Construction:
r
2
= 15.625 = 0.694422.5
69% of the variability in sales is explained
by the regression based on payroll.
Note: 0 < r2 < 1
SST, SSR and SSEjust themselves
provide little direct
interpretation. This
measures the
usefulness of
regression
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
13/40
Correlation Coefficient
YY(YnXXn
YXXYnr
For Triple A Construction, r = 0.8333
The correlation coefficient (r)measures the strength of the linear
relationship.
Note: -1 < r < 1
Possible
Scatter Diagrams
for values of r.
Shown as Multiple R in
the output of Excel
file
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
14/40
Correlation Coefficient
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
15/40
Standard error
s = MSE = SSEnk-1
The mean squared error (MSE) isthe estimate of the error variance of
the regression equation.
2
Where,n = number of observations in the sample
k = number of independent variables
For Triple A Construction, s = 1.312 Nadia Z KhanNUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
16/40
Test for linearity
An F-test is used to statisticallytest the null hypothesis that there
is no linear relationship between
the X and Y variables (i.e. = 0).
If the significance level for the F
test is low, we reject Ho and conclude
there is a linear relationship.
F = MSR
MSE
where, MSR = SSR
k
1
For Triple A Construction:
MSR = 15.625 = 15.625
1
F = 15.625 = 9.0909
1.7188
The significance level for F = 9.0909 is
0.0394, indicating we reject Ho and
conclude a linear relationship exists
between sales and payroll.
p value is significance level
alpha = level of significance or
= 1-confidence interval
If p
-
7/31/2019 QTM Regression Analysis Ch4 RSH
17/40
Computer Software for
RegressionIn Excel, use Tools/
Data Analysis. Thisis an add-in option.
Nadia Z Khan
NUST Business School
Friday, May 25, 12
C mpu er S f ware f r
-
7/31/2019 QTM Regression Analysis Ch4 RSH
18/40
Computer Software for
Regression
Nadia Z Khan
NUST Business School
Friday, May 25, 12
Computer Software for
-
7/31/2019 QTM Regression Analysis Ch4 RSH
19/40
Computer Software for
Regression
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
20/40
Anova table
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
21/40
Residual Analysis:to verify regression assumptionsare correct
Nadia Z Khan
NUST Business School
Friday, May 25, 12
Assumptions of the
-
7/31/2019 QTM Regression Analysis Ch4 RSH
22/40
Assumptions of the
Regression Model
Errors are independent. Errors are normally distributed. Errors have a mean of zero. Errors have a constant variance.
We make certain assumptions aboutthe errors in a regression model
which allow for statistical testing.
Assumptions:
A plot of
the errors (Real
Value minus predicted
value of Y), also calledresiduals in excel may
highlight
problems with the
model.
PITFALLS:
Prediction beyond the range of X values in the sample can be misleading, includinginterpretation of the intercept (X=0).
A linear regression model may not be the best model, even in the presence of a significant Ftest. Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
23/40
Constant variance
Triple A Construction
Errors have constant
Variance Assumption
Plot Residues w.r.t X values
Pattern should be random!
Non-constant Variation in Error
Residual Plot violation0 X
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
24/40
Normal distribution
Histogram of Residuals - Should look like a bell curve
Triple A Construction
Not possible to see
the bell curve with just
6 observations. Need
more samples
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
25/40
zero mean
Triple A Construction
Errors have zero Mean
0 X
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
26/40
independent errors
If samples collected over aperiod of time and not at the
same time, then plot the
residues w.r.t time to see if
any pattern (Autocorrelation)
exists.
If substantial autocorrelation,Regression Model Validity
becomes doubtful
Autocorrelation can also be checkedusing DurbinWatson statistic.
Example: Manager of a packagedelivery store wants to predict
weekly sales based on the
number of customers making
purchases for a period of 100
days. Data is collected over a
period of time so check for
autocorrelation (pattern) effect.
time
Res
idues Cyclical Pattern!A Violation
Nadia Z Khan
NUST Business School
Friday, May 25, 12
Residual analysis for
-
7/31/2019 QTM Regression Analysis Ch4 RSH
27/40
Residual analysis for
validating assumptions
Nonlinear Residual Plot violation
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
28/40
multiple regression
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
29/40
multiple regression
Multiple regression models aresimilar to simple linear regression
models except they include more
than one X variable.
Y= b + bX + b X ++ b X0 1 1 2 2 n n
Independent variables
slope
Price Sq. Feet Age Condition
35000 1926 30 Good
47000 2069 40 Excellent
49900 1720 30 Excellent
55000 1396 15 Good
58900 1706 32 Mint
60000 1847 38 Mint
67000 1950 27 Mint
70000 2323 30 Excellent
78500 2285 26 Mint
79000 3752 35 Good
87500 2300 18 Good
93000 2525 17 Good
95000 3800 40 Excellent
97000 1740 12 Mint
Wilson Realty wants to develop a model to
determine the suggested listing price for a house
based on size and age.
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
30/40
multiple regression
67% of the variation in
sales price is explained by
size and age.
Ho: No linearrelationship
is rejected
Ho: 1 = 0 is rejected
Ho: 2 = 0 is rejected
Y = 60815.45 + 21.91(size) 1449.34 (age)
Y = 60815.45 + 21.91(size) 1449.34 (age)
Wilson Realty has found a linear
relationship between price and size
and age. The coefficient for size
indicates each additional square foot
increases the value by $21.91, whileeach additional year in age decreases
the value by $1449.34.
For a 1900 square foot house that is 10years old, the following prediction can be
made:
$87,951 = 21.91(1900) + 1449.34(10)
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
31/40
binary or dummyvariables
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
32/40
dummy variables
A dummy variable is assigned avalue of 1 if a particular condition ismet and a value of 0 otherwise.
The number of dummy variablesmust equal one less than the numberof categories of the qualitative
variable.
Binary (or dummy) variablesare special variables that are
created for qualitative data.
Return to Wilson Realty, and letsevaluate how to use property
condition in the regression model.
There are three categories: Mint,
Excellent, and Good.
X = 1 if the house is in excellent condition
= 0 otherwise
X = 1 if the house is in mint condition
= 0 otherwise
Note: If both X and X = 0 then thehouse is in good condition
3
4
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
33/40
dummy variables
Y = 48329.23 + 28.21 (size) 1981.41(age) +
23684.62 (if mint) + 16581.32 (if excellent)
As more variables areadded to the model, the r2
usually increases.
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
34/40
model building
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
35/40
adjusted r-Square
As more variables are added to themodel, the r2 usually increases.
The adjusted r2 takes into accountthe number of independent variablesin the model.
The best model is a statisticallysignificant model with a high r2
and a few variables.
Note: When variables are added to the model, the
value of r2 can never decrease; however, the
adjusted r2 may decrease. Nadia Z KhanNUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
36/40
-
7/31/2019 QTM Regression Analysis Ch4 RSH
37/40
non-linear regression
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
38/40
non-linear regression
Engineers at Colonel Motors want to use regression analysis to improve fuel efficiency. They are
studying the impact of weight on miles per gallon (MPG).
Linear regression model:
MPG = 47.8 8.2 (weight)
F significance = .0003
r2 = .7446
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
39/40
non-linear regression
Nonlinear (transformedvariable)regression model
MPG = 79.8 30.2(weight) + 3.4(weight)
F significance = .0002
R2 = .8478
2
Nadia Z Khan
NUST Business School
Friday, May 25, 12
-
7/31/2019 QTM Regression Analysis Ch4 RSH
40/40
non-linear regression
We should not try to interpret the coefficients of the variables
due to the correlation between (weight) and (weight squared).
Normally we would interpret the coefficient for as the change
in Ythat results from a 1-unit change in X1, while holding allother variables constant.
Obviously holding one variable constant while changing the
other is impossible in this example since If changes, then mustchange also.
This is an example of a problem that exists when
multicollinearity is present Nadia Z Khan