OM Forecasting GLM(en)
-
Upload
nacionaldwin -
Category
Documents
-
view
221 -
download
0
Transcript of OM Forecasting GLM(en)
-
8/3/2019 OM Forecasting GLM(en)
1/70
#
Slide
Forecasting MethodsForecasting
Methods
Quantitative Qualitative
Causal Time Series
Smoothing TrendProjectionTrend Projection
Adjusted forSeasonal Influence
-
8/3/2019 OM Forecasting GLM(en)
2/70
2006 by Thomson Learning, a division of Thomson Asia Pte Ltd.. #
Slide
Models in which the parameters ( 0, 1, . . . , p ) allhave exponents of one are called linear models.It does not imply that the relationship between y andthe x is is linear.
General Linear Model
A general linear model involving p independentvariables is
0 1 1 2 2
p py z z z
Each of the independent variables z is a function of x1, x2, ... , x k (the variables for which data have beencollected).
-
8/3/2019 OM Forecasting GLM(en)
3/70
2006 by Thomson Learning, a division of Thomson Asia Pte Ltd.. #
Slide
General Linear Model
y x 0 1 1y x 0 1 1
The simplest case is when we have collected data for just one variable x1 and want to estimate y by using astraight-line relationship. In this case z1 = x1.
This model is called a simple first-order model with onepredictor variable.
-
8/3/2019 OM Forecasting GLM(en)
4/70
-
8/3/2019 OM Forecasting GLM(en)
5/70
#
Slide
Estimation Process
Multiple Regression ModelE(y) = 0 + 1x1 + 2x2 +. . .+ px p + Multiple Regression Equation
E(y) = 0 + 1x1 + 2x2 +. . .+ px p Unknown parameters are
0, 1, 2, . . . , p
Sample Data:x1 x2 . . . x p y. . . .. . . .
0 1 1 2 2
... p py b b x b x b x
Estimated MultipleRegression Equation
Sample statistics areb0, b1, b2 , . . . , b p
b0, b1, b2 , . . . , b p provide estimates of 0, 1, 2, . . . , p
-
8/3/2019 OM Forecasting GLM(en)
6/70
#
Slide
Least Squares Method
Least Squares Criterion
2min ( )i iy y
Computation of Coefficient Values
The formulas for the regression coefficients b0, b1, b2, . . ., b p involve the use of matrix algebra.We will rely on computer software packages to
perform the calculations.
-
8/3/2019 OM Forecasting GLM(en)
7/70 # Slide
Multiple Regression Equation
Example: Butler Trucking CompanyTo develop better work schedules, the managers wantto estimate the total daily travel time for their driversData
-
8/3/2019 OM Forecasting GLM(en)
8/70 # Slide
Multiple Regression EquationMINITAB Output
-
8/3/2019 OM Forecasting GLM(en)
9/70 # Slide
The years of experience, score on the aptitude
test, and corresponding annual salary ($1000s) for asample of 20 programmers is shown on the nextslide.
Example: Programmer Salary Survey
Multiple Regression Model
A software firm collected data for a sampleof 20 computer programmers. A suggestionwas made that regression analysis couldbe used to determine if salary was relatedto the years of experience and the scoreon the firms programmer aptitude test.
-
8/3/2019 OM Forecasting GLM(en)
10/70 # Slide
47158100
166
92105684
633
781008682868475
808391
88737581748779
947089
2443
23.734.335.838
22.2
23.13033
3826.636.231.62934
30.1
33.928.230
Exper. Score ScoreExper.Salary Salary
Multiple Regression Model
-
8/3/2019 OM Forecasting GLM(en)
11/70 # Slide
Suppose we believe that salary ( y) isrelated to the years of experience ( x1) and the score onthe programmer aptitude test ( x2) by the followingregression model:
Multiple Regression Model
where y = annual salary ($1000)
x1 = years of experience x2 = score on programmer aptitude test
y = 0 + 1x1 + 2x2 +
-
8/3/2019 OM Forecasting GLM(en)
12/70 # Slide
Solving for the Estimates of 0 , 1 , 2
Input DataLeast Squares
Outputx1 x2 y
4 78 247 100 43. . .
. . .3 89 30
ComputerPackage
for SolvingMultiple
RegressionProblems
b0
=b1 =b2 =R2 =
etc.
-
8/3/2019 OM Forecasting GLM(en)
13/70
-
8/3/2019 OM Forecasting GLM(en)
14/70 # Slide
Excels Regression Dialog Box
Solving for the Estimates of 0 , 1 , 2
-
8/3/2019 OM Forecasting GLM(en)
15/70 # Slide
Excels Regression Equation Output
A B C D E
38
39 Coeffic. Std. Err. t Stat P-value 40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843
Note: Columns F-I are not shown.
Solving for the Estimates of 0 , 1 , 2
-
8/3/2019 OM Forecasting GLM(en)
16/70
# Slide
Estimated Regression Equation
SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)
Note: Predicted salary will be in thousands of dollars.
-
8/3/2019 OM Forecasting GLM(en)
17/70
# Slide
Interpreting the Coefficients
In multiple regression analysis, we interpret each
regression coefficient as follows:
bi represents an estimate of the change in ycorresponding to a 1-unit increase in xi when all
other independent variables are held constant.
-
8/3/2019 OM Forecasting GLM(en)
18/70
# Slide
Salary is expected to increase by $1,404 foreach additional year of experience (when the variable
score on programmer attitude test is held constant).
b1 = 1. 404
Interpreting the Coefficients
-
8/3/2019 OM Forecasting GLM(en)
19/70
# Slide
Salary is expected to increase by $251 for eachadditional point scored on the programmer aptitude
test (when the variable years of experience is heldconstant).
b2 = 0.251
Interpreting the Coefficients
-
8/3/2019 OM Forecasting GLM(en)
20/70
# Slide
Multiple Coefficient of Determination
Relationship Among SST, SSR, SSE
where:SST = total sum of squaresSSR = sum of squares due to regression
SSE = sum of squares due to error
SST = SSR + SSE
2( )iy y2( )iy y
2( )i iy y
-
8/3/2019 OM Forecasting GLM(en)
21/70
# Slide
Excels ANOVA Output
A B C D E F3233 ANOVA34 df SS MS F Significance F 35 Regression 2 500.3285 250.1643 42.76013 2.32774E-0736 Residual 17 99.45697 5.8504137 Total 19 599.785538
SST SSR
Multiple Coefficient of Determination
-
8/3/2019 OM Forecasting GLM(en)
22/70
# Slide
Multiple Coefficient of Determination
R2 = 500.3285/599.7855 = .83418
R2 = SSR/SST
In general, R 2 always increases as independent variables areadded to the model.
adjusting R 2 for the number of independent variables to avoidoverestimating the impact of adding an independent variable
-
8/3/2019 OM Forecasting GLM(en)
23/70
d d l l ff
-
8/3/2019 OM Forecasting GLM(en)
24/70
# Slide
Excels Regression Statistics
A B C
23 24 SUMMARY OUTPUT25
26 Regression Statistics 27 Multiple R 0.91333405928 R Square 0.83417910329 Adjus ted R Square 0.81467076230 Standard Error 2.41876207631 Observations 2032
Adjusted Multiple Coefficientof Determination
-
8/3/2019 OM Forecasting GLM(en)
25/70
# Slide
The variance of , denoted by 2, is the same for allvalues of the independent variables.
The error is a normally distributed random variable
reflecting the deviation between the y value and theexpected value of y given by 0 + 1x1 + 2x2 + ... + px p.
Assumptions About the Error Term
The error is a random variable with mean of zero.
The values of are independent.
-
8/3/2019 OM Forecasting GLM(en)
26/70
-
8/3/2019 OM Forecasting GLM(en)
27/70
# Slide
In simple linear regression, the F and t tests providethe same conclusion.
Testing for Significance
In multiple regression, the F and t tests have different
purposes.
-
8/3/2019 OM Forecasting GLM(en)
28/70
# Slide
Testing for Significance: F Test
The F test is referred to as the test for overallsignificance.
The F test is used to determine whether a significantrelationship exists between the dependent variableand the set of all the independent variables.
-
8/3/2019 OM Forecasting GLM(en)
29/70
-
8/3/2019 OM Forecasting GLM(en)
30/70
# Slide
Testing for Significance: F Test
Hypotheses
Rejection Rule
Test Statistics
H 0: 1
= 2
= . . . = p
= 0 H a: One or more of the parameters
is not equal to zero.
F = MSR/MSE
Reject H 0 if p-value < a or if F > F a ,where F a is based on an F distribution
with p d.f. in the numerator and n - p - 1 d.f. in the denominator.
-
8/3/2019 OM Forecasting GLM(en)
31/70
# Slide
Testing for Significance: F TestANOVA Table for A Multiple Regression Model with pIndependent Variables
-
8/3/2019 OM Forecasting GLM(en)
32/70
# Slide
F Test for Overall Significance
Hypotheses H 0: 1 = 2 = 0 H a: One or both of the parametersis not equal to zero.
Rejection Rule For a = .05 and d.f. = 2, 17; F .05 = 3.59Reject H 0 if p-value < .05 or F > 3.59
-
8/3/2019 OM Forecasting GLM(en)
33/70
# Slide
Excels ANOVA Output
A B C D E F3233 ANOVA34 df SS MS F Significance F 35 Regression 2 500.3285 250.1643 42.76013 2.32774E-0736 Residual 17 99.45697 5.8504137 Total 19 599.785538
F Test for Overall Significance
p-value used to test foroverall significance
-
8/3/2019 OM Forecasting GLM(en)
34/70
# Slide
F Test for Overall Significance
Test Statistics F = MSR/MSE
= 250.16/5.85 = 42.76
Conclusion p-value < .05, so we can reject H 0.(Also, F = 42.76 > 3.59)
-
8/3/2019 OM Forecasting GLM(en)
35/70
# Slide
Testing for Significance: t Test
Hypotheses
Rejection Rule
Test Statistics
Reject H 0 if p-value < a orif t < -ta or t > ta where ta
is based on a t distributionwith n - p - 1 degrees of freedom.
t bs
i
bi
t bs
i
bi
0 : 0iH
: 0a iH
t Test for Significance
-
8/3/2019 OM Forecasting GLM(en)
36/70
# Slide
t Test for Significanceof Individual Parameters
Hypotheses
Rejection Rule For a = .05 and d.f. = 17, t.025 = 2.11
Reject H 0 if p-value < .05 or if t > 2.11
0 : 0iH
: 0a iH
t Test for Significance
-
8/3/2019 OM Forecasting GLM(en)
37/70
# Slide
Excels Regression Equation Output
A B C D E
38
39 Coeffic. Std. Err. t Stat P-value 40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843
Note: Columns F-I are not shown.
t Test for Significanceof Individual Parameters
t statistic and p-value used to test for theindividual significance of Experience
t Test for Significance
-
8/3/2019 OM Forecasting GLM(en)
38/70
# Slide
Excels Regression Equation Output
A B C D E
38
39 Coeffic. Std. Err. t Stat P-value 40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843
Note: Columns F-I are not shown.
t Test for Significanceof Individual Parameters
t statistic and p-value used to test for theindividual significance of Test Score
t Test for Significance
-
8/3/2019 OM Forecasting GLM(en)
39/70
# Slide
t Test for Significanceof Individual Parameters
bsb11
1 40391986 7 07 .. .bsb11
1 40391986 7 07 .. .
bsb
2
2
2508907735
3 24 ..
.bsb
2
2
2508907735
3 24 ..
.
Test Statistics
Conclusions Reject both H 0: 1 = 0 and H 0: 2 = 0.Both independent variables aresignificant.
-
8/3/2019 OM Forecasting GLM(en)
40/70
# Slide
Testing for Significance: Multicollinearity
The term multicollinearity refers to the correlationamong the independent variables.
When the independent variables are highly correlated
(say, | r | > .7), it is not possible to determine theseparate effect of any particular independent variableon the dependent variable.
-
8/3/2019 OM Forecasting GLM(en)
41/70
# Slide
Testing for Significance: Multicollinearity
Every attempt should be made to avoid includingindependent variables that are highly correlated.
If the estimated regression equation is to be used onlyfor predictive purposes, multicollinearity is usuallynot a serious problem.
-
8/3/2019 OM Forecasting GLM(en)
42/70
# Slide
Modeling Curvilinear Relationships
This model is called a second-order model with onepredictor variable.
y x x 0 1 1 2 12y x x 0 1 1 2 12
To account for a curvilinear relationship, we might set z1 = x1 and z2 = .21 x
-
8/3/2019 OM Forecasting GLM(en)
43/70
# Slide
Modeling Curvilinear RelationshipsExample: Reynolds, Inc.,
Managers at Reynolds want toinvestigate the relationshipbetween length of employmentof their salespeople and thenumber of electronic laboratoryscales sold.Data
-
8/3/2019 OM Forecasting GLM(en)
44/70
# Slide
Modeling Curvilinear RelationshipsScatter Diagram for the Reynolds Example
-
8/3/2019 OM Forecasting GLM(en)
45/70
# Slide
Modeling Curvilinear RelationshipsLet us consider a simple first-order model and the
estimated regression is
Sales = 111 + 2.38 Months,
where :Sales = number of electronic laboratory scales sold,Months = the number of months the salesperson
has been employed
-
8/3/2019 OM Forecasting GLM(en)
46/70
# Slide
Modeling Curvilinear RelationshipsMINITAB output first-order model
-
8/3/2019 OM Forecasting GLM(en)
47/70
# Slide
Modeling Curvilinear RelationshipsStandardized Residual plot first-order model
The standardized residual plot suggests that acurvilinear relationship is needed
-
8/3/2019 OM Forecasting GLM(en)
48/70
# Slide
Modeling Curvilinear RelationshipsReynolds Example : The second-order model
The estimated regression equation isSales = 45.3 + 6.34 Months + .0345 MonthsSq
where :Sales = number of electronic laboratory scales sold,MonthsSq = the square of the number of months the
salesperson has been employed
-
8/3/2019 OM Forecasting GLM(en)
49/70
# Slide
Modeling Curvilinear RelationshipsMINITAB output second-order model
-
8/3/2019 OM Forecasting GLM(en)
50/70
# Slide
Modeling Curvilinear RelationshipsStandardized Residual plot second-order model
V i bl S l ti P d
-
8/3/2019 OM Forecasting GLM(en)
51/70
# Slide
Variable Selection Procedures
Stepwise Regression
Forward SelectionBackward Elimination
Iterative; one independentvariable at a time is added or
deleted based on the F statistic
Variable Selection: Stepwise
-
8/3/2019 OM Forecasting GLM(en)
52/70
# Slide
Variable Selection: StepwiseRegression
Compute F stat. and
p-value for each indep.variable not in model
Start with no indep.variables in model
Any p-value > alpha
to remove?
Stop
Indep. variablewith largest
p-value isremovedfrom model
Compute F stat. and
p-value for each indep.variable in model
Any p-value < alpha
to enter ?
Indep. variable with
smallest p-value isentered into model
NoNo
YesYes
nextiteration
-
8/3/2019 OM Forecasting GLM(en)
53/70
Variable Selection: Backward
-
8/3/2019 OM Forecasting GLM(en)
54/70
# Slide
Variable Selection: BackwardElimination
Stop
Compute F stat. and p-value for each indep.
variable in model
Any p-value > alpha
to remove?
Indep. variable withlargest p-value is
removed from model
No
Yes
Start with all indep.
variables in model
Qualitative Independent Variables
-
8/3/2019 OM Forecasting GLM(en)
55/70
# Slide
In many situations we must work with qualitativeindependent variables such as gender (male, female),method of payment (cash, check, credit card), etc.
For example, x2
might represent gender where x2
= 0indicates male and x2 = 1 indicates female.
Qualitative Independent Variables
In this case, x2 is called a dummy or indicator variable.
-
8/3/2019 OM Forecasting GLM(en)
56/70
Qualitative Independent Variables
-
8/3/2019 OM Forecasting GLM(en)
57/70
# Slide
471
58100166
9210
5684633
7810086
82868475808391
887375
81748779947089
2443
23.7
34.335.838
22.223.13033
3826.636.2
31.62934
30.133.928.230
Exper. Score ScoreExper.Salary SalaryDegr.NoYesNo
YesYesYesNoNoNoYes
Degr.YesNoYes
NoNoYesNoYesNoNo
Qualitative Independent Variables
Estimated Regression Equation
-
8/3/2019 OM Forecasting GLM(en)
58/70
# Slide
Estimated Regression Equation
y = b0 + b1x1 + b2x2 + b3x3
^where:
y = annual salary ($1000)x1 = years of experiencex2 = score on programmer aptitude testx3 = 0 if individual does not have a graduate degree
1 if individual does have a graduate degree
x3 is a dummy variable
Qualitative Independent Variables
-
8/3/2019 OM Forecasting GLM(en)
59/70
# Slide
Excels Regression Statistics
A B C
23 24 SUMMARY OUTPUT25
26 Regression Statistics 27 Multiple R 0.92021523928 R Square 0.84679608529 Adjus ted R Square 0.81807035130 Standard Error 2.39647510131 Observations 2032
Qualitative Independent Variables
-
8/3/2019 OM Forecasting GLM(en)
60/70
Qualitative Independent Variables
-
8/3/2019 OM Forecasting GLM(en)
61/70
# Slide
Excels Regression Equation Output
A B C D E
38
39 Coeffic. Std. Err. t Stat P-value 40 Intercept 7.94485 7.3808 1.0764 0.297741 Experience 1.14758 0.2976 3.8561 0.001442 Test Score 0.19694 0.0899 2.1905 0.0436443 Grad. Degr. 2.28042 1.98661 1.1479 0.2678944
Note: Columns F-I are not shown.
Not significant
Qualitative Independent Variables
More Complex Qualitative Variables
-
8/3/2019 OM Forecasting GLM(en)
62/70
# Slide
More Complex Qualitative Variables
If a qualitative variable has k levels, k - 1 dummyvariables are required, with each dummy variablebeing coded as 0 or 1.
For example, a variable with levels A, B, and C couldbe represented by x1 and x2 values of (0, 0) for A, (1, 0)for B, and (0,1) for C.
Care must be taken in defining and interpreting the
dummy variables.
-
8/3/2019 OM Forecasting GLM(en)
63/70
Interaction
-
8/3/2019 OM Forecasting GLM(en)
64/70
# Slide
Interaction
y x x x x x x 0 1 1 2 2 3 12
4 22
5 1 2y x x x x x x 0 1 1 2 2 3 12
4 22
5 1 2
This type of effect is called interaction.
In this model, the variable z5 = x1 x2 is added to accountfor the potential effects of the two variables actingtogether.
If the original data set consists of observations for y and
two independent variables x1 and x2 we might develop asecond-order model with two predictor variables.
Interaction
-
8/3/2019 OM Forecasting GLM(en)
65/70
# Slide
Interaction
Example: Tyler Personal Care
New shampoo products, two factors believed to havethe most influence on sales are unit selling price andadvertising expenditure.Data
-
8/3/2019 OM Forecasting GLM(en)
66/70
-
8/3/2019 OM Forecasting GLM(en)
67/70
I t ti
-
8/3/2019 OM Forecasting GLM(en)
68/70
# Slide
InteractionTo account for the effect of interaction, use the
following regression model
where : y = unit sales (1000s), x1 = price ($), x2 = advertising expenditure ($1000s).
21322110 x x x x y
-
8/3/2019 OM Forecasting GLM(en)
69/70
# Slide
InteractionGeneral Linear Model involving three independent
variables ( z1, z2, and z3)
where : y= Sales = unit sales (1000s) z1 = x1 (price) = price of the product ($) z2 = x2 (AdvExp) = advertising expenditure ($1000s)
z3 = x1 x 2 (PriceAdv) = interaction term(Price times AdvExp)
3322110 z z z y
I i
-
8/3/2019 OM Forecasting GLM(en)
70/70
InteractionMINITAB Output for the Tyler Personal Care Example