OM Forecasting GLM(en)

download OM Forecasting GLM(en)

of 70

Transcript of OM Forecasting GLM(en)

  • 8/3/2019 OM Forecasting GLM(en)

    1/70

    #

    Slide

    Forecasting MethodsForecasting

    Methods

    Quantitative Qualitative

    Causal Time Series

    Smoothing TrendProjectionTrend Projection

    Adjusted forSeasonal Influence

  • 8/3/2019 OM Forecasting GLM(en)

    2/70

    2006 by Thomson Learning, a division of Thomson Asia Pte Ltd.. #

    Slide

    Models in which the parameters ( 0, 1, . . . , p ) allhave exponents of one are called linear models.It does not imply that the relationship between y andthe x is is linear.

    General Linear Model

    A general linear model involving p independentvariables is

    0 1 1 2 2

    p py z z z

    Each of the independent variables z is a function of x1, x2, ... , x k (the variables for which data have beencollected).

  • 8/3/2019 OM Forecasting GLM(en)

    3/70

    2006 by Thomson Learning, a division of Thomson Asia Pte Ltd.. #

    Slide

    General Linear Model

    y x 0 1 1y x 0 1 1

    The simplest case is when we have collected data for just one variable x1 and want to estimate y by using astraight-line relationship. In this case z1 = x1.

    This model is called a simple first-order model with onepredictor variable.

  • 8/3/2019 OM Forecasting GLM(en)

    4/70

  • 8/3/2019 OM Forecasting GLM(en)

    5/70

    #

    Slide

    Estimation Process

    Multiple Regression ModelE(y) = 0 + 1x1 + 2x2 +. . .+ px p + Multiple Regression Equation

    E(y) = 0 + 1x1 + 2x2 +. . .+ px p Unknown parameters are

    0, 1, 2, . . . , p

    Sample Data:x1 x2 . . . x p y. . . .. . . .

    0 1 1 2 2

    ... p py b b x b x b x

    Estimated MultipleRegression Equation

    Sample statistics areb0, b1, b2 , . . . , b p

    b0, b1, b2 , . . . , b p provide estimates of 0, 1, 2, . . . , p

  • 8/3/2019 OM Forecasting GLM(en)

    6/70

    #

    Slide

    Least Squares Method

    Least Squares Criterion

    2min ( )i iy y

    Computation of Coefficient Values

    The formulas for the regression coefficients b0, b1, b2, . . ., b p involve the use of matrix algebra.We will rely on computer software packages to

    perform the calculations.

  • 8/3/2019 OM Forecasting GLM(en)

    7/70 # Slide

    Multiple Regression Equation

    Example: Butler Trucking CompanyTo develop better work schedules, the managers wantto estimate the total daily travel time for their driversData

  • 8/3/2019 OM Forecasting GLM(en)

    8/70 # Slide

    Multiple Regression EquationMINITAB Output

  • 8/3/2019 OM Forecasting GLM(en)

    9/70 # Slide

    The years of experience, score on the aptitude

    test, and corresponding annual salary ($1000s) for asample of 20 programmers is shown on the nextslide.

    Example: Programmer Salary Survey

    Multiple Regression Model

    A software firm collected data for a sampleof 20 computer programmers. A suggestionwas made that regression analysis couldbe used to determine if salary was relatedto the years of experience and the scoreon the firms programmer aptitude test.

  • 8/3/2019 OM Forecasting GLM(en)

    10/70 # Slide

    47158100

    166

    92105684

    633

    781008682868475

    808391

    88737581748779

    947089

    2443

    23.734.335.838

    22.2

    23.13033

    3826.636.231.62934

    30.1

    33.928.230

    Exper. Score ScoreExper.Salary Salary

    Multiple Regression Model

  • 8/3/2019 OM Forecasting GLM(en)

    11/70 # Slide

    Suppose we believe that salary ( y) isrelated to the years of experience ( x1) and the score onthe programmer aptitude test ( x2) by the followingregression model:

    Multiple Regression Model

    where y = annual salary ($1000)

    x1 = years of experience x2 = score on programmer aptitude test

    y = 0 + 1x1 + 2x2 +

  • 8/3/2019 OM Forecasting GLM(en)

    12/70 # Slide

    Solving for the Estimates of 0 , 1 , 2

    Input DataLeast Squares

    Outputx1 x2 y

    4 78 247 100 43. . .

    . . .3 89 30

    ComputerPackage

    for SolvingMultiple

    RegressionProblems

    b0

    =b1 =b2 =R2 =

    etc.

  • 8/3/2019 OM Forecasting GLM(en)

    13/70

  • 8/3/2019 OM Forecasting GLM(en)

    14/70 # Slide

    Excels Regression Dialog Box

    Solving for the Estimates of 0 , 1 , 2

  • 8/3/2019 OM Forecasting GLM(en)

    15/70 # Slide

    Excels Regression Equation Output

    A B C D E

    38

    39 Coeffic. Std. Err. t Stat P-value 40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843

    Note: Columns F-I are not shown.

    Solving for the Estimates of 0 , 1 , 2

  • 8/3/2019 OM Forecasting GLM(en)

    16/70

    # Slide

    Estimated Regression Equation

    SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)

    Note: Predicted salary will be in thousands of dollars.

  • 8/3/2019 OM Forecasting GLM(en)

    17/70

    # Slide

    Interpreting the Coefficients

    In multiple regression analysis, we interpret each

    regression coefficient as follows:

    bi represents an estimate of the change in ycorresponding to a 1-unit increase in xi when all

    other independent variables are held constant.

  • 8/3/2019 OM Forecasting GLM(en)

    18/70

    # Slide

    Salary is expected to increase by $1,404 foreach additional year of experience (when the variable

    score on programmer attitude test is held constant).

    b1 = 1. 404

    Interpreting the Coefficients

  • 8/3/2019 OM Forecasting GLM(en)

    19/70

    # Slide

    Salary is expected to increase by $251 for eachadditional point scored on the programmer aptitude

    test (when the variable years of experience is heldconstant).

    b2 = 0.251

    Interpreting the Coefficients

  • 8/3/2019 OM Forecasting GLM(en)

    20/70

    # Slide

    Multiple Coefficient of Determination

    Relationship Among SST, SSR, SSE

    where:SST = total sum of squaresSSR = sum of squares due to regression

    SSE = sum of squares due to error

    SST = SSR + SSE

    2( )iy y2( )iy y

    2( )i iy y

  • 8/3/2019 OM Forecasting GLM(en)

    21/70

    # Slide

    Excels ANOVA Output

    A B C D E F3233 ANOVA34 df SS MS F Significance F 35 Regression 2 500.3285 250.1643 42.76013 2.32774E-0736 Residual 17 99.45697 5.8504137 Total 19 599.785538

    SST SSR

    Multiple Coefficient of Determination

  • 8/3/2019 OM Forecasting GLM(en)

    22/70

    # Slide

    Multiple Coefficient of Determination

    R2 = 500.3285/599.7855 = .83418

    R2 = SSR/SST

    In general, R 2 always increases as independent variables areadded to the model.

    adjusting R 2 for the number of independent variables to avoidoverestimating the impact of adding an independent variable

  • 8/3/2019 OM Forecasting GLM(en)

    23/70

    d d l l ff

  • 8/3/2019 OM Forecasting GLM(en)

    24/70

    # Slide

    Excels Regression Statistics

    A B C

    23 24 SUMMARY OUTPUT25

    26 Regression Statistics 27 Multiple R 0.91333405928 R Square 0.83417910329 Adjus ted R Square 0.81467076230 Standard Error 2.41876207631 Observations 2032

    Adjusted Multiple Coefficientof Determination

  • 8/3/2019 OM Forecasting GLM(en)

    25/70

    # Slide

    The variance of , denoted by 2, is the same for allvalues of the independent variables.

    The error is a normally distributed random variable

    reflecting the deviation between the y value and theexpected value of y given by 0 + 1x1 + 2x2 + ... + px p.

    Assumptions About the Error Term

    The error is a random variable with mean of zero.

    The values of are independent.

  • 8/3/2019 OM Forecasting GLM(en)

    26/70

  • 8/3/2019 OM Forecasting GLM(en)

    27/70

    # Slide

    In simple linear regression, the F and t tests providethe same conclusion.

    Testing for Significance

    In multiple regression, the F and t tests have different

    purposes.

  • 8/3/2019 OM Forecasting GLM(en)

    28/70

    # Slide

    Testing for Significance: F Test

    The F test is referred to as the test for overallsignificance.

    The F test is used to determine whether a significantrelationship exists between the dependent variableand the set of all the independent variables.

  • 8/3/2019 OM Forecasting GLM(en)

    29/70

  • 8/3/2019 OM Forecasting GLM(en)

    30/70

    # Slide

    Testing for Significance: F Test

    Hypotheses

    Rejection Rule

    Test Statistics

    H 0: 1

    = 2

    = . . . = p

    = 0 H a: One or more of the parameters

    is not equal to zero.

    F = MSR/MSE

    Reject H 0 if p-value < a or if F > F a ,where F a is based on an F distribution

    with p d.f. in the numerator and n - p - 1 d.f. in the denominator.

  • 8/3/2019 OM Forecasting GLM(en)

    31/70

    # Slide

    Testing for Significance: F TestANOVA Table for A Multiple Regression Model with pIndependent Variables

  • 8/3/2019 OM Forecasting GLM(en)

    32/70

    # Slide

    F Test for Overall Significance

    Hypotheses H 0: 1 = 2 = 0 H a: One or both of the parametersis not equal to zero.

    Rejection Rule For a = .05 and d.f. = 2, 17; F .05 = 3.59Reject H 0 if p-value < .05 or F > 3.59

  • 8/3/2019 OM Forecasting GLM(en)

    33/70

    # Slide

    Excels ANOVA Output

    A B C D E F3233 ANOVA34 df SS MS F Significance F 35 Regression 2 500.3285 250.1643 42.76013 2.32774E-0736 Residual 17 99.45697 5.8504137 Total 19 599.785538

    F Test for Overall Significance

    p-value used to test foroverall significance

  • 8/3/2019 OM Forecasting GLM(en)

    34/70

    # Slide

    F Test for Overall Significance

    Test Statistics F = MSR/MSE

    = 250.16/5.85 = 42.76

    Conclusion p-value < .05, so we can reject H 0.(Also, F = 42.76 > 3.59)

  • 8/3/2019 OM Forecasting GLM(en)

    35/70

    # Slide

    Testing for Significance: t Test

    Hypotheses

    Rejection Rule

    Test Statistics

    Reject H 0 if p-value < a orif t < -ta or t > ta where ta

    is based on a t distributionwith n - p - 1 degrees of freedom.

    t bs

    i

    bi

    t bs

    i

    bi

    0 : 0iH

    : 0a iH

    t Test for Significance

  • 8/3/2019 OM Forecasting GLM(en)

    36/70

    # Slide

    t Test for Significanceof Individual Parameters

    Hypotheses

    Rejection Rule For a = .05 and d.f. = 17, t.025 = 2.11

    Reject H 0 if p-value < .05 or if t > 2.11

    0 : 0iH

    : 0a iH

    t Test for Significance

  • 8/3/2019 OM Forecasting GLM(en)

    37/70

    # Slide

    Excels Regression Equation Output

    A B C D E

    38

    39 Coeffic. Std. Err. t Stat P-value 40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843

    Note: Columns F-I are not shown.

    t Test for Significanceof Individual Parameters

    t statistic and p-value used to test for theindividual significance of Experience

    t Test for Significance

  • 8/3/2019 OM Forecasting GLM(en)

    38/70

    # Slide

    Excels Regression Equation Output

    A B C D E

    38

    39 Coeffic. Std. Err. t Stat P-value 40 Intercept 3.17394 6.15607 0.5156 0.6127941 Experience 1.4039 0.19857 7.0702 1.9E-0642 Test Score 0.25089 0.07735 3.2433 0.0047843

    Note: Columns F-I are not shown.

    t Test for Significanceof Individual Parameters

    t statistic and p-value used to test for theindividual significance of Test Score

    t Test for Significance

  • 8/3/2019 OM Forecasting GLM(en)

    39/70

    # Slide

    t Test for Significanceof Individual Parameters

    bsb11

    1 40391986 7 07 .. .bsb11

    1 40391986 7 07 .. .

    bsb

    2

    2

    2508907735

    3 24 ..

    .bsb

    2

    2

    2508907735

    3 24 ..

    .

    Test Statistics

    Conclusions Reject both H 0: 1 = 0 and H 0: 2 = 0.Both independent variables aresignificant.

  • 8/3/2019 OM Forecasting GLM(en)

    40/70

    # Slide

    Testing for Significance: Multicollinearity

    The term multicollinearity refers to the correlationamong the independent variables.

    When the independent variables are highly correlated

    (say, | r | > .7), it is not possible to determine theseparate effect of any particular independent variableon the dependent variable.

  • 8/3/2019 OM Forecasting GLM(en)

    41/70

    # Slide

    Testing for Significance: Multicollinearity

    Every attempt should be made to avoid includingindependent variables that are highly correlated.

    If the estimated regression equation is to be used onlyfor predictive purposes, multicollinearity is usuallynot a serious problem.

  • 8/3/2019 OM Forecasting GLM(en)

    42/70

    # Slide

    Modeling Curvilinear Relationships

    This model is called a second-order model with onepredictor variable.

    y x x 0 1 1 2 12y x x 0 1 1 2 12

    To account for a curvilinear relationship, we might set z1 = x1 and z2 = .21 x

  • 8/3/2019 OM Forecasting GLM(en)

    43/70

    # Slide

    Modeling Curvilinear RelationshipsExample: Reynolds, Inc.,

    Managers at Reynolds want toinvestigate the relationshipbetween length of employmentof their salespeople and thenumber of electronic laboratoryscales sold.Data

  • 8/3/2019 OM Forecasting GLM(en)

    44/70

    # Slide

    Modeling Curvilinear RelationshipsScatter Diagram for the Reynolds Example

  • 8/3/2019 OM Forecasting GLM(en)

    45/70

    # Slide

    Modeling Curvilinear RelationshipsLet us consider a simple first-order model and the

    estimated regression is

    Sales = 111 + 2.38 Months,

    where :Sales = number of electronic laboratory scales sold,Months = the number of months the salesperson

    has been employed

  • 8/3/2019 OM Forecasting GLM(en)

    46/70

    # Slide

    Modeling Curvilinear RelationshipsMINITAB output first-order model

  • 8/3/2019 OM Forecasting GLM(en)

    47/70

    # Slide

    Modeling Curvilinear RelationshipsStandardized Residual plot first-order model

    The standardized residual plot suggests that acurvilinear relationship is needed

  • 8/3/2019 OM Forecasting GLM(en)

    48/70

    # Slide

    Modeling Curvilinear RelationshipsReynolds Example : The second-order model

    The estimated regression equation isSales = 45.3 + 6.34 Months + .0345 MonthsSq

    where :Sales = number of electronic laboratory scales sold,MonthsSq = the square of the number of months the

    salesperson has been employed

  • 8/3/2019 OM Forecasting GLM(en)

    49/70

    # Slide

    Modeling Curvilinear RelationshipsMINITAB output second-order model

  • 8/3/2019 OM Forecasting GLM(en)

    50/70

    # Slide

    Modeling Curvilinear RelationshipsStandardized Residual plot second-order model

    V i bl S l ti P d

  • 8/3/2019 OM Forecasting GLM(en)

    51/70

    # Slide

    Variable Selection Procedures

    Stepwise Regression

    Forward SelectionBackward Elimination

    Iterative; one independentvariable at a time is added or

    deleted based on the F statistic

    Variable Selection: Stepwise

  • 8/3/2019 OM Forecasting GLM(en)

    52/70

    # Slide

    Variable Selection: StepwiseRegression

    Compute F stat. and

    p-value for each indep.variable not in model

    Start with no indep.variables in model

    Any p-value > alpha

    to remove?

    Stop

    Indep. variablewith largest

    p-value isremovedfrom model

    Compute F stat. and

    p-value for each indep.variable in model

    Any p-value < alpha

    to enter ?

    Indep. variable with

    smallest p-value isentered into model

    NoNo

    YesYes

    nextiteration

  • 8/3/2019 OM Forecasting GLM(en)

    53/70

    Variable Selection: Backward

  • 8/3/2019 OM Forecasting GLM(en)

    54/70

    # Slide

    Variable Selection: BackwardElimination

    Stop

    Compute F stat. and p-value for each indep.

    variable in model

    Any p-value > alpha

    to remove?

    Indep. variable withlargest p-value is

    removed from model

    No

    Yes

    Start with all indep.

    variables in model

    Qualitative Independent Variables

  • 8/3/2019 OM Forecasting GLM(en)

    55/70

    # Slide

    In many situations we must work with qualitativeindependent variables such as gender (male, female),method of payment (cash, check, credit card), etc.

    For example, x2

    might represent gender where x2

    = 0indicates male and x2 = 1 indicates female.

    Qualitative Independent Variables

    In this case, x2 is called a dummy or indicator variable.

  • 8/3/2019 OM Forecasting GLM(en)

    56/70

    Qualitative Independent Variables

  • 8/3/2019 OM Forecasting GLM(en)

    57/70

    # Slide

    471

    58100166

    9210

    5684633

    7810086

    82868475808391

    887375

    81748779947089

    2443

    23.7

    34.335.838

    22.223.13033

    3826.636.2

    31.62934

    30.133.928.230

    Exper. Score ScoreExper.Salary SalaryDegr.NoYesNo

    YesYesYesNoNoNoYes

    Degr.YesNoYes

    NoNoYesNoYesNoNo

    Qualitative Independent Variables

    Estimated Regression Equation

  • 8/3/2019 OM Forecasting GLM(en)

    58/70

    # Slide

    Estimated Regression Equation

    y = b0 + b1x1 + b2x2 + b3x3

    ^where:

    y = annual salary ($1000)x1 = years of experiencex2 = score on programmer aptitude testx3 = 0 if individual does not have a graduate degree

    1 if individual does have a graduate degree

    x3 is a dummy variable

    Qualitative Independent Variables

  • 8/3/2019 OM Forecasting GLM(en)

    59/70

    # Slide

    Excels Regression Statistics

    A B C

    23 24 SUMMARY OUTPUT25

    26 Regression Statistics 27 Multiple R 0.92021523928 R Square 0.84679608529 Adjus ted R Square 0.81807035130 Standard Error 2.39647510131 Observations 2032

    Qualitative Independent Variables

  • 8/3/2019 OM Forecasting GLM(en)

    60/70

    Qualitative Independent Variables

  • 8/3/2019 OM Forecasting GLM(en)

    61/70

    # Slide

    Excels Regression Equation Output

    A B C D E

    38

    39 Coeffic. Std. Err. t Stat P-value 40 Intercept 7.94485 7.3808 1.0764 0.297741 Experience 1.14758 0.2976 3.8561 0.001442 Test Score 0.19694 0.0899 2.1905 0.0436443 Grad. Degr. 2.28042 1.98661 1.1479 0.2678944

    Note: Columns F-I are not shown.

    Not significant

    Qualitative Independent Variables

    More Complex Qualitative Variables

  • 8/3/2019 OM Forecasting GLM(en)

    62/70

    # Slide

    More Complex Qualitative Variables

    If a qualitative variable has k levels, k - 1 dummyvariables are required, with each dummy variablebeing coded as 0 or 1.

    For example, a variable with levels A, B, and C couldbe represented by x1 and x2 values of (0, 0) for A, (1, 0)for B, and (0,1) for C.

    Care must be taken in defining and interpreting the

    dummy variables.

  • 8/3/2019 OM Forecasting GLM(en)

    63/70

    Interaction

  • 8/3/2019 OM Forecasting GLM(en)

    64/70

    # Slide

    Interaction

    y x x x x x x 0 1 1 2 2 3 12

    4 22

    5 1 2y x x x x x x 0 1 1 2 2 3 12

    4 22

    5 1 2

    This type of effect is called interaction.

    In this model, the variable z5 = x1 x2 is added to accountfor the potential effects of the two variables actingtogether.

    If the original data set consists of observations for y and

    two independent variables x1 and x2 we might develop asecond-order model with two predictor variables.

    Interaction

  • 8/3/2019 OM Forecasting GLM(en)

    65/70

    # Slide

    Interaction

    Example: Tyler Personal Care

    New shampoo products, two factors believed to havethe most influence on sales are unit selling price andadvertising expenditure.Data

  • 8/3/2019 OM Forecasting GLM(en)

    66/70

  • 8/3/2019 OM Forecasting GLM(en)

    67/70

    I t ti

  • 8/3/2019 OM Forecasting GLM(en)

    68/70

    # Slide

    InteractionTo account for the effect of interaction, use the

    following regression model

    where : y = unit sales (1000s), x1 = price ($), x2 = advertising expenditure ($1000s).

    21322110 x x x x y

  • 8/3/2019 OM Forecasting GLM(en)

    69/70

    # Slide

    InteractionGeneral Linear Model involving three independent

    variables ( z1, z2, and z3)

    where : y= Sales = unit sales (1000s) z1 = x1 (price) = price of the product ($) z2 = x2 (AdvExp) = advertising expenditure ($1000s)

    z3 = x1 x 2 (PriceAdv) = interaction term(Price times AdvExp)

    3322110 z z z y

    I i

  • 8/3/2019 OM Forecasting GLM(en)

    70/70

    InteractionMINITAB Output for the Tyler Personal Care Example