11 SSGB Amity BSI Regression

download 11 SSGB Amity BSI Regression

of 43

Transcript of 11 SSGB Amity BSI Regression

  • 8/9/2019 11 SSGB Amity BSI Regression

    1/43

    Module-11

    Linear Regression

  • 8/9/2019 11 SSGB Amity BSI Regression

    2/43

    Linear & Polynomial Regression

    Learning Objectives

    At the end of this section delegates will be able to:

    Understand the role of regression analysis within theTransactional DMAIC Improvement Process

    Understand that regression can be used to explore the

    relationships between inputs (xs) and outputs (ys)

  • 8/9/2019 11 SSGB Amity BSI Regression

    3/43

    Linear & Polynomial Regression - Agenda

    Regression within DMAIC

    Review of Scatter Diagrams

    Introduction to Regression

    Linear Regression

    Polynomial Regression

    Summary

  • 8/9/2019 11 SSGB Amity BSI Regression

    4/43

    Six Sigma Transactional Improvement Process

    15 20 25 30 35

    LSL USL

    Define Measures (ys)

    Check Data Integrity

    Determine ProcessStability

    Determine ProcessCapability

    Set Targets forMeasures

    Phase Review

    Control Critical xs

    Monitor ys

    Validate ControlPlan

    Identify furtheropportunities

    Close Project

    1 5 10 15 20

    10.2

    10.0

    9.8

    9.6

    Upper Control Limit

    Lower Control Limit

    y

    Phase Review

    Develop DetailedProcess Maps

    Identify CriticalProcess Steps (xs)by looking for:

    Process Bottlenecks Rework / Repetition

    Non-value AddedSteps

    Sources of Error /Mistake

    Map the Ideal

    Process Identify gaps

    between current andideal

    START

    PROCESSSTEPS

    DECISION

    STOP

    Phase Review

    Brainstorm PotentialImprovement Strategies

    Select ImprovementStrategy

    Plan and Implement

    Pilot Verify Improvement

    ImplementCountermeasures

    Criteria A B C D

    Time + s - +

    Cost + - + s

    Service - + - +

    Etc s s - +

    15 20 25 30 35

    LSL USL

    Phase Review

    Analyse Improve ControlMeasureDefine Select Project

    Define Project

    Objective

    Form the Team

    Map the Process

    Identify CustomerRequirements

    Identify Priorities

    Update Project File

    Phase Review

    Define

  • 8/9/2019 11 SSGB Amity BSI Regression

    5/43

    Purpose

    To show:

    How one variable changes in response to changes in

    another.

    The nature of the relationship between two

    variables.

    The strength of relationship between two variables.

    Scatter Diagram Revisited

  • 8/9/2019 11 SSGB Amity BSI Regression

    6/43

    Amount

    Overweight

    Health

    Index

    Suspected

    Cause

    AmountOverweight

    HealthIndex

    50 .5381 .32

    117 .10

    100 .1368 .59

    77 .40

    112 .2849 .4570 .50

    89 .2570 .34

    115 .1852 .60

    90 .42

    70 .43121 .1580 .49

    40 .6575 .22

    35 .58100 .35

    SuspectedCause

    High

    Low

    High

    Low

    SuspectedEffect

    Suspected

    Effect

    Scatter Diagram

  • 8/9/2019 11 SSGB Amity BSI Regression

    7/43

    Amount

    Overweight

    Health

    Index .60

    .50

    .40

    .30

    .20

    .10

    45 65 85 105 125

    Scatter Diagram

  • 8/9/2019 11 SSGB Amity BSI Regression

    8/43

    Positive CorrelationAn increase in Y may depend on

    increases in X. If X is controlled,Y could be controlled.

    Possible Positive

    CorrelationIf X is increased, Y may increase

    somewhat, but Y seems to have other

    causes than X.

    No CorrelationThere is no correlation.

    Possible Negative

    CorrelationIf X is increased, Y may decrease

    somewhat, but Y seems to have other

    causes than X.

    Negative CorrelationAn increase in X may cause a

    decrease in Y. Therefore, if X iscontrolled, Y could be controlled.

    Scatter Diagram

  • 8/9/2019 11 SSGB Amity BSI Regression

    9/43

    Scatter Diagram: Risks & Limitations

    Does not prove anything

    Both axes should be of equal length

    Conclusions must not be made outside the

    experimental range Experimental range should be wide enough to draw

    useful conclusions

  • 8/9/2019 11 SSGB Amity BSI Regression

    10/43

    At the heart of Six Sigma activities is identifying whichinputs or process steps cause unwanted variation in process

    outputs

    y = f(x)

    Regression analysis will allow us to determine which inputs

    (xs) influence our output or outputs (ys)

    We can sometimes use regression analysis to build a

    mathematical model which can be used to predict the value

    of our outputs

    Regression

  • 8/9/2019 11 SSGB Amity BSI Regression

    11/43

    Regression analysis is used when we wish

    to determine the relationship between

    two or more continuous variables In Six Sigma activities we often need to

    understand the relationship between our

    output (y) and our critical xs In Problem Solving activities this may

    help us to discover root causes

    y

    x

    In Process Improvement activities, regression analysiswill allow us to find optimal settings for our critical xs

    Why do we use Regression?

  • 8/9/2019 11 SSGB Amity BSI Regression

    12/43

    Regression Exercise

    Volume of Speech and Alcohol Volume Consumed

    Site Location and Quantity of Defects

    Shipping Defects and Customer Distance from Distribution Depot

    WIP and Yield

    Education and Salary

    Age and Beauty

    Sales and Advertising

    Pick Errors and Cycle time

    Sales Representative and Sales value

    Goals Scored per Season and Purchase Price of the Player

    Quantity Sold and Selling Price

    Speed of Query Resolution and Experience of the Operator

    Exercise - consider the following pairs of measures could we draw a

    line which might summarise the relationship/regression between them?

  • 8/9/2019 11 SSGB Amity BSI Regression

    13/43

    The simplest form of regression is single variable linear

    regression

    y is the dependent variable x is the independent variable

    The equation for linear regression is:

    y = 0 + 1x + error

    0 is the intercept

    1 is the slope

    y

    x

    Linear Regression

  • 8/9/2019 11 SSGB Amity BSI Regression

    14/43

    A finance department is carrying out an investigation into

    the number of errors that are generated on customerinvoices.

    They suspect that the number of errors may be affected by

    the volume of invoicing on any particular day.

    The following slide shows the data for the past 50 working

    days.

    Linear Regression - Example

  • 8/9/2019 11 SSGB Amity BSI Regression

    15/43

    Linear Regression - Example

    Volume

    175

    173

    201297

    165

    193

    162

    271

    179

    197

    162

    265

    221

    165154

    199

    Errors

    3

    3

    728

    5

    2

    0

    17

    5

    8

    2

    14

    9

    65

    5

    Volume

    178

    155

    186201

    241

    174

    163

    207

    188

    154

    163

    178

    210

    263162

    165

    224

    Errors

    6

    3

    57

    8

    8

    4

    10

    6

    1

    3

    5

    9

    133

    3

    6

    Volume

    155

    165

    170198

    276

    209

    186

    288

    176

    208

    163

    173

    174

    223196

    241

    283

    Errors

    2

    5

    35

    26

    3

    4

    23

    3

    4

    5

    3

    1

    117

    10

    26

  • 8/9/2019 11 SSGB Amity BSI Regression

    16/43

    Scatter Diagram

    Open Worksheet: Invoicing Errors

    Enter Errors

    in Y and

    Volume in X

    and click OK

  • 8/9/2019 11 SSGB Amity BSI Regression

    17/43

    A scatter diagram reveals that there may be a relationship between the number of

    errors and the volume of invoices. A regression analysis will reveal the existence

    and/or the strength of the relationship.

    Scatter Diagram

    Volume

    Errors

    300275250225200175150

    30

    25

    20

    15

    10

    5

    0

    Scatterplot of Errors vs Volume

  • 8/9/2019 11 SSGB Amity BSI Regression

    18/43

    We first need to establish the equation for the best fitting line which will minimise the

    sum of squares of the predicted y values from the observed y values. In short, this is

    known as the least squares method.

    Linear Regression Least Squares Method

    Volume

    Errors

    300275250225200175150

    30

    25

    20

    15

    10

    5

    0

    Scatterplot of Errors vs Volume

  • 8/9/2019 11 SSGB Amity BSI Regression

    19/43

    Regression - Minitab

    Open Worksheet: Invoicing Errors

  • 8/9/2019 11 SSGB Amity BSI Regression

    20/43

    Regression - Minitab

    1. Enter Errorsand Volume

    2. Check Linear

  • 8/9/2019 11 SSGB Amity BSI Regression

    21/43

    Volume

    Er

    rors

    300275250225200175150

    30

    25

    20

    15

    10

    5

    0

    S 2.98583

    R-Sq 79.3%

    R-Sq(adj) 78.9%

    Fitted Line PlotErrors = - 21.74 + 0.1465 Volume

    Minitab Regression Plot

    This is the equation for

    the best fit line.

    We can use it to

    predict:

    e.g. if we have 200invoices we would

    predict:

    -21.74 + 0.1465 (200)

    = 7.6 errors

  • 8/9/2019 11 SSGB Amity BSI Regression

    22/43

    Volume

    Er

    rors

    300275250225200175150

    30

    25

    20

    15

    10

    5

    0

    S 2.98583

    R-Sq 79.3%

    R-Sq(adj) 78.9%

    Fitted Line PlotErrors = - 21.74 + 0.1465 Volume

    Minitab Regression Plot

    The R-Squared and R-

    Squared (adjusted) tell

    us how much of thevariation in Errors can

    be explained by the

    changes in Volume.

    Here it is around 79%.

  • 8/9/2019 11 SSGB Amity BSI Regression

    23/43

    Volume

    Er

    rors

    300275250225200175150

    30

    25

    20

    15

    10

    5

    0

    S 2.98583

    R-Sq 79.3%

    R-Sq(adj) 78.9%

    Fitted Line PlotErrors = - 21.74 + 0.1465 Volume

    Minitab Regression Plot

    The s value is the

    standard error of

    the y values aboutthe best fit line. It

    is the standard

    deviation of the

    residuals

    (the difference

    between actual and

    best-fit y values foreach x)

  • 8/9/2019 11 SSGB Amity BSI Regression

    24/43

    Linear Regression Minitab Output

    Regression Analysis: Errors versus Volume

    The regression equation is

    Errors = - 21.74 + 0.1465 Volume

    S = 2.98583 R-Sq = 79.3% R-Sq(adj) = 78.9%

    Analysis of Variance

    Source DF SS MS F P

    Regression 1 1642.07 1642.07 184.19 0.000

    Error 48 427.93 8.92

    Total 49 2070.00

    A p value of

  • 8/9/2019 11 SSGB Amity BSI Regression

    25/43

    The analysis of variance divides up the total variation in y

    (errors) into its constituent parts.

    We can learn a lot from this table:

    1. What is the overall variation in y?

    2. Is there a significant relationship between y and x?3. How much of the variation in y is due to changes in x?4. How much variation in y is still unexplained?5. How accurate is my prediction of y for a given value of x?

    Source Degreesof Variation of Freedom Sum of Squares Mean Square F-Ratio

    Regression 1 1642.07 1642.07 184.19

    Residual (Error) 48 427.93 8.92Total 49 2070.00

    Analysis of Variance (ANOVA) for Linear Regression

  • 8/9/2019 11 SSGB Amity BSI Regression

    26/43

    6.542.245

    42.24549

    00.2070

    1

    2

    1

    ==

    ==

    n

    n

    Check this out by calculating the standard

    deviation of the 50 error results

    The total variation in y is given by the Total Sum of Squares = 2070.00

    The Total Sum of Squares =

    The total mean square =The total sum of squares

    Total Degrees of Freedom

    2)( yy

    ( ) 21

    2

    1=

    = n

    n

    yy

    What is the overall variation in y?

    Source Degrees

    of Variation of Freedom Sum of Squares Mean Square F-Ratio

    Regression 1 1642.07 1642.07 184.19

    Residual (Error) 48 427.93 8.92Total 49 2070.00

  • 8/9/2019 11 SSGB Amity BSI Regression

    27/43

    Source Degrees

    of Variation of Freedom Sum of Squares Mean Square F-Ratio

    Regression 1 1642.07 1642.07 184.19

    Residual (Error) 48 427.93 8.92Total 49 2070.00

    We can test the significance of the relationship between y and x by examining the

    F-Ratio. The F-Ratio is name after Sir Ronald Fisher, who devised this test forcomparing variances.

    F-Ratio =Regression Mean Square

    Residual Mean Square=

    1642.07

    8.92= 184.19

    Examining the F tables for F0.05,1,48 gives a value of 4.03.

    Our value of 184.19 is greater than 4.03 so we can assume that there is a

    statistically significant relationship between y and x.

    Is there a significant relationship between y and x?

  • 8/9/2019 11 SSGB Amity BSI Regression

    28/43

    Is there a significant relationship between y and x?

    Analysis of Variance

    Source DF SS MS F P

    Regression 1 1642.07 1642.07 184.19 0.000

    Error 48 427.93 8.92Total 49 2070.00

    Minitab gives a P value as the outcome of a Hypothesis Test:

    H0 = The regression is not significant (i.e. variation in the x is not significant in

    explaining the variation in the y)

    H1 = The regression is significant

    Minitabs P value is the probability that we would get this F value if the NullHypothesis were true

    Since it is below 0.05 we can conclude with at least 95% Confidence that the

    number of errors is influenced by the volume of invoices processed

  • 8/9/2019 11 SSGB Amity BSI Regression

    29/43

    Source Degreesof Variation of Freedom Sum of Squares Mean Square F-Ratio

    Regression 1 1642.07 1642.07 184.19

    Residual (Error) 48 427.93 8.92

    Total 49 2070.00

    SSTOTAL = SSREGRESSION + SSRESIDUAL

    SSTOTAL = Total Sum of Squares = Total variability in y values.

    SSREGRESSION = Regression Sum of Squares = the amount of variability in the

    y values explained by the

    regression relationship.

    SSRESIDUAL = Residual Sum of Squares = the amount of variability in the

    (or Error Sum of Squares) y values not accounted for by the

    regression relationship.

    How much variation in y is explained by changes in x?

  • 8/9/2019 11 SSGB Amity BSI Regression

    30/43

    The coefficient of determination is normally expressed as a

    percentage. It represents the percentage of the total variability

    accounted for by the regression relationship. It can also be used to

    test whether the regression accounts for a statistically significant

    amount of the total variability.

    SSTOTAL

    How much variation in y is explained by changes in x?

    The Coefficient of Determination R2

    R2 =

    SSREGRESSION 1642.07

    2070.00 = 0.79=

    Source Degreesof Variation of Freedom Sum of Squares Mean Square F-Ratio

    Regression 1 1642.07 1642.07 184.19

    Residual (Error) 48 427.93 8.92

    Total 49 2070.00

  • 8/9/2019 11 SSGB Amity BSI Regression

    31/43

    The Residual (Error) term provides us with information concerning the

    amount of variation in y which is not accounted for by the regression.

    The square root of the residual mean square is the standard error of y

    about the regression equation.

    ErrorMS = standard error of y about x

    We can use the standard error to calculate confidence intervals for y

    values for any given value of x.

    How much variation in y is still unexplained?

    Source Degrees

    of Variation of Freedom Sum of Squares Mean Square F-Ratio

    Regression 1 1642.07 1642.07 184.19

    Residual (Error) 48 427.93 8.92Total 49 2070.00

  • 8/9/2019 11 SSGB Amity BSI Regression

    32/43

    Residuals are the difference between the observed values of y andthe predicted values based on the regression model.

    ErrorMS = standard error of y about x

    Residuals

    Volume

    Errors

    300275250225200175150

    30

    25

    20

    15

    10

    5

    0

    Scatterplot of Errors vs Volume

    Actual

    value

    Predicted

    value

    Residual

  • 8/9/2019 11 SSGB Amity BSI Regression

    33/43

    Observed y Predicted y

    x y y = -21.74+0.1465x ( y - y ) ( y y )2

    155 2 0.9675 -1.0325 1.066

    165 5 2.4325 -2.5675 6.592

    170 3 2.485 -0.515 0.265

    * * * * *

    * * * * *

    * * * * *

    * * * * *

    * * * * *

    * * * * *

    199 5 6.6175 1.6175 2.616

    427.93 = SSRESIDUAL

    Residuals are the differences between the observed values of y and the predictedvalues based on the regression model. If there was no difference between these two

    entities, then we would have a perfect model. In reality, this is unlikely to occur.

    Examination of Residuals

  • 8/9/2019 11 SSGB Amity BSI Regression

    34/43

    By examining the residual plot we can check for:

    Lack of fit (model inadequacy)

    Non-constant variability When we have sufficient data points, a normality test can also

    be carried out. The distribution of residuals should be normal if

    the model is a good fit to the data.

    Residuals vs Fits

    Fitted Value

    Resid

    ual

    20151050

    7.5

    5.0

    2.5

    0.0

    -2.5

    -5.0

    Residuals Versus the Fitted Values(response is Errors)

  • 8/9/2019 11 SSGB Amity BSI Regression

    35/43

    In this case a Normality Test of the Residuals shows that they areNormal (p value > 0.05)

    Normality of Residuals

    RESI1

    Percent

    86420-2-4-6-8

    99

    95

    90

    80

    70

    60

    50

    40

    30

    20

    10

    5

    1

    Mean 3.055334E-15

    StDev 2.955

    N 50

    AD 0.342

    P-Value 0.479

    Probability Plot of RESI1Normal

  • 8/9/2019 11 SSGB Amity BSI Regression

    36/43

    ErrorMS = standard error of y about x

    Statistical software programs will use the error mean square to

    calculate confidence intervals when predicting y for a given value of

    x. We can obtain confidence intervals for the predicted mean value

    and also for the predicted individual values.

    How accurate is my prediction of y?

    Source Degrees

    of Variation of Freedom Sum of Squares Mean Square F-Ratio

    Regression 1 1642.07 1642.07 184.19

    Residual (Error) 48 427.93 8.92

    Total 49 2070.00

    H i di i f ?

  • 8/9/2019 11 SSGB Amity BSI Regression

    37/43

    How accurate is my prediction of y?

    Open Worksheet: Invoicing Errors

    H t i di ti f ?

  • 8/9/2019 11 SSGB Amity BSI Regression

    38/43

    How accurate is my prediction of y?

    1. Enter Errorsand Volume

    2. Check Linear3. Click on Options

    H t i di ti f ?

  • 8/9/2019 11 SSGB Amity BSI Regression

    39/43

    How accurate is my prediction of y?

    Tick both

    Display Options

    H t i di ti f ?

  • 8/9/2019 11 SSGB Amity BSI Regression

    40/43

    How accurate is my prediction of y?

    Volume

    Errors

    300275250225200175150

    30

    20

    10

    0

    -10

    S 2.98583

    R-Sq 79.3%

    R-Sq(adj) 78.9%

    Regression

    95% C I

    95% PI

    Fitted Line PlotErrors = - 21.74 + 0.1465 Volume

    95% Confidence Intervals show the range of values we expect for the average value of

    errors for any particular volume of invoices being processed

    95% Prediction Intervals show the range of values within which we expect 95% of the

    individual error values to be if we use the regression equation to predict this

    Precise values can be obtained within the Stat > Regression > Regression menu

    R i E i

  • 8/9/2019 11 SSGB Amity BSI Regression

    41/43

    Regression Exercises

    Question 1:A company developing healthcare software solutions is bidding for a new

    contract and has historical data on similar previous contracts. It wants to

    minimise the risk of failing to deliver the solution on time, so wants a good

    estimate of the man-years of effort needed (the output measure, or y).

    The variables previously recorded are the number of application sub-programs

    written (x1), and the number of software configuration change proposals

    implemented (x2).

    Use regression to:1. Investigate the relationship between x1 and the man-years required

    2. Investigate the relationship between x2 and the man-years required

    3. If the company estimates that 150 application sub-programs will be required,and there are likely to be 100 software configuration change proposalsimplemented, what would be your recommendation for the number of man-years they should estimate?

    Data is in Minitab Worksheet: Transactional Regression Exercises.mtw

    Regression Exercises

  • 8/9/2019 11 SSGB Amity BSI Regression

    42/43

    Regression Exercises

    Question 2:

    The team investigating the Expense Claims process have

    identified a potential input variable (x) that they believe

    could affect the amount of time taken to pay the claims. The

    potential variable is the amount of money claimed, and they

    have gathered data on amounts claimed for the 100 payment

    times they already had. Use Regression Analysis toinvestigate the relationship, and be prepared to advise the

    team on your conclusions.

    Data is in Minitab Worksheet:

    PAYMENT TIMES.mtw

    Summary Linear & Polynomial Regression

  • 8/9/2019 11 SSGB Amity BSI Regression

    43/43

    Regression Analysis can be used to identify xs that

    are affecting the ys

    A linear or polynomial regression model of y=f(x) canbe developed for individual xs

    The model can be tested to see if it is significant and

    how well it fits the data The model can be used to make predictions of y for

    given values of x

    Regression is used much more extensively inoperational and DFSS activities

    Summary - Linear & Polynomial Regression