Blended Lean Six Sigma Black Belt Training – ABInBev Correlation and Regression ©2010 ASQ. All...
-
Upload
kaela-gomm -
Category
Documents
-
view
232 -
download
10
Transcript of Blended Lean Six Sigma Black Belt Training – ABInBev Correlation and Regression ©2010 ASQ. All...
Blended Lean Six Sigma Black Belt Training –
ABInBev
Correlation and Regression
©2010 ASQ. All Rights Reserved.
© 2010 ASQ. All Rights Reserved.2
Module Objectives
Learn and apply some key Black Belt tools used to analyze your data• How to develop and interpret the correlation between
variables • Develop a mathematical model expressing the
relationship—egressiono Regressiono Simple Linear Regressiono Multiple Linear Regressiono Logistic Regression
This review module is aligned with your Moresteam Web Training, Session 7: Identifying Root Cause
© 2010 ASQ. All Rights Reserved.3 © 2010 ASQ. All Rights Reserved.
So Where Are We Now?
• We have understood our process using process maps and FMEA.
• Created graphs and charts to visualize what is happening in our process—seven basic tools.
• Validated our measurement system to ensure our data is both precise and accurate—Gage R&R.
• Collected data to establish our process performance using process capability analysis.
• Now we are going to use the statistical tools to infer cause and effect/uncover underlying relationships.
© 2010 ASQ. All Rights Reserved.4
TermsCorrelation
• Used when both Y and X are continuous• Measures the strength of linear relationship between Y and X• Metric: Pearson Correlation Coefficient, r (r varies between -1 and +1)
o Perfect positive relationship: r = 1o No relationship: r = 0o Perfect negative relationship: r = -1
Regression
• Simple linear regression used when both Y and X are continuous• Quantifies the relationship between Y and X (Y = b0 + b1X)
• Metric: Coefficient of Determination, R-Sq (varies from 0.0 to 1.0 or zero to 100%)o None of the variation in Y is explained by X, R-Sq = 0.0o All of the variation in Y is explained by X, R-Sq = 1.0
© 2010 ASQ. All Rights Reserved.5
Correlation Coefficients: Illustration
1031021011009998
-98
-99
-100
-101
-102
-103
X
-Y
SCATTERPLOT OF Y VERSUS X
220210200
210
200
190
180
X
Y
SCATTERPLOT OF Y VERSUS X
r = 0.0
r = -1.0
1031021011009998
103
102
101
100
99
98
X
Y
SCATTERPLOT OF Y VERSUS X
r = +1.0
© 2010 ASQ. All Rights Reserved.6 © 2010 ASQ. All Rights Reserved.
Correlation: Minitab Example
• Voltage for the same power supply is measured at Station 1 and Station 2.
• Determine the correlation for voltage between the two stations.
Approach:• Open Datafile:CORRELAT.mtw
(the data are displayed in the Data Window)
• Go to Stat > Basic Statistics > Correlation…
© 2010 ASQ. All Rights Reserved.7 © 2010 ASQ. All Rights Reserved.
Correlation: Minitab Example (Continued)
1. Select C1 Station 1 and C2 Station 2
2. Select Display p-values
2
1 Graph > Scatterplot…Simple
© 2010 ASQ. All Rights Reserved.8 © 2010 ASQ. All Rights Reserved.
Correlation: Minitab Example (Continued)
From Minitab Session Window
Null Hypothesis:no correlation between Station 1 and Station 2 (H0 is false because p is less than 0.05)
9.69.49.29.08.88.6
9.4
9.3
9.2
9.1
9.0
8.9
8.8
8.7
8.6
8.5
Station 2
Sta
tion 1
Scatterplot of Station 1 vs Station 2
© 2010 ASQ. All Rights Reserved.9 © 2010 ASQ. All Rights Reserved.
ABI Example 1 – Correlation
This project related to measuring client satisfaction in the BSC.
• Client satisfaction was measured by a monthly survey of five general questions
• Four answers could be given for each question: ”very dissatisfied”, “dissatisfied”, “satisfied”, and “very satisfied”.
• The questions were about response time, language knowledge, helpfulness, quality of solution, and knowledge.
• A correlation test was run to determine if there is a relationship between the questions—meaning that if a low score in one area might mean a low score in another, and so on…
Isabelle Verdoodt and Matthias Pindur Belt Project, Zone WE
© Anheuser-Busch InBev. All Rights Reserved.
© 2010 ASQ. All Rights Reserved.10 © 2010 ASQ. All Rights Reserved.
ABI Example 1 – Correlation (Continued)
Is there a correlation betweeen customer satisfaction questions?
Isabelle Verdoodt and Matthias Pindur Belt Project, Zone WE
© Anheuser-Busch InBev. All Rights Reserved.
There is only correlationbetween the questionsHelpfulness and Knowledge.
© 2010 ASQ. All Rights Reserved.11 © 2010 ASQ. All Rights Reserved.
ABI Example 2 – Correlation
• POC buyout is a type of trade investment to POC with the agreement about volume commitment, loyalty request, or other conditionality.
• POC buyout is a key driver of core+ and premium business in the restaurant channel and the nightlife channel.
• It is the single biggest investment in China, accounting for 45% of total China commercial investments (2.4 billion RMB in 2011).
• A correlation test was run to determine if there was any correlation between the volume sold and investment made for four different brands.
Luke Zhou Belt Project, Zone APAC
© Anheuser-Busch InBev. All Rights Reserved.
© 2010 ASQ. All Rights Reserved.12 © 2010 ASQ. All Rights Reserved.
ABI Example 2 – Correlation (Continued)
Pearson Correlation P-Value
Volume vs. Investment / case – Bud SBT
0.819 0.000
Volume vs. Investment / case – HICE 500
0.594 0.000
Volume vs. Investment / case - Bud BBT
.890 0.000
Volume vs. Investment / case – HICE 600
.139 0.312
Is there a correlation?What is the strength of the relationship?
Luke Zhou Belt Project, Zone APAC
© Anheuser-Busch InBev. All Rights Reserved.
© 2010 ASQ. All Rights Reserved.13 © 2010 ASQ. All Rights Reserved.
Testing Method Selection Matrix
Variable Type Attribute Y Count Y Continuous Y
Discrete X
1 or 2 TreatmentsProportions
3+ TreatmentsChi Square
1 or 2 Treatments Poisson
3+ Treatments Chi Square
1 or 2 TreatmentsT tests
3+ TreatmentsANOVA
Continuous XLogistic Regression
Logistic Regression
Least Squares Regression
© 2010 ASQ. All Rights Reserved.14
Simple Linear Regression Analysis
Y valuepopulaton the of
) value'fitted(" estimate an is Y where
XbbY
• Used to fit lines and curves to data when the parameters (bs) are linear
• The fitted lines:o Quantify the relationship between the predictor (input) variable (X)
and response (output) variable (Y)o Help to identify the vital few Xs o Enable predictions of the response Y to be made from a knowledge of
the predictor Xo Identify the impact of controlling a process input variable (X) on a
process output variable (Y)
• Produces an equation of the form:
© 2010 ASQ. All Rights Reserved.15
Regression: Minitab Example 1
• A Black Belt in the Supply department is tracking the output of voltage at two different stations. Voltage is measured at Station 1 and Station 2.
• A Black Belt is given the task of predicting the voltage at Station 2 from the voltage measured at Station 1.
• Stat>Regression>Fitted Line Plot
Approach:• Open Datafile: CORRELAT.mtw
(the data are displayed in the Data Window)• Go to Stat > Regression > Fitted Line Plot…
© 2010 ASQ. All Rights Reserved.16
Regression: Minitab Example 1 (Continued)
© 2010 ASQ. All Rights Reserved.17
Regression: Minitab Example 1 (Continued)
Prediction equation
Coefficient of Determination: use R-Sq for simple linear regression (one X)
Fitted line: obeys the prediction equation
© 2010 ASQ. All Rights Reserved.18
Regression: Minitab Example 1 (Continued)
• From the Session Window, the regression equation is:
Station 2 = -0.3402 + 1.054 Station 1
o The intercept (b0) is where the fitted line (regression line) crosses the Y-axis when X = 0.
o The slope, b1, is “rise over run”, or DY/DX.
• The coefficients b0 and b1 are estimates of the population parameters b0 and b1: they are linear coefficients.
Intercept, b0 Slope, b1
Practically, what does this mean? • You can measure the voltage only at Station 1 and plug it into the equation.• You can then predict the voltage at Station 2..
As a result of the regression equation, you no longer need to measure the voltage at Station 2.
© 2010 ASQ. All Rights Reserved.19 © 2010 ASQ. All Rights Reserved.
Statistical Significance – Minitab Example 2
• An analysis of variance (ANOVA) table informs us about the statistical significance of the regression analysis.
• Hypothesis for Regression:
– H0: The regression results from common cause variation—when H0 is true, there is no statistically significant regression, and the best prediction of Y is the mean of Y.
– Ha: The regression is statistically significant.– Look at the p-value used to evaluate the null
hypothesis; in this case, alpha = 0.05. So if p is less than alpha, then reject the null
hypothesis. You can conclude that the regression is statistically
significantApproach:• Use Datafile:REGRESSANOVA.mtw• Go to Stat > Regression… >Regression
© 2010 ASQ. All Rights Reserved.20
ANOVA for Simple Linear Regression – Minitab Example 2 (Continued)
REGRESSANOVA.mtwStat > Regression… >Regression
© 2010 ASQ. All Rights Reserved.21 © 2010 ASQ. All Rights Reserved.
ANOVA for Simple Linear Regression – Minitab Example 2 (Continued)
Regression is significant: p < 0.05
What is R-sq value telling us?
© 2010 ASQ. All Rights Reserved.22 © 2010 ASQ. All Rights Reserved.
Analysis of Residuals – Minitab Example 2 (Continued)
• Residuals are used to test the adequacy of the prediction equation (model)
• In residual plots, three types of plots indicate model inadequacy• The plots will be dramatic—not subtle!
1. Fans 2. Bands sloping up or down
3. Curved bands
© 2010 ASQ. All Rights Reserved.23 © 2010 ASQ. All Rights Reserved.
Analysis of Residuals – Minitab Example 2 (Continued)
Do you see any patterns in the residuals that might indicate model inadequacy?
© 2010 ASQ. All Rights Reserved.24
Regression: Minitab Example 3
Illustrating the analysis of residuals
Use Datafile: RESIDUALS.mtwGo to Stat > Regression…
>Fitted Line Plot Linear
© 2010 ASQ. All Rights Reserved.25 © 2010 ASQ. All Rights Reserved.
Regression: Minitab Example 3 (Continued)
• R-Sq is 89.7%.• The regression is significant.• Can we do better?• How do the residuals look?
© 2010 ASQ. All Rights Reserved.26 © 2010 ASQ. All Rights Reserved.
Regression: Minitab Example 3 (Continued)
Not quite random!
What do the Residuals look like? Is the straight line a best fit? What do you suggest?
© 2010 ASQ. All Rights Reserved.27
Regression: Minitab Example 3 (Continued)
Continuing with the same example …..Use Datafile: RESIDUALS.mtwGo to Stat > Regression… >Fitted Line Plot > Quadratic
Illustrating the analysis of residuals
© 2010 ASQ. All Rights Reserved.28
Regression: Minitab Example 3 (Continued)
Improving the model adequacy increased R-Sq from 89.7% to 95.0%
How do the residuals look?
© 2010 ASQ. All Rights Reserved.29 © 2010 ASQ. All Rights Reserved.
ABI Example 1: Correlation and Regression
Trying to determine if there is a relationship between Customer Delivery Performance and Forecast Accuracy?
What is the Regression Equation?
UKI Forecast Accuracy (FA)
© Anheuser Busch InBev. All Rights Reserved.
Gustavo Burger Belt Project – Zone WE
© 2010 ASQ. All Rights Reserved.30 © 2010 ASQ. All Rights Reserved.
50000400003000020000100000
25
20
15
10
5
Vol
Inv/case
S 1. 58346
R- Sq 82. 1%
R- Sq(调整) 81. 4%
I nv/ case = 8. 123 + 0. 000809 Vol- 0. 000000 Vol **2
Bud SBT
Pearson correlation: 0.819
P value:0.00
Legend:Volume is units sold
Investment per case is how much money is paid to the POC
Luke Zhou Belt Project - Zone APAC
ABI Example 2: Correlation and Regression
© Anheuser Busch InBev. All Rights Reserved.
© 2010 ASQ. All Rights Reserved.31 © 2010 ASQ. All Rights Reserved.
The regression equation is: Gross Inv Val = 3,531,232 + 854,979 Vol pack (MM bbl)
Predictor Coef SE Coef T PConstant 3531232 1751989 2.02 0.072Vol pack (MM bbl) 854979 191217 4.47 0.001
S = 2079206 R-Sq = 66.7% R-Sq(adj) = 63.3%
Analysis of VarianceSource DF SS MS F PRegression 1 8.64274E+13 8.64274E+13 19.99 0.001Residual Error 10 4.32310E+13 4.32310E+12Total 11 1.29658E+14
Regression Analysis: Gross Inv Val vs. Volume packaged
Katie Shiro Belt Project, Zone NA
What is the regression equation?What is the Rsq (adj) figure telling you?
ABI Example 3: Correlation and Regression – Spare Parts Inventory
Determine whether these is a correlation between the inventory value of spare parts and the volume packaged at each brewery.
© Anheuser Busch InBev. All Rights Reserved.
© 2010 ASQ. All Rights Reserved.32
Multiple Linear Regression – Exercise 1 (Continued)
55443322110 XbXbXbXbXbbY
Our goal is to fit a multiple regression of the following form:
This example will illustrate the following additional aspects of multiple regression:
1. Elimination of X-variables that have no explanatory power2. Residual analysis
© 2010 ASQ. All Rights Reserved.33
Multiple Factor Correlation and Regression
Data on water usage has been collected along with data on factors that may be used to predict water usage. The factors were average temperature, production volume, number of associates, number of days of plant operation, and number of visitors.
Data is in Water Usage.mtw
© 2010 ASQ. All Rights Reserved.34
Multiple Factor Regression
Stat>Regression>General RegressionRecommend you always turn this
option on.
© 2010 ASQ. All Rights Reserved.35
Session WindowRegression EquationWater Usage = 6805.38 + 17.2286 Average Temp + 0.221781 Production - 138.578 Operating Days - 26.4302 Associates - 1.59134 Visitors
Coefficients
Term Coef SE Coef T P VIFConstant 6805.38 1461.69 4.65583 0.001Average Temp 17.23 6.64 2.59330 0.025 1.26281Production 0.22 0.05 4.41450 0.001 6.74070Operating Days -138.58 55.09 -2.51543 0.029 1.27287Associates -26.43 9.27 -2.85192 0.016 6.77552Visitors -1.59 3.19 -0.49900 0.628 1.03867
Visitors are not significant and should be removed from the model.
© 2010 ASQ. All Rights Reserved.36
Reduced Model
Coefficients
Term Coef SE Coef T P VIFConstant 6687.10 1396.48 4.78854 0.000Average Temp 16.96 6.41 2.64580 0.021 1.25479Production 0.22 0.05 4.53564 0.001 6.71249Operating Days -138.35 53.34 -2.59393 0.023 1.27278Associates -25.89 8.91 -2.90510 0.013 6.68148
Variance Inflation Factor (VIF) checks for factors that are co-linear. Co-linear factors may cause invalid models and should be avoided. Rule of thumb: VIFs < 8 are not a problem. If factors are highly correlated, try removing one from the model or using Partial Least Squares Regression.
Regression EquationWater Usage = 6687.1 + 16.9643 Average Temp + 0.220159 Production - 138.354 Operating Days - 25.8854 Associates
© 2010 ASQ. All Rights Reserved.37
The Rest of the Session Window
Summary of Model
S = 276.626 R-Sq = 76.10% R-Sq(adj) = 68.14%PRESS = 1588865 R-Sq(pred) = 58.65%
Analysis of Variance
Source DF Seq SS Adj SS Adj MS F PRegression 4 2924259 2924259 731065 9.5537 0.0010367 Average Temp 1 315092 535674 535674 7.0003 0.0213440 Production 1 1562688 1574213 1574213 20.5720 0.0006830 Operating Days 1 400666 514877 514877 6.7285 0.0234871 Associates 1 645813 645813 645813 8.4396 0.0132008Error 12 918264 918264 76522Total 16 3842523
Standard deviation of the error term
2 1 Error
Total
SSSS
r
12 ErrorAdjusted
Total
MS n-1r =
MS n-number of factors-1
How well the model is
expected to predict new observations.
© 2010 ASQ. All Rights Reserved.38
Residual Analysis
5002500-250-500
99
90
50
10
1
Residual
Per
cent
N 17AD 0.159P-Value 0.938
50004500400035003000
500
0
-500
Fitted Value
Resi
dual
4002000-200-400
4.5
3.0
1.5
0.0
Residual
Frequen
cy
161412108642
500
0
-500
Observation Order
Res
idual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Water Usage
Worksheet: Water Usage.MTW
The residuals are normally distributed with a mean of zero and a constant variance. There is no reason to reject the model.
© 2010 ASQ. All Rights Reserved.39
Let’s Use the Model to Predict UsageYou have been asked to predict the amount of usage for a month with an average temperature of 68, production of 1400, 20 days of operation, and 175 associates.
Do Control + E to bring back previous dialog box
© 2010 ASQ. All Rights Reserved.40
The Prediction
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI 1 851.869 666.567 (-600.455, 2304.19) (-720.553, 2424.29)
The predicted value
However, because of the low r2
Predicted the prediction intervals are very wide.
However, because of the low r2 Predicted , the prediction intervals are very wide.
© 2010 ASQ. All Rights Reserved.41
Multiple Regression: ABI Example 1 – Brand Health
Pedro Lozada Belt Project – Zone GHQ
SUMMARY OUTPUT
Regression StatisticsMultiple R 0.82R Square 0.68Adjusted R Square 0.65Standard Error 0.17Observations 48.00
ANOVAdf SS MS F Significance F
Regression 4 2.46 0.61 22.50 4.45E-10Residual 43 1.17 0.03Total 47 3.63
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept -3.27 2.08 -1.58 0.12 -7.46 0.92ln(Becks Price) -1.76 0.44 -3.98 0.00 -2.65 -0.87ln(Competitor Price) 1.74 0.20 8.57 0.00 1.33 2.15ln(Becks Media) 0.02 0.01 2.48 0.02 0.00 0.04ln(Competitor Media) -0.03 0.04 -0.63 0.53 -0.11 0.06
This output is from Excel.What is the significance telling us?How do you interpret the Rsq?
- 1.76 = Increase of 10% in price will decrease the share by – 17.6%
Is there a relationship between price and market share?
© Anheuser Busch InBev. All Rights Reserved.
© 2010 ASQ. All Rights Reserved.42 © 2010 ASQ. All Rights Reserved.
ABI Example 2 – UK CDP Performance (Multiple Regression Analysis)
What is the prediction model between Customer Delivery Performance in the UK and line efficiency (LEF)?
© Anheuser Busch InBev. All Rights Reserved.
Gustavo Burger Belt Project – Zone WE
© 2010 ASQ. All Rights Reserved.43 © 2010 ASQ. All Rights Reserved.
ABI Example 3 – Multiple Regression
Price Change vs. Ad Feature
Use multi-variable regression toseparate the impact of a pricedecrease vs. placing the productin the ad feature.
Source: NC Food Lion Natural Light 24pks
© Anheuser Busch InBev. All Rights Reserved.
Mike Zacharias Belt Project – Zone NA
© 2010 ASQ. All Rights Reserved.44
Practically what does this mean?
ABI Example 3 (Continued)
What is the regression equation?
From the regression equation: A $1 price decrease is worth 1.8 share points, and an ad feature is worth 6.0 share points.
© Anheuser Busch InBev. All Rights Reserved.
Mike Zacharias Belt Project – Zone NA
© 2010 ASQ. All Rights Reserved.45 © 2010 ASQ. All Rights Reserved.
Logistic Regression
• Logistic regression is a variation of ordinary regression which is used when:o The dependent (response) variable is a dichotomous
variable (i.e., it takes only two values, which usually represent the occurrence or non-occurrence of some outcome event, usually coded as 0 or 1).
o The independent (input) variables are continuous, categorical, or both.
© 2010 ASQ. All Rights Reserved.46 © 2010 ASQ. All Rights Reserved.
Testing Method Selection Matrix
Variable Type Attribute Y Count Y Continuous Y
Discrete X
1 or 2 TreatmentsProportions3+ TreatmentsChi Square
1 or 2 Treatments Poisson
3 + Treatments Chi Square
1 or 2 TreatmentsT tests
3 + TreatmentsANOVA
Continuous XLogistic Regression
Logistic Regression
Least Squares Regression
© 2010 ASQ. All Rights Reserved.47 © 2010 ASQ. All Rights Reserved.
Logistic Regression
• Logistic Regression evaluates the occurrence of the event in terms of its probability.o If an event happens (success), the probability is “p”o The probability of the event not happening is given by (1-p)
• Odds of success relative to failure is the ratio of p/(1-p)• The logistic regression model is fitted to the natural logarithm of
the odds Ln {p/(1-p)}• The statistical model for logistic regression is:
Log (p/1 − p) = β0 + β1xo where p is a binomial proportion and x is the input factor.o The parameters of the logistic model are β0 and β1.
© 2010 ASQ. All Rights Reserved.48 © 2010 ASQ. All Rights Reserved.
The Logistic Function
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
The logistic function starts very close to 0, then rises rapidly as the event probability threshold is approached, then asymptotically approaches 1.
Datafile/EXHREG.XLS
Pro
babi
lity
of e
vent
© 2010 ASQ. All Rights Reserved.49 © 2010 ASQ. All Rights Reserved.
An Example
A cereal company want to determine the factors that increase the probability a consumer will purchase their product. Data was collected on 71 consumers to determine the effect of whether they had seen an advertisement, whether they have children, their income, and if they purchased the cereal. Data is in Logistic Regression Cereal Ad.mtw.
© 2010 ASQ. All Rights Reserved.50 © 2010 ASQ. All Rights Reserved.
Set Up the Analysis
Discrete factors that are included in the model
are entered in the Factors box.
Stat>Regression>Binary Logistics Regression
© 2010 ASQ. All Rights Reserved.51 © 2010 ASQ. All Rights Reserved.
Option and Graphs
© 2010 ASQ. All Rights Reserved.52 © 2010 ASQ. All Rights Reserved.
Logistic Regression Output
Variable Value CountBought Yes 34 (Event) No 37 Total 71
Logistic Regression Table
Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant -5.21059 1.31033 -3.98 0.000Income 0.0563140 0.0230953 2.44 0.015 1.06 1.01 1.11Children Yes 2.69208 1.13832 2.36 0.018 14.76 1.59 137.43ViewAd Yes 1.76941 0.658335 2.69 0.007 5.87 1.61 21.32
Log-Likelihood = -30.480Test that all slopes are zero: G = 37.341, DF = 3, P-Value = 0.000
The null hypothesis is that the factor has no effect on the event probability.
All three factors are statistically significant
© 2010 ASQ. All Rights Reserved.53 © 2010 ASQ. All Rights Reserved.
Model Integrity
Goodness-of-Fit Tests
Method Chi-Square DF PPearson 45.1757 49 0.629Deviance 44.8648 49 0.641Hosmer-Lemeshow 6.4373 8 0.598
Measures of Association:(Between the Response Variable and Predicted Probabilities)
Pairs Number Percent Summary MeasuresConcordant 1105 87.8 Somers' D 0.76Discordant 145 11.5 Goodman-Kruskal Gamma 0.77Ties 8 0.6 Kendall's Tau-a 0.39Total 1258 100.0
The null hypothesis for goodness of fit is that the model fits. Do not reject the null hypothesis and conclude the model fits.
© 2010 ASQ. All Rights Reserved.54 © 2010 ASQ. All Rights Reserved.
The Chi-Square vs. Probability Graph
0.90.80.70.60.50.40.30.20.10.0
14
12
10
8
6
4
2
0
Probability
Delt
a C
hi-
Square
Delta Chi-Square versus Probability
Worksheet: Logistic Regression Cereal Ad.MTW
Right-click on the graph and brush the outliers.
Note them in the data sheet.
© 2010 ASQ. All Rights Reserved.55 © 2010 ASQ. All Rights Reserved.
Prepare a Graph of the Results
Do Control + e to bring back previous dialog box
© 2010 ASQ. All Rights Reserved.56 © 2010 ASQ. All Rights Reserved.
Storing the Data
© 2010 ASQ. All Rights Reserved.57 © 2010 ASQ. All Rights Reserved.
Preparing the Graph
Graph>Scatterplot
© 2010 ASQ. All Rights Reserved.58 © 2010 ASQ. All Rights Reserved.
Presenting the Results
605040302010
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Income
Pro
babilit
y of Purc
hase
No NoNo YesYes NoYes Yes
Children ViewAd
Worksheet: Logistic Regression Cereal Ad.MTW
Probability of Purchase vs Income
© 2010 ASQ. All Rights Reserved.59
Exercise – Your Turn
Data was collected for the outcome of emergency room admissions. A hospital administrator would like help determining if any of the factors collected could be used to predict the probability of dying in the hospital.
The data is in Datafile/Emergency.MTW.
A definition of the terms is given inDatafile/EmergencyFileTerms.DOC.
© 2010 ASQ. All Rights Reserved.60
What Have We Covered?
Learned and applied key tools to analyze your data• How to develop and interpret the correlation between
variables • Develop a mathematical model expressing the
relationship—regressiono Regressiono Simple Linear Regressiono Multiple Linear Regressiono Logistic Regression
© 2010 ASQ. All Rights Reserved.61 © 2010 ASQ. All Rights Reserved.
In the Next Module . . .
• We will learn how to determine the proper sample size and the power of the test
• We will use Minitab to determine:– Sample size– Delta– Power
© 2010 ASQ. All Rights Reserved.62
Supplemental Material
© 2010 ASQ. All Rights Reserved.63
Exercise Solution – Emergency Room
© 2010 ASQ. All Rights Reserved.64
Exercise Solution – Emergency Room (Continued)
Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -5.74590 1.27590 -4.50 0.000
Age 0.0342199 0.0117207 2.92 0.004 1.03 1.01 1.06
Sex
1 -0.374718 0.411645 -0.91 0.363 0.69 0.31 1.54
Race
2 -1.16640 1.09116 -1.07 0.285 0.31 0.04 2.64
3 0.269519 0.907951 0.30 0.767 1.31 0.22 7.76
Ser
1 -0.394346 0.432915 -0.91 0.362 0.67 0.29 1.57
Can
1 1.83110 0.849745 2.15 0.031 6.24 1.18 33.00
PRE
1 0.571998 0.546810 1.05 0.296 1.77 0.61 5.17
TYP
1 2.87674 0.918809 3.13 0.002 17.76 2.93 107.51
Age, TYP, and Can are significant
© 2010 ASQ. All Rights Reserved.65
Exercise Solution – Emergency Room (Continued)
© 2010 ASQ. All Rights Reserved.66
Exercise Solution – Emergency Room (Continued)
Logistic Regression Table
Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -6.20134 1.17173 -5.29 0.000
Age 0.0352979 0.0109595 3.22 0.001 1.04 1.01 1.06
Can
1 1.57914 0.808289 1.95 0.051 4.85 0.99 23.65
TYP
1 3.02273 0.873298 3.46 0.001 20.55 3.71 113.79
Even though Can is slightly over .05, let’s keep it in the model.
© 2010 ASQ. All Rights Reserved.67
Exercise Solution – Emergency Room (Continued)
© 2010 ASQ. All Rights Reserved.68
Exercise Solution – Emergency Room (Continued)
100908070605040302010
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Age
Pro
babili
ty o
f D
ying in
Hosp
ital 0 0
0 11 01 1
Can TYP
Worksheet: Emergency.MTW
Probability of Dying by Type of Admission vs Age