Discrete Data Analysis

Discrete Data AnalysisDiscrete Data AnalysisThe table below summarizes the methods used to analyze discretedata.

z is a value from the normal distribution, forthe required level of confidence.

Z

2-Sided Confidence

Level

1 - Sided Confidence

Level1.282 80.00% 90.00%1.645 90.00% 95.00%1.960 95.00% 97.50%2.326 98.00% 99.00%2.576 99.00% 99.50%

• Large n (sample size)• p not too close to 0 or 1• np>10 and n(1-p)>10

PoissonConfidenceInterval

ExactBinomial Test

χχχχ2 (Chi-square)

• Large n (sample size)• small proportion defective (p<0.10)

1 Pr

opor

tion

)ˆ1(ˆ ppZ −×

±− )( ˆˆ21 pp

nn 21

11 +×

NormalApproximation

PoissonApproximation

Mor

e th

an 2

Prop

ortio

ns(a

nd 2

way

tabl

es)

Com

parin

g 2

Prop

ortio

ns

nppZp )ˆ1(ˆˆ −×±

Normal ApproximationNormal ApproximationOne ProportionOne ProportionUse this approximation when the sample size is large and the number ofdefects in the sample is greater than 10 (np>10), and the number of goodparts in the sample is greater than 10 (n(1-p)>10).

Size SampleDefects #ˆ =p

A two sided confidence interval for the proportion (p) that are defective inthe population is given by the equation:

nppZp )ˆ1(ˆˆ −×±

The resultant will provide the lower and upper limits of a range of allplausible values for the proportion of defects of that population.

nnppZpp21

2111)ˆ1(ˆ)ˆˆ( +×−×±−

The resultant will provide the lower and upper limits of a range of allplausible values of the difference between the proportion defective in thepopulations.

If “0” is included within the range of plausible values, then there is notstrong evidence that the proportions of defects in the two populations aredifferent.

Comparing Two ProportionsComparing Two ProportionsIf evaluating two different sample sets with proportion defective data, theconfidence interval for the difference in proportion defectives between thetwo sample sets is given by:

Where:If Ki = # of defects in the ith sample& ni = sample size of the ith sample

1

11ˆ n

kp = )()(ˆ

21

21nn

kkp ++=

2

22ˆ n

kp =

35 36

37 34

Confidence IntervalsConfidence IntervalsA confidence interval is a range of plausible values for a population parameter, such asthe mean of the population, µ.

For example, a test of 8 units might give an average efficiency of 86.2%. This is themost likely estimate of the efficiency of the entire population. However, observationsvary, so the true population efficiency might be somewhat higher or lower than 86.2%. A95% confidence interval for the efficiency might be (81.2%, 91.2%). 95% of the intervalsconstructed in this manner will contain the true population parameter.

The confidence interval for the mean of one sample is

“t” comes from the t tables (Page 65) with n-1 degrees of freedom and with the desiredlevel of confidence.

Confidence Interval for the difference in the means of 2 samples, if the variances of 2samples are assumed to be equal is:

“t” comes from the t tables (Page 65) with n1+n2-2 degrees of freedom, and with thedesired level of confidence. SP is the pooled standard deviation:

In MINITAB, confidence Intervals are calculated using “1 Sample” and “2 Sample” tmethods, above. In the text output shown below, the 95% confidence interval for thedifference between the mean of Manu_a and the mean of Manu_b is 6.65 to 8.31. Thisstatement points to accepting Ha, that the means are different.

95% CI for mu Manu_a - mu Manu_b: ( 6.65, 8.31)

ntX σ̂×±

nnp21 t*s)xx21

11( +±−

( ) ( )[ ]( )2

1121

2

22

2

11−+

−+−= nnsnsn

PS

Using the 1 Sample t test• Run Stat > Basic Statistics > 1-Sample t. In the dialog box, identify the variable or

variables to be tested.• Select the test to be performed, “Confidence Interval” or “Test Mean”.• If “Confidence Interval” is to be calculated on other than an a value of 95%, change to

the appropriate number.• If “Test Mean” is selected, identify the desired mean to be tested, the mean of the null

hypothesis and in the alternative box,select the alternative hypothesis which isappropriate for the analysis. This will determine the test used for the analysis (one tailedor two tailed).

• If graphic output is needed, select the graphs button and choose among “Histograms”,“Dotplot” and “Boxplot” output.

• Click Ok to run analysis.Analyzing the test results.

• If running the “Confidence Interval” option, Minitab will calculate the “t” statistic and willcalculate a confidence interval for the data.

• If using the “Test Mean” option, Minitab will provide descriptive statistics for the testeddistribution(s), the “t” statistic and a “p” value.

• The graphic outputs will all have a graphical representation of the confidence interval ofthe mean, shown by a red line with a dot at the mean for the sample population

• The χ2 Test for independent relationship tests the null hypothesis (Ho,) that twodiscrete variables are independent.

• Data “relating” two discrete variables are used to create a contingency table.For each of the cells in the contingency table the “observed frequency” iscompared to the “expected frequency” in order to test for independence.

• The expected frequencies in each cell must be at least five (5) for the χ2 test tobe valid.

• For continuous data, it is best to test for dependency, or correlation, by usingscatter plots and regression analysis.

Example of χχχχ2 Analysis1. There are 2 variables to be studied, height and weight. The null hypothesis Ho is

that “weight” is independent of “height.”2. For each variable 2 conditions (categories) are defined.

Weight: < 140 lbs, > 140 lbsHeight: < 5’6”, > 5’6”

3. The data has been accumulated as shown below:

Height below 5’6” Height above 5’6” Totals RowsWeight below 140 LBS 20 13 33Weight Above 140 LBS 11 22 33Totals Columns 31 35 N=66

Manual χ χ χ χ2

1. Compute fexp for each cell ij: fexp ij = (Row Total)i x (Column Total)j / N.2. N is total of all fobs for all 4 cells. (For our example, N = 66 and fexp1,2 =

(33*35)/66 = 1155 / 66 = 17.5).3. Calculate χ2

calc, where χ2calc = Σ[[[[(fobs - fexp)2/ fexp ] = 4.927

4. Calculate the degrees of freedom df = (Number of Rows - 1)(Number ofColumns -1). For our example, df = (2-1) * (2-1) = 1.

5. Determine χ2crit from the χ2 table for the degrees of freedom and confidence

level desired (usually 5% risk). For 1 df and 5% α risk, χ2crit = 3.841

6. If χ2calc> χ2

crit, then reject Ho and accept Ha, i.e. that weight depends on height.In this example, we reject Ho.

Using Minitab to perform χχχχ2 AnalysisMinitab can be used to analyze data using χ2 with two different processes.Stat>Tables>Chi Square Test and Stat>Tables>Cross Tabulation. Chi SquareTest analyzes data which is in a table. Cross Tabulation analyzes data which isin columns with subscripted categories. Since Minitab commonly needs data incolumns to graph, Cross Tabulation is a preferred method for most analysis.

χχχχχχχχ22 -- -- Test for IndependenceTest for Independence

33 38

“t” Test“t” TestA “t” test tests the hypothesis that the means of two distributions are equal. It can be

used to demonstrate a shift of the mean after a process change. If there hasbeen a change to a process, and it must be determined whether or not the meanof the output was changed, compare samples before and after the change usingthe “t” test.

• Your ability to detect a shift (or change) is improved by increasing the size ofyour samples and by increasing the size of the shift (or change) that you aretrying to detect, or by decreasing the variation (See Sample size; page 27 -28).

• There are two tests for means, a One Sample t test and a Two Sample ttest.

• The “one sample t test” “Stat > Basic Statistics > 1-Sample t”compares a single distribution average to a target or hypothesized value.

• The “two sample t test” “Stat > Basic Statistics > 2-Sample t” analyzesthe means of two separate distributions.

Using the 2 Sample t test1. Pull samples in a random manner from the distributions whose means are being

evaluated. In Minitab, the data can be in separate columns or in a single columnwith a subscript column.

2. Determine the Null Hypothesis Ho and Alternative Hypothesis Ha (Less than,Equal to, or Greater than).

3. Confirm if variances are similar using “F” test or Homogeneity of Variance (page30).

4. Run “Stat > Basic Statistics > 2-Sample t” In the dialog box, select “Samples inOne Column” and identify the data column and subscript column, or “Samples inDifferent Columns” and identify both columns.

5. In the alternative box, select the alternative hypothesis which is appropriate for theanalysis. This will determine the test used for the analysis (one tailed or twotailed).

6. If the variances are similar, check the “Assume Equal Variances” box.7. If graphic output is needed, select the graphs button and choose between “dotplot”

and “boxplot” output.8. Click Ok to run analysis.Analyzing the test results.Minitab will provide a calculation of descriptive statistics for each distribution, provide

a Confidence Interval statement (page 32) and provide a statement of the t test asa test of the difference between two means. The output will provide a “t” statistic,a “p” value and the degrees of freedom statistic. To use the “t” distribution tableon page 65, the “t” statistic and the degrees of freedom are required. Analysiscan be made using that table or the “p” value.

Minitab: Stat>Tables>Chi Square1. Create the table shown in the example in

Minitab.2. Run Stat>Tables>Chi Square Test. In the

dialog box select the columns containing thetabular data, in this case, “C2” and “C3”. Click“OK” to run.

3. In the Session window, the table that is createdwill show the expected value for each of the datacells under the actual data for the cell, plus thec2 calculation , χ2

calc = Σ[(fobs - fexp)2/ fexp].4. The Chi Square calculation is χ2

calc = 4.927.The p value for this test of 0.026.

5. The degrees of freedom df = (Number of Rows -1)(Number of Columns -1) is shown, df = 1.

6. Determine χ2crit from the χ2 table for the degrees

of freedom and confidence level desired (usually5% risk). χ2

crit=.3.841.7. Since Χ2 calc >Χ2

crit, reject Ho.Because the data is in tabular form in Minitab, no

other analysis can be done.

column format. Note that the data is in a single column and the factors or variables beingconsidered are shown as subscripted values. In this graphic, the data is in column C6 andthe appropriate subscripts are in column C4 and C5.1. Run Stat>Tables>Cross Tabulation. In the dialog box select the columns identifying

Minitab: Stat>Tables>Cross TabulationIf additional analysis of data is desired, includingany graphical analysis, the Stat>Tables>CrossTabulation procedure is preferred. Thisprocedure uses data in the common Minitab

the factors or variables in the “ClassificationVariables” box.

2. Click Chi Square Analysis and select “Aboveand expected count”.

3. Select the column containing the response datain the “Frequencies in” box, in this case, “data”.

4. Click Run.

5. The output in the session window is verysimilar to the output for Stat>Tables>ChiSquare Test, except that it does not showthe Chi Square calculation for each cell.6. Analysis of the test are done as before,either by using the generated p value or byusing the calculated χ2 and degrees offreedom and entering the tables with thatinformation to find a χ2

crit.

3239

Testing Equality of VariancesTesting Equality of VariancesThe “F” test is used to compare the variances of two distributions. It tests thehypothesis, Ho, that the variances of two distributions are equal. It is performed byforming a ratio of two variances from two samples and comparing the ratio with a valuein the “F” distribution table. The “F” test can be used to demonstrate that the variancehas been increased or decreased after a process change. Since “t” tests and“ANOVA” need to know if population variance is the same or different, this test is alsoa prerequisite for doing other types of hypothesis testing. In Minitab, this test is doneas “Homogeneity of Variance”.

The “F” test is also used during the ANOVA process to confirm or reject hypothesesabout the equality of averages of several populations.

Performing an F Test1. Pull samples in a random manner from the two distributions for which you are

comparing the variances. Prior to running the test confirm sample distributionnormality for each sample (page 17).

2. Compute the “F” statistic, Fcalc = . The “F” statistic should always becalculated so that the larger variance is in the numerator.

3. Calculate the degrees of freedom for each sample. Degrees of freedom = ni-1,where ni is the sample size for the ith sample, i.e., n1-1 & n2-1.

4. Specify the risk level that you can tolerate for making an error in your decision(usually set at 5%.)

5. Use the “F” distribution table (p 59 - 60) to determine Fcrit for the degrees offreedom in your samples and for the risk level you have chosen.

6. Compare Fcalc to Fcrit. If Fcalc < Fcrit., the null hypothesis, Ho, which implies thatthe variances from both distributions are equal, cannot be rejected. If Fcalc>Fcrit ,reject the null hypothesis and conclude that the samples have different variances.

Using Homogeneity of Variance (For MINITAB Analysis)1. Homogeneity of Variance will allow analysis of multiple population variances

simultaneously. It will also allow analysis of “non-normal” distributions. Data fromall sample groups must be “stacked” in a single column with the samplesidentified with a separate “subscript” or “factor” column.

2. In Minitab, use STAT>ANOVA>HOMOGENEITY OF VARIANCE. In the dialogbox, identify the single “response” column and a separate “Factors” column orcolumns.

3. Analysis of the test will be done using the “p value.” If the data is Normal (SeeNormality, page 15), use Bartlett's Test. Use Levene's Test when the data comefrom continuous, but not necessarily normal distributions.

4. The computations for the homogeneity of variance test require that at least onecell contains a non-zero standard deviation. Normally, it is possible to compute astandard deviation for a factor if it contains at least two observations.

5. Two standard deviations are necessary to calculate Bartlett’s and Levene’s teststatistics.

22

21

σσ

1. Determine the desired confidence level (80%; 90% or 95%).

2. Find the lower and upper confidence interval factors for that level ofconfidence for the number of failures found in the sample.

3. Divide these factors by the actual sample size used.

4. The resultant of the two calculations gives the range of plausible valuesfor the proportion of the population that is defective.

Example: k=2; n=200 (2 defects in 200 sampled parts or CTQ outputs)

Then:

Poisson ApproximationPoisson ApproximationUse this approximation when the sample size is large and the probability ofdefects (p) in the sample is less than 0.10.

In such a situation:

nkp =ˆ Where: k=number of defects

n= number of sample parts

The confidence interval for this proportion defective can be found usingthe Poisson distribution.

0100.2002ˆ ==p And the 95% 2-sided confidence Interval is:

Where:Lower confidence factor = .619Upper confidence factor = 7.225

CI=(.619/200, 7.225/200) =(.0031,.0361)

31 40

F Test - Compares the variances of two DistributionsH0 - The sample variances tested are statistically the same

σ20=σ2

1

HA - The sample variances tested are not equalσ2

0≠≠≠≠σ21

Homogeneity of Variance - Compares the variances ofmultiple Distributions

H0 - The sample variances tested are statistically the sameσ2

0=σ21=σ3=…=σk

HA - At least one of the sample variances tested are notequal

σ20≠≠≠≠σ2

1≠≠≠≠σ21

Bartletts Test - Tests Normal DistributionsLevene’s Test - Tests non-normal distributions

ΧΧΧΧ2 - Tests the hypothesis that two discretely measuredvariables operate independently of each other

Ho : Independent (There is no relationship between the populations)Ho : p1 = p2 = p3 = ... = pn

Ha : Dependent (There is a relationship between the populations)Ha : At least one of the equalities does not hold

Hypothesis Statements (Hypothesis Statements (contcont.).)

What is a p-value?What is a p-value?Statistical definitions of p-value:

• The observed level of significance.• The chance of claiming a difference if there is no difference.• The smallest value of alpha that will result in rejecting the null hypothesis.

• If p < Alpha, then the difference is statistically significant. Reject the nullhypothesis and declare that there is a difference.

• Think of (1 - p) as the degree of confidence that there is a difference.Example: p = .001, so (1 - p) = .999, or 99.9%.

You can think of this as 99.9% confidence that there is a difference.

How do I use it?How do I use it?

f (X)f (X)f (X)f (X)f (X)f (X)f (X)f (X)Y=Y=Y=Y=Y=Y=Y=Y=

The TransferThe TransferThe TransferThe TransferThe TransferThe TransferThe TransferThe TransferFunctionFunctionFunctionFunctionFunctionFunctionFunctionFunction

EffectEffect Root CausesRoot Causes

OutputOutput InputsInputs

If Inputs are Continuous:RegressionAnalysis of Covariance

If Outputs are :

What is the mathematical relationshipbetween the “Y” and the “X’s”

What is the mathematical relationshipWhat is the mathematical relationshipbetween the “Y” and the “X’s”between the “Y” and the “X’s”

Copyright 1995 Six Sigma Academy, Inc.

Dis

cret

eC

ontin

uous If Inputs are Discrete:

ANOVA t Tests; F TestsConfidence IntervalsDOE

If Inputs are Continuous:Logistic Regression

If Inputs are Discrete:Logistic RegressionΧ2

Confidence Intervals- ProportionsDOE

3041

Stating the Hypothesis HStating the Hypothesis HOO and H and HAA

The starting point for a hypothesis test is the “null” hypothesis - Ho. Hois the hypothesis of sameness, or no difference.

Example: The population mean equals the test mean.

Ho

HaThe second hypothesis is Ha - the “alternative” hypothesis. It representsthe hypothesis of difference.

Example: The population mean does not equal the test mean.

•You usually want to show that there is a difference (Ha). •Start by assuming equality (Ho).•If the data show they are not equal, then they must be different (Ha).

Hypothesis Statements1 Sample t - Compares a single distribution to a

target or hypothesized value.– H0 - The sample tested equals the target

µ0=TargetHa - The sample tested is not equal to the target or

greater than/less than the target.µ0≠≠≠≠Targetµ0>Targetµ0<Target

2 Sample t - Compares the means of two separatedistributions

– H0 - The samples tested are statistically the sameµ0=µ1

– Ha - The sample tested is not equal to the target orgreater than/less than the target.

µ0≠ ≠ ≠ ≠ µ1µ0> µ1µ0< µ1

ANOVA - One WayANOVA - One Way

ANOVA,, ANalysis Of VAriance is a technique used to determine the statisticalsignificance of the relationship between a dependent variable (“Y”) and a single ormultiple independent variable(s) or factors (“X’s”).

ANOVA should be used when the independent variables (X’s) are categorical (notcontinuous). Regression Analysis (Pages 43 - 45) is a technique for performing asimilar analysis with continuous independent variables.

ANOVA determines if the differences between the averages of the levels is greaterthan the expected variation. It answers the question: “Is the signal between levelsgreater than the noise within levels?”

ANOVA allows the investigator to compare several means simultaneously with thecorrect overall level of risk.

Basic Assumptions for using ANOVA•Equal Variances (or close to the same) for each subgroup.• Independent and normally distributed observations.•Data must represent the population variation.•Acceptable Gage R&R•ANOVA tests for equality of means is fairly robust to the assumption of normalityfor moderately large sample sizes, so normality is often not a major concern.

ANOVAANOVA

The One Way ANOVA enables the investigation of a single factor at multiple levelswith a continuous dependent variable. The primary investigation question is “Do any ofthe populations of “Y” stemming from the levels of “X” have different means?”MINITAB will do this analysis either with the data in table form, with data for each levelof X in separate columns (STAT>ANOVA>ONE WAY (UNSTACKED) ) or with all thedata in a single column and the factor levels identified by a separate subscript column(STAT>ANOVA>ONE WAY). For the data below, use “One-Way c1-c3 and “OneWay” for data in columns c4-c5.

•In the dialog box, for “One Way(Unstacked)”, identify each of thecolumns containing the data.•In the dialog box for “One-way”,identify the column containing theResponse (Y) and the Factor (X) asappropriate.•For both analyses, if graphicanalysis is desired select the“Graphs” button and select between“Dotplots” and “Boxplots” .•Click OK to run. For analysis, seepage 41.

29 42

Hypothesis TestingHypothesis Testing

Steps in Hypothesis TestingSteps in Hypothesis Testing1. Define the problem; state the objective of the Test.2. Define the Null and Alternate Hypotheses.3. Decide on the appropriate statistical hypothesis test; Variance (Page 30);

Mean (t Test - Page 31 - 32); Frequency of Occurrence (Discrete - Χ2 -Page 35 - 36).

4. Define the acceptable αααα and ββββ risk.5. Define the sample size required. (Page 27 - 28)6. Develop the sampling plan and collect samples.7. Calculate the test statistic from the data.8. Compare the calculated test statistic to a predicted test statistic for the risk

levels defined.• If calculated is larger than the predicted test statistic, the statistic

indicates difference.

Since all data are variable, an observed change could be due to chance and may notbe repeatable. Hypothesis testing determines if the change could be due to chancealone, or if there is strong evidence that the change is real and repeatable.

In order to show that a change is real and not due to chance alone, first assume thereis no change (Null Hypothesis, HO). If the observed change is larger than the changeexpected by chance, then the data are inconsistent with the null hypothesis of nochange. We then “reject” the null hypothesis of no change and accept the alternativehypothesis, HA.

The null hypothesis might be that two suppliers provide parts with the same averageflatness (HO:µ1=µ2, the mean for supplier 1 is the same as the mean for supplier 2). Inthis case, the alternative hypothesis is that average flatness is not equal (HA: µ1≠µ2).

If the means are equal and your decisionis that they are equal (top left box), thenyou made the correct decision.

If the means are not equal and yourdecision is that they are not equal(bottomright box), then you made the rightdecision.

If the means are equal but your decisionis that they are not equal (Bottom leftbox), then you made a type 1 error. Theprobability of this error is alpha (α)

If the means are not equal but yourdecision is that they are equal (top rightbox), then you made a Type 2 error. Theprobability of this error is beta (β).

.

Real WorldReal World

DecisionDecision

µ1=µ2µ1=µ2 µ1≠µ2µ1≠µ2

µ1=µ2µ1=µ2

µ1≠µ2µ1≠µ2

CorrectDecision

Type 2Error

ββββ

Type 1Error

αααα

CorrectDecision

Anova Two WayAnova Two WayTwo way ANOVA evaluates the effect of two separate factors on a single response can beevaluated. Each cell(combination of independent variables) must contain an equal numberof observations (must be balanced). See General Linear Model (Pages 42) for unbalancedData sets. In the data set on the right, Strength isthe response (Y) and Chem and Fabric are theseparate factors (X1 and X2). To analyze thesignificance of these factors on Y, runSTAT>ANOVA>TWO WAY. In the dialog box,identify the Response (Y), “Strength.” In the “RowFactor” box, identify the first of two factors (X) foranalysis. In the “Column Factor”, Identify thesecond “vital X”. Select the “Display Means” boxfor each factor to gain Confidence interval andmeans analysis.

Select “STORE RESIDUALS” and then “STOREFITS”.

If graphical analysis of the ANOVA data isdesired, select the “Graphs” button and chooseone, or all of the four diagnostic graphs available.

This analysis does not produce F and p-values,since you can not specify whether the effects arefixed or random. Use Balanced ANOVA (Page36) to perform a two-way analysis of variance,specify fixed or random effects, and display the Fand p-values when you have balanced data. If you have unbalanced data and randomeffects, use General Linear Model (Page 42) with Options to display the appropriate testresults.

It can be seen from the SS columnthat the “error SS” is very smallrelative to the other terms. In thegraphic Confidence interval analysisit is clear that both factors arestatistically significant, since someof the confidence intervals do notoverlap.

43 28

Calculating Sample SizeCalculating Sample Size

(((( ))))(((( ))))

2

22/

/2

σσσσδδδδββββαααα zz

n++++

××××====

(((( ))))(((( ))))

3503.

326.2645.122

2 ====++++××××====n

Example:

α = .10, β = .01,

δ/σ=.3

αααα αααα/2 Zαααα/2 ββββ Zββββ

.20 .10 1.282 .20 0.842

.10 .05 1.645 .10 1.282

.05 .025 1.960 .05 1.645

.01 .005 2.576 .01 2.326

To calculate the actual sample size without the table,or to program a spreadsheet to calculate sample size,use this equation.

ANOVA - BalancedANOVA - Balanced

Figure 2

The Balanced ANOVA allows the analysis of process data with two or more factors. Aswith the Two Way ANOVA, Balanced ANOVA allows analysis of the effect of multiplefactors, at multiple levels simultaneously. A factor (B) is “nested” within another factor (A)if the level of B appears with only a single level of A. Two factors are “crossed” if everylevel of one factor appears with every level of the other factor. The data for individuallevels of factors must be balanced: “each combination of independent variables (cell)must have an equal number of observations”. See General Linear Model (Page 38) foranalysis of unbalanced designs. Guidelines for normality and variance remain same asshown on page 38.

Figure 1 shows how some of the factors anddata might look in the MINITAB worksheet.Note there are five (5) data points for eachcombination of the three factors.

To analyze for significance of these factors(Xij)on the response variable (Y), runSTAT>ANOVA>BALANCED ANOVA. In thedialog box (Figure 2) , Identify the

“Y” variable in the Response box and identifythe factors in the “Model” box. Note that the“pipes [“Shift \”] indicate the model analyzed isto include factor interactions. Select “Storage”to store “residuals” and “fits” for later analysis.Select “Options” and select “Display means...” todisplay information about data means for eachfactor and level.

Figure 3 is the primary output of this analysis. There is no significant graphic analysisfor the balanced ANOVA. See page 41 for analysis of this output.

Figure 1

Figure 3

PIPE

4427

Sample Size Determination Sample Size Determination When using sampling to analyze processes, sample size must be consciously selected

based on the allowable α and β risk, the smallest amount of true difference (δ) thatyou need to observe for the change to be of practical significance and the variationof the characteristic being measured (σ). As variation decreases or sample sizeincreases it is easier to detect a difference.

Steps to defining sample sizeSteps to defining sample size1. Determine the smallest true difference to be detected, the gap ( δ ).2. Confirm the process variation ( σ ) of the processes to be evaluated.3. Calculate δ/σ.4. Determine acceptable α and β risk.5. Use chart on page 58 to read the sample size required for each level of the factor

tested.

For example --Assume the direction of the effect is unknown, but you need to see a delta sigma (δ/σ)of 1.0 in order to say the change is important. For an α risk of 5% and a β risk of 10%,we would need to use 21 samples. Remember that we would need 21 at each level ofthe factor tested. If for the same δ, σ were reduced so that δ/σ were 2, only 5 sampleswould be required. In general, the smaller the shift (δ/σ) you are trying to detect, and/orthe lower the tolerable risk, the greater the number of samples required.

Sample size sensitivity is a function of the standard error of the mean Smallersamples are less sensitive than larger samples.

gap delta(δδδδ)

TodayToday

T

DesiredDesired

variation(σσσσ)

TodayToday

gap delta (δδδδ)

DesiredDesired

variation(σσσσ)

δσ

≅ 1 2≅σδ

).( nσ

Continuous Data AnalysisContinuous Data Analysis

T

Interpreting the ANOVA OutputInterpreting the ANOVA Output

The first table lists the factors and levels. In the table shown there are three “factors”,“Region”, “Shift” and “WorkerEx”. There are three levels each for “Region” and “Shift”. Thevalues assigned for the Region and Shift levels are “1,2 &3”. “WorkerEx” is a two levelfactor and has level values of “1&2”.

The second table is the ANOVA output. The columns are as defined below.

Source The source shows the identified factors from the model, showingboth the single factor information (i.e., Region) and the interactioninformation (i.e., Region*Shift)

DF Degrees of Freedom for the particular factor. Region and shift have3 levels and 3-1=2 df, and workerex has 2 levels and 2-1=1 df.

SS Factor “Sum of Squares” is a measure of the variation of the samplemeans of that factor.

MS Factor “Mean Square” is the SS divided by the DF.F The Fcalc value is the MS of the factor divided by the MS of the Error

term. In the case of Region, F=90.577÷3.325=27.24. If using Fcritto analyze for significance, enter table with DF degrees of freedomand α=.05. Compare Fcalc to Fcrit. If Fcalc is greater than Fcrit , thefactor is significant.

P The calculated P value, the observed level of significance. If P<.05,the factor is statistically significant at the 95% level of confidence.

Note: The relative size of the error SS to total SS indicates the percent of variation leftunexplained by the model. In this case, the unexplained variation is 39.16% of the totalvariation in this model. The “s” of this unexplained variation is the square root of the MS ofthe Error term (3.325). In this case the “within” group variation has a sigma of 1.82. If thisremaining variation does not enable the process to achieve the desired performance state,look for additional factors.

45 26

Analysis and ImproveAnalysis and ImproveToolsTools YY

DiscreteDiscrete

Con

tinuo

usC

ontin

uous

XX

Dis

cret

eD

iscr

ete

ContinuousContinuous

Logistic Regression, Discriminant Analysis and CART (Classification andRegression Trees) are advanced topics not taught in Six Sigma Training.The following references may be helpful.

Breiman, Friedman, Olshen and Stone; Classification and Regression Trees;Chapman and Hall, 1984

Hosmer and Lemeshow; Applied Logistic Regression; Wiley, 1989

Minitab Help - Additional information about Discriminate Analysis

• Tables (Crosstab)

• Chi Square• Confidence

intervals forproportions

• Pareto

• Confidenceintervals

• t test• ANOVA• Homogeneity of

Variance• GLM• DOE (factorial fit)

• Logisticregression

• DiscriminantAnalysis

• CART(Classification andRegression Trees)

• Linear regression• Multiple regression• Stepwise

Regression• DOE response

surface

General Linear ModelGeneral Linear ModelThe General Linear Model (GLM) can handle “unbalanced” data - such as data sets withmissing observations. Where the Balanced ANOVA required the number of observationsto be equal in each “factor/level” grouping, GLM can work around this limitation.

The data must be “full rank” (enough data to estimate terms in the model). But you don’thave to worry about this, because Minitab will tell you if your data isn’t full rank!

Interpretation:Temp1 is a significant X variable, because it explains 62% of the total variation(528.04/850.4). (Temp1 also has a p-value < 0.05, indicating that it is statisticallysignificant)

Neither Oxygen1 nor the interaction between Oxygen and Temperature appearssignificant.

The unexplained variation represents 30.95% ((263.17÷850.4)*100) and the estimate ofthe within subgroup variation is 5.4 (square root of 29.24).

In the data set shown in Figure 1, note that there is onlyone data point in “Rot1”, the response column for factor“Temp1”- level 10 / ”Oxygen1”- level 10(rows 8 & 9), andonly two data points for “Temp1” - level 16 / “Oxygen1” -Level 6 (Row 14). In such case, “Balanced ANOVA”would not run because the requirement of equalobservation would require three data points in each cell(factor and level combination).

Run STAT>ANOVA>GENERAL LINEAR MODEL. Inthe Dialog box, identify the response variable in the“Response” box and the factors in the “Model” box. Usethe pipe (shifted “\”) to include interactions in theanalysis.

Figure 2 is the primary output of this analysis. There isno graphic analysis of this output.Figure 1

Figure 2

Pareto DiagramsPareto DiagramsStatStat>Quality Tools>Pareto Chart>Quality Tools>Pareto Chart

Cause and Effect DiagramsCause and Effect DiagramsFishbone Diagrams:Fishbone Diagrams: Stat Stat>Quality Tools>Cause & Effect>Quality Tools>Cause & Effect

When analyzing categorical defect data , it isuseful to use the Pareto chart to visualize therelative defect frequency. A Pareto Chart is afrequency ordered column chart. The analysis caneither analyze raw defect data, such as “scratch,dent, etc”, or it can analyze count data such as ismade available from Assembly Line Defectsreports. The graphic on the left is from count data.Set up the worksheet with two columns, the firstwith the defect cause descriptor and the secondwith count or frequency of occurrences. In the“PARETO CHART” dialog box, select “Chart

Others

Incomplete Part

Defective Housi

Leaky Gasket

Missing Clips

Missing Screws

18 10 19 43 59274 4.3 2.4 4.510.213.964.8

100.0 95.7 93.4 88.9 78.7 64.8

400

300

200

100

0

100

80

60

40

20

0

DefectCount

PercentCum %

Perc

ent

Cou

nt

Pareto Chart for Defects

DEFECTS TABLE”. Link the cause descriptor to the “LABELS IN” box and the counts to the“FREQUENCY” box. Click OK. For more information, see Minitab Context sensitive help in thePareto Dialog box.To interpret the pareto, look for a sharp gradient to the categories with 80% of counted defectsattributable to 20-30% of the identified categories. If Pareto is flat with all categories linked toapproximately the same number of defects, try to restate the question to redefine the categoricalsplits.

Exhaust Quality

Condensation

Moisture%

Inspectors

Microscopes

Micrometers

Brake

Engager

Angle

Suppliers

Lubricants

Alloys

Speed

Lathes

Bits

Sockets

Operators

Training

Supervisors

Shifts

Men

Machines

Materials

Methods

Measurements

Environment

Cause-and-Effect Diagram When working with the Advocacy team to define thepotential factors (X’s), it is often helpful to use a“Cause and Effect Diagram” or “Fishbone” to displaythe factors. The arrangement helps in the discoveryof potential interactions between Factors (X’s).Use Minitab worksheet columns to recorddescriptors for the factors identified during the teambrainstorming session. Group the descriptors incolumns by categories such as the 5M’s. Once thefactors are all recorded, open the Minitab“Stat>Quality Tools>Cause and Effect” dialogbox.

The dialog box will have the 5M’s and Environment shown as default categories of factors. If usingthese categories, link the worksheet columns of categorized descriptors to the dialog box categories.If the team has elected to use other Category names, replace the default names and link theappropriate columns. Click OK.To interpret the Cause and Effect Diagram, look for places where a factor in one category could alsobe included in another category. Question the Advocacy team about priority or significance of thefactors in each category. Then prioritize the factors as a whole. For the most significant factors, askthe team where there is the potential for changes in one factor to influence the actions of anotherfactor. Use this information to plan analysis work.

25 46

Regression AnalysisRegression AnalysisRegression can be used to describe the mathematical relationship between the responsevariable and the vital few X’s, if you have continuous data for your X’s. Also, after the“vital few variables” have been isolated, solving a regression equation can be used todetermine what tolerances are needed on the “vital few variables” in order to assure thatthe response variable is within a desired tolerance.

Regression analysis can find a linear fit between the response variable Y and the vital fewinput variables X1 and X2.

This linear equation can be used to decide what tolerances must be maintained on X1 andX2 in order to hold a desired tolerance on a Variable Y.

(Start with a scatter diagram to examine the data.)errorXBXBBY +++= 22110

2211 XBXBY ∆+∆=∆Regression analysis can be done using several of the MINITAB tools.Stat>Regression>Fitted Line Plot is explained on Page 20. This section will discussRegression>Regression.

Data must be paired in the MINITAB worksheet. That is, one measurement from eachinput factor (x) is paired with the response data (Y) for that particular measurement point.

Plot the data first using Minitab Stat>Plot. Analyze the data usingStat>Regression>Regression. In the dialog box indicate the Response (Y) in theResponse box and theexpected factors (X’s) in thePredictors box. Select theStorage button and in thatdialog box select Fits andResiduals. Click OK twice torun the analysis. The outputwill appear as shown in thefigure to the right.

The full regression equation isshown at the top of the output.Predictor influence can beevaluated using the p column inthe first table. Analysis of thesecond table is done in similarfashion to the ANOVA analysison page 41. Note that R2 (ADJ)is similar to R2 but is modified toreflect the number of terms inthe regression. If there are many terms in the model, and the sample size is small, then R2

(ADJ) can be much lower than R2, and you may be over-fitting. In this example, the totalsample size is large (n=560), so R2 and R2 (ADJ) are similar.

2447

Time Series PlotTime Series PlotGraph>Time Series PlotGraph>Time Series Plot

The time series plot is useful as adiagnostic tool. Use it to analyze datacollection processes, non-normal data sets,etc. In GRAPH VARIABLES: Identify anynumber of variables (Y) from the worksheetyou wish to look at over time. Minitabassumes the values are entered in the orderthey occurred. Enter one column at a time.Minitab will automatically sequence to thenext graph for each column. The X axis isthe time axis and is set by selecting theappropriate setting in TIME SCALE. Each

Time series plot will display on a separate graph. In FRAME,ANNOTATE andOPTIONS, you can change chart axes, display multiple charts, etc. In analyzing theTime Series Plot, look for a story. Look for trends, sudden shifts, a regular cycle,extreme values, etc. If any of these exist, they can be used as a lead into problemsolving.

543210-1-2-3-4-5

60

50

40

30

20

10

0

95% Confidenc e Interval

S tD ev

La m bda

Last I teration Info

2.7832.7822.784

0.1700.1130.056

StDevLambda

UpEstLow

Box-C ox P lo t for Skew ed

Box-Cox TransformationBox-Cox TransformationStatStat>Control Charts>Box-Cox Transformation>Control Charts>Box-Cox Transformation

The transformed data is the original data raised to the power of λ. Subgroup datacan be in columns or across rows. In the dialog box, indicate how DATA AREARRANGED and where located. If data is subgrouped and subgroups are in rows,identify configuration. To store transformed data, select STORE TRANSFORMEDDATA IN and indicate new location.

The Box-Cox transformation can be useful for correcting non-normality in processdata, and for correcting problems due to unstable process variation. Under mostconditions, it is not necessary to correct for non-normality unless the data are highlyskewed. It may not be necessary to transform data which are used in control charts,because control charts work well in situations where data are not normallydistributed.

Note: You can only use this procedure with positive data.

BOX-COX TRANSFORMATION is auseful tool for finding a transformationthat will make a data set closer to anormal distribution. Once confirmedthat the distribution is non-normal, useBox Cox to find an appropriatetransformation. Box-Cox provides anexponent used in the transformationcalled lambda, “λ”.

Stepwise RegressionStepwise RegressionStepwise Regression is useful to search for leverage factors in a data set with manyfactors (x’s) and a response variable (Y). The tool can analyze up to 100 factors. But,while this enables the analysis of Baseline data for potential Vital X’s, be careful notto draw conclusions about significance of X’s without first confirming with a DOE.

To use Stepwise regression, the data needs to be entered in Minitab with eachvariable in a separate column and each row representing a single data point. Nextselect Stat>Regression>Stepwise.

In the dialog box, identify the column containing the response (Y) data in theResponse box. In the Predictor box, identify the columns containing the factors(X’s) you want Minitab to use. If their F-statistic falls below the value in the “F toremove” text box under Options (Default = 4), Minitab removes them. By selectingthe Options button, you can change the Fcrit value for adding and removing factorsfrom the selection and also reduce the number of steps of analysis the tool goesthrough before asking for your input.

Minitab will prioritize the leverage “X” variables and run the first regression step onthe factor with the greatest influence. It continues to add variables as long as the “t”value is greater than the SQRT of the identified F statistic limit (Default = 4). TheMinitab output includes

1) the constant and the factor coefficients for the significant terms.2) the “t” value for the Factors included.3) the “s” for the unexplained variation based on the current model.4) the R2 for the current model.

If you have chosen “1 step between pauses”, Minitab will then ask if you wish to runmore. Type “yes” and “enter”. Continue this procedure until MINITAB won’t calculateany more. At that point, you will have identified your potential “leverage X’s”.

OutputOutputIn this output, there are five potentialpredictors identified by stepwiseregression. The steps are shown bythe numbered columns and includethe regression information forincluded factors. The information incolumn 1 represents the regressionequation information if only “Form” isused. In column 5, the regressionequation information includes fivefactors, but the s is .050 and the R2

is only 25%. In all probability, theanalyst will choose to gatherinformation including additionalfactors during next runs.

4823

Regression with Curves (Quadratic) &Regression with Curves (Quadratic) &InteractionsInteractionsWhen analyzing multiple factor relationships, it is important to consider if there ispotential for quadratic (curved) relationships and interactions. Normal graphic analysistechniques and Regression do not allow analysis of the effects of interrelated factors. Toaccomplish this, the data must be analyzed in an orthogonal array (See Page 49). Inorder to create an orthogonal array with continuous data, the factor (x) data must becentered. Do this as follows:

1. The data to be analyzed need to be is columns, with the response in one columnand the values of the factors paired with the response and recorded in separatecolumns.

2. Use Stat>DOE>Define Custom RS DesignIn the dialog box, identify the columns containing the factor settings.

3. Next, analyze the model using Stat>DOE>Analyze RS Design. Identify the column containing the response dataCheck: Analyze Data using Coded Units.

4. Click on Storage and select Fits and Residuals for later regression diagnostics.Click OK. Click on Graphs and select the desired graphs for analysis diagnostics.The initial analysis will include all terms in the potential equation including fullquadratic. Analysis of the output will be similar to that forRegression>Regression (Page 43).

5. Where elements areinsignificant revert to theStat>DOE>Analyze RSDesign >Terms dialog boxto eliminate. In the case ofthis example, the equationcan be analyzed as a linearrelationship, so select“Linear” in the “Include thefollowing terms box”. Notethat this removes all theinteraction and quadraticterms. Re-run theregression. Once anappropriate regressionanalysis, including leveragefactors has been obtained, validate the adequacy of the model by using theregression diagnostic plots Stat>Regression>Residuals Plots (Page 22).

Once an appropriate regression equation has been determined, remember thisanalysis was done with centered data for the factors. The centering will have to bereversed in order to make the equation useful from a practical standpoint. Tocreate a graphic of the model, use Stat>DOE>RSPlots (Page 52). From thisdialog box a contour plot of the results can be created.

Box PlotBox PlotGRAPH>BOXPLOTGRAPH>BOXPLOT

The boxplot is useful for comparingmultiple distributions (Continuous Y anddiscrete X).In the GRAPH section of the dialog box, fillin the column(s) you want to show for Yand if a column is used to identify variouscategories of X, i.e., subgroup coding, etc.Click FRAME Button to give you theoptions of setting common axes or multiplegraphs on the same page. To generatemultiple plots on a single page, select

FRAME>MULTIPLE GRAPHS>OVERLAY GRAPHS... Click ATTRIBUTES to allow youto change individual box colors. Click OKThe box represents the middle 50% of the distribution. The horizontal line is the median(the middlemost value) The whiskers each represent a region sized at 1.5*(Q3-Q1), theregion shown by the box). Interpretation can be that the box represents the hump of thedistribution and the whiskers represent the tails. Asterisks represent points which wouldfall outside the lower or upper limits of expected values.

Y variable: Select the column to be plotted on the y-axis.Group variable: Select the column containing the groups (or categories). This variable isplotted on the x-axis.Type of interval plotType of interval plotStandard error: Choose to display standard error bars where the error bars extend onestandard error away from the mean of each subgroup.Multiple: Enter a positive number to be used as the multiplier for standard errors (1 is thedefault).Confidence interval: Choose to display confidence intervals instead of standard errorbars. The confidence intervals assume a normal distribution for the data and use t-distribution critical values.Level: Enter the level of confidence for the intervals. The default confidence coefficient is95%.

Interval PlotInterval PlotGRAPH>INTERVAL PLOTGRAPH>INTERVAL PLOT

type

thic

knes

s

newexisting

129.25

129.15

129.05

128.95

Useful for comparison of multipledistributions. Shows the spread of dataaround the mean by plotting standard errorbars or confidence intervals.

The default form of the plot provides errorbars extending one standard error (standarddeviation/square root of n) above and belowa symbol at the mean of the data.

One-Variable RegressionOne-Variable RegressionSTAT>REGRESSION>FITTED LINE PLOTSTAT>REGRESSION>FITTED LINE PLOT

In the STAT>REGRESSION>FITTEDLINE PLOT dialog box, identifyResponse Variable (Y). Identify one (1)Predictor (X). Select TYPE OF MODEL(Linear, Quadratic or Cubic). Click onSTORAGE. Select RESIDUALS andFITS.If you need to transform data, useOPTIONS and select Transformation.In OPTIONS, select DISPLAYCONFIDENCE BANDS and DISPLAYPREDICTION BANDS. Click OK.

The output from the fitted line plot contains an equation which relates your predictor(input variable) to your response (output variable). A plot of the data will indicatewhether or not a linear relationship between x and y is a sensible approximation.These observations are modeled by the equation:

Y = b + mx + errorConfidence Bands are 95% confidence limits for data means. Prediction Bandsare limits for 95% of individual data points.

The R-sq is the square of the correlation coefficient. It is also the fraction of thevariation in the output (response) variable that is explained by the equation. What isa good value? It depends... chemists may require an R-sq of .99. We may besatisfied with an R2 of .80.

Use Residual Plots (Below) to plot the residuals vs predicted values ( Fits) anddetermine if there are additional patterns in the data.

760750740730720710700690680670660

700

600

500

400

300

200

Hardness

Abra

sion

R-Sq = 0.784Y = 2692.80 - 3.16067X

95% PI

95% CI

Regression

Regression Plot

Residual PlotsResidual PlotsStatStat>Regression>Residual Plots>Regression>Residual Plots

0.60.40.20.0-0.2-0.4-0.6-0.8

10

5

0

Residual

Freq

uenc

y

Histogram of Residuals

3020100

1

0

-1

Observation Number

Res

idua

l

I Chart of Residuals

X=0.000

3.0SL=0.9631

-3.0SL=-0.9631

10987654

0.5

0.0

-0.5

-1.0

FitR

esid

ual

Residuals vs. Fits

210-1-2

0.5

0.0

-0.5

-1.0

Normal Plot of Residuals

Normal Score

Res

idua

l

Weld Temp FitsAny time a model has been created for an X/Yrelationship, through ANOVA, DOE, Regression, thequality of that model can be evaluated by analysis ofthe error in the equation.When doing the REGRESSION (Page 37-38), or theFITTED LINE PLOT (above), be sure to select store“FITS” and “RESIDUALS” in the “STORAGE” dialogbox. If the fit is good, the error should be normallydistributed with an average of zero and there shouldbe no pattern to the error over the range.Then in the “RESIDUAL PLOTS” dialog box, identifythe column where the residuals are stored

in the “Residuals” box and the fits storage column in the “Fits” box.The output includes a normal plot of residuals, a histogram of residuals, an Individuals Chart ofResiduals and a scatter plot of Residuals versus Fits.Analysis of the Normal Plot should show a relatively straight line if the residuals are normallydistributed. The I chart should be analyzed as a control chart. The histogram should be a bell-shapeddistribution. The residuals vs fits scatter plot should show no pattern, with a constant spread over therange.

22

Binary Logistic RegressionBinary Logistic RegressionIn binary logistic regression the predicted value (Y) will be probabilities p(d) ofan event such as success or failure occurring. The predicted values will bebounded between zero and one (because they are probabilities).

Example: Predict the success or failure of winning a contract based on theresponse cycle time to a request for proposal and the proposal team leader.

The probability of an event, π(x) or Y, is not linear with respect to the Xs. Thechange in π(x) for a unit change becomes progressively smaller as π(x)approaches zero or one. Logistic regression develops a function to model this.

is the odds. The Logit is the Log of the odds. Ultimately thetransfer function being developed will solve for π(x).

To analyze the binary logistic problem, use STAT>REGRESSION>BINARYLOGISTIC REGRESSION. The data set used for “Response” will be Discreteand Binary (Yes/No;Success/Failure). In the “Model” dialog box, enter allfactors to be analyzed. In the “Factors” dialog box, enter those factors whichare discrete. Use the “Storage” button and select “Event probability”. This willstore the Calculated Event probability for each unique value of the function.

))(1()( xx ππ −

)1( 10

10

)(e

ex x

x

ββ

ββ

π +

+

+=

Analyze the Session Windowoutput.

1. Analyze the Hypothesis Testfor the model as a whole.Check for a p value indicatingmodel significance.

2. Check for statisticalsignificance of the individualfactors separately. Use the PValue.

3. Check the Odds ratios for theindividual predictor levels.

4. Use the Confidence Interval toconfirm significance. Whereconfidence interval includes1.0, the odds are notsignificant.

5. Evaluate the Model forGoodness of fit. Use

Hosmer- Lemeshow if there is a “continuous” “X” in the model.6. Assess the measures of Association. Note “% Concordant” is a measure

similar to R2. A higher value here indicates a better predictive model.

Binary Logistic Regression

Link Function: Logit

Response Information

Variable Value CountBid Yes 113 (Event)

No 110Total 223

Logistic Regression TableOdds 95% CI

Predictor Coef StDev Z P Ratio Lower UpperConstant 7.410 1.670 4.44 0.000Index -8.530 1.799 -4.74 0.000 0.00 0.00 0.01Brand SpYes 1.2109 0.3005 4.03 0.000 3.36 1.86 6.05

Log-Likelihood = -134.795Test that all slopes are zero: G = 39.513, DF = 2, P-Value = 0.000 11

22 33 44

Method Chi-Square DF PPearson 187.820 116 0.000Deviance 224.278 116 0.000Hosmer-Lemeshow 7.138 7 0.415

Table of Observed and Expected Frequencies:(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)

(Between the Response Variableand Predicted Probabilities)

Pairs Number Percent Summary MeasuresConcordant 9060 72.9% Somers' D 0.47Discordant 3260 26.2% Goodman-Kruskal Gamma 0.47Ties 110 0.9% Kendall's Tau-a 0.23Total 12430 100.0%

55

66

49

21

Normal PlotNormal PlotSTAT>BASIC STATISTICS>NORMALITY TESTSTAT>BASIC STATISTICS>NORMALITY TEST

• Identify the variable you will be testingin the Variable box.

• Click OK (Use default Anderson Darlingtest).

A Normal probability plot is a graphical methodto help you determine whether your data isnormally distributed. To graphically analyzeyour data, look at the plotted points relative tothe sloped line. A normal distribution will yieldplotted points which closely hug the line. Non-normal data will generally show points whichsignificantly stray from the line.

The test statistics displayed on the plot are A-squared and p-value.

The A-squared value is an output of a test for normality. Focus your analysis on the p value. The pvalue is “The probability of claiming the data are not normal if the data are truly from a normaldistribution”, a type I error. A high p-value would therefore be consistent with a normal distribution. Alow p-value would indicate non-normality. Use the appropriate type I error probability for judging thisresult.

P-Value: 0.328A-Squared: 0.418

Anderson-Darling Normality Test

N: 500StDev: 10.0000Average: 70.0000

1069686766656463626

.999.99.95

.80

.50

.20

.05

.01.001

Prob

abilit

y

Normal

Normal Probability Plot

Descriptive StatisticsDescriptive StatisticsStatStat>Basic Statistics>Descriptive Statistics>Basic Statistics>Descriptive Statistics

928272625242

95% Confidence Interval for Mu

717069

95% Confidence Interval for Median

Variable: C1

68.6347

9.2856

68.5160

Maximum3rd QuartileMedian1st QuartileMinimum

NKurtosisSkewnessVarianceStDevMean

P-Value:A-Squared:

70.8408

10.5136

70.2489

99.615475.869769.684862.585838.9111

5003.89E-02-5.0E-0397.2442 9.861269.3824

0.7900.235


95% Confidence Interval for Sigma



Descriptive Statistics The Descriptive Statistics>Graphs>GraphicalSummary graphic provides a histogram of the datawith a superimposed normal curve, a normalitycheck, a table of descriptive statistics, a box plot ofthe data and confidence interval plots for mean andmedian.In the Descriptive Statistics dialog box select thevariables for which you wish to create the descriptivestatistics. If choosing a stacked variable with acategory column, check the BY VARIABLE box andindicate the location of the category identifier.Use the Graphs button to open the graphs dialogbox. In the graphs dialog box, select “GraphicalSummary”.When using this tool to interpret normality, confirm the p value and evaluate the shape of the

histogram. Remember that the p value is “The probability of claiming the data are not normal if thedata are truly from a normal distribution”, a type I error. A high p-value would therefore be consistentwith a normal distribution. A low p-value would indicate non-normality. When evaluating the shapeof the histogram graphically, determine: Is it bimodal? Is it skewed? If yes, investigate potentialcauses for the non-normality. Improve if possible or analyze groups separately. If no special cause isfound for the non-normality, the distribution may be non-normal naturally and you may need totransform the data (page 22) prior to calculating your Z.

50

Design for Six Sigma- Design for Six Sigma- Tolerance AnalysisTolerance Analysis

Datum

A

B1 B4B3B2

Gap

+

Tolerance Analysis is a design method used determine the impact thatindividual parts of a system have on the overall requirement for thatsystem.

Most often, Tolerance Analysis is applied to dimensional characteristics inorder to see the impact the dimensions have on the final assembly interms of a gap or interference. In this application, a tolerance loop maybe used to illustrate the relationship.

PurposeTo graphically show the relationships of multiple parts in a system which

result in a desired technical requirement in terms of a gap orinterference.

Process1. Generate a layout drawing of your assembly. A hand sketch is all that is

required.2. Clearly identify the gap in the most severe condition.3. Select a DATUM or point from which to start your loop. (It is easier to

start the loop at one of the interfaces of the Gap.)4. Use drawing dimensions as vectors to connect the two sides of the gap.5. Assign sign convention (+/-) to vectors

In the diagram above, the relationship can be explained as:

GAP = A - B1 - B2 - B3 - B4

Because the relationship can be explained using only + & - signs, theequation is considered LINEAR, and can be analyzed using a methodknown as Root Sum of Squares (RSS) analysis.

Vector AssignmentAssign a positive (+) vector when• An increase in the dimension

increases the gap.• An increase in the dimension

reduces the interference.Assign a negative (-) vector when• An increase in the dimension

reduces the gap.• An increase in the dimension

increases the interference.

51 20

9.79.59.39.18.98.78.5

9.7

9.5

9.3

9.1

8.9

8.7

8.5

NEW

EXIS

TIN

G

Scatter Plot Scatter Plot GRAPH>PLOTGRAPH>PLOT The scatter plot is a useful tool for

understanding the relationship between twovariables.

In the GRAPH VARIABLES box select each Xand Y variable you wish to plot. MINITAB willcreate individual plots for each pair of variablesselected. In the Six Sigma method, theselected Y should be the dependent variableand the selected X the independent variable.Select as many combinations as you wish.

Click the OPTIONS button to add jitter tothe graph. Where there are multiple data points with the same value, this will alloweach data point to be seen.

Click FRAME button to give the options for setting common axes or multiple graphs.Click ATTRIBUTES button to access options for changing Graphic colors or fill type.Click ANNOTATION button to access options for changing the appearance of the datapoints or to add titles, data labels or text.

Results: If Y changes as X changes, there is a potential relationship. Use the graph tocheck visually for linearity or non-linearity.

Histogram Histogram GRAPH>HISTOGRAM GRAPH>HISTOGRAM

The histogram is useful to lookgraphically at the distribution of data.

In the GRAPH VARIABLES Box selecteach variable you wish to graphindividually.

Click the OPTIONS button to change theHistogram displayed:

• Type of Histogram - Frequency(Default); Percent; Density

• Type of Intervals - Midpint (Default ) or Cutpoint• Definition of intervals - Automatic (Default ) or manual definition

See Minitab Help for explanation of how to use these options. Click HELP in theHISTOGRAM>OPTIONS Dialog Box.

Click FRAME button to give the options for setting common axes or multiplegraphs.

Click ATTRIBUTES button to access options for changing graphic colors or filltype.

Design for Six Sigma -Design for Six Sigma -Tolerance Analysis (continued)Tolerance Analysis (continued)

Gather and prepare the required data• gap nominals• gap specification limits• process data for each component

� process mean� process s.st� process s.lt

• process data assumptions, if data not available: try 'expert data sources'� or....process s.lt estimates when data is not available� s.st, s.lt from capability data� Z-shift assumptions, long term-to-short term, when one is known

♦ multiply s.st by 1.6 for a variance inflation factor♦ multiply s.st by 1.6+ for a process that has less control long term

( divide s.lt by the above factor if long-term historical data is known)

The Tolanal.xls spreadsheet performs its analysis using RootSum of Squares method, and should only be applied to LINEARrelationships (i.e. Y=X1+X2-X3). Non-linear relationships requiremore detailed analysis using advanced DFSS tools such asMonte Carlo or the ANALYSIS.XLS spreadsheet. Contact aMaster Blackbelt for support.

Using RSS, the statistics for the GAP can be explained asfollows. The bar (−) above the terms designates mean values &“S” designates Standard Deviation.___ _ __ __ __ __GAP = A - B1 - B2 - B3 - B4

Given these equations, the impact that each individual part hason the entire system can be analyzed. In order to perform thisanalysis, follow these steps:

Linear Tolerance SpreadsheetLinear Tolerance SpreadsheetOnce the data has been collected, it can be analyzed using theTolanal.xls spreadsheet described on the next page. TheTolanal.xls spreadsheet can be found on the GEA website, underSix Sigma, Forms & Tools.

( )sssssS BBBBAgap2

42

32

22

12 ++++=

Every problem solving task is focused on finding out something. The investigation willbe more effective if it is planned. Planning is appropriate for Gage R&R, characterizingthe process, analyzing the process for difference (hypothesis testing), design ofexperiments or confirmation run analysis. In short it is appropriate in every phase of theMAIC or MAD process.

This investigative process is called “reverse loading” because the process begins with aquestion focusing on what is desired at the end of the process.

19 52

The Planning QuestionsThe Planning Questions

1) What do you want to know?2) How do you want to see what it is that

you need to know?3) What type of tool will generate what it

is that you need to see?4) What type of data is required of the

selected tool ?5) Where can you get the required type

of data?

CriticalQuestions

Plan

ExecuteCopyright 1995 Six Sigma Academy, Inc.

ThePrinciple ofReverseLoading

KnowSee

ToolData

Where

TolerancingTolerancing Analysis - Analysis -LinearLinear SpreadsheetSpreadsheet

Input your process data into the spreadsheet· input gap technical requirements· input interface dimension 'nominals' as the baseline case· input design vectors from loop diagram· input process σ's (long and short term from available data)

Analyze the initial output· Gap: Min/Max of constraints (info only) vs. Z.upper and Z.lower for 'Gap' (CTQ)· Parts: RSS-σ.st contribution by each part: target the high RSS% first

Modify the input variables to optimize the Z-gap..change:

· Part nominals vs. means (implies process shift... tooling sensitive?)· Component σ.st (caution here... if you reduce, you are making 'big' assumptions )· Z-shift factor (change the σ.lt, using actual data or assumptions)· Target CTQ specifications (if not constrained... negotiate with Customer)

Review your output for the optimized Z-gap condition· If initial Z-gap is very high, you can move part nominals or 'open' the variation (don't penalizeyourself by constraining processes too much)

· If you cannot get Z-gap to your goals, re-design should be considered· Understand ALL implications of your changes to any of the input variables

Establish your tolerances based on %RSS contribution, sensitivity to Z.gap and desiredsigma' level of the particular contributing dimension; know the effect on your NOMINALdesign

· highest %RSS contributors will have most impact. Iterate by moving the nominal by 1.0 σ.stin both directions.... continue iterating to 2*s.st.., 4*s.st.., 5*s.st.., etc.

· understand and weigh your risks by increasing tolerance (effect on nominals, subsequentoperations, etc)... how is the process managed and what are the controls? Who supplies?

2. Input target dimensions andvector direction

Spreadsheet identifies majorcontributors to

system variation

3. Input shortand long termσs of part dims

1. Input thetechnical

requirements

53 18

Six Sigma Process ReportSix Sigma Process ReportAnalysis of Continuous data

Actual (LT) Potential (ST)

2015105

Process Performance

USLLSL

Actual (LT) Potential (ST)

1,000,000

100,000

10,000

1000

100

10

187654321

Potential (ST)Actual (LT)

Sigma

PPM

(Z.Bench)

Process Benchmarks

215.402

2.17

15034.0

3.52

Process Demographics

Award158

22Likert S

PerceiveAgingMountain

Wine QuaDick KelAugust 1

Opportunity:Nominal:Lower Spec:

Upper Spec:Units:

Characteristic:Process:Department:

Project:Reported by:Date:

Report 1: Executive Summary

Date 06/31/96Reported By Duke BrewsterProject Shoe CastDepartment Brake DivisionProcess CastingCharacteristic HardnessUnits BrinellUpper Spec 42Lower Spec 38Nominal 40OpportunityData SourceTime Span 01/01/96 -06/31/96Data Trace Bin # 1057a-9942

The Six Sigma Process Report, “SixSigma>Process Report” displays data to enablethe analysis of continuous process data. Thedefault reports are the Executive Summary (Report1)and the Process Capability Report (Report 2).

To use this tool effectively, the response data (Y)must be collected in rational subgroups of two (2)or more data points. In addition to the Y data, ademographics column may be added to provide thedemographic information on the right side of Report1. The demographics column must be entered inexact order shown if used. See figure.

Once the data is entered, create the report bycalling “Six Sigma>Process Report”.

1. Identify the configuration of the Y data, andthe location of the useful data. (columns orrows).

2. Identify the CTQ specifications or the locationof the demographic information.

3. If detailed demographic information is to beused, select the Demographics button. Eitherenter the data for the information (shown atthe left) in the dialog box or reference aspreadsheet column with this informationlisted as shown.

4. When the report is generated with only thisinformation, the default reports will be shown.If additional reports are desired, they can beaccessed through the “Reports” button.

543210

S=1.691

3.0SL=4.342

-3.0SL=0.000

87654321

161514131211109

Xbar and S Chart

Subgroup

X=12.44

3.0SL=15.74

-3.0SL=9.133

22 8

20.7227 9.2773

Potential (ST) CapabilityProcess Tolerance

Specification

IIIIII

22 8

18.6145 6.2592

Actual (LT) CapabilityProcess Tolerance

Specification

IIIIII

MeanStDevZ.USLZ.LSL

Z.BenchZ.ShiftP.USLP.LSL

P.TotalYielPPM

CpCpkPp

Ppk

LTST

Capability Indices

Data Source:Time Span:

0.721.13

15034.098.49660.0150340.0150330.000001

1.3513 2.1692 2.1692 4.6756 2.045412.4368

0.781.22

215.40299.97850.0002150.0001080.000108 1.3513 3.5205 3.7003 3.7003 1.891715.0000

Executive Summary - Top left graphic displays the predicted distribution based on data.MINITAB assumes normal data and will display a normal curve whether the data isnormal or not. The lower left hand graphic displays the expected PPM defect rates assubgroups are added to the prediction. When this curve stabilizes (levels off) enoughdata has been taken. The Process Benchmarks show the reported Z Benchmark scoresand PPM (Defects in both tails are combined) (Page 8).

Capability Study - The control charts provide an excellent means for diagnosing therational subgrouping process. Use normal techniques for analysis of this chart (Page 54).The capability indices on the right provide tabular results of the study. The bar diagramsat the bottom of the report show comparative graphics of the short term and long termprocess predictions.

Design Of ExperimentsDesign Of ExperimentsBaselining data collection is considered passive observation. The process is monitoredand recorded without intentional changes or tweaking. In Designed Experiments,independent variables (Factors) are actively manipulated and recorded and the effect onthe dependent variable (Response) is observed. Designed experiments are used to:

• Determine which factors (X’s) have the greatest impact on the response (Y).• Quantify the effects of the factors (X’s) on the response (Y).• Prove the factors (X’s) you think are important really do affect the process.

OrthogonalityOrthogonalitySince our goal in experimentation is to determine the effect each factor has on theresponse independent of the effects of other factors, experiments must be designed so asto be horizontally and vertically balanced. An experimental array is vertically balanced ifthere are an equal number of high and low values in each column. The array ishorizontally balanced if for each level within each factor we are testing an equal number ofhigh and low values from each of the other factors. If we have a balanced design in thismanner, it is Orthogonal. Standard generated designs are orthogonal. When modifying orfractionating standard designs be alert to assure maintenance of orthogonality.

RepetitionRepetitionCompleting a run more than once without resetting the independent variables is calledrepetition. It is commonly used to minimize the effect of measurement and to analyzefactors affecting short-term variation in the response.

ReplicationReplicationDuplicate experimental runs more than once after resetting the independent variables iscalled replication. It is commonly used to assure generalization of results over longer termconditions. When using MINITAB for experimental designs, Replications can beprogrammed during the design creation.

RandomizationRandomizationRunning experimental trials in a random sequence is a common, recommended practicethat assures that variables that change over time have an equal opportunity to affect all theruns. When possible, randomizing should be used for designed experimental plans. It isthe default setting when MINITAB generates the design, but can be deselected using theOPTIONS button.

BlockingBlockingA block is a group of “homogeneous units”. It may be a group of units made at “the sametime”, such as a block by shift or lot, or it may be a group of units made from “the samematerial” such as raw material lot or manufacturer. When blocking an experiment, you areadding a factor to the design; i.e., in a full factorial 24 experiment with blocking, the actualdesign will analyze as a 25-1 experiment. When analyzing processes subject to multipleshift or multiple raw material flow environments, etc, blocking by those conditions isrecommended.

5417

Normality of DataNormality of Data

97.585.072.560.047.535.0


717069


Variable: Normal

69.021

9.416

69.121

Maximum3rd QuartileMedian1st QuartileMinimum

NKurtosisSkewnessVarianceStDevMean

P-Value:A-Squared:

70.737

10.662

70.879

103.301 76.653 69.977 63.412 29.824

5000.393445-5.0E-02100.00010.000070.0000

0.3280.418


95% Confidence Interval for Sigma



Descriptive Statistics

P-Value: 0.328A-Squared: 0.418

Anderson-Darling Norm

N: 500StDev: 10.0000Average: 70.0000

1069686766656463626

.999.99.95

.80

.50

.20

.05

.01.001

Prob

abilit

y

Normal

Normal Probability Plot

100500

1514131211109080706050

Subgroup

Means1111111111

111111111111111

111111111111

11111111

11111

11111

1111111111111

1111111111111111111

1111111111111

X=100.3.0SL=113.

-

5040302010

Ranges

1

R=22.83

3.0SL=48.2

-

Xbar/R Chart for Mystery

Data from many processes can be approximated by a normal distribution. Additionally,The Central Limit Theorem states that characteristics which are the average of individualvalues are likely to have an approximately normal distribution. Prior to characterizing yourproject Y, it is valuable to analyze the data for normality to confirm whether the data doesfollow a normal distribution. If there is strong evidence that the data do not follow a normaldistribution, then predictions of future performance should not be made using the normaldistribution.

Use “Stat>Basic Stats>Normality Test”(Fig 1) (Page 21) or “Stat>BasicStat>Descriptive Statistics” (Fig 2) (Page 21)with “Graphs>Graphical Summary” checked.If using “Normality Test”, the default is“Anderson-Darling”. Use that test for mostinvestigations. Use other tests with caution.For example, Kolmogorov-Smirnov is actually aless sensitive test.

The test statistic for primary use in analyzing thetest results is the P value. The null hypothesis,Ho,states that the process is normal, so if the pvalue <.05, then there is evidence that the datado not follow a normal distribution. If theprocess shows non-normality, either there arespecial causes of variation that cause the non-normality, or the common cause variation is notnormal. Analyze first for special cause.

Use Stat>Control Charts (Fig 3)(Page 49)orPlot>Time Series Plot (Page 24) to look for“out of control” points or drifts of the processover time. Try to determine the cause of thosepoints and separate, or stratify, the data usingthat knowledge. If the levels of X’s have beencaptured, use graphics to aid in visualizing theprocess stratified by the X’s. If the data can bestratified and within the strata the data isnormal, the process can be characterized at theindividual levels and perhaps characterizedusing the Product Report (page 15). Thediscovery of a special cause contributing to non-normality may lead to improving the process. Ifthe common cause variation is non-normal, itmay be possible to transform the data to anapproximately normal distribution. MINITABprovides such a tool in “Stat>ControlCharts>Box-Cox Transformation” (Page 24).Additional notes on data transformation can befound in the Quality Handbook; Juran Chap 22.

`

Fig 1

Fig 3

Fig 2

Fig 3

Factorial DesignsFactorial DesignsFactorial Designs are primarily used to analyze the effects of two or more factors andtheir interactions. Based on the level of risk acceptable, experiments may be either fullfactorial, looking at each factor combination , or fractional factorial, looking at a fraction ofthe factor combinations. Fractional Factorial experiments are an economical way toscreen for vital X’s. They only look at a fraction of the factor combinations. Their resultsmay be misleading because of confounding, the mixing of the effect of one factor with theeffect of a second factor or interaction. In planning a fractional factorial experiment, it isimportant to know the confounding patterns, and confirm that they will not preventachievement of the goals of the DOE.

To create a Factorial Experiment usingMINITAB, select STAT>DOE>CREATEFACTORIAL DESIGN. In the dialog box(Fig 1) select the Number of Factors andthen the Designs Button. If the numberof factors allows both a fractional and fullfactorial design, the Designs dialog box(Fig 2) will show the available selectionsincluding both full and fractional designs.Resolution, which is a measure ofconfounding is shown by each displayeddesign. While in this dialog box identifythe number of replicates and blocks to beused in the design. Select OK to return tothe initial dialog box. Select Options. Inthe Options dialog box select RandomizeRun if planned. Finally, select theFactors button and in that dialog box,name the factors being studied and thefactor experimental levels. Click OK twiceto generate the completed design. Thedesign will be generated on the MINITABworksheet as shown in Fig 3. An analysisof the design, including the designResolution and confounding will be

generated in the MINITABSession Window.

Now run the experiment andcollect the data. Record rundata in a new column in samerow as run factor settings.

Fig 2

Fig 1

Characterizing the Process -Characterizing the Process -Rational Subgrouping

To separate the measurement of Z.ST and Z.LT and understand fully how the processoperates, capture data in such a way to see both short term variation, inherent to thetechnology being used, and long term variation, which reflects the variation inducedby outside influences. The process of collecting data in such a manner is called“Rational Subgrouping”. Analyzing Rational Subgroups allows analysis of “centeringvs. spread” and “control vs. technology.”

Steps to characterize a process using Rational Subgroups1. Work with operational advocacy team to define the factors (X’s) suspected asinfluential in causing output variation (Y). Confirm which of these factors areoperationally controllable and which are environmental. Prioritize and understand thecycle time for sensing the identified factors. Be sure to question the effect ofelements of all the 5M’s of process variation:

Machine - Technology; Maintenance; SetupMaterials - Batch/Lot/Coil DifferencesMethod - MTS; Workstation layout; Operator methodManpower - Station Rotation; Shift Changes; skill levelsMeasurement - R&R; Calibration effectsEnvironment – Weather; Job Site or shop

2. Define a data collection plan over time that captures data within each subgrouptaken over a period of time short enough that only the variation inherent to thetechnology occurs. Subgroup size can be anything greater than two (2). Twomeasured data points are necessary to see subgroup variation. Larger subgroupsprovide greater sensitivity to the process changes, so the choice of subgroup sizemust be made to balance the needs of the business and the need for processunderstanding. This variation is called “common cause” and represents the best theprocess can achieve. In planning data collection use of The Planning Questions(Page 19) is helpful.

3. Define the plan to allow for collection of the subgroups over a long period of timewhich allows the elements of long term variation and systematic effects of potentiallyimportant variables to influence the subgroup results. Do not tweak, or purposelyadjust the process, but rather recognize that the process will drift over time and planthe data collection accordingly.

4. Capture data and analyze data using Control Charts (Page 53 - 55) and 6 SigmaProcess Report (Page 18) during the data collection period. Stay close to theprocess and watch for data shifts and causes for the shifts. Capture datadocumenting the levels of the identified vital X’s. This data may be helpful inanalyzing the causes of process variation. During data collection it may be helpful tomaintain a control chart or some other visual means of sensing process shift.

5. Capture sufficient subgroups of data to allow for multiple changes in all theidentified vital X’s and also to allow for a stable estimate of the mean and variation inthe output variable (Y). See 6 Sigma Process Report (Page 18) for explanation ofgraphic indicator of estimation stability.

1655

DOE AnalysisDOE AnalysisAnalysis of DOE’s includes both graphical and tabular information. Once the data for theexperimental runs has been collected and entered in the MINITAB worksheet, analyzewith STAT>DOE>ANALYZE FACTORIAL DESIGN. In the ANALYZE FACTORIALDESIGN dialog box, identify the column(s) with the response data in the Responsesbox. Select the GRAPHS button. In the GRAPHS dialog box, select PARETO for theeffects plots and change ALPHA (α level of significance) to .05. Click OK twice. Notethat we have not used the other options buttons at this time. Leave Randomize at defaultsettings. The initial analysis provides a session window output and a Pareto graph.

20100

BD

BDDE

ECE

ABCABBEAEADACCD

C

Pareto Chart of the Effects(response is PCReact, Alpha = .05)

A: FeedrateB: CatalystC: AgitateD: TemperaE: Concentr

Analysis of the DOE requires both graphic and model analysis, however, the model shouldbe generated and analyzed before full graphic analysis can be completed. An analysis ofthe Fit Model in the MINITAB Session window shows the amount of effect and the modelcoefficients. Most important though is the ANOVA table. This table may show thesignificant factors or interactions (See Balanced ANOVA Page 40). In this case, the Fscore is shown as “**” and there are no “p” values. This indicates that the model asdefined is too complex to be analyzed with the amount of data points taken. The modelneeds to be simplified. The Pareto graphic is a helpful tool for that. Note that effects B,Dand E and interactions BD and DE show as significant effects. Remaining non-significanteffects can be eliminated.Rerun STAT>DOE>ANALYZE FACTORIAL DESIGN . This time select the TERMSoption button. In the dialog box, deselect the terms not shown as significant in the Pareto.Click OK. Select STORAGE and select RESIDUALS and FITS. Click OK twice. Theresulting ANOVA table shows the significance of the factors and the model coefficients areprovided. Next, Run STAT>DOE>FACTORIAL PLOTS. Select and setup each of theplots, MAIN EFFECTS, INTERACTIONS and CUBE as follows. Identify the responsecolumn in the RESPONSES box. Select only the significant factors to be included in theplot. Click OK twice to generate the plots. Confirm the significance of effects andinteractions graphically using MAIN EFFECTS and INTERACTIONS plots. Use the CUBEPLOT to identify the select factor levels for achieving the most desirable response.

63180140

Concentr

Temperat

Catalyst

180

140

2

1

Interaction Plot for PCReact

ConcentrTemperatCatalyst

6318014021

75

70

65

60

55

PCR

eact

Main Effects for PCReact

80.0

94.0

66.0

62.0

47.0

64.5

55.5

53.0Concentr

emperat

Catalyst21

80

40

6

3

Cube Plot - Means for PCReact

15 56

Response Surface Response Surface Central Composite Design (CCD)Central Composite Design (CCD)

Response Surface analysis is a type of Designed Experiment that allowsinvestigation of non-linear relationships. It is a tool for fine tuning processoptimization once the region of optimal process conditions is known. Using the CCDtype RS Design, you will be designing an experiment that tests each factor at fivelevels, and an experiment which can be used to augment a factorial experiment thathas been completed. The CCD design will include FACTORIAL points, STAR pointsand CENTER points.

Start by Running STAT>DOE>CREATE RS DESIGN . Select CENTRALCOMPOSITE from the design type choices in the dialog box. Identify the number offactors to be studied and click the DESIGN button. In the DESIGN dialog box,select the experiment design desired, including the blocks. Click OK and then selectthe FACTORS button. In that dialog box identify the factors and their high and lowfactorial settings and click OK. Randomize runs is found in the OPTIONS dialog box.Click OK to generate the design. The design will be placed on a new worksheet.Collect data for each of the scheduled trials defined by the design. Note that therewill be multiple points run at the centerpoint of each factor and there will be starpoints for each factor beyond the factor ranges identified in the design.

Analyze the data using STAT>DOE>ANALYZE RS DESIGN. In the dialog boxidentify the response column. Leave the Use Coded Units selected and choose theappropriate setting for the USE BLOCKS box, depending on plan. Click OK and run.The resulting output is a combination of the Regression Output (Page 43) and theANOVA output (PAGE 41). The regression output analyzes how the individualfactors and interactions fit the model. The ANOVA table will analyze the type ofrelationship and also the total fit of the model. If “Lack of Fit” error is significant,another model may be appropriate. Simplify the model for terms and regressioncomplexity as appropriate. See DOE Analysis (Page 51). RerunSTAT>DOE>ANALYZE RS DESIGN and select the TERMS button. Beforererunning the simplified analysis, select STORAGE and select FITS andRESIDUALS.

Continue simplification and tabular analysis to attempt to find a simple model thatexplains a large portion of the variation. Confirm regression fit quality using ResidualPlots (Page 22). The terms in the ANOVA Matrix should show significance, exceptthat “Lack of Fit” term should become insignificant (p>.05). Next runSTAT>DOE>RSPLOTS. Select either CONTOUR or SURFACE plot and SETUPfor the selection. In the SETUP dialog box, confirm that the appropriate factors areincluded for the plot, noting that each plot will have only the factor pair shown.Check that the plot is displayed using UNCODED units and run. Use the graphicgenerated to visually analyze for optimal factor setting or use the model coefficientsand solve for the optimal settings mathematically.

30 35 40

31302928272625

95

85

75

Volume

Com

posi

tion

Contour Plot of strength

2024.5

30strength

Volume

26.527.5

28.5Volume 29.5

30.531.5

2024.525.5

30

26.5

30strength

40

30

40

75

85Compositio95

75

Compositio

Six Sigma Product ReportSix Sigma Product ReportThe Six Sigma Product Report “Six Sigma>Product Report” is used tocalculate and aggregate Z values from discrete data and data from multiplenormal processes.Enter “# defects”, “# units” and “# opportunities” data in separate columns inMINITAB. When Z shift is included in the calculation (1.5 default) thereported Z bench is short term. If zero is entered, the reported Z.bench islong term.Defect count - Enter the actual defects recorded in the sample population.If using defect data from Continuous Process Study, use PPM for long term.If this report is a rollup of subordinate processes, use the defect count fromthe subordinate process totals.Units - Enter the actual number of parts included in the sample populationevaluated. If using data from Continuous Process Study, use 1,000,000. Ifthis report is a rollup of subordinate processes, use the actual number ofparts included in the sample population evaluated.

Opportunities - At the lowest level,use one (1) for the number ofopportunities. One (1) is thenumber of CTQ’s characterized atthe lowest level of analysis. If thisreport is a rollup of subordinateprocesses, use the total number ofopportunities accounted for in thesubordinate process.Characteristics (Optional) Enterthe test name for theCharacteristic, CTQ or subprocess.

Shift - Process ZSHIFT can be entered three ways. If the report is an agregateof a number of continuous data based studies, for example, a part withmultiple CTQ’s, the ZSHIFT data can be entered in the worksheet as a separatecolumn and refered to in the Product Report dialog box. A fixed ZSHIFT of 1.5is the default and will be used if nothing is specified. A ZSHIFT of zero (0) willproduce a report that shows only the long-term results.As the levels are rolled up, the data from the totals in the subordinatedprocesses will become line items in the higher level breakdown. In the chartabove, the process reported includes data from 12 subprocesses. Doorassembly South includes a process that included six (6) CTQ’scharacterized.Analyzing the reportThe far right hand column of the report shows the Z.Bench for the individualprocesses and for the Cumulative Z.Bench. The number at the bottom of theDPO column, in this case 0.081917, reports the P (d), probability of a defectat the end of the line.

Total

C91

C90

C87

Plastics

C86

Door Assy South

C85

C84

Sealed System Low

Sealed System High

C83

CG Case

Characteristic

2.892

2.969

3.492

3.211

2.262

3.752

2.824

3.392

3.101

3.604

4.139

3.344

2.233

ZBench

1.500

1.500

1.500

1.500

1.500

1.500

1.500

1.500

1.500

1.500

1.500

1.500

1.500

ZShift

81917

70852

23166

43534

223144

12174

92673

29223

54667

17708

4157

32627

231767

PPM

0.081917

0.070852

0.023166

0.043534

0.223144

0.012174

0.092673

0.029223

0.054667

0.017708

0.004157

0.032627

0.231767

DPO

0.071

0.023

0.044

0.223

0.012

0.556

0.029

0.055

0.053

0.008

0.033

0.695

DPU

1465992

66636

66636

66636

66636

66636

399816

66636

66636

199908

133272

66636

199908

TotOpps

1

1

1

1

1

6

1

1

3

2

1

3

Opps

66636

66636

66636

66636

66636

66636

66636

66636

66636

66636

66636

66636

Units

120089

4721

1544

2901

14869

811

37052

1947

3643

3540

554

2174

46332

Defs

Report 7: Product Performance

57

Control charts are a practical tool for detecting product and/or process performancechanges in and R over time in relation to historical performance. Since they are arigorous maintenance tool, control charts should be used as an alternate to closed loopprocess control, such as mechanical sensing and process adjustment.

Common and special-caused variation can be seen in rationally subgrouped samples:

• common-cause variation characterized by steady state stable process variation(captured by the within subgroup variation).

• special-cause variation characterized by outside assignable causes on the processvariation (captured by the “between subgroup variation).

• Control Chart Analysis signals when the steady state process variation has beeninfluenced by outside assignable causes.

Variables Control ChartsVariable Control Charts are used in pairs. One chart characterizes the variation ofsubgroup averages, and the other chart characterizes the variation of the spread of thesubgroups.Individual Charts (X/Moving Range): These charts are excellent for tracking long termvariation changes. Because they use a single measurement for each data point, they arenot a tool of choice where measurement variation is involved, such as with partdimensions. They work well with temperatures, pressures, concentration, etc.Subgroup Charts (XBar R or Xbar S): These charts are excellent for tracking changesin short term variation as well as variation over time. They require multiple measurements(two or more) in each subgroup. Using rational subgroup techniques with this chartenables graphic analysis of both short term variation changes (Range or S) and long termvariation (X Bar chart). This chart is the chart of choice where measurement variation isinvolved. It is also an excellent tool for tracking processes during baselining orrebaselining, since it assists in pointing to special cause influence on results. Becausethere is usually no change in temperature, pressures or concentration in the short term,they are not used for that type of measurement.Attribute ChartsAttribute Control Charts are a single chart. The common difference between these chartsis whether they track proportion, a ratio, or defects, a count.Proportion Defective Charts (P charts): This chart tracks proportion. The data pointplotted is the ratio Number of defects/Number of Pieces Inspected . In using proportion defective charts,the number of pieces in a sample can vary, and the control limit for the chart will varybased on that sample size.Number Defective Charts (nP Charts): This chart tracks defect count. The data pointplotted is the number of defects in a sample. Because the data point is a number relativeto a sample size, it is important that the sample size be relatively constant betweensamples. The sample size should be defined so that the average number of defects is atleast five in order for this chart to be effective.In setting up Control Charts, use the Planning Questions (Page 19) first. Those questionsalong with these notes will help define the type of chart needed. Use SETCIM, MINITAB,SPQ (Supplier Process Quality) or other electronic methods for long term charting.

Control ChartsControl Charts

P(d) 1 YNA= −

Rolled Throughput YieldRolled Throughput Yield

Normalized Average YieldNormalized Average YieldNormalized Average Yield (YNA) is the average yield of oneopportunity. It answers the question “What is the probability that theoutput of this process will meet the output requirements?” The YNAof a process is an average defect rate, and can be used forcomparing processes with differing levels of complexity.

Normalized Average Yield (YNA), is the probability of good product, soif we calculate 1-YNA, we can find the Probability of a defect, P(d).With this we can find the Z.LT score for a process.

iesOpportunit

RTNA YY1

)(=

DPU

RT eY −=

Rolled Throughput Yield (YRT) is the probability of completing all theopportunities in a process without a defect.

As such, it is a tool which can focus the investigation when narrowingdown the problem from a larger business problem.

In a process which has 18 stations, each with 5 opportunities, andDPO = 0.001, the YRT is .9139, calculated as follows:

In addition to the straight multiplication method of calculating YRT

YRT= Y1 x Y2 x Y3 ......x YNWhere Y1, Y2, Y3....YN are yields of individual stations or operationsin a process.

YRT can also be estimated using the Poisson Approximation

.91389(.995))(.999RT

Y

)((Yield)RT

Y18185

Stations#StationiesOpportunit#

===

=

DPU Ln(Yrt )≅ −And conversely

14

A lack of control (“out of control”) is indicated when oneor more of the following rules apply to your chartdata:

1. A single point above or below a control limit2. Two out of three consecutive points are on the same side of the

mean, in Zone A or beyond10 / 11 points above Mean12 / 14 points above Mean

3. Four out of five consecutive points are on the same side of the mean,in Zone B or beyond

4. At least eight consecutive points are on the same side of the mean,in Zone C.

5. 7 points in a row trending up or 7 points in a row trending down6. 14 points sequentially alternating up then down then up, etc..7. 14 points in a row in Zone C on both sides of the mean.8. 8 points in a row alternating in Zone B or beyond.

Interpreting Variables ControlInterpreting Variables ControlChartsCharts

UCL

LCL

X

A

B

C

C

B

A

Rule 1Rule 2 Rule 3 Rule 4 Rule 5

Note: A, B, and C represent plus and minus one, twoand three sigma zones from the overall process average.

Note: A, B, and C represent plus and minus one, twoand three sigma zones from the overall process average.

13

DPU / DPODPU / DPODPUDPU is the number of defects per unit produced. It’s an average. Thismeans that on the average, each unit produced will have so many defects.

DPU gives us an index of quality generated by the effects of process,material, design, environmental and human factors. Keep in mind thatDPU measures symptoms, not problems. (It’s the Y, not the X’s).

DPU = (# Defects) / (# units)[DPU is the average number of defects in a unit]

DPU forms the foundation for Six Sigma. From DPU and a knowledge ofthe opportunities, we can calculate the long term capability of the process.

OpportunityAn opportunity is anything you measure, test or inspect. It may be a part,product or service CTQ. It can be each of the elements of an assembly orsubassembly.

DPODPO is the number of defects per opportunity. It is a probability.

[DPO is the probability of a defect on any one CTQ or step of a process]

Yield = 1-DPO

DPO is the foundation for determining the Z value when using discretedata. To find Z, given DPU, convert DPU to DPO. Then look up the P(d)for DPO in the body of the Z table. Convert to Z score (page 7).

UnitiesOpportunit # * Units =#iesOpportunit Total

unitiesOpportunit

DPU * Units

Defects# = (DPO)y Opportunit per Defectstuni

iesOpportunit=

58

59

Analysis Criteria.• Desirable system will have a Gage R&R <10% and Categories of

Discrimination >5.• The system is acceptable if Gage R&R >10%, but < 20% and discrimination

categories =5.• If Gage R&R is > 20%, but < 30% and Categories of Discrimination =4, the

decision about acceptability will be based on importance of measuring thecharacteristic and business cost.

• If Gage R&R is >30%, or the Categories of Discrimination < 4, themeasurement system is not considered acceptable and needs to beimproved.

MINITAB Analysis OutputsMINITAB provides a tabular and graphicaloutput. The tabular output has threetables; the first an ANOVA table (seeANOVA Interpretation; Page 37). Thesecond table provides raw calculatedresults of the study and the third tableprovides the percent contribution results.Interpretation of Gage R&R results isfocused on the third table. The third tabledisplays the “% Contribution” and “%Study Variation”. “% Contribution” and “%Study Variation” figures are interpreted asGage R&R. If you have included atolerance range with the “Options” button,

this table will also report a “% Tolerance” result.The Number of Distinct Categories is also provided. This number indicates how manyclassifications can be reliably distinguished given the observed process variation.

The graphical analysis providesseveral important graphic tools.• The control chart should appear

out of control. Operator tooperator variation defines controllimits. If the gage has adequatesensitivity beyond its own noise,more than 50% of the points willbe outside the control limits. Ifthis is not the case, the systemis inadequate to detect part-to-part variations.

• The range chart should be incontrol, showing consistency between the operators. If there are only two or threedistinct ranges recorded, it may indicate lack of gage resolution.

• The column chart shows the graphic picture of data provided in table three of thetabular report. The graphics on the right show various interaction patterns that maybe helpful in troubleshooting a problem measurement system.

(1) Measurement Systems Analysis Reference Manual; ©AIAG 1994

12

( )

( )( )

X

, and

R

RUCL R and LCL RR Rm m

=+ + +

= −

=+ + +

−= ±

= =

+

−

X X Xk

R X XR R R

kCL X E

D D

k

m i i

mk

x m

m m

1 2

1

1 2 1

2

4 3

1

...

...

Control Chart ConstantsControl Chart Constantsn A2 A3 D3 D4 B3 B4 d 2 c 41 2.660 3.760 - - - - - -2 1.880 2.659 0 3.267 0 3.267 1.128 0.79793 1.023 1.954 0 2.575 0 2.568 1.693 0.88624 0.729 1.628 0 2.282 0 2.266 2.059 0.92135 0.577 1.427 0 2.115 0 2.089 2.326 0.94006 0.483 1.287 0 2.004 0.03 1.970 2.534 0.95157 0.419 1.182 0.076 1.924 0.118 1.882 2.704 0.95948 0.373 1.099 0.136 1.864 0.185 1.815 2.847 0.96509 0.337 1.032 0.184 1.816 0.239 1.761 2.970 0.969310 0.308 0.975 0.223 1.777 0.284 1.716 3.078 0.9727

Variables Control Chart Control Limit Constants

( )

( )

XX X . . . X

k,where X X

n

RR R . . . R

kUCL X A R and LCL X A R

UCL D R and LCL D R

1 2 k ii 1

n

1 2 k

x 2 x 2

R 4 R 3

=+ + +

=

=+ + +

= + = −

= =

=�

Average/Range Chart Individual X /Moving Range Chart

( )( )

np # defective for each subgroup

np npk

, for all k subgroups

UCL np 3 np

UCL np 3 np

np

np

=

=

= + −

= − −

1

1

p

p

p Chartsnp Charts

( )

( )

p npn

and p npN

UCL p 3p 1 p

n

LCL p 3p 1 p

n

p

p

= =

= +−

= −−

np = number of defectivesn= subgroup sizeN= total number defectives for all

subgroups

15%

60

What it is:Gage R&R is a means for checking the measurement system (gage plusoperator) to gain a better understanding of the variation and sources from themeasurement system.

Gage R&R= mσ * .515 or 5.15* AVEV 22 +Where σm =Measurement System standard deviation

Components of Measurement Error• Repeatability = Equipment Variation (EV): The variation in

measurements attributable to one measurement instrument when usedseveral times by one appraiser to measure the identical characteristicon the same part.

• Reproducibility = Appraisal Variation (AV): The variation inmeasurements attributable to different appraisers using the samemeasurement instrument to measure the same characteristic on thesame part.

How to do the gage R&R study.1. Determine how the gage is going to be used; i.e., Product Acceptance

or Process Control. Gage must have resolution 10X finer than the process variation it is

intended to measure. (i.e., measurement of parts with processvariation of .001 requires a gage with .0001 resolution)

2. Select approximately ten parts which represent the entire expectedrange of the process variation, including several beyond the normallyacceptable range. Code (blind) the parts.

3. Identify two or three Gage R&R participants from the people who actuallydo the measurement. Have them each measure each part two or threetimes. The measurements should by done with samples randomized andblinded.

4. Record results on a MINITAB worksheet as follows:a) One Column - Coded Part Numbers (PARTS)b) One Column - Appraiser number or name (OPER)c) One Column - Recorded Measurement (RESP)

5. Analyze using MINITAB by running “Stat>Quality Tools>GageR&R”a) In the initial dialog box choose ANOVA method.b) Identify the appropriate columns for “PARTS”, “OPERATOR”,

and “MEASUREMENT Data “.c) If you wish to include the analysis for process tolerance, select

the “OPTIONS” button. This is only to be used if the gage isfor pass fail decisions only, not for process control.

d) If you wish to show demographic information on the graphicoutput, including gage number, etc, select the “GageInformation” button.

Gage R&R Gage R&R (1)(1)

11

PrecontrolPrecontrol

Provide ongoing visual means of on-the-floor process control.

• Gives operators decision rules for continuing or stoppingproduction.

• Rules are based on probability that population mean hasshifted.

Why use it?

What does it do?

How do I do it?1. Establish control zones:

2. When five parts in a row are “green”, the process isqualified.

3. Sample two consecutive parts on a periodic basis.

4. Decision rules for operators:A. If first part is green, no action needed, continue to run.

B. If first part is yellow, then check a second part.» If second part is green, no action needed.» If second part is yellow on same side, then adjust» If second part is yellow on opposite side, stop, call

support engineer.C. If any part is red, stop, call support engineer.

5. After correcting and restarting a process, must achieve 5consecutive “green” samples to re-qualify.

YellowRed RedYellowGreen

-1.5 s-3.0 s +1.5 s +3.0 s

µ

.07 .07.86

61 10

Data Validity StudiesData Validity StudiesNon Measurement data is that which is not the result of a measurementusing a gage.Examples:

•Finance data (T&L cost; Cost & Benefits; Utility Costs; Sales, etc.)•Sales Data (Units sold; Items purchased, etc.)•HR data (Employee Information; medical service providerinformation)

•Customer Invoice Data

Samples of data should be selected to assure they represent thepopulation. A minimum of 100 data points is desirable. The data is thenanalyzed for agreement by comparing each data point (as reported bythe standard reporting mechanism) to its true observed value.

The validity of the data is reported as % Agreement.

100nsObservatio ofNumber Agreements ofNumber Agreement % ×�

��

��=

% Agreement should be very good. Typically this measure is muchgreater than 95%.

% Agreement for Binary% Agreement for Binary(Pass / Fail) Data(Pass / Fail) Data

Calculate % Agreement in similar manner to Non Measurement, exceptusing the following equation.

100iesOpportunit ofNumber Agreements ofNumber Agreement % ×�

��

��=

Where the number of opportunities is found by the following equations.

n = total number of assessments per samples = number of samples

��

�

�

��

�

� −×=

412ns iesOpportunit #

then odd, is n If

��

�

�

��

�

�×=

4

2ns iesOpportunit #

then even, is n If

• Overall % Agreement = Agreement rate for all opportunities

• Repeatability % Agreement = Compare the assessments for one operator overmultiple assessment opportunities. (Fix this problem first)

• Reproducibility % Agreement = Compare assessments of the same part fromoperator to operator.

Project ClosureProject ClosureAt Closure, the project must be positioned so that the changes made to the process aresustainable over time. Doing so requires the completion of a number of tasks.

1. The improvement must be fully implemented, with leverage factors identified andcontrolled. The process must have been re-baselined to confirm the degree ofimprovement.

2. Process owners must be fully trained and running the process, controlling theleverage factors and monitoring the Response (Y).

3. Required Quality Plan and Control Procedures, drawings, documents, policies,generated reports or institutionalized rigor must be completed .

• Workstation Instructions• Job Descriptions• Preventive Maintenance Plan• Written Policy or controlled ISO documents• Documented training procedures• Periodic internal audits or review meetings.

4. The project History Binder must be completed which records key information aboutthe project work in hard copy. Where MINITAB has been used for analysis, hardcopies of the generated graphics and tables should be included.

• Initial baseline data• Gage R & R calculations• Statistical characterization of the process• DOE (Design of Experiments)• Hypothesis testing• Any data from Design Change Process activities (described on the next page),

Failure Modes and Effects Analysis (FMEA), Design for Six Sigma (DFSS),etc.

• Copies of engineering part and tooling drawing changes showing “Z” scorevalues on the drawings.

• Confirmation run data• Financial data (costs and benefits)• Final decision on improvement and conclusions• All related quality system documents• A scorecard (with frequency of reporting)• Documented control plan

5. All data entries must be complete in PROTRAK• Response Variable Z scores at initial Baselining• Response Variable Z scores at Re-baselining,• Project Definition• Improvements Made• Accomplishments, Barriers and Milestones for all project phases• Tools used for all project phases.

6. Costs and Benefits for the project must be reconfirmed with site finance.

7. Investigate potential transfer opportunities where project lessons learned can beapplied to other business processes.

8. Submit closure package for signoff through the site approval channels.

Sample SizeSample Sizeα α α α = 20% αααα = 10% αααα = 5% αααα = 1%

ββββ 20% 10% 5% 1% 20% 10% 5% 1% 20% 10% 5% 1% 20% 10% 5% 1%δ/σδ/σδ/σδ/σ0.2 225 328 428 651 309 428 541 789 392 525 650 919 584 744 891 12020.3 100 146 190 289 137 190 241 350 174 234 289 408 260 331 396 5340.4 56 82 107 163 77 107 135 197 98 131 162 230 146 186 223 3000.5 36 53 69 104 49 69 87 126 63 84 104 147 93 119 143 1920.6 25 36 48 72 34 48 60 88 44 58 72 102 65 83 99 1340.7 18 27 35 53 25 35 44 64 32 43 53 75 48 61 73 980.8 14 21 27 41 19 27 34 49 25 33 41 57 36 46 56 750.9 11 16 21 32 15 21 27 39 19 26 32 45 29 37 44 59

1.0 9 13 17 26 12 17 22 32 16 21 26 37 23 30 36 481.1 7 11 14 22 10 14 18 26 13 17 21 30 19 25 29 401.2 6 9 12 18 9 12 15 22 11 15 18 26 16 21 25 331.3 5 8 10 15 7 10 13 19 9 12 15 22 14 18 21 281.4 5 7 9 13 6 9 11 16 8 11 13 19 12 15 18 251.5 4 6 8 12 5 8 10 14 7 9 12 16 10 13 16 211.6 4 5 7 10 5 7 8 12 6 8 10 14 9 12 14 191.7 3 5 6 9 4 6 7 11 5 7 9 13 8 10 12 171.8 3 4 5 8 4 5 7 10 5 6 8 11 7 9 11 151.9 2 4 5 7 3 5 6 9 4 6 7 10 6 8 10 13

2.0 2 3 4 7 3 4 5 8 4 5 6 9 6 7 9 122.1 2 3 4 6 3 4 5 7 4 5 6 8 5 7 8 112.2 2 3 4 5 3 4 4 7 3 4 5 8 5 6 7 102.3 2 2 3 5 2 3 4 6 3 4 5 7 4 6 7 92.4 2 2 3 5 2 3 4 5 3 4 5 6 4 5 6 82.5 1 2 3 4 2 3 3 5 3 3 4 6 4 5 6 82.6 1 2 3 4 2 3 3 5 2 3 4 5 3 4 5 72.7 1 2 2 4 2 2 3 4 2 3 4 5 3 4 5 72.8 1 2 2 3 2 2 3 4 2 3 3 5 3 4 5 62.9 1 2 2 3 1 2 3 4 2 2 3 4 3 4 4 6

3.0 1 1 2 3 1 2 2 4 2 2 3 4 3 3 4 53.1 1 1 2 3 1 2 2 3 2 2 3 4 2 3 4 53.2 1 1 2 3 1 2 2 3 2 2 3 4 2 3 3 53.3 1 1 2 2 1 2 2 3 1 2 2 3 2 3 3 43.4 1 1 1 2 1 1 2 3 1 2 2 3 2 3 3 43.5 1 1 1 2 1 1 2 3 1 2 2 3 2 2 3 43.6 1 1 1 2 1 1 2 2 1 2 2 3 2 2 3 43.7 1 1 1 2 1 1 2 2 1 2 2 3 2 2 3 43.8 1 1 1 2 1 1 1 2 1 1 2 3 2 2 2 33.9 1 1 1 2 1 1 1 2 1 1 2 2 2 2 2 34.0 1 1 1 2 1 1 1 2 1 1 2 2 1 2 2 3

9 62

Fulfillment & SpanFulfillment & SpanFulfillment �� Providing what the customer wantswhen the customer wants itFulfillment is a highly segmented metric and typically does not follow a normaldistribution. Because the data is non-normal some of the traditional 6 Sigmatools should not be used (such as the 6 Sigma Process Report). Therefore,Median and Span will be used to measure Fulfillment.

Median � the middle value in a data setSpan � The difference between two values in the data set(e.g. 1/99 Span = the difference between the 99th percentile and the 1st

percentile)

Example

� A sample of 100 delivery times hasa high value of 40 days

� If that one value had instead been30 days, the 1/99 span wouldchange by 10 days

� 10/90 span is not affected by whathappens to that highest point

We don’t want our decision to be influenced by a single data point.Therefore, the Span calculation is dependent on the sample size. Largerdata sets will have a wider span. Following are corporate guidelines on theSpan calculation:

Sample Size Span100-500 10/90 Span500-5000 5/95 Span>5000 1/99 Span

In order to analyze a fulfillment process, the data should be segmented by thevariables that may affect the process. Each segment of data should becompared to identify if the segmenting factor had an influence on the Medianand the Span. Mood’s Median test is a tool that can be used to identifysignificant differences in Median. Factors that are identified as having aninfluence on Span and Median, should be evaluated further through designedexperimentation.

F DistributionF DistributionDenom DF 1 2 3 4 5 6 7 8 9 10

1 161.40 199.50 215.70 224.60 230.20 234.00 236.80 238.90 240.50 241.902 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.403 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.794 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.965 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.746 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.067 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.648 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.359 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.1410 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.9811 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.8512 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.7513 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.6714 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.6015 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.5416 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.4917 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.4518 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.4119 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.3820 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.3521 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.3222 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.3023 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.2724 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.2525 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.2426 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.2227 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.2028 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.1929 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.1830 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.1640 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.0860 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99120 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.91∞∞∞∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83

863

Z - An Important MeasureZ - An Important MeasureZ Short TermZ Short Term

Z.LT describes the sustained reproducibility of aprocess. It is also called “long-term capability.” Itreflects all of the sources of operationalvariation, the influence of common causevariation, dynamic nonrandom process centeringerror, and any static off-set present in theprocess mean. This metric assumes the datawere gathered in accordance to the principalsand spirit of a “rational sampling” plan (p. 14).This equation is applicable to all types oftolerances. It is used to estimate the long-termprocess “PPM.”

Z ShiftZ ShiftZ Z ZSHIFT ST LT. . .= −

Z Long TermZ Long Term

Z BenchmarkZ Benchmark

)( . PPZ LSLUSLBenchmark scoreZ +=

While the Z values above are all calculated in reference to a single spec limit,Z Benchmark is the Z score of the summation of the probabilities of defects in bothtails of the distribution. To find, sum the Probability of defect at the Lower Spec Limit(PLSL ) and the Probability of defect at the Upper Spec Limit (PUSL ). Look up the sumof the combined probabilities in a normal table to find the corresponding Z value.

Z SL TargetsST

.�

ST = −

Z.st describes how the process performs at any given moment in time. It is referred toas “instantaneous capability,” “short-term capability” or “process entitlement”. It is usedwhen referring to the “SIGMA” of a process. It is the process capability if everything iscontrolled so that only background noise (common cause variation) is present. This metricassumes the process is centered and the data were gathered in accordance to theprincipals and spirit of a rational subgrouping plan (p. 14). The “Target” assumes thateach subgroup average is aligned to this number, so that all subgroup means are artificiallycentered on this number. The sst used in this equation can be estimated by the square rootof the Mean Square Error term in the ANOVA Table. Since it is centered data, it can becalculated from either one of the Specification Limits (SL).

Z LT USLUSL

LT.

�

�= − µ

σ

Minimum of

orZ LT LSL

LSLLT

.�

�= −µ

σ

Z.SHIFT describes how well the process beingmeasured is controlled over time. It reflects thedifference between the short term and long termcapability. It focuses on the dynamic nonrandomprocess centering error, and any static off-setpresent in the process mean. Interpretation ofthe Z.shift is only valid when following theprinciples of rational subgrouping (p.14)

αααααααα=.05=.05Numerator Degrees of Freedom

F DistributionF Distribution

647

αααααααα=.05=.05Numerator Degrees of Freedom

The Standard Normal CurveThe Standard Normal Curve

** Area under the curve = 1, the center is 0 **

Z Area Z Area Z Area Z Area0.00 .500000000 1.51 .065521615 3.02 .001263795 4.53 .0000029990.05 .480061306 1.56 .059379869 3.07 .001070234 4.58 .0000023690.10 .460172290 1.61 .053698886 3.12 .000904215 4.63 .0000018670.15 .440382395 1.66 .048457216 3.17 .000762175 4.68 .0000014690.20 .420740315 1.71 .043632958 3.22 .000640954 4.73 .0000011530.25 .401293634 1.76 .039203955 3.27 .000537758 4.78 .0000009030.30 .382088486 1.81 .035147973 3.32 .000450127 4.83 .0000007050.35 .363169226 1.86 .031442864 3.37 .000375899 4.88 .0000005500.40 .344578129 1.91 .028066724 3.42 .000313179 4.93 .0000004280.45 .326355105 1.96 .024998022 3.47 .000260317 4.98 .0000003320.50 .308537454 2.01 .022215724 3.52 .000215873 5.03 .0000002580.55 .291159644 2.06 .019699396 3.57 .000178601 5.08 .0000001990.60 .274253121 2.11 .017429293 3.62 .000147419 5.13 .0000001540.65 .257846158 2.16 .015386434 3.67 .000121399 5.18 .0000001180.70 .241963737 2.21 .013552660 3.72 .000099739 5.23 .0000000910.75 .226627465 2.26 .011910681 3.77 .000081753 5.28 .0000000700.80 .211855526 2.31 .010444106 3.82 .000066855 5.33 .0000000530.85 .197662672 2.36 .009137469 3.87 .000054545 5.38 .0000000410.90 .184060243 2.41 .007976235 3.92 .000044399 5.43 .0000000310.95 .171056222 2.46 .006946800 3.97 .000036057 5.48 .0000000241.00 .158655319 2.51 .006036485 4.02 .000029215 5.53 .0000000181.05 .146859086 2.56 .005233515 4.07 .000023617 5.58 .0000000141.10 .135666053 2.61 .004527002 4.12 .000019047 5.63 .0000000101.15 .125071891 2.66 .003906912 4.17 .000015327 5.68 .0000000081.20 .115069593 2.71 .003364033 4.22 .000012305 5.73 .0000000061.25 .105649671 2.76 .002889938 4.27 .000009857 5.78 .0000000041.30 .096800364 2.81 .002476947 4.32 .000007878 5.83 .0000000031.35 .088507862 2.86 .002118083 4.37 .000006282 5.88 .0000000031.40 .080756531 2.91 .001807032 4.42 .000004998 5.93 .0000000021.45 .073529141 2.96 .001538097 4.47 .000003968 5.98 .0000000011.50 .066807100 3.01 .001306156 4.52 .000003143 6.03 .000000001

Z = 2.76Thistable liststhe tailarea tothe rightof Z.

Table of AreaUnder theNormal Curve

Copyright 1995 Six Sigma Academy, Inc.

PerformanceLimit

The Z value is a measure of processcapability and is often referred toas the “sigma of the process.” A Z= 1 indicates a process for whichthe performance limit falls onestandard deviation from the mean.If we calculate the standard normaldeviate for a given performancelimit and discover that Z = 2.76,the probability of a defect (P(d)) isthe probability of a point lyingbeyond the Z value of 2.76.

Z

Units of MeasureZ=0

µµµµµµµµMean

Point of Inflection

Z=1

1σσσσ

Z=1

Total Area = 1

Denom DF 12 15 20 24 30 40 60 120 ∞∞∞∞1 243.90 245.90 248.00 249.10 250.10 251.10 252.20 253.30 254.302 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.503 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.534 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.635 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.366 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.677 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.238 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.939 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.7110 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.5411 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.4012 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.3013 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.2114 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.1315 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.0716 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.0117 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.9618 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.9219 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.8820 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.8421 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.8122 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.7823 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.7624 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.7325 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.7126 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1.75 1.6927 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.6728 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.6529 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.6430 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.6240 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.5160 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39

120 1.83 1.75 1.66 1.61 1.55 1.50 1.43 1.35 1.25∞∞∞∞ 1.75 1.67 1.57 1.52 1.46 1.39 1.32 1.22 1.00

Probabilityof a Defect

Example = .00289

Chi-Square DistributionChi-Square Distributiondf .995 .990 .975 .950 .900 .750 .5001 .000039 .000160 .000980 .003930 .015800 .101500 .4550002 0.010 0.020 0.051 0.103 0.211 0.575 1.3863 0.072 0.115 0.216 0.352 0.584 1.213 2.3664 0.207 0.297 0.484 0.711 1.064 1.923 3.3575 0.412 0.554 0.831 1.145 1.610 2.675 4.351

6 0.676 0.872 1.237 1.635 2.204 3.455 5.3487 0.989 1.239 1.690 2.167 2.833 4.255 6.3468 1.344 1.646 2.180 2.733 3.490 5.071 7.3449 1.735 2.088 2.700 3.325 4.168 5.899 8.34310 2.156 2.558 3.247 3.940 4.865 6.737 9.342

11 2.603 3.053 3.816 4.575 5.578 7.584 10.34112 3.074 3.571 4.404 5.226 6.304 8.438 11.34013 3.565 4.107 5.009 5.892 7.042 9.299 12.34014 4.075 4.660 5.629 6.571 7.790 10.165 13.33915 4.601 5.229 6.262 7.261 8.547 11.036 14.339

16 5.142 5.812 6.908 7.962 9.312 11.912 15.33817 5.697 6.408 7.564 8.672 10.085 12.792 16.33818 6.265 7.015 8.231 9.390 10.865 13.675 17.33819 6.844 7.633 8.907 10.117 11.651 14.562 18.33820 7.434 8.260 9.591 10.851 12.443 15.452 19.337

21 8.034 8.897 10.283 11.591 13.240 16.344 20.33722 8.643 9.542 10.982 12.338 14.041 17.240 21.33723 9.260 10.196 11.688 13.091 14.848 18.137 22.33724 9.886 10.856 12.401 13.848 15.659 19.037 23.33725 10.520 11.524 13.120 14.611 16.473 19.939 24.337

26 11.160 12.198 13.844 15.379 17.292 20.843 25.33627 11.808 12.879 14.573 16.151 18.114 21.749 26.33628 12.461 13.565 15.308 16.928 18.939 22.657 27.33629 13.121 14.256 16.047 17.708 19.768 23.567 28.33630 13.787 14.953 16.791 18.493 20.599 24.478 29.336

7 Basic QC Tools - Ishikawa7 Basic QC Tools - IshikawaThe seven basic QC tools are the simplest, quickest tools forstructured problem solving. In many cases these tools will definethe appropriate area in which to focus to solve quality problems.They are an integral part of the Six Sigma DMAIC process toolkit.

65 6

• Brainstorming: Allows generation of a high volume of ideas quickly.Generally used integrally with the advocacy team when identifying thepotential X’s.

• Pareto: Helps to define the potential vital few X’s. The pareto linksdata to problem causes and aids in making data based decisions (Page23).

• Histogram: Displays frequency of occurrence of various categories inchart form, can be used as first cut at mean, variation, distribution ofdata. An important part of process data analysis. (Page 18).

• Cause & Effect / Fishbone Diagram: Helps identify potential problemcauses and focus brainstorming. (Page 23).

• Flowcharting / Process Mapping: Displays actual steps of process.Provides basis for examining potential areas of improvement.

• Scatter Charts: Shows relationship between two variables.(Page 18).

• Check Sheets: Capture data in a format that facilitates interpretation.

Pareto Chart

Fishbone (Ishikawa)Diagram

αααα

Chi-Square DistributionChi-Square Distributiondf .250 .100 .050 .025 .010 .005 .0011 1.323 2.706 3.841 5.024 6.635 7.879 10.8282 2.773 4.605 5.991 7.378 9.210 10.597 13.8163 4.108 6.251 7.815 9.348 11.345 12.838 16.2664 5.385 7.779 9.488 11.143 13.277 14.860 18.4675 6.626 9.236 11.070 12.832 15.086 16.750 20.515

6 7.841 10.645 12.592 14.449 16.812 18.548 22.4587 9.037 12.017 14.067 16.013 18.475 20.278 24.3228 10.219 13.362 15.507 17.535 20.090 21.955 26.1259 11.389 14.684 16.919 19.023 21.666 23.589 27.87710 12.549 15.987 18.307 20.483 23.209 25.188 29.588

11 13.701 17.275 19.675 21.920 24.725 26.757 31.26412 14.845 18.549 21.026 23.337 26.217 28.300 32.90913 15.984 19.812 22.362 24.736 27.688 29.819 34.52814 17.117 21.064 23.685 26.119 29.141 31.319 36.12315 18.245 22.307 24.996 27.488 30.578 32.801 37.697

16 19.369 23.542 26.296 28.845 32.000 34.267 39.25217 20.489 24.769 27.587 30.191 33.409 35.718 40.79018 21.605 25.989 28.869 31.526 34.805 37.156 43.31219 22.718 27.204 30.144 32.852 36.191 38.582 43.82020 23.828 28.412 31.410 34.170 37.566 39.997 45.315

21 24.935 29.615 32.671 35.479 38.932 41.401 46.79722 26.039 30.813 33.924 36.781 40.289 42.796 48.26823 27.141 32.007 35.172 38.076 41.638 44.181 49.72824 28.241 33.196 36.415 39.364 42.980 45.558 51.17925 29.339 34.382 37.652 40.646 44.314 46.928 52.620

26 30.434 35.563 38.885 41.923 45.642 48.290 54.05227 31.528 36.741 40.113 43.194 46.963 49.645 55.47628 32.620 37.916 41.337 44.461 48.278 50.993 56.89229 33.711 39.087 42.557 45.722 49.588 52.336 58.30230 34.800 40.256 43.773 46.979 50.892 53.672 59.703

665

ααααPractical Problem StatementPractical Problem StatementA major cause of futile attempts to solve a problem is poor, up front statement of theproblem. Define the problem using available facts, and planned improvement.

1. Write an initial “as is” problem statement condition. This statement describes theproblem as it exists now. It is a statement of what “hurts” or what “bugs” you. Thestatement should contain data based measures of the hurt. For example:

As Is: “The response time for 15% of our service calls is more than 24 hours.

2. Be sure the problem statement meets the following criteria:•Is as specific as possible•Contains no potential causes•Contains no conclusions or potential solutions•Is sufficiently narrow in scope

The most common mistake in developing a Problem Statement is the problem isstated at too high a level or is too broad for effective investigation. Use theStructure Tree (Page 25), Pareto (Page 25) or Rolled Throughput Yield analysis(Page 14) to break the problem down further.

3. Avoid the following in wording problem statements:

4. Determine if you have identified the correct level to address the problem.

Ask: “Is my “Y” response variable (Output) defined at a level at which it canbe solved by direct interaction with it’s independent variables (X’s) Inputs?

5. Determine if correcting the “Y” response variable will result in the desiredimprovement in the problem as stated.

6. Describe the “desired state”, a description of what you want to achieve by solvingthe problem, as objectively as possible. As with the “as is” statement, be sure the“desired state” is in measurable observable terms. For example:

Desired State: “The response time for all our service calls is less than 24 hours.”

Avoid Ineffective ProblemStatement

Effective Problem Statement

Questions “How can we reduce thedowntime on theAssembly Line.”

“Assembly Line downtimecurrently runs 15% of operatinghours. “

The word“lack”

“We lack word processingsoftware”

“Material to be typed isbacklogged by five days.”

Solutionmasqueradingas a problem

“We need to hire anotherwarehouse shippingclerk.”

“50% of the scheduled day’sshipments are not being pulledon time.”

Blamingpeople insteadof processes

“File Clerks aren’t doingtheir jobs.”

“Files cannot be located withinthe allowed 5 minutes afterrequested.”

67

Defining a Six Sigma ProjectDefining a Six Sigma Project

A well defined problem is the first stepin a successful project!

4

Normal DistributionNormal Distribution

Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 5.00E-01 4.96E-01 4.92E-01 4.88E-01 4.84E-01 4.80E-01 4.76E-01 4.72E-01 4.68E-01 4.64E-010.1 4.60E-01 4.56E-01 4.52E-01 4.48E-01 4.44E-01 4.40E-01 4.36E-01 4.33E-01 4.29E-01 4.25E-010.2 4.21E-01 4.17E-01 4.13E-01 4.09E-01 4.05E-01 4.01E-01 3.97E-01 3.94E-01 3.90E-01 3.86E-010.3 3.82E-01 3.78E-01 3.75E-01 3.71E-01 3.67E-01 3.63E-01 3.59E-01 3.56E-01 3.52E-01 3.48E-010.4 3.45E-01 3.41E-01 3.37E-01 3.34E-01 3.30E-01 3.26E-01 3.23E-01 3.19E-01 3.16E-01 3.12E-010.5 3.09E-01 3.05E-01 3.02E-01 2.98E-01 2.95E-01 2.91E-01 2.88E-01 2.84E-01 2.81E-01 2.78E-010.6 2.74E-01 2.71E-01 2.68E-01 2.64E-01 2.61E-01 2.58E-01 2.55E-01 2.51E-01 2.48E-01 2.45E-010.7 2.42E-01 2.39E-01 2.36E-01 2.33E-01 2.30E-01 2.27E-01 2.24E-01 2.21E-01 2.18E-01 2.15E-010.8 2.12E-01 2.09E-01 2.06E-01 2.03E-01 2.01E-01 1.98E-01 1.95E-01 1.92E-01 1.89E-01 1.87E-010.9 1.84E-01 1.81E-01 1.79E-01 1.76E-01 1.74E-01 1.71E-01 1.69E-01 1.66E-01 1.64E-01 1.61E-01

1.0 1.59E-01 1.56E-01 1.5 39E01 1.52E-01 1.49E-01 1.47E-01 1.45E-01 1.42E-01 1.40E-01 1.38E-011.1 1.36E-01 1.34E-01 1.31E-01 1.29E-01 1.27E-01 1.25E-01 1.23E-01 1.21E-01 1.19E-01 1.17E-011.2 1.15E-01 1.13E-01 1.11E-01 1.09E-01 1.08E-01 1.06E-01 1.04E-01 1.02E-01 1.00E-01 9.85E-021.3 9.68E-02 9.51E-02 9.34E-02 9.18E-02 9.01E-02 8.85E-02 8.69E-02 8.53E-02 8.38E-02 8.23E-021.4 8.08E-02 7.93E-02 7.78E-02 7.64E-02 7.49E-02 7.35E-02 7.21E-02 7.08E-02 6.94E-02 6.81E-021.5 6.68E-02 6.55E-02 6.43E-02 6.30E-02 6.18E-02 6.06E-02 5.94E-02 5.82E-02 5.71E-02 5.59E-021.6 5.48E-02 5.37E-02 5.26E-02 5.16E-02 5.05E-02 4.95E-02 4.85E-02 4.75E-02 4.65E-02 4.55E-021.7 4.46E-02 4.36E-02 4.27E-02 4.18E-02 4.09E-02 4.01E-02 3.92E-02 3.84E-02 3.75E-02 3.67E-021.8 3.59E-02 3.52E-02 3.44E-02 3.36E-02 3.29E-02 3.22E-02 3.14E-02 3.07E-02 3.01E-02 2.94E-021.9 2.87E-02 2.81E-02 2.74E-02 2.68E-02 2.62E-02 2.56E-02 2.50E-02 2.44E-02 2.39E-02 2.33E-02

2.0 2.28E-02 2.22E-02 2.17E-02 2.12E-02 2.07E-02 2.02E-02 1.97E-02 1.92E-02 1.88E-02 1.83E-022.1 1.79E-02 1.74E-02 1.70E-02 1.66E-02 1.62E-02 1.58E-02 1.54E-02 1.50E-02 1.46E-02 1.43E-022.2 1.39E-02 1.36E-02 1.32E-02 1.29E-02 1.26E-02 1.22E-02 1.19E-02 1.16E-02 1.13E-02 1.10E-022.3 1.07E-02 1.04E-02 1.02E-02 9.90E-03 9.64E-03 9.39E-03 9.14E-03 8.89E-03 8.66E-03 8.42E-032.4 8.20E-03 7.98E-03 7.76E-03 7.55E-03 7.34E-03 7.14E-03 6.95E-03 6.76E-03 6.57E-03 6.39E-032.5 6.21E-03 6.04E-03 5.87E-03 5.70E-03 5.54E-03 5.39E-03 5.23E-03 5.09E-03 4.94E-03 4.80E-032.6 4.66E-03 4.53E-03 4.40E-03 4.27E-03 4.15E-03 4.02E-03 3.91E-03 3.79E-03 3.68E-03 3.57E-032.7 3.47E-03 3.36E-03 3.26E-03 3.17E-03 3.07E-03 2.98E-03 2.89E-03 2.80E-03 2.72E-03 2.64E-032.8 2.56E-03 2.48E-03 2.40E-03 2.33E-03 2.26E-03 2.19E-03 2.12E-03 2.05E-03 1.99E-03 1.93E-032.9 1.87E-03 1.81E-03 1.75E-03 1.70E-03 1.64E-03 1.59E-03 1.54E-03 1.49E-03 1.44E-03 1.40E-03



ZIssue is clearly defined to the lowest level of causeand effect. The project should have a ‘responsevariable’ (Y) with specifications and constraints (i.e.cycle time for returned parts, washer base width). Itshould be bound by clearly defined goals. If it looksbig, it is. A poorly defined project will require greaterscoping time and will have a longer completion timethan one that is clearly defined.

SPECIFIC

Financially justifiable - directly impacts a businessmetric that returns value: PPM, reliability, yield,pricing errors, field returns, factory yield, overtime,transportation, warehousing, availability, SCR,rework, under billing and scrap.

VALUE-ADDED

The ‘response variable’ (Y) must have reasonablehistorical DATA, or you must have the ability tocapture a reliable data stream.

Having a method for measuring vital X’s is alsoessential for in-depth process analysis with data.Discreet data can be effectively used for probleminvestigation, but ‘variable’ (continuous) data isbetter. Projects based on unreliable data haveunreliable results.

MEASURABLE

The selected project should be one which can beaddressed by the accepted “local” organizationAdequate support is needed to ensure successfulproject completion and permanent change to theprocess. It is difficult to “manage improvements inLouisville from the field”

LOCALLYACTIONABLE

CUSTOMERFOCUSED

The Project Y should be clearly linked to a specificcustomer want or need - can result in improvedcustomer perception or consumer satisfaction(Customer WOW): on time delivery, billing accuracy,call answer rate.

3 68

Normal DistributionNormal Distribution

Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.095.0 3.00E-07 2.85E-07 2.71E-07 2.58E-07 2.45E-07 2.32E-07 2.21E-07 2.10E-07 1.99E-07 1.89E-075.1 1.80E-07 1.71E-07 1.62E-07 1.54E-07 1.46E-07 1.39E-07 1.31E-07 1.25E-07 1.18E-07 1.12E-075.2 1.07E-07 1.01E-07 9.59E-08 9.10E-08 8.63E-08 8.18E-08 7.76E-08 7.36E-08 6.98E-08 6.62E-085.3 6.27E-08 5.95E-08 5.64E-08 5.34E-08 5.06E-08 4.80E-08 4.55E-08 4.31E-08 4.08E-08 3.87E-085.4 3.66E-08 3.47E-08 3.29E-08 3.11E-08 2.95E-08 2.79E-08 2.64E-08 2.50E-08 2.37E-08 2.24E-085.5 2.12E-08 2.01E-08 1.90E-08 1.80E-08 1.70E-08 1.61E-08 1.53E-08 1.44E-08 1.37E-08 1.29E-085.6 1.22E-08 1.16E-08 1.09E-08 1.03E-08 9.78E-09 9.24E-09 8.74E-09 8.26E-09 7.81E-09 7.39E-095.7 6.98E-09 6.60E-09 6.24E-09 5.89E-09 5.57E-09 5.26E-09 4.97E-09 4.70E-09 4.44E-09 4.19E-095.8 3.96E-09 3.74E-09 3.53E-09 3.34E-09 3.15E-09 2.97E-09 2.81E-09 2.65E-09 2.50E-09 2.36E-095.9 2.23E-09 2.11E-09 1.99E-09 1.88E-09 1.77E-09 1.67E-09 1.58E-09 1.49E-09 1.40E-09 1.32E-09



8.0 4.05E-15 3.79E-15 3.54E-15 3.31E-15 3.10E-15 2.90E-15 2.72E-15 2.54E-15 2.38E-15 2.22E-158.1 2.08E-15 1.95E-15 1.82E-15 1.70E-15 1.59E-15 1.49E-15 1.40E-15 1.31E-15 1.22E-15 1.14E-158.2 1.07E-15 9.99E-16 9.35E-16 8.74E-16 8.18E-16 7.65E-16 7.16E-16 6.69E-16 6.26E-16 5.86E-168 3 5.48E-16 5.12E-16 4.79E-16 4.48E-16 4.19E-16 3.92E-16 3.67E-16 3.43E-16 3.21E-16 3.00E-168.4 2.81E-16 2.62E-16 2.45E-16 2.30E-16 2.15E-16 2.01E-16 1.88E-16 1.76E-16 1.64E-16 1.54E-168.5 1.44E-16 1.34E-16 1.26E-16 1.17E-16 1.10E-16 1.03E-16 9.60E-17 8.98E-17 8.40E-17 7.85E-178.6 7.34E-17 6.87E-17 6.42E-17 6.00E-17 5.61E-17 5.25E-17 4.91E-17 4.59E-17 4.29E-17 4.01E-178.7 3.75E-17 3.51E-17 3.28E-17 3.07E-17 2.87E-17 2.68E-17 2.51E-17 2.35E-17 2.19E-17 2.05E-178.8 1.92E-17 1.79E-17 1.68E-17 1.57E-17 1.47E-17 1.37E-17 1.28E-17 1.20E-17 1.12E-17 1.05E-178.9 9.79E-18 9.16E-18 8.56E-18 8.00E-18 7.48E-18 7.00E-18 6.54E-18 6.12E-18 5.72E-18 5.35E-18


10.0 6.22E-21 5.82E-21 5.44E-21 5.09E-21 4.77E-21 4.46E-21 4.17E-21 3.91E-21 3.66E-21 3.42E-21

Six SigmaSix SigmaProblem Solving ProcessesProblem Solving Processes

Step Description Focus Tools DeliverablesDefine

A Identify Project CTQs Y VOC; Process Map;CAP

Project CTQs (1)

B Develop Team Charter Project CAP Approved Charter (2)C Define Process Map Y=f(x) Process Map High Level Process

Map (3)Measure

1 Select CTQCharacteristics

Y VOC; QFD;FMEA Project Y (4)

2 Define PerformanceStandards

Y VOC, Blueprints PerformanceStandard for Project Y(5)

3 Measurement SystemAnalysis

Y & X Continuous GageR&R; Test/Retest,Attribute R&R

Data Collection Plan& MSA (6), Data forProject Y (7)

Analyze4 Establish Process

CapabilityY Capability Indices Process Capability for

Project Y (8)5 Define Performance

ObjectivesY Team,

BenchmarkingImprovement Goal forProject Y (9)

6 Identify VariationSources

X Process Analysis,Graphical Analysis,Hypothesis Tests

Prioritized List of all Xs(10)

Improve7 Screen Potential

CausesX DOE-Screening List of Vital Few Xs

(11)8 Discover Variable

RelationshipsX Factorial Designs Proposed Solution

(13)9 Establish Operating

TolerancesX Simulation Piloted Solution (14)

Control10 Define & Validate

Measurement Systemon X’s in ActualApplication

X, Y Continuous GageR&R, Test/Retest,Attribute R&R

MSA

11 Determine NewProcess Capability

X, Y Capability Indices Process Capability Y,X

12 Implement ProcessControl

X Control Charts;Mistake Proofing;FMEA

Sustained Solution(15), ProjectDocumentation (16),

Six Sigma Toolkit - IndexSix Sigma Toolkit - Indext-Distributiont-Distribution1-αααα

df .600 .700 .800 .900 .950 .975 .990 .9951 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.6572 0.289 0.617 1.061 1.886 2.920 4.303 6.965 9.9253 0.277 0.584 0.978 1.638 2.353 3.182 4.541 5.8414 0.271 0.569 0.941 1.533 2.132 2.776 3.747 4.6045 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032

6 0.265 0.553 0.906 1.440 1.943 2.447 3.143 3.7077 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.4998 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.3559 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250

10 0.260 0.542 0.879 1.372 1.812 2.228 2.764 3.169

11 0.260 0.540 0.876 1.363 1.796 2.201 2.718 3.10612 0.259 0.539 0.873 1.356 1.782 2.179 2.681 3.05513 0.259 0.538 0.870 1.350 1.771 2.160 2.650 3.01214 0.258 0.537 0.868 1.345 1.761 2.145 2.624 2.97715 0.258 0.536 0.866 1.341 1.753 2.131 2.602 2.947

16 0.258 0.535 0.865 1.337 1.746 2.120 2.583 2.92117 0.257 0.534 0.863 1.333 1.740 2.110 2.567 2.89818 0.257 0.534 0.862 1.330 1.734 2.101 2.552 2.87819 0.257 0.533 0.861 1.328 1.729 2.093 2.539 2.86120 0.257 0.533 0.860 1.325 1.725 2.086 2.528 2.845

21 0.257 0.532 0.859 1.323 1.721 2.080 2.518 2.83122 0.256 0.532 0.858 1.321 1.717 2.074 2.508 2.81923 0.256 0.532 0.858 1.319 1.714 2.069 2.500 2.80724 0.256 0.531 0.857 1.318 1.711 2.064 2.492 2.79725 0.256 0.531 0.856 1.316 1.708 2.060 2.485 2.787

26 0.256 0.531 0.856 1.315 1.706 2.056 2.479 2.77927 0.256 0.531 0.855 1.314 1.703 2.052 2.473 2.77128 0.256 0.530 0.855 1.313 1.701 2.048 2.467 2.76329 0.256 0.530 0.854 1.311 1.699 2.045 2.462 2.75630 0.256 0.530 0.854 1.310 1.697 2.042 2.457 2.750

40 0.255 0.529 0.851 1.303 1.684 2.021 2.423 2.70460 0.254 0.527 0.848 1.296 1.671 2.000 2.390 2.660120 0.254 0.526 0.845 1.289 1.658 1.980 2.358 2.617∞ 0.253 0.524 0.842 1.282 1.645 1.960 2.326 2.576

69

IndexAnalysis and Improve Tools Selection Matrix· · · · · · · · · · · · · · · · · · · 26ANOVA

ANOVA / ANOVA One Way· · · · · · · · · · · · · · · · · · · · · · · · ·ANOVA Two Way · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·ANOVA - Balanced · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Interpreting the ANOVA Output· · · · · · · · · · · · · · · · · · · · · · ·

41424344

Calculating Sample Size (Equation for manual Calculation) · · · · · · · · · · 28Characterizing the Process - Rational Subgrouping · · · · · · · · · · · · · 16Control Chart Constants· · · · · · · · · · · · · · · · · · · · · · · · · · · · 59Control Charts· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 57-58Data Validity Studies/% Agreement on Binary (Pass / Fail) Data· · · · · · · 10Defining a Six Sigma Project · · · · · · · · · · · · · · · · · · · · · · · · · 4Definition of Z · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 8Design for Six Sigma

Loop Diagrams· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Tolerancing Analysis · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

5051-52

Discrete Data Analysis· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 35DOE

Design of Experiments· · · · · · · · · · · · · · · · · · · · · · · · · · ·Factorial Designs · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·DOE Analysis · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

535455

DPU / DPO · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 13Gage R & R · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 11 - 12General Linear Model · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 45Hypothesis Statements 30-31Hypothesis Testing · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 29Minitab Graphics

Histogram / Scatter Plot · · · · · · · · · · · · · · · · · · · · · · · · · ·Descriptive Statistics / Normal Plot · · · · · · · · · · · · · · · · · · · ·One Variable Regression / Residual Plots · · · · · · · · · · · · · · · · ·Boxplot / Interval Plot· · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Time Series Plot / Box-Cox Transformation · · · · · · · · · · · · · · · ·Pareto Diagrams / Cause & Effect Diagrams · · · · · · · · · · · · · · · ·

202122232425

Normal Approximation· · · · · · · · · · · · · · · · · · · · · · · · · · · ·χ2 Test (Test for Independence) · · · · · · · · · · · · · · · · · · · · ·Poisson Approximation · · · · · · · · · · · · · · · · · · · · · · · · · · ·

3637-38

39Normality of Data · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 17Planning Questions · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 19Practical Problem Statement · · · · · · · · · · · · · · · · · · · · · · · · · 5Precontrol · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 60Project Closure· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 61Regression Analysis

Regression · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Stepwise Regression · · · · · · · · · · · · · · · · · · · · · · · · · · ·Regression with Curves (Quadratic) and Interactions· · · · · · · · · · · ·Binary Logistic Regression· · · · · · · · · · · · · · · · · · · · · · · · ·

46474849

Response Surface - CCD · · · · · · · · · · · · · · · · · · · · · · · · · · · · 56Rolled Throughput Yield · · · · · · · · · · · · · · · · · · · · · · · · · · · · 14Sample Size Determination · · · · · · · · · · · · · · · · · · · · · · · · · · 27Seven Basic Tools · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 6Six Sigma Problem Solving Processes · · · · · · · · · · · · · · · · · · · · 3Six Sigma Process Report · · · · · · · · · · · · · · · · · · · · · · · · · · · 18Six Sigma Product Report · · · · · · · · · · · · · · · · · · · · · · · · · · · 15Stable Ops and 6 Sigma · · · · · · · · · · · · · · · · · · · · · · · · · · · 9t Test (Testing Means) (1 Sample t; 2 Sample t; Confidence Intervals)· · · · 33-34Tables · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

Determining Sample Size· · · · · · · · · · · · · · · · · · · · · · · · ·F Test · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·χ2 Test · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·Normal Distribution · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·t Test · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

6263-6465-6667-68

69Testing Equality of Variance (F test; Homogeneity of Variance) · · · · · · · · 32The Normal Curve · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 7The Transfer Function 40

The material in this Toolkit is a combination of materialdeveloped by the GEA Master Black Belts and Dr. Mikel Harry(The Six Sigma Academy, Inc.). Worksheets, statistical tablesand graphics are outputs of MINITAB for Windows Version12.2, Copyright 1998, Minitab, Inc. It is intended for use as aquick reference for trained Black Belts and Green Belts.

More detail information is available from the Quality CoachWebsite SSQC.ge.com.

If you need more GEA Six Sigma Information, visit the GEAppliances Six Sigma Website at

http://genet.appl.ge.com/sixsigma

For information on GE Corporate Certification Testing, go tothe Green Belt Training Site via the GE Appliances Six SigmaWebsite.

For information about other GE Appliances Six SigmaTraining, contact a member of the GEA Six Sigma TrainingTeam

• Jeff Keller - Ext 7649Email: [email protected]

• Irene Ligon - Ext 4562Email:[email protected]

• Broadcast Group eMail:[email protected]

The Toolkit - A Six Sigma ResourceThe Toolkit - A Six Sigma ResourceThe Toolkit - A Six Sigma Resource GLOSSARY OF SIX SIGMA TERMSGLOSSARY OF SIX SIGMA TERMS1. αααα - alpha risk - Probability of falsely accepting the alternative (HA) of

difference2. ANOVA - Analysis of Variance3. β β β β - Beta risk - Probability of falsely accepting the null hypothesis (H0 )

of no difference4. χχχχ2 - Tests for independent relationship between two discrete variables5. δδδδ - Difference between two means6. DOE - Design of Experiments7. DPU - Defects per unit8. e-DPU - Rolled throughput yield9. F- Test - Used to compare the variances of two distributions10. g - number of subgroups11. FIT - The point estimate of the mean response for each level of the

independent variable.12. H0 - Null hypothesis13. HA - Alternative hypothesis14. LSL - Lower spec limit15. µµµµ - Population mean16. - Sample mean17. n - number of samples in a subgroup18. N - Number in the total population19. P Value - If the calculated value of p is lower than the alpha (α) risk,

then reject the null hypothesis and conclude that there is a difference.Often referred to as the “observed level of significance”.

20. Residual - The difference between the observed values and the Fit,the error in the model

21. σ - Population standard deviation22. Σ - Summation23. - Sample standard deviation24. Stratify - Divide or arrange data in organized classes or segments,

based on known characteristics or factors.24. SS - Sum of squares25. t-Test - Used to compare the means of two distributions26. Transfer Function - Prediction Equation - Y=f(x)27. USL - Upper spec limit28. - mean29. - mean of the means30. Z - Transforms a set of data such that µ=0 and σ=131. ZLT - Z Long term32. ZST - Z short term33. ZSHIFT - ZST - ZLT

�µ

)s( σ̂

XX

Revision 4.5 - September 2001GE Appliances Copyright 2001

GEGE AppliancesAppliances

Six Sigma ToolkitSix Sigma ToolkitSix Sigma Toolkit

gggggggg

Rev 4.5 9/2001 GE Appliances Proprietary

Discrete Data Analysis

Documents

Transcript of Discrete Data Analysis