Proportion Testing October 2, 2009 Statistics Symposium.
-
Upload
emma-smith -
Category
Documents
-
view
217 -
download
3
Transcript of Proportion Testing October 2, 2009 Statistics Symposium.
Proportion Testing
October 2, 2009
Statistics Symposium
2
Outline1. What we are covering and what we are not covering today
2. Virtual Scavenger Hunt
3. Statistical Decisions and Risk
4. Six Sigma DMAIC application
5. The Business Approach
6. Hypothesis Test Approach
7. Understanding Distributions
8. Sample Size
9. Test of Independence
10. Example 1: Regulatory Compliance Documentation
11. Example 2: Workload Balance (Productivity)
12. References and Web Sites
13. Q&A
Hypothesis Tests: What we are covering?Continuous Data
Attribute Data
1 sample t-test : Δ mean from known test mean
2 sample t-test: Δ mean between 2 independent sample means
Paired t-test: Δ mean between 2 dependent sample means
One Way ANOVA :At least 1 sample mean Δ
between 3 or more samples
Kruskal Wallis & Mood’s Median:At least 1 sample median Δ between 3 or more samples
F-test, Levene’s test, & Bartlett’s test: At least 1 sample standard deviation Δ
between 3 or more samples
Correlation/Regression/DOE: 2 or more factors are correlated/
Predictor affects the sampled process
1 proportion test: A sample proportion Δ against a known value
2 proportion test: Proportions from the two samples are different
Chi Square test: At least one sample proportion Δ from others:
4
Scavenger HuntFind another person who can sign off on these statements. Each person can only sign once.
1. Has used Chi-Square or Proportion Test
2. Has more than $50 on them
3. Used Minitab to determine sample size
4. Worked on a project with a value proposition >$1 million
5. Knows Chris Connors' middle name
6. Has more than three children
7. Has met a movie star or celebrity (and was not arrested)
8. Knows the difference between a confidence interval and a confidence level
9. Knows what a quark or a fantod is
10. Has more than one academic degree, license or certification
5
Statistical Decision: Setting up your risk levelType I and II errors
There are two kinds of errors that can be made in significance testing:
(1) a true null hypothesis can be incorrectly rejected and
(2) a false null hypothesis can fail to be rejected. The former error is called a Type I error and the latter error is called a Type II error. These two types of errors are defined in the table.
The probability of a Type I error is designated by the Greek letter alpha () and is called the Type I error rate; the probability of a Type II error (the Type II error rate) is designated by the Greek letter beta (ß). A Type II error is only an error in the sense that an opportunity to reject the null hypothesis correctly was lost. It is not an error in the sense that an incorrect conclusion was drawn since no conclusion is drawn when the null hypothesis is not rejected.
H0 True H0 False
Reject H0
Type I error Correct
Do not Reject H0 CorrectType II error
Statistical Decision
True State of the Null Hypothesis
6
Six Sigma DMAIC method: Hypothesis Tests
Six Sigma DMAIC method has 5 phases:
1. Define Opportunity/Problem
2. Measure Performance
3. Analyze Process and Performance
4. Improve Process and Performance
5. Control Process and Performance
I typically use this diagram to depict the continuous focus of measurement in the Six Sigma method by placing Measure in the center of the DMAIC method.
Measure
Define
Control
Improve
Analyze
7
6S Black Belt Level of Cognition for Hypothesis Testing
Topic Level of Cognition My Development
Introduction to Statistical Comparisons
2*
Normality and Transformation 2
Correlation Analysis 3
Regression Analysis 3
Introduction to Multiple Linear
Regression
1
t-tests 3
ANOVA 3
1 and 2 proportion test 3
Chi-Square Analysis 3
Binary Logistic Regression 1
*1= learned, 2= know, 3 = used, 4 = taught
8
6S BB Level of Cognition for Hypothesis Testing
Topic Level of Cognition My Development
Introduction Experimental Design
2*
Background on Experimental
design
2
DOE Designs and terminology 2
Full Factorial design 2
Half factorial designs 2
Robust Designs 2
Checklists for designing and conducting DOE
BBC exercise: DOE Simulation
2
Results of DOE Simulation 2
*1= learned, 2= know, 3 = used, 4 = taught
9
ProportionTests
When we want to make a statistical comparison of a discrete variable with a target, or between two discrete variables, Proportion Tests should be used.
StatisticalProblem
StatisticalProblemStatisticalProblem
StatisticalProblem
BusinessProblem
BusinessProblem
BusinessSolution
BusinessSolution
StatisticalSolution
StatisticalSolution
Potential Root
Causes Identified
Root Causes Verified
The Business Approach
10
Dis
cre
teD
iscr
ete
DiscreteDiscrete ContinuousContinuous
ProportionTests
ProportionTests
Logistic Regression
Logistic Regression
t testANOVA
DOE
t testANOVA
DOE
CorrelationRegression
CorrelationRegression
X
YC
on
tinuo
us
Co
ntin
uou
s
Selecting the Right Statistical Tool
11
Determine if a statistically significant difference of proportion exists between:
- A sample and a target- Two independent samples- Two samples or less
Tests of Proportion
1 ProportionTest
1 Sample
Comparing Proportions
2 ProportionTest
Chi-SquareTest
More Than 2 Samples2 Samples
Use samples to make inferences about population proportions
12
Proportion Test Approach1. State the null and alternative hypotheses
Null H0 P1 = P2 Number of tails = 2
P1 - P2 0 Number of tails = 1
P1 - P2 0 Number of tails = 1
Alternatives Ha P1 - P2 0 P1 P2 Number of tails = 1, left or right
P1 - P2 0
P1 - P2 0
2. Formulate an analysis plan: 1 Proportion to known value (z) or 2 Proportions test
3. Analyze sample dataa. Independence Test: Fisher’s, Barnard’s, G-Testb. Pooled sample proportion to compute standard errorc. P value for test statistic
4. Interpret results: for a statistical decision (hopefully a business decision, not not always)
If P is low, H0 must be no go
13
One Tail or Two Tails: Placing the Alpha Risk
14
Useful Discrete DistributionsBinomial distribution for:
The number X of successes (or failures!) in n trials when p is the chance of success (or failure!) or each trial.
Examples:
• number X of faulty expense reports out of n=100 submitted in a particular month, when the faulty expense report rate typically runs at p=0.03 (i.e., 3%)
• number of voters out of a random sample of n=800 expressing approval of the President’s performance, when the approval rating in the entire population of voters is p=0.42 (i.e., 42%)
X is discrete: it must be one of 0, 1, 2, … , n
15
Useful fact: has approximately a normal distribution when n is large
(more than 25 or 30) and np and n(1-p) are not too small (say >5).
Binomial - key facts
A p p lic a t io n s : a t t r ib u te d a ta :
n
pppnp
pnp
pnXp
pX
pX
)1( and )1(
and
/ˆ
ˆ
ˆ
:ssd'
:smean'
estimates
p̂
16
Binomial - Normal Approximation
X w o u l d b e a p p r o x i m a t e l y n o r m a l , w i t h m e a n a n d s d g i v e n b y :
6717.497.003.0750)1(
5.2203.0750
pnp
np
X
X
A p p r o x i m a t e a r e a u n d e r c u r v e l e f t o f X = 1 2 c a n b e f o u n d f r o m t h e z - s c o r e :
248.26717.4
5.2212
z
F r o m t a b l e s ( o r M i n i t a b ) , t h e a r e a u n d e r t h e n o r m a l c u r v e t o t h e l e f t o f - 2 . 2 4 8 i s 0 . 0 1 2 3 . T h e e x a c t p r o b a b l y t h a t X 1 2 c a n a l s o b e f o u n d i n M i n i t a b t o b e 0 . 0 1 0 9 , s o t h e n o r m a l a p p r o x i m a t i o n g i v e s a v e r y c l o s e a n s w e r . W h a t a r e t h e c h a n c e s t h e s i t e w o u l d b e s o l o w i n “ d e f e c t s ” i f i t w e r e f o l l o w i n g t h e d e f e c t r a t e ? P r e t t y l o w ! W h a t w o u l d y o u c o n c l u d e ?
17
Histogram: n=20
20
Frequency
115110105100959085
7
6
5
4
3
2
1
0
Mean 101.1StDev 7.423N 20
Histogram of 20Normal
18
Histogram: n=100
100
Frequency
120.0112.5105.097.590.082.5
20
15
10
5
0
Mean 100.6StDev 6.913N 100
Histogram of 100Normal
19
Sample Size
General Guidelines (if not followed, test may not run):
• Each Sample includes at least 10 failures and 10 successes (some texts say 5)
• The sample is from a population 10 x the sample
• Use Minitab sample size calculator
• Use TI 83 or TI 84 Graphing Calculator (see web)
20
Hypothesis testing - terms
Null hypothesis (H0) – e.g., µ1 = µ2 - this is the hypothesis to be tested and should be in the form of a true/false statement . This hypothesis states that there is NO DIFFERENCE between the data sets or samples or populations. Null hypotheses are never accepted – we either reject them or fail to reject them. The null hypothesis has PRIORITY and should not be rejected unless there is strong statistical evidence to do so.
Alternate hypothesis (H1, HA) – e.g., µ1 ≠ µ2 - the alternative to the null hypothesis – states that there IS A DIFFERENCE between the data sets or populations.
Type 1 error – rejecting the null hypothesis when it is really true – e.g., “convicting the innocent”
Type 2 error – failing to reject the null hypothesis when it really is false – e.g., “letting the guilty go free”
Level (or size) of a test = Alpha (α) – is the probability of a type 1 error – default = 5%
Beta (β) – is the probability of a type 2 error – default = 10%
Power of a test or power – is the probability of correctly rejecting a false null hypothesis. Since β is the probability of a type I error, power is calculated by the formula (1 - β). Power = (1 - β) when the null hypothesis is false. The default value for power is 90%This means that you have an 90% chance of finding a difference when you really want to find it.
Critical region (rejection region) – set of values of the test statistic that cause the null hypothesis to be rejected. If the test statistic falls into the rejection region, the null hypothesis is rejected.
21
Hypothesis testing steps• State the null hypothesis H0 and the alternate hypothesis HA (e.g., the
mean incomes of college graduates does not equal that of other people)
Choose the level of significance, alpha (α default = 0.05) and the sample size (default n = 25)
Choose the appropriate statistical techniques (t test, Chi-square, etc.,) and test statistic (e.g., mean)
Collect the data and calculate the sample value of the test statistic
Calculate the p value based on the test statistic and compare it with alpha (α = 0.05)
Make a statistical decision – if p is greater than or equal to alpha, fail to reject the null hypothesis. If the p value is less than alpha, reject the null hypothesis.
22
Hypothesis tests are either one tailed or two tail tests
Fail to Reject H0Reject H0
1% or 5%
significance level
Fail to Reject H0Reject H0
On
e t
ail
te
st
- A
ns
we
rs o
nly
O
NE
qu
es
tio
n -
is
th
e t
es
t s
tati
sti
c
les
s t
ha
n o
r g
rea
ter
tha
n t
he
k
no
wn
dis
trib
uti
on
Fail to Reject H0 Reject H0Reject H0
Tw
o t
ail
ed
te
st
– O
nly
as
ks
if
the
te
st
sta
tis
tic
is
dif
fere
nt
fro
m t
he
kn
ow
n d
istr
ibu
tio
n –
H
A u
su
all
y h
as
“n
ot
eq
ua
l to
” in
th
e w
ord
ing
2.5% significance level 2.5% significance level
23
Clinical Testing One-tailed example by hand
The “Feel Good” Drug company has discovered a new drug which prevents acne. Since the market for skin care products is larger for woman than men, the company would like to be able to show a treatment advantage for women vs men. The company statistician chooses a simple random sample of 110 women and 207 men from a population of 100,000 healthy volunteers. After 6 months, 48% of women had no acne, vs 61% of men. Can the company claim a benefit for women vs men at the 0.01 level of significance?
1)What are the hypotheses?
2)Calculate the pooled sample proportion and the Standard Error and consult the z-score statistic
3)What do the results tell us?
24
Clinical Testing One-tailed example by hand1) What are the hypotheses?
Ho - P1 = P2 Ha – P1 < > P2
The null hypothesis will be rejected if the proportion of women developing acne (p1) is substantially smaller than the proportion of men developing acne (p2)
2) Calculate the pooled sample proportion and the Standard Error and consult the z-score statistic:
P = (p1 * n1 + p2 * n2)/(n1 + n2) = [(0.48 *110) + (0.61 * 207)]/(110 + 207) = 52.8 + 126.3 / 317= 0.564
SE = sqrt { p * (1 - p) * [(1/n1) + (1/n2)]}= [ 0.564 * 0.436 * (1/110 + 1/207)
= sqrt 0.245 * (0.009 + 0.005) = 0.058
Z = (p1 - p2)/SE = (0.48 - 0.61) / 0.058 = -2.24Since this is a one tailed test, the P value is the probability that the z-score is less than -2.24. The Normal distribution calculator for P (z < -2.24) = 0.013 P value = 0.013. Since 0.013 is greater than the chosen significance level (0.01), WE FAIL TO REJECT THE NULL HYPOTHESIS – THERE IS NO STATISTICAL
DIFFERENCE BETWEEN THE POPULATIONS
25
Test of IndependenceFisher’s Exact Test is most commonly used for 2 x 2 tables to determine if there is a nonrandom relationship between two categorical variables. Fisher’s calculates conditional probability for
the observed row and column matrix.
Fisher’s exact test in Minitab:
Trials Events %200 120 60.0%300 210 70.0%
countsadverse drug
120 y old
80 n old210 y new90 n new
Rows: adverse Columns: drug
new old All
n 90 80 170
y 210 120 330
All 300 200 500
Cell Contents: Count
Fisher's exact test: P-Value = 0.0265193
26
Regulatory Compliance Documentation Sample Size: Minitab
27
1-Proportion Test
StatisticalProblem
StatisticalProblemStatisticalProblem
StatisticalProblem
BusinessProblem
BusinessProblem
BusinessSolution
BusinessSolution
StatisticalSolution
StatisticalSolution
Potential Root
Causes Identified
Root Causes Verified
The Business Approach
28
Regulatory Compliance Documentation ExampleA Black Belt is studying the company’s ability to get regulatory compliance documentation to the record center with in 5 days from project completion.
What is the binomial characteristic?
A random sample of 130 project documentation records showed that 74 of them met the 5 day deadline.
The business was heard saying “at least we’re over the half way mark!”
Test the hypothesis at 95% confidence that more than 50% of engagements met the deadline.
What is the Null Hypothesis?
29
Regulatory Compliance Documentation Example - Hypothesis
Ho : The proportion of compliance documentation filed at
the record center on time is 50% (interim target value).
Ha : The proportion of external work papers filed at the
record center on time is greater than 50%.
Note: Typically the alternative is stated as “there is a difference.”
Why does this example state “greater than?”
30
Compliance Documentation Example – Minitab CommandsTool Bar Menu > Stat > Basic Statistics > 1 Proportion Analysis
target
31
Compliance Documentation Example – Minitab Results
What’s our interpretation?
Test and CI for One Proportion
Test of p = 0.5 vs p > 0.5
95% Lower ExactSample X N Sample p Bound P-Value
1 74 130 0.569231 0.493309 0.068
32
Regulatory Compliance Documentation Sample SizePower and Sample Size
Test for Two Proportions
Testing proportion 1 = proportion 2 (versus <)
Calculating power for proportion 2 = 0.7
Alpha = 0.05
Sample Target
Proportion 1 Size Power Actual Power
0.6 388 0.9 0.900148
0.6 281 0.8 0.800923
The sample size is for each group.
Is the sample size a concern?
33
2-Proportion Test
StatisticalProblem
StatisticalProblemStatisticalProblem
StatisticalProblem
BusinessProblem
BusinessProblem
BusinessSolution
BusinessSolution
StatisticalSolution
StatisticalSolution
Potential Root
Causes Identified
Root Causes Verified
The Business Approach
34
Analysis of Proportions for Workload BalanceJack Lairdieson, MBB, Vanguard
Pro
port
ion
SPAWestSoutheastNortheastNYCentralTotal
0.54
0.52
0.50
0.48
0.46
0.44
0.42
0.40
August WLB In-Range Proportions with 95% Confidence Bands
Interpret as an Interval Plot for Multiple Proportions
Total Region 5 Region 6 Region 3 Region 1 Region 3 Region 2
35
Workload Balance Example
The Workload Balance (WLB) metrics were being discussed at a regional meeting. The Region 1 representative scoffed at the Region 2 representative that the Region 2’s “In-range” WLB performance metrics were at the “bottom of the barrel”. The Region 2 representative quickly responded, “Really, Region 1 is no better than Region 2.”Once back to the office the concerned Region 1 representative gave the following Workload Balance data to a Black Belt.WLB Stats In-Range Staff
Region 1 663 1411Region 2 141 353
Should Region 1 be concerned about his conclusion? What is the null hypothesis?
36
Workload Balance Example - Hypothesis
Ho : The proportion of Region 1 “In-Range” staff is
equal to the proportion of Region 2 “In-Range” staff.
Ha : The proportion of Region 1 “In-Range” staff is not
equal to the proportion of Region 2 “In-Range” staff.
or
Ha : The proportion of Region 1 “In-Range” staff is
greater than the proportion of Region 2 “In-Range” staff.
37
Workload Balance Example – Minitab Commands
Tool Bar Menu > Stat > Basic Statistics > 2 Proportion
Analysis through MINITAB™
38
Workload Balance Example – Minitab Results
Session Window Output
What’s our interpretation? What Hypothesis did we choose to test?
Is the sample size a concern?
Test and CI for Two Proportions
Sample X N Sample p 1 663 1411 0.469880 2 141 353 0.399433
Difference = p (1) - p (2)Estimate for difference: 0.070446195% lower bound for difference: 0.0223190Test for difference = 0 (vs > 0): Z = 2.41 P-Value = 0.008
39
Sample Size: MinitabTesting proportion 1 = proportion 2 (versus >)
Calculating power for proportion 2 = 0.399
Alpha = 0.05
Sample Target
Proportion 1 Size Power Actual Power
0.469 857 0.9 0.900072
0.469 619 0.8 0.800094
The sample size is for each group.
40
ReferencesFisher RA (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh
Barnard GA (1945). A new test for 2 x 2 tables. Nature 156:177
Chan I (1998) Exact tests of equivalence and efficacy with non-zero lower bound for comparative studies. Statistics in Medicine 17, 1403-1413
Mehta CR and Senchaudhuri P (2003). Conditional versus unconditional tests for comparing two binomials. Cytel Software.
Web Sites:
http://www.minitab.com/support/documentation/answers/
SampleSize2p.pdf
www.statsoft.com/textbook/stathome
http://sofia.fhda.edu/gallery/statistics/lessons/lesson10-2
41
Six Sigma LinksSix SigmaMotorola, Inc. - Motorola UniversitySix Sigma - What is Six Sigma?i Six Sigma - Six Sigma Quality Resources for Achieving Six Sigma ResultsGeneral Electric : Our Company : What is Six Sigma?QualityAmerican Society for Quality - ASQTQM Virtual CoursePackSPC Press - HomeStatisticshttp://www.statsoft.com/textbook/stathome.htmlPenn State Statistical Education Resource Kit--Overview of Statistics DataStatistics Video CourseThe Sofia Open Content Initiative - Elementary StatisticsResource: Learning Math: Data Analysis, Statistics, and ProbabilityLean Six SigmaKaizen and Lean Manufacturing Consulting: Gemba Research - | Kaizen ProductsConquering Complexity, Fast Innovation, Lean Six Sigma Quality. George Group Consulting Six Sigma Training BookLEAN.org - Lean Enterprise Institute| Lean Production| Lean Manufacturing| LEI| Lean Services| Lean Enterprise Training Course| Lean Consumption| Lean Resources| Lean Experts| Lean Healthcare| Lean in Healthcare| Training on Lean Manufacturing| Lean Business Excel Statistics Add onhttp://www.qimacros.com/