Green Belt Introduction to Hypothesis Testing - Lean...
Transcript of Green Belt Introduction to Hypothesis Testing - Lean...
SIMPLER. FASTER. BETTER. LESS COSTLY.
Transforming the Public SectorGreen Belt
Introduction to Hypothesis Testing
SIMPLER. FASTER. BETTER. LESS COSTLY.
SIMPLER. FASTER. BETTER. LESS COSTLY.
DMAIC FlowAnalyze:
– ID Potential Sources of Variation
– Characterize the X’s– Determine Significant X’s
Analyze Purpose: To determine the root causes, estimate population parameters with confidence intervals and to construct hypothesis about the data and test them to determine significance.
Hypothesis Test
SIMPLER. FASTER. BETTER. LESS COSTLY.
Objectives• Define hypothesis and hypothesis testing• Understand the role hypothesis testing and
inferential statistics play in a Six Sigma project• Learn how to formally state a hypothesis• Decide the appropriate settings for your test
based on practical considerations• Learn how to interpret the outcome of a
statistical hypothesis test• Set the stage for later sections on specific types
of statistical tests
SIMPLER. FASTER. BETTER. LESS COSTLY.
Quick Review
SIMPLER. FASTER. BETTER. LESS COSTLY.
Basic Stats• Which characteristics help us to determine the
centering of our data?– Mean, median and mode
• Which characteristics help us to determine the spread of our data?– Range, variance and standard deviation
SIMPLER. FASTER. BETTER. LESS COSTLY.
Population
Population vs. SampleParameter
Sample
Statistic
SIMPLER. FASTER. BETTER. LESS COSTLY.
Sampling Terminology
Sample
Sample Size:𝑛𝑛 = 12
Number of Samples = 16
SIMPLER. FASTER. BETTER. LESS COSTLY.
Central Limit TheoremOne of the most important foundational concepts in inferential statistics
The Central Limit Theorem states that, as sample size (n) gets larger:
• The sample means tend to follow a normal distribution• The sample means tend to cluster around the true
population mean
This holds true regardless of the distribution of the original population
SIMPLER. FASTER. BETTER. LESS COSTLY.
What is Hypothesis Testing?
SIMPLER. FASTER. BETTER. LESS COSTLY.
What Is A Hypothesis?• We pose informal hypotheses on a daily
basis:– I bet that 71 will have less traffic than 315 today– If I push the crosswalk button a bunch of times, it
will make the light change sooner– I bet that more people prefer the taste of Coke
over Pepsi– If I offer my kids merit-based allowance, they will
be more likely to help with chores– I bet that OSU will beat UM again this year– If we start holding quarterly staff potlucks, it will
solve our morale problem
SIMPLER. FASTER. BETTER. LESS COSTLY.
Hypothesis Testing:Let’s Get NUTS!
• State my hypothesis– All squirrels are eastern gray squirrels
• Collect data– Record each observation
• Analyze data– Statistical tools
• Decide:What does my data analysis tell me? Do squirrels really only come in one species/color?
SIMPLER. FASTER. BETTER. LESS COSTLY.
Hypothesis Testing• A component of inferential statistics used
to prove/disprove a claim (hypothesis) about a population under study
• Investigates characteristics or ‘parameters’ of the population:– Centering (mean)– Spread (variance)– Percentage of occurrence (proportion)
• Compares the characteristic to that of:– An established standard or target– One or more other samples
Most common
SIMPLER. FASTER. BETTER. LESS COSTLY.
Why Hypothesis Test?• Allows us to make inferences on our population
based on samples• Helps us move beyond reliance on anecdotal
evidence or assumptions• Enables data-driven decision making• Can help us evaluate critical factors (Xs) and
their effect on our output (Y)• Can help us determine whether observed
differences are due to common cause or special cause
SIMPLER. FASTER. BETTER. LESS COSTLY.
Visualizing Hypothesis Testing
𝜇𝜇
𝑥𝑥1 𝑥𝑥2𝑥𝑥3
SIMPLER. FASTER. BETTER. LESS COSTLY.
More Practical Examples: • If we re-arranged our employees into work cells, would it improve our
application processing time?
• Is the percentage of no-show appointments this year significantly different than last year’s percentage?
• If we made specific improvements to our customer forms, would it reduce errors and incomplete fields?
• Is there a significant difference in performance between employees (or teams, or regions, etc)?
• Would changing the wording of our survey participation request increase our response rate?
Hypothesis testing can help us answer questions like these!
SIMPLER. FASTER. BETTER. LESS COSTLY.
Stating Your Hypothesis
SIMPLER. FASTER. BETTER. LESS COSTLY.
A hypothesis test involves two distinct and opposing hypothesis statements:
• Null Hypothesis (H0)
• Alternative Hypothesis (Ha)
Hypothesis Statements
SIMPLER. FASTER. BETTER. LESS COSTLY.
• Baseline (default) argument
• Asserts status quo and equality:– No difference between two means– No significant change compared to baseline– One mean equals another
• Assume to be true, but aim to disprove
• Similar to a defendant on trial:Innocent until proven guilty
The Null Hypothesis (H0)
SIMPLER. FASTER. BETTER. LESS COSTLY.
• Claim we believe or wish to accept as true
• Carries the burden of proof
• We accept the alternative hypothesis onlyif there is sufficient evidence to reject the null hypothesis (i.e. to render a guilty verdict on our defendant)
• What constitutes sufficient evidence? More on that later…
The Alternative Hypothesis (Ha)
SIMPLER. FASTER. BETTER. LESS COSTLY.
• Taken together, the null and alternative hypotheses cover the entire domain of possible outcomes
• The H0 always contains some form of equality symbol (=, ≤ or ≥)
• The Ha always contains an inequality symbol (≠, < or >)
Hypothesis Fundamentals
SIMPLER. FASTER. BETTER. LESS COSTLY.
Hypothesis Notation ExampleA common example of a hypothesis test is to determine whether or not a treated sample group (1) has a mean which is statistically different from the non-treated (control) group (2):
• Null Hypothesis: There is no difference between groups 1 and 2.
– 𝐻𝐻0: 𝜇𝜇1 = 𝜇𝜇2
• Alternative Hypothesis: There is a difference between groups 1 and 2.
– 𝐻𝐻𝑎𝑎: 𝜇𝜇1 ≠ 𝜇𝜇2
SIMPLER. FASTER. BETTER. LESS COSTLY.
Two-Tailed Test• With a ‘not equal’ (≠) in the
Ha, this implies that we’re looking for change in either direction (positive or negative)
• This is referred to as a two-tailed test, meaning both tails of the curve make up the rejection region (the area in which we reject the H0 and accept the Ha)
𝜇𝜇1H0
Ha
𝐻𝐻0: 𝜇𝜇1 = 𝜇𝜇2𝐻𝐻𝑎𝑎: 𝜇𝜇1 ≠ 𝜇𝜇2
SIMPLER. FASTER. BETTER. LESS COSTLY.
Another Example• We would like to get out of class early today (as
opposed to ending on time or later than usual).
• The null hypothesis is:– Today’s end time ≥ Standard end time
𝐻𝐻0: 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑎𝑎𝑡𝑡 ≥ 𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡• Status quo & Equality
• The alternative hypothesis is:– Today’s end time < Standard end time
𝐻𝐻𝑎𝑎: 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑎𝑎𝑡𝑡 < 𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡• Claim we believe or wish to accept as true
SIMPLER. FASTER. BETTER. LESS COSTLY.
One-Tailed Test• Here, our rejection region
only occupies one tail on either side of the mean (in this case, it’s in the negativeor left direction)
• This is referred to as a one-tailed test– More specifically, this example
could be referred to as a one-sided, left-tailed test
𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡H0Ha
𝐻𝐻0: 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑎𝑎𝑡𝑡 ≥ 𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡𝐻𝐻𝑎𝑎: 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑎𝑎𝑡𝑡 < 𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡
SIMPLER. FASTER. BETTER. LESS COSTLY.
Tailedness of Tests
𝜇𝜇1H0Ha 𝐻𝐻0: 𝜇𝜇1 ≥ 𝜇𝜇2
𝐻𝐻𝑎𝑎: 𝜇𝜇1 < 𝜇𝜇2
𝜇𝜇1H0
Ha
𝐻𝐻0: 𝜇𝜇1 = 𝜇𝜇2𝐻𝐻𝑎𝑎: 𝜇𝜇1 ≠ 𝜇𝜇2
• Whether you have a One- or Two-Tailed Test depends on the equality/inequality symbols used in your hypothesis statements
• The symbols you use depend on practical considerations of what you’re testing for
SIMPLER. FASTER. BETTER. LESS COSTLY.
DOP A&A Section: Hypothesis Statement Practice
Write a Hypothesis Test for the Scenarios:The DOP section receives two types of Slow Forms: Renewal and Initials. We want to see if there is a difference in the number of errors between these two forms
• H0:• Ha:• Is this a one- or two-tailed test?
– Two-tailed
Renewal Form errors = Initial Form errors
Renewal Form errors ≠ Initial Form errors
SIMPLER. FASTER. BETTER. LESS COSTLY.
DOP A&A Section: Hypothesis Statement Practice
Write a Hypothesis Test for the Scenarios:The DOP section performs a Poka Yoke activity on their forms. They want to see if the changes they made reduced the time it takes to process the forms.
• H0:• Ha:• Is this a one- or two-tailed test?
– One-tailed
Poka Yoked process time ≥ Original process time
Poka Yoked process time < Original process time
SIMPLER. FASTER. BETTER. LESS COSTLY.
To RejectOr
Not To Reject
SIMPLER. FASTER. BETTER. LESS COSTLY.
Scientific Inquiry & Falsifiability• We either Reject the H0 or Fail to Reject
the H0.– Accepting the H0 is technically not a thing.
• Hypothesis testing is a form of scientific inquiry• A scientific claim needs to be falsifiable (having
the potential to be proven wrong by evidence)• Sufficient evidence from a sample can be used
to prove a claim wrong• In contrast, to prove a claim definitively correct
would require testing the entire population (which defeats the purpose of the test)
SIMPLER. FASTER. BETTER. LESS COSTLY.
Scientific Inquiry & FalsifiabilityI claim that all squirrels are eastern gray squirrels
• Is the claim falsifiable?• Yes!
• What evidence could I provide to prove my claim true?• Can’t realistically be proven true,
only shown to be more plausible
• But if it can’t be proven true, does that make it a bad claim?• No! The claim stands as long as
the evidence fails to reject it
SIMPLER. FASTER. BETTER. LESS COSTLY.
Scientific Inquiry & Falsifiability
• What would constitute sufficient evidence to prove my claim false?• A single squirrel of a different species
I claim that all squirrels are eastern gray squirrels
SIMPLER. FASTER. BETTER. LESS COSTLY.
Scientific Inquiry & Falsifiability
• What might my null hypothesis be?• H0: 𝑛𝑛𝑠𝑠 = 1
• Status quo and equality• Assume true, aim to disprove
• What might my alternative hypothesis be?• Ha: 𝑛𝑛𝑠𝑠 > 1
• Accept only if we reject the null
• Is this a one- or two-tailed test?• One-tailed
• (pun not intended, but I’ll take the credit anyway)
In quantifiable terms, where 𝑛𝑛𝑠𝑠 = number of squirrel species:
SIMPLER. FASTER. BETTER. LESS COSTLY.
Three Steps
SIMPLER. FASTER. BETTER. LESS COSTLY.
1: Write your Hypothesis: (H0 and Ha)
2: Collect Data (a sample of reality)
3: Run the Test and Decide:What does the evidence suggest?
Reject H0? or Fail to Reject H0?
Hypothesis Testing Steps
SIMPLER. FASTER. BETTER. LESS COSTLY.
Step One: Write the HypothesisBMV uses a form for processing registration renewals. They decide to poka-yoke the form and conduct a pilot to see if the changes had an impact (positive or negative) on their processing time
• H0:
• Ha:
Time to process original form = Time to process poka-yoked form
Time to process original form ≠ Time to process poka-yoked form
SIMPLER. FASTER. BETTER. LESS COSTLY.
Step Two: Collect Data
What data do you need to collect?
• Samples of original (baseline) processing time
• Samples of poka-yoked (pilot) processing time
SIMPLER. FASTER. BETTER. LESS COSTLY.
Step Three: Decide
• What does the evidence suggest?
• H0:
• Ha:
Time to process original form (6 min) = Time to process poka-yoked form (2 min)
Time to process original form (6 min) ≠ Time to process poka-yoked form (2 min)
Likely outcome:Reject the null
SIMPLER. FASTER. BETTER. LESS COSTLY.
What About Now?
• What does the evidence suggest?
• H0:
• Ha:
Time to process original form (6 min) = Time to process Poka Yoke’d form (5.7 min)
Time to process original form (6 min) ≠ Time to process Poka Yoke’d form (5.7 min)
Do we have“sufficient evidence?”
SIMPLER. FASTER. BETTER. LESS COSTLY.
Significance,Confidence
& Risk
SIMPLER. FASTER. BETTER. LESS COSTLY.
Significance Level• Before testing, we need to define what
“sufficient evidence” would look like and how certain we need to be
• This is alpha (𝜶𝜶), or Significance Level– Our specified level of acceptable risk of rejecting
the null hypothesis when it is actually true• We set alpha based on practical considerations
in order to strike a balance between test sensitivity, risk and costs
SIMPLER. FASTER. BETTER. LESS COSTLY.
Significance LevelHow high or low should I set my alpha?
• Higher alpha values (typically up to 0.2):– When there’s less risk involved– Actively seeking potential effects for further exploration
• Mid-range alpha value (0.05):– Typical default for transactional and service-
oriented processes such as in state government• Lower alpha values (as low as 0.01):
– Best for risk-averse scenarios– When costs of taking action are very high
SIMPLER. FASTER. BETTER. LESS COSTLY.
Significance LevelIn our squirrel example, how might I want to set my alpha if:
• It was extremely important to me that I protect my wager and only pay up if I’m very certain I’m wrong?– Lower alpha value (0.01)
• I’m interested in being able to detect the possibility of new species?– Higher alpha value (0.1 – 0.2)
• I want to strike a balance between the two extremes?– Standard 0.05 alpha value
SIMPLER. FASTER. BETTER. LESS COSTLY.
Visualizing Significance LevelConsider an alpha of 0.05:One-sided, right-tailed testAssuming normal distribution
• We accept a 5% risk of rejecting H0 when it’s actually true
• An alpha of 0.05 gives a cut-off point at 1.645 standard deviations from the mean (95th percentile)
• If our test results land in the rejection region (to the right of the cut-off), then we can safely reject the H0
SIMPLER. FASTER. BETTER. LESS COSTLY.
Confidence LevelThe compliment of alpha (1-𝛼𝛼) is known as the Confidence Level
• Confidence level determines our confidence interval, which coincides with our non-rejection region
• If our alpha is 0.05, we have a 0.95 confidence level
• If our sample mean equals our population mean, random samples will fall within our non-rejection region 95% of the time Confidence Interval
SIMPLER. FASTER. BETTER. LESS COSTLY.
Confidence/Significance Trade-OffsWhy not just set our confidence level very high all the time? Surely that can’t be a bad thing, right?
• As confidence level goes up, precision goes down• We’ll have greater confidence that our samples reflect our true
population mean, but we’re less certain about the value of the mean• How might we be
able to raise confidence level without sacrificing precision?
• Increase the sample size(Central Limit Theorem!)
SIMPLER. FASTER. BETTER. LESS COSTLY.
Alpha Risk (α) and Beta Risk (β) • Alpha Risk or Type I error: rejecting the null hypothesis
when it is true– Generally considered a more serious error– Can lead to taking action when none was needed
(conviction of an innocent person)– Reducing alpha level will reduce your alpha risk (but
increases your beta risk)
• Beta Risk or Type II error: failing to reject the null hypothesis when the alternative is true– Considered a less serious error– It could mean that we took no action when we
probably should have; however, status quo is maintained (acquittal of a guilty person)
– Increasing sample size is one way to reduce beta risk
SIMPLER. FASTER. BETTER. LESS COSTLY.
Your Decision
Fail to Reject H0
The Truth
H0 True
H0 False
Type IError
(α-Risk)
Type II Error
(β-Risk)
Correct
Correct
Reject H0
Decision Errors
SIMPLER. FASTER. BETTER. LESS COSTLY.
Alpha Risk - Type I Error
Your Decision
Not Guilty
The Truth
Not Guilty
Guilty
Type IError
(α-Risk)
Type II Error
(β -Risk)
Correct
Correct
Guilty
SIMPLER. FASTER. BETTER. LESS COSTLY.
Beta Risk - Type II Error
Your Decision
Not Guilty
The Truth
Not Guilty
Guilty
Type IError
(α-Risk)
Type II Error
(β -Risk)
Correct
Correct
Guilty
SIMPLER. FASTER. BETTER. LESS COSTLY.
The AlmightyP-Value
SIMPLER. FASTER. BETTER. LESS COSTLY.
P-Value• P-Value: likelihood that the test result
occurred by random chance rather than special cause
• A primary result of running a statistical hypothesis test
• In terms of “sufficient evidence”, it’s a bit like the smoking gun (or lack thereof)
SIMPLER. FASTER. BETTER. LESS COSTLY.
P-Values Are Everywhere !
SIMPLER. FASTER. BETTER. LESS COSTLY.P-Value: 0.020A-Squared: 0.889
Anderson-Darling Normality Test
N: 25StDev: 0.943184Average: 10.0799
1211109
.999
.99
.95
.80
.50
.20
.05
.01
.001
Pro
bab
ilit y
Ma c h 1
N o r m a l P r o b a b i l i ty P lo t
P-Values Are Everywhere !
SIMPLER. FASTER. BETTER. LESS COSTLY.
P- Values are everywhere, even in Excel!
t-Test: Paired Two Sample for Means
Variable 1 Variable 2Mean 169.583333 75.33333333Variance 18970.44697 852.969697Observations 12 12Pearson Correlation 0.1014423Hypothesized Mean Difference 0df 11t Stat 2.368164241P(T<=t) one-tail 0.018636275t Critical one-tail 1.795884819P(T<=t) two-tail 0.037272551t Critical two-tail 2.20098516
SIMPLER. FASTER. BETTER. LESS COSTLY.
What are P-values Used For?
P-Value ≤ 𝜶𝜶:• Ample Evidence• H0 is Rejected• Support Ha
P-Value > 𝜶𝜶:• Little evidence• H0 is Not Rejected• Assume H0 is true
SIMPLER. FASTER. BETTER. LESS COSTLY.
How Low Must P be?• P-Value ≤ 𝛼𝛼: Reject the H0
• P-Value > 𝛼𝛼: Fail to reject the H0
A NEW MANTRA: “If P is low the H0 must go”
SIMPLER. FASTER. BETTER. LESS COSTLY.
What would we Conclude?
The H0 must go!
t-Test: Paired Two Sample for Means
Variable 1 Variable 2Mean 169.583333 75.33333333Variance 18970.44697 852.969697Observations 12 12Pearson Correlation 0.1014423Hypothesized Mean Difference 0df 11t Stat 2.368164241P(T<=t) one-tail 0.018636275t Critical one-tail 1.795884819P(T<=t) two-tail 0.037272551t Critical two-tail 2.20098516
The p is low, So….
Does the conclusion seem logical given the visual representation of the two groups?
SIMPLER. FASTER. BETTER. LESS COSTLY.
Hypothesis Testing: What’s Next?• Your data is run through a specific hypothesis test,
which generates a series of test statistics including a p-value
• The type of test you’ll use will depend on a handful of factors:– Is your data normally distributed or not?– What are you evaluating (mean, variance proportion)?– What are you comparing your sample to (a standard/goal
or other samples)?• You’ll learn more about these specific tests in
upcoming sessions on 1- and 2- variable testing
SIMPLER. FASTER. BETTER. LESS COSTLY.
Hypothesis Testing: In Review• A hypothesis statement is created for each statistical test
performed• Assume the null hypothesis is true, but look to disprove it• Chose a significance level (𝛼𝛼) based on desired test
sensitivity and acceptable risk level• Chose a sample size small enough to be practical and
cost efficient, but big enough to increase precision and reduce risk
• We subject our data to the appropriate statistical test and compare the resulting p-value to our chosen alpha value (remember: If the P is low, the H0 must go!)
• We either reject the null hypothesis (accepting the alternative hypothesis) or fail to reject the null hypothesis (allowing it to stand)
SIMPLER. FASTER. BETTER. LESS COSTLY.
Hypothesis FormHypothesis Testing Form
What is the Y? What Type of Data?
What is the X? What Type of Data
How many “levels” does X have?
Is my data Stable?
What type of tool would you use?
Is my data Normal? (Outliers?)
Comparing Median or Means?
Ho: (=)
Ha:
P value: (0.05)
Interpret results:
SIMPLER. FASTER. BETTER. LESS COSTLY.
Questions?