Green Belt Introduction to Hypothesis Testing - Lean...

SIMPLER. FASTER. BETTER. LESS COSTLY.

Transforming the Public SectorGreen Belt

Introduction to Hypothesis Testing


DMAIC FlowAnalyze:

– ID Potential Sources of Variation

– Characterize the X’s– Determine Significant X’s

Analyze Purpose: To determine the root causes, estimate population parameters with confidence intervals and to construct hypothesis about the data and test them to determine significance.

Hypothesis Test


Objectives• Define hypothesis and hypothesis testing• Understand the role hypothesis testing and

inferential statistics play in a Six Sigma project• Learn how to formally state a hypothesis• Decide the appropriate settings for your test

based on practical considerations• Learn how to interpret the outcome of a

statistical hypothesis test• Set the stage for later sections on specific types

of statistical tests


Quick Review


Basic Stats• Which characteristics help us to determine the

centering of our data?– Mean, median and mode

• Which characteristics help us to determine the spread of our data?– Range, variance and standard deviation


Population

Population vs. SampleParameter

Sample

Statistic


Sampling Terminology

Sample

Sample Size:𝑛𝑛 = 12

Number of Samples = 16


Central Limit TheoremOne of the most important foundational concepts in inferential statistics

The Central Limit Theorem states that, as sample size (n) gets larger:

• The sample means tend to follow a normal distribution• The sample means tend to cluster around the true

population mean

This holds true regardless of the distribution of the original population


What is Hypothesis Testing?


What Is A Hypothesis?• We pose informal hypotheses on a daily

basis:– I bet that 71 will have less traffic than 315 today– If I push the crosswalk button a bunch of times, it

will make the light change sooner– I bet that more people prefer the taste of Coke

over Pepsi– If I offer my kids merit-based allowance, they will

be more likely to help with chores– I bet that OSU will beat UM again this year– If we start holding quarterly staff potlucks, it will

solve our morale problem


Hypothesis Testing:Let’s Get NUTS!

• State my hypothesis– All squirrels are eastern gray squirrels

• Collect data– Record each observation

• Analyze data– Statistical tools

• Decide:What does my data analysis tell me? Do squirrels really only come in one species/color?


Hypothesis Testing• A component of inferential statistics used

to prove/disprove a claim (hypothesis) about a population under study

• Investigates characteristics or ‘parameters’ of the population:– Centering (mean)– Spread (variance)– Percentage of occurrence (proportion)

• Compares the characteristic to that of:– An established standard or target– One or more other samples

Most common


Why Hypothesis Test?• Allows us to make inferences on our population

based on samples• Helps us move beyond reliance on anecdotal

evidence or assumptions• Enables data-driven decision making• Can help us evaluate critical factors (Xs) and

their effect on our output (Y)• Can help us determine whether observed

differences are due to common cause or special cause


Visualizing Hypothesis Testing

𝜇𝜇

𝑥𝑥1 𝑥𝑥2𝑥𝑥3


More Practical Examples: • If we re-arranged our employees into work cells, would it improve our

application processing time?

• Is the percentage of no-show appointments this year significantly different than last year’s percentage?

• If we made specific improvements to our customer forms, would it reduce errors and incomplete fields?

• Is there a significant difference in performance between employees (or teams, or regions, etc)?

• Would changing the wording of our survey participation request increase our response rate?

Hypothesis testing can help us answer questions like these!


Stating Your Hypothesis


A hypothesis test involves two distinct and opposing hypothesis statements:

• Null Hypothesis (H0)

• Alternative Hypothesis (Ha)

Hypothesis Statements


• Baseline (default) argument

• Asserts status quo and equality:– No difference between two means– No significant change compared to baseline– One mean equals another

• Assume to be true, but aim to disprove

• Similar to a defendant on trial:Innocent until proven guilty

The Null Hypothesis (H0)


• Claim we believe or wish to accept as true

• Carries the burden of proof

• We accept the alternative hypothesis onlyif there is sufficient evidence to reject the null hypothesis (i.e. to render a guilty verdict on our defendant)

• What constitutes sufficient evidence? More on that later…

The Alternative Hypothesis (Ha)


• Taken together, the null and alternative hypotheses cover the entire domain of possible outcomes

• The H0 always contains some form of equality symbol (=, ≤ or ≥)

• The Ha always contains an inequality symbol (≠, < or >)

Hypothesis Fundamentals


Hypothesis Notation ExampleA common example of a hypothesis test is to determine whether or not a treated sample group (1) has a mean which is statistically different from the non-treated (control) group (2):

• Null Hypothesis: There is no difference between groups 1 and 2.

– 𝐻𝐻0: 𝜇𝜇1 = 𝜇𝜇2

• Alternative Hypothesis: There is a difference between groups 1 and 2.

– 𝐻𝐻𝑎𝑎: 𝜇𝜇1 ≠ 𝜇𝜇2


Two-Tailed Test• With a ‘not equal’ (≠) in the

Ha, this implies that we’re looking for change in either direction (positive or negative)

• This is referred to as a two-tailed test, meaning both tails of the curve make up the rejection region (the area in which we reject the H0 and accept the Ha)

𝜇𝜇1H0

Ha

𝐻𝐻0: 𝜇𝜇1 = 𝜇𝜇2𝐻𝐻𝑎𝑎: 𝜇𝜇1 ≠ 𝜇𝜇2


Another Example• We would like to get out of class early today (as

opposed to ending on time or later than usual).

• The null hypothesis is:– Today’s end time ≥ Standard end time

𝐻𝐻0: 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑎𝑎𝑡𝑡 ≥ 𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡• Status quo & Equality

• The alternative hypothesis is:– Today’s end time < Standard end time

𝐻𝐻𝑎𝑎: 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑎𝑎𝑡𝑡 < 𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡• Claim we believe or wish to accept as true


One-Tailed Test• Here, our rejection region

only occupies one tail on either side of the mean (in this case, it’s in the negativeor left direction)

• This is referred to as a one-tailed test– More specifically, this example

could be referred to as a one-sided, left-tailed test

𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡H0Ha

𝐻𝐻0: 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑎𝑎𝑡𝑡 ≥ 𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡𝐻𝐻𝑎𝑎: 𝑇𝑇𝑡𝑡𝑡𝑡𝑡𝑡𝑎𝑎𝑡𝑡 < 𝑇𝑇𝑠𝑠𝑡𝑡𝑡𝑡


Tailedness of Tests

𝜇𝜇1H0Ha 𝐻𝐻0: 𝜇𝜇1 ≥ 𝜇𝜇2

𝐻𝐻𝑎𝑎: 𝜇𝜇1 < 𝜇𝜇2

𝜇𝜇1H0

Ha

𝐻𝐻0: 𝜇𝜇1 = 𝜇𝜇2𝐻𝐻𝑎𝑎: 𝜇𝜇1 ≠ 𝜇𝜇2

• Whether you have a One- or Two-Tailed Test depends on the equality/inequality symbols used in your hypothesis statements

• The symbols you use depend on practical considerations of what you’re testing for


DOP A&A Section: Hypothesis Statement Practice

Write a Hypothesis Test for the Scenarios:The DOP section receives two types of Slow Forms: Renewal and Initials. We want to see if there is a difference in the number of errors between these two forms

• H0:• Ha:• Is this a one- or two-tailed test?

– Two-tailed

Renewal Form errors = Initial Form errors

Renewal Form errors ≠ Initial Form errors


DOP A&A Section: Hypothesis Statement Practice

Write a Hypothesis Test for the Scenarios:The DOP section performs a Poka Yoke activity on their forms. They want to see if the changes they made reduced the time it takes to process the forms.

• H0:• Ha:• Is this a one- or two-tailed test?

– One-tailed

Poka Yoked process time ≥ Original process time

Poka Yoked process time < Original process time


To RejectOr

Not To Reject


Scientific Inquiry & Falsifiability• We either Reject the H0 or Fail to Reject

the H0.– Accepting the H0 is technically not a thing.

• Hypothesis testing is a form of scientific inquiry• A scientific claim needs to be falsifiable (having

the potential to be proven wrong by evidence)• Sufficient evidence from a sample can be used

to prove a claim wrong• In contrast, to prove a claim definitively correct

would require testing the entire population (which defeats the purpose of the test)


Scientific Inquiry & FalsifiabilityI claim that all squirrels are eastern gray squirrels

• Is the claim falsifiable?• Yes!

• What evidence could I provide to prove my claim true?• Can’t realistically be proven true,

only shown to be more plausible

• But if it can’t be proven true, does that make it a bad claim?• No! The claim stands as long as

the evidence fails to reject it


Scientific Inquiry & Falsifiability

• What would constitute sufficient evidence to prove my claim false?• A single squirrel of a different species

I claim that all squirrels are eastern gray squirrels


Scientific Inquiry & Falsifiability

• What might my null hypothesis be?• H0: 𝑛𝑛𝑠𝑠 = 1

• Status quo and equality• Assume true, aim to disprove

• What might my alternative hypothesis be?• Ha: 𝑛𝑛𝑠𝑠 > 1

• Accept only if we reject the null

• Is this a one- or two-tailed test?• One-tailed

• (pun not intended, but I’ll take the credit anyway)

In quantifiable terms, where 𝑛𝑛𝑠𝑠 = number of squirrel species:


Three Steps


1: Write your Hypothesis: (H0 and Ha)

2: Collect Data (a sample of reality)

3: Run the Test and Decide:What does the evidence suggest?

Reject H0? or Fail to Reject H0?

Hypothesis Testing Steps


Step One: Write the HypothesisBMV uses a form for processing registration renewals. They decide to poka-yoke the form and conduct a pilot to see if the changes had an impact (positive or negative) on their processing time

• H0:

• Ha:

Time to process original form = Time to process poka-yoked form

Time to process original form ≠ Time to process poka-yoked form


Step Two: Collect Data

What data do you need to collect?

• Samples of original (baseline) processing time

• Samples of poka-yoked (pilot) processing time


Step Three: Decide

• What does the evidence suggest?

• H0:

• Ha:

Time to process original form (6 min) = Time to process poka-yoked form (2 min)

Time to process original form (6 min) ≠ Time to process poka-yoked form (2 min)

Likely outcome:Reject the null


What About Now?

• What does the evidence suggest?

• H0:

• Ha:

Time to process original form (6 min) = Time to process Poka Yoke’d form (5.7 min)

Time to process original form (6 min) ≠ Time to process Poka Yoke’d form (5.7 min)

Do we have“sufficient evidence?”


Significance,Confidence

& Risk


Significance Level• Before testing, we need to define what

“sufficient evidence” would look like and how certain we need to be

• This is alpha (𝜶𝜶), or Significance Level– Our specified level of acceptable risk of rejecting

the null hypothesis when it is actually true• We set alpha based on practical considerations

in order to strike a balance between test sensitivity, risk and costs


Significance LevelHow high or low should I set my alpha?

• Higher alpha values (typically up to 0.2):– When there’s less risk involved– Actively seeking potential effects for further exploration

• Mid-range alpha value (0.05):– Typical default for transactional and service-

oriented processes such as in state government• Lower alpha values (as low as 0.01):

– Best for risk-averse scenarios– When costs of taking action are very high


Significance LevelIn our squirrel example, how might I want to set my alpha if:

• It was extremely important to me that I protect my wager and only pay up if I’m very certain I’m wrong?– Lower alpha value (0.01)

• I’m interested in being able to detect the possibility of new species?– Higher alpha value (0.1 – 0.2)

• I want to strike a balance between the two extremes?– Standard 0.05 alpha value


Visualizing Significance LevelConsider an alpha of 0.05:One-sided, right-tailed testAssuming normal distribution

• We accept a 5% risk of rejecting H0 when it’s actually true

• An alpha of 0.05 gives a cut-off point at 1.645 standard deviations from the mean (95th percentile)

• If our test results land in the rejection region (to the right of the cut-off), then we can safely reject the H0


Confidence LevelThe compliment of alpha (1-𝛼𝛼) is known as the Confidence Level

• Confidence level determines our confidence interval, which coincides with our non-rejection region

• If our alpha is 0.05, we have a 0.95 confidence level

• If our sample mean equals our population mean, random samples will fall within our non-rejection region 95% of the time Confidence Interval


Confidence/Significance Trade-OffsWhy not just set our confidence level very high all the time? Surely that can’t be a bad thing, right?

• As confidence level goes up, precision goes down• We’ll have greater confidence that our samples reflect our true

population mean, but we’re less certain about the value of the mean• How might we be

able to raise confidence level without sacrificing precision?

• Increase the sample size(Central Limit Theorem!)


Alpha Risk (α) and Beta Risk (β) • Alpha Risk or Type I error: rejecting the null hypothesis

when it is true– Generally considered a more serious error– Can lead to taking action when none was needed

(conviction of an innocent person)– Reducing alpha level will reduce your alpha risk (but

increases your beta risk)

• Beta Risk or Type II error: failing to reject the null hypothesis when the alternative is true– Considered a less serious error– It could mean that we took no action when we

probably should have; however, status quo is maintained (acquittal of a guilty person)

– Increasing sample size is one way to reduce beta risk


Your Decision

Fail to Reject H0

The Truth

H0 True

H0 False

Type IError

(α-Risk)

Type II Error

(β-Risk)

Correct

Correct

Reject H0

Decision Errors


Alpha Risk - Type I Error

Your Decision

Not Guilty

The Truth

Not Guilty

Guilty

Type IError

(α-Risk)

Type II Error

(β -Risk)

Correct

Correct

Guilty


Beta Risk - Type II Error

Your Decision

Not Guilty

The Truth

Not Guilty

Guilty

Type IError

(α-Risk)

Type II Error

(β -Risk)

Correct

Correct

Guilty


The AlmightyP-Value


P-Value• P-Value: likelihood that the test result

occurred by random chance rather than special cause

• A primary result of running a statistical hypothesis test

• In terms of “sufficient evidence”, it’s a bit like the smoking gun (or lack thereof)


P-Values Are Everywhere !

SIMPLER. FASTER. BETTER. LESS COSTLY.P-Value: 0.020A-Squared: 0.889

Anderson-Darling Normality Test

N: 25StDev: 0.943184Average: 10.0799

1211109

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

bab

ilit y

Ma c h 1

N o r m a l P r o b a b i l i ty P lo t

P-Values Are Everywhere !


P- Values are everywhere, even in Excel!

t-Test: Paired Two Sample for Means

Variable 1 Variable 2Mean 169.583333 75.33333333Variance 18970.44697 852.969697Observations 12 12Pearson Correlation 0.1014423Hypothesized Mean Difference 0df 11t Stat 2.368164241P(T<=t) one-tail 0.018636275t Critical one-tail 1.795884819P(T<=t) two-tail 0.037272551t Critical two-tail 2.20098516


What are P-values Used For?

P-Value ≤ 𝜶𝜶:• Ample Evidence• H0 is Rejected• Support Ha

P-Value > 𝜶𝜶:• Little evidence• H0 is Not Rejected• Assume H0 is true


How Low Must P be?• P-Value ≤ 𝛼𝛼: Reject the H0

• P-Value > 𝛼𝛼: Fail to reject the H0

A NEW MANTRA: “If P is low the H0 must go”


What would we Conclude?

The H0 must go!

t-Test: Paired Two Sample for Means

Variable 1 Variable 2Mean 169.583333 75.33333333Variance 18970.44697 852.969697Observations 12 12Pearson Correlation 0.1014423Hypothesized Mean Difference 0df 11t Stat 2.368164241P(T<=t) one-tail 0.018636275t Critical one-tail 1.795884819P(T<=t) two-tail 0.037272551t Critical two-tail 2.20098516

The p is low, So….

Does the conclusion seem logical given the visual representation of the two groups?


Hypothesis Testing: What’s Next?• Your data is run through a specific hypothesis test,

which generates a series of test statistics including a p-value

• The type of test you’ll use will depend on a handful of factors:– Is your data normally distributed or not?– What are you evaluating (mean, variance proportion)?– What are you comparing your sample to (a standard/goal

or other samples)?• You’ll learn more about these specific tests in

upcoming sessions on 1- and 2- variable testing


Hypothesis Testing: In Review• A hypothesis statement is created for each statistical test

performed• Assume the null hypothesis is true, but look to disprove it• Chose a significance level (𝛼𝛼) based on desired test

sensitivity and acceptable risk level• Chose a sample size small enough to be practical and

cost efficient, but big enough to increase precision and reduce risk

• We subject our data to the appropriate statistical test and compare the resulting p-value to our chosen alpha value (remember: If the P is low, the H0 must go!)

• We either reject the null hypothesis (accepting the alternative hypothesis) or fail to reject the null hypothesis (allowing it to stand)


Hypothesis FormHypothesis Testing Form

What is the Y? What Type of Data?

What is the X? What Type of Data

How many “levels” does X have?

Is my data Stable?

What type of tool would you use?

Is my data Normal? (Outliers?)

Comparing Median or Means?

Ho: (=)

Ha:

P value: (0.05)

Interpret results:


Questions?

Green Belt Introduction to Hypothesis Testing - Lean...

Documents

Transcript of Green Belt Introduction to Hypothesis Testing - Lean...