Principles of Hypothesis Testing for Public HealthTesting ... · Principles of Hypothesis Testing...

Principles of Hypothesis Testing for Public HealthTesting for Public Health

Laura Lee Johnson Ph DLaura Lee Johnson, Ph.D.Statistician

National Center for ComplementaryNational Center for Complementary and Alternative Medicine

johnslau@mail.nih.govFall 2011Fall 2011

Answers to Questions I U ll G A d NI Usually Get Around Now

• ITT is like generalizing to real lifeg g• I am not a fan of stratification

Except by clinic/siteN t ithNot everyone agrees with me

• OK to adjust for (some) variablesBaseline covariatesBaseline covariates

Cannot stratify a continuous variableAt least rarely can you do it wellt east a e y ca you do t e

Some variables are not ok, or you just upgraded to a fancy model!

ObjectivesObjectives

• Formulate questions for statisticiansFormulate questions for statisticians and epidemiologists using

P valueP-valuePowerType I and Type II errors

• Identity a few commonly used y ystatistical tests for comparing two groupsg p

OutlineOutlineEstimation and HypothesesEstimation and Hypotheses

• How to Test Hypotheses• Confidence Intervals• Confidence Intervals • Regression

E• Error• Diagnostic Testing• Misconceptions• Appendix

Estimation and HypothesesEstimation and Hypotheses

InferenceInferenceHow we use Hypothesis Testing

• Estimation• Distributions• Distributions• Hypothesis testing• Sides and Tails

Statistical InferenceStatistical Inference

• Inferences about a population are• Inferences about a population are made on the basis of results

bt i d f l dobtained from a sample drawn from that population

• Want to talk about the larger population from which the subjectspopulation from which the subjects are drawn, not the particular subjects!subjects!

Use Hypothesis TestingUse Hypothesis Testing• Designing a studyDesigning a study• Reviewing the design of other studies

Grant or application review (e g NIHGrant or application review (e.g. NIH study section, IRB)

• Interpreting study results• Interpreting study results• Interpreting other’s study results

R i i i t j lReviewing a manuscript or journalInterpreting the news

I Use Hypothesis TestingI Use Hypothesis Testing

• Do everything on previous slide• Do everything on previous slide• Analyze the data to find the results

Program formulas not presented here in detailhere in detail

• Anyone can analyze the data, too, y y , ,but be careful

Analysis Follows DesignAnalysis Follows Design

Questions → Hypotheses →Questions → Hypotheses → Experimental Design → Samples →Data → Analyses →Conclusions

What Do We TestWhat Do We Test• Effect or Difference we are interested in

Difference in Means or ProportionsOdds Ratio (OR)Relative Risk (RR)Correlation Coefficient

Cli i ll i t t diff• Clinically important differenceSmallest difference considered biologically or clinically relevantbiologically or clinically relevant

• Medicine: usually 2 group comparison of population means

InferenceInferenceHow we use Hypothesis TestingEstimation

• Distributions• Distributions• Hypothesis testing• Sides and Tails

Estimation: From the SampleEstimation: From the Sample

• Point estimation• Point estimationMeanMedianChange in mean/medianChange in mean/median

• Interval estimationVariation (e.g. range, σ2, σ, σ/√n)95% Confidence interval95% Confidence interval

Pictures, Not NumbersPictures, Not Numbers

• Scatter plotsScatter plots• Bar plots (use a table)

Hi t• Histograms• Box plots

• Not EstimationNot EstimationSee the data and check assumptions

Graphs and TablesGraphs and Tables

• A picture is worth a thousand t-testsA picture is worth a thousand t-tests• Vertical (Y) axis can be misleading

Like the Washington Post W h Th hWeather, Though

InferenceInferenceHow we use Hypothesis TestingEstimationDistributionsDistributions

• Hypothesis testing• Sides and Tails

DistributionsDistributions

• Parametric tests are based on• Parametric tests are based on distributions

Normal Distribution (standard normal, bell curve, Z distribution), , )

• Non-parametric tests still have assumptions but not based onassumptions, but not based on distributions

2 of the Continuous Distributions2 of the Continuous Distributions

• Normal distribution: N( μ, σ2)Normal distribution: N( μ, σ )μ = mean, σ2 = varianceZ or standard normal = N(0 1)Z or standard normal = N(0,1)

• t distribution: tω

d f f d (df)ω = degrees of freedom (df)Usually a function of sample size

XMean = (sample mean)Variance = s2 (sample variance)

Binary DistributionBinary Distribution

• Binomial distribution: B (n p)• Binomial distribution: B (n, p)Sample size = nProportion ‘yes’ = pMean = npMean = npVariance = np(1-p)

• Can do exact or use Normal

Many More DistributionsMany More Distributions

• Not going to coverNot going to cover• Poisson

L l• Log normal• Gamma• Beta• Weibull• Weibull • Many more

InferenceInferenceHow we use Hypothesis TestingEstimationDistributionsDistributionsHypothesis TestingSides and Tails

Hypothesis TestingHypothesis Testing

• Null hypothesis (H )• Null hypothesis (H0)• Alternative hypothesis (H1 or Ha)

Null HypothesisNull Hypothesis• For superiority studies we think for examplep y p

Average systolic blood pressure (SBP) on Drug A is different than average SBP on Drug B g

• Null of that? Usually that there is no effectMean = 0OR = 1OR 1RR = 1Correlation Coefficient = 0

• Sometimes compare to a fixed value so Null• Sometimes compare to a fixed value so NullMean = 120

• If an equivalence trial, look at NEJM paper or other specific resourcesother specific resources

Alternative HypothesisAlternative Hypothesis

• Contradicts the null• Contradicts the null• There is an effect• What you want to prove• If equivalence trial special way to• If equivalence trial, special way to

do this

Example HypothesesExample Hypotheses

• H : μ = μ• H0: μ1 = μ2

• HA: μ1 ≠ μ2

Two-sided test• H : μ > μ• HA: μ1 > μ2

One-sided test

1 vs. 2 Sided Tests1 vs. 2 Sided Tests

• Two-sided testTwo sided testNo a priori reason 1 group should have stronger effecthave stronger effectUsed for most tests

One sided test• One-sided testSpecific interest in only one directionNot scientifically relevant/interesting if reverse situation true

Use a 2-Sided TestUse a 2 Sided Test

• Almost always• Almost always• If you use a one-sided test

Explain yourselfPenalize yourself on the alphaPenalize yourself on the alpha

0.05 2-sided test becomes a 0.025 1-sided test

Never “Accept” AnythingNever Accept Anything• Reject the null hypothesisj yp• Fail to reject the null hypothesis

• Failing to reject the null hypothesis does NOT mean the null (H0) is true

• Failing to reject the null means Not enough evidence in your sample to reject the null hypothesisreject the null hypothesisIn one sample saw what you saw

OutlineOutline

Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t l• Confidence Intervals

• Regressiong• Error• Diagnostic Testing• Diagnostic Testing• Misconceptions

ExperimentExperiment

Develop hypothesesDevelop hypothesesCollect sample/Conduct experimentexperiment

• Calculate test statistic• Compare test statistic with what is

expected when H0 is truep 0

Information at HandInformation at Hand

• 1 or 2 sample test?• 1 or 2 sample test?• Outcome variable

Binary, Categorical, Ordered, Continuous, SurvivalContinuous, Survival

• Population• Numbers (e.g. mean, standard

deviation))

Example: H i /Ch l lHypertension/Cholesterol

• Mean cholesterol hypertensive menMean cholesterol hypertensive men• Mean cholesterol in male general

(normotensive) population (20 74(normotensive) population (20-74 years old)I th 20 74 ld l• In the 20-74 year old male population the mean serum cholesterol is 211 mg/ml with acholesterol is 211 mg/ml with a standard deviation of 46 mg/ml

One Sample:Ch l l S l DCholesterol Sample Data

• Have data on 25 hypertensive men• Have data on 25 hypertensive men• Mean serum cholesterol level is

220mg/ml ( = 220 mg/ml)Point estimate of the mean

• Sample standard deviation: s = 38 6 mg/ml38.6 mg/ml

Point estimate of the variance = s2

Compare Sample to PopulationCompare Sample to Population

• Is 25 enough?Is 25 enough?Next lecture we will discuss

Wh t diff i h l t l i• What difference in cholesterol is clinically or biologically

i f l?meaningful?• Have an available sample and want p

to know if hypertensives are different than general populationg p p

SituationSituation

• May be you are reading another• May be you are reading another person’s work

• May be already collected data

• If you were designing up front you ld l l t th l iwould calculate the sample size

But for now, we have 25 people, p p

Cholesterol HypothesesCholesterol Hypotheses

• H0: μ1 = μ2H0: μ1 μ2• H0: μ = 211 mg/ml

μ = POPULATION mean serumμ = POPULATION mean serum cholesterol for male hypertensivesMean cholesterol for hypertensiveMean cholesterol for hypertensive men = mean for general male populationpopulation

• HA: μ1 ≠ μ2• H : μ ≠ 211 mg/ml• HA: μ ≠ 211 mg/ml

Cholesterol Sample DataCholesterol Sample Data

• Population information (general)Population information (general)μ = 211 mg/ml

/ (σ = 46 mg/ml (σ2 = 2116)• Sample information (hypertensives)p ( yp )

= 220 mg/mls = 38 6 mg/ml (s2 = 1489 96)Xs = 38.6 mg/ml (s2 = 1489.96) N = 25

ExperimentExperimentDevelop hypothesesDevelop hypothesesCollect sample/Conduct experimentCalculate test statisticCalculate test statistic

• Compare test statistic with what is expected when H is trueexpected when H0 is true

Test StatisticTest Statistic• Basic test statistic for a meanBasic test statistic for a mean

point estimate of - target value of test statistic = μ μ

• σ = standard deviation (sometimes use σ/√n)

point estimate of

test statistic μσ

• σ = standard deviation (sometimes use σ/√n)• For 2-sided test: Reject H0 when the test

statistic is in the upper or lower 100*α/2% ofstatistic is in the upper or lower 100 α/2% of the reference distribution

• What is α?What is α?

VocabularyVocabulary

• Types of errors• Types of errorsType I (α) (false positives)Type II (β) (false negatives)

• Related words• Related wordsSignificance Level: α levelPower: 1- β

Unknown Truth and the DataUnknown Truth and the DataTruth H0 Correct HA CorrectTruth

DataH0 Correct HA Correct

Decide H 1- α βDecide H0

“fail to reject H0”

1- αTrue Negative

βFalse Negative

Decide HA

“reject H ”α

False Positive1- β

True Positiveα = significance level

1- β = power

reject H0 False Positive True Positive

1 β power

Type I ErrorType I Error

• α = P( reject H0 | H0 true)α = P( reject H0 | H0 true)• Probability reject the null hypothesis

given the null is truegiven the null is true• False positive• Probability reject that hypertensives’

µ=211mg/ml when in truth the mean cholesterol for hypertensives is 211

Type II Error (or, 1- Power)Type II Error (or, 1 Power)

• β = P( do not reject H0 | H1 true )β P( do not reject H0 | H1 true )• False Negative

P b bilit NOT j t th t l• Probability we NOT reject that male hypertensives’ cholesterol is that

f th l l ti h iof the general population when in truth the mean cholesterol for h i i diff h hhypertensives is different than the general male population

PowerPower

• Power = 1-β = P( reject H | H true )• Power = 1-β = P( reject H0 | H1 true )• Everyone wants high power, and

therefore low Type II error

Cholesterol Sample DataCholesterol Sample Data

• N = 25N 25• = 220 mg/ml

211 / lX

• μ = 211 mg/ml• s = 38.6 mg/ml (s2 = 1489.96)g ( )• σ = 46 mg/ml (σ2 = 2116)• α = 0 05• α = 0.05• Power? Next lecture!

Z Test Statistic and N(0,1)Z Test Statistic and N(0,1)• Want to test continuous outcomeWant to test continuous outcome• Known variance• Under H X μ−• Under H0

0 ~ (0,1)/

• Therefore,0

0Reject H if 1.96 (gives a 2-sided =0.05 test)/

X μ ασ

Reject H if X > 1.96 or X < 1.96

nσσ σμ μ+ −0 0 0eject .96 o .96n n

ExperimentExperimentDevelop hypothesesDevelop hypothesesCollect sample/Conduct experimentCalculate test statisticCalculate test statisticCompare test statistic with what is expected when H is trueexpected when H0 is true

Reference distributionA ti b t di t ib ti fAssumptions about distribution of outcome variable

Z or Standard Normal Di ib iDistribution

How to test?How to test?Rejection intervalj

Like a confidence interval but centered on the null mean

• Z test or Critical ValueN(0,1) distribution and alpha

• t test or Critical Valuet distribution and alpha

• P-value• Confidence interval

General Formula (1-α)%Rejection Region for Mean PointRejection Region for Mean Point

EstimateZ Z⎛ ⎞1 / 2 1 / 2,Z Zα ασ σμ μ− −⎛ ⎞− +⎜ ⎟

⎝ ⎠n n⎜ ⎟⎝ ⎠

• Note that +Z(α/2) = - Z(1-α/2)• 90% CI : Z = 1 64590% CI : Z = 1.645• 95% CI : Z = 1.96• 99% CI : Z = 2 58• 99% CI : Z = 2.58

Cholesterol Rejection IntervalU i H P l i I f iUsing H0 Population Information

Reject H if

N(211 462)

Reject Ho if 220 is outside of (193 229)N(211, 462) of (193,229)

211193 229

Normal DistributionNormal Distribution

Cholesterol Rejection IntervalU i H S l I f iUsing H0 Sample Information

Reject H ift (df=24, 211, 38.62)

Reject Ho if 220 is outside of (195 227)of (195,227)

211195 227

t Distribution (df = 24)t Distribution (df = 24)

Side Note on t vs. ZSide Note on t vs. Z

• If s = σ then the t value will beIf s σ then the t value will be larger than the Z value

• BUT here our sample standard• BUT, here our sample standard deviation (38.6) was quite a bit smaller than the population sd (46)smaller than the population sd (46)

HERE intervals using t look ll h Z i l BUTsmaller than Z intervals BUT

Because of sd, not distribution,

Z test or Critical ValueN(0,1) distribution and alpha

t test or Critical Valuet distribution and alpha

• P-value• Confidence interval

Z-test: Do Not Reject H0Z test: Do Not Reject H0

0 220 211 0.98 1.96/ 46 / 25

− −= = = <

/ 46 / 25nσ

Z or Standard Normal Di ib iDistribution

Determining Statistical Significance: Critical Value Method

• Compute the test statistic Z (0 98)• Compute the test statistic Z (0.98)• Compare to the critical value

Standard Normal value at α-level (1 96)Standard Normal value at α-level (1.96)• If |test statistic| > critical value

Reject H0Reject H0Results are statistically significant

• If |test statistic| < critical valueIf |test statistic| critical valueDo not reject H0Results are not statistically significanty g

T-Test StatisticT Test Statistic

• Want to test continuous outcomeWant to test continuous outcome• Unknown variance (s, not σ)

U d H X• Under H0 0( 1)~

/ nX ts n

• Critical values: statistics books or computerp

• t-distribution approximately normal for degrees of freedom (df) >30degrees of freedom (df) >30

Cholesterol: t-statisticCholesterol: t statistic

• Using data 0 220 211 1 17XT μ− −= = =Using data

F 0 05 t id d t t f t(24)

1.17/ 38.6 / 25

• For α = 0.05, two-sided test from t(24) distribution the critical value = 2.064

• | T | = 1.17 < 2.064• The difference is not statistically y

significant at the α = 0.05 level• Fail to reject H0Fail to reject H0

Almost all ‘Critical Value’ Tests: E S IdExact Same Idea

• Paired tests• Paired tests• 2-sample tests• Continuous data• Binary data• Binary data

• See appendix at end of slides

P-value• Confidence interval

P-valueP value

• Smallest α the observed sample wouldSmallest α the observed sample would reject H0

• Given H is true probability of• Given H0 is true, probability of obtaining a result as extreme or more extreme than the actual sampleextreme than the actual sample

• MUST be based on a modelNormal, t, binomial, etc.

Cholesterol ExampleCholesterol Example

• P-value for two sided test• P-value for two sided test• = 220 mg/ml, σ = 46 mg/mlX• n = 25• H : μ = 211 mg/ml• H0: μ = 211 mg/ml• HA: μ ≠ 211 mg/ml

2* [ 220] 0.33P X > =

Determining Statistical Si ifi P V l M h dSignificance: P-Value Method

• Compute the exact p-value (0.33)Compute the exact p value (0.33)• Compare to the predetermined α-level

(0.05)(0.05)• If p-value < predetermined α-level

Reject H0Reject H0Results are statistically significant

• If p-value > predetermined α-levelIf p-value > predetermined α-levelDo not reject H0Results are not statistically significantResults are not statistically significant

P-value Interpretation RemindersP value Interpretation Reminders

• Measure of the strength of• Measure of the strength of evidence in the data that the null is

t tnot true

A random variable whose value• A random variable whose value lies between 0 and 1

• NOT the probability that the null hypothesis is true.hypothesis is true.

P-value• Confidence interval

OutlineOutline

Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals

• Regressiong• Error• Diagnostic Testing• Diagnostic Testing• Misconceptions

General Formula (1-α)% CI for μGeneral Formula (1 α)% CI for μ

Z Z⎛ ⎞1 / 2 1 / 2,Z ZX Xα ασ σ− −⎛ ⎞− +⎜ ⎟⎝ ⎠n n⎜ ⎟⎝ ⎠

• Construct an interval around the point estimate

• Look to see if the population/null mean is insideis inside

Cholesterol Confidence IntervalU i P l i V i ( Z )Using Population Variance ( Z )

N(220, 462)

220211202196 229 238 244

CI for the Mean, Unknown V iVariance

• Pretty commonPretty common• Uses the t distribution

D f f d• Degrees of freedom1,1 / 2 1,1 / 2n nt s t s

X Xα α− − − −⎛ ⎞+⎜ ⎟,

2.064*38.6 2.064*38.6220 220

X Xn n

− +⎜ ⎟⎝ ⎠

⎛ ⎞+⎜ ⎟220 , 22025 25

(204.06, 235.93)

= − +⎜ ⎟⎝ ⎠

Cholesterol Confidence IntervalU i S l D ( )Using Sample Data ( t )

t (df=24, 220, 38.62)

220212204198 228 236 242

But I Have All Zeros! C l l 9 % b dCalculate 95% upper bound

• Known # of trials without an event (2.11 (van Belle 2002, Louis 1981)

• Given no observed events in n trials, 95% b d t f95% upper bound on rate of occurrence is 3 / (n + 1)

No fatal outcomes in 20 operationsNo fatal outcomes in 20 operations95% upper bound on rate of occurrence = 3 / (20 + 1) = 0 143 sooccurrence 3 / (20 + 1) 0.143, so the rate of occurrence of fatalities could be as high as 14.3%

Hypothesis Testing d C fid I land Confidence Intervals

• Hypothesis testing focuses on whereHypothesis testing focuses on where the sample mean is located

• Confidence intervals focus on plausible• Confidence intervals focus on plausible values for the population meanI l th b t t ti t• In general, the best way to estimate a confidence interval is to bootstrap (d t il t ti ti i )(details: see a statistician)

CI InterpretationCI Interpretation• Cannot determine if a particular interval p

does/does not contain true mean• Can say in the long run

Take many samples Same sample sizeFrom the same population95% of similarly constructed confidence intervals will contain true meanintervals will contain true mean

• Think about meta analyses

Interpret a 95% Confidence Interval (CI) for the populationInterval (CI) for the population

mean, μ• “If we were to find many such

intervals, each from a differentintervals, each from a different random sample but in exactly the same fashion then in the long runsame fashion, then, in the long run, about 95% of our intervals would incl de the pop lation meaninclude the population mean, μ, and 5% would not.”

Do NOT interpret a 95% CI…Do NOT interpret a 95% CI…• “There is a 95% probability that the trueThere is a 95% probability that the true

mean lies between the two confidence values we obtained from a particular psample”

• “We can say that we are 95% confident ythat the true mean does lie between these two values.”

• Overlapping CIs do NOT imply non-significance

Take Home: Hypothesis TestingTake Home: Hypothesis Testing

• Many ways to testMany ways to testRejection intervalZ test t test or Critical ValueZ test, t test, or Critical ValueP-valueC fid i t lConfidence interval

• For this, all ways will agreeIf not: math wrong, rounding errors

• Make sure interpret correctly

Take Home Hypothesis TestingTake Home Hypothesis Testing

• How to turn questions into hypothesesHow to turn questions into hypotheses• Failing to reject the null hypothesis

DOES NOT mean that the null is trueDOES NOT mean that the null is true• Every test has assumptions

A statistician can check all the assumptionsIf the data does not meet the assumptions there are non-parametric versions of tests (see text)(see text)

Take Home: CITake Home: CI

• Meaning/interpretation of the CIMeaning/interpretation of the CI• How to compute a CI for the true mean

when variance is known (normal model)when variance is known (normal model)• How to compute a CI for the true mean

h th i i NOT k (twhen the variance is NOT known (t distribution)

• In practice use Bootstrap

Take Home: VocabularyTake Home: Vocabulary• Null Hypothesis: H0Null Hypothesis: H0• Alternative Hypothesis: H1 or Ha or HA• Significance Level: α level• Significance Level: α level• Acceptance/Rejection Region

St ti ti ll Si ifi t• Statistically Significant• Test Statistic• Critical Value• P-value, Confidence Interval

OutlineOutline

Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals Regressiong

• Error• Diagnostic Testing• Diagnostic Testing• Misconceptions

RegressionRegression

• Continuous outcome• Continuous outcomeLinear

• Binary outcomeLogisticLogistic

• Many other types

Linear regressionLinear regression• Model for simple linear regression

Y β + β +Yi = β0 + β1x1i + εiβ0 = interceptβ = slopeβ1 = slope

• AssumptionsOb ti i d d tObservations are independentNormally distributed with constant variancevariance

• Hypothesis testingH : β = 0 vs H : β ≠ 0H0: β1 = 0 vs. HA: β1 ≠ 0

In Order of ImportanceIn Order of Importance

1. Independence2. Equal variance3 Normality3. Normality

(for ANOVA and linear regression)

More Than One CovariateMore Than One Covariate

• Yi = β0 + β1x1i + β2x2i + β3x3i + εiYi = β0 + β1x1i + β2x2i + β3x3i + εi

• SBP = β β D β M l β Aβ0 + β1 Drug + β2 Male + β3 Age

• β 1

Association between Drug and SBPAverage difference in SBP betweenAverage difference in SBP between the Drug and Control groups, given sex and agesex and age

Testing?Testing?

• Each β has a p-value associated with itEach β has a p-value associated with it• Each model will have an F-test

Oth th d t d t i fit• Other methods to determine fitResiduals

• See a statistician and/or take aSee a statistician and/or take a biostatistics class. Or 3.

Repeated Measures (3 i i )(3 or more time points)

• Do NOT use repeated measures• Do NOT use repeated measures AN(C)OVA

Assumptions quite stringent• Talk to a statisticianTalk to a statistician

Mixed modelGeneralized estimating equationsOtherOther

An Aside: CorrelationAn Aside: Correlation

• Range: -1 to 1Range: -1 to 1• Test is correlation is ≠ 0

With N 1000 t h hi hl• With N=1000, easy to have highly significant (p<0.001) correlation = 0.05

Statistically significant that isNo where CLOSE to meaningfully g ydifferent from 0

• Partial Correlation CoefficientPartial Correlation Coefficient

Do Not Use Correlation.U R iUse Regression

• Some fields: Correlation still popularSome fields: Correlation still popularPartial regression coefficients

Hi h l ti i 0 8 (i b l t• High correlation is > 0.8 (in absolute value). Maybe 0.7

• Never believe a p-value from a correlation test

• Regression coefficients are more meaningfulg

OutlineOutline

Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals RegressiongError

• Diagnostic Testing• Diagnostic Testing• Misconceptions

Is α or β more important ?Is α or β more important ?

• Depends on the question• Depends on the question• Most will say protect against Type I

errorMultiple comparisionsMultiple comparisions

• Need to think about individual and l ti h lth i li ti dpopulation health implications and

OmicsOmics

• False negative (Type II error)• False negative (Type II error)Miss what could be importantAre these samples going to be looked at again?looked at again?

• False positive (Type I error)Waste resources following dead ends

HIV ScreeningHIV Screening

• False positive• False positiveNeedless worryStigma

• False negative• False negativeThinks everything is okContinues to spread disease

• For cholesterol example?• For cholesterol example?

What do you need to think b ?about?

• Is it worse to treat those who truly• Is it worse to treat those who truly are not ill or to not treat those who

ill?are ill?• That answer will help guide you as p g y

to what amount of error you are willing to tolerate in your trialwilling to tolerate in your trial design

OutlineOutline

Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals RegressiongErrorDiagnostic TestingDiagnostic Testing

• Misconceptions

Little Diagnostic Testing Lingo

• False Positive/False Negative (α, β)False Positive/False Negative (α, β) • Positive Predictive Value (PPV)

Probability diseased given POSITIVEProbability diseased given POSITIVE test result

• Negative Predictive Value (NPV)• Negative Predictive Value (NPV)Probability NOT diseased given NEGATIVE test resultNEGATIVE test result

• Predictive values depend on disease prevalenceprevalence

Sensitivity, SpecificitySensitivity, Specificity

• Sensitivity: how good is a test atSensitivity: how good is a test at correctly IDing people who have diseasedisease

Can be 100% if you say everyone is ill (all have positive result)is ill (all have positive result)Useless test with bad Specificity

• Specificity: how good is the test at correctly IDing people who are welly g p p

Example: Western vs. ELISAExample: Western vs. ELISA• 1 million people1 million people• ELISA Sensitivity = 99.9%• ELISA Specificity = 99 9%• ELISA Specificity = 99.9%• 1% prevalence of infection

10 000 iti b W t ( ld10,000 positive by Western (gold standard) 9990 t iti (TP) b ELISA9990 true positives (TP) by ELISA10 false negatives (FN) by ELISA

1% Prevalence1% Prevalence

• 990 000 not infected990,000 not infected989,010 True Negatives (TN)990 F l P iti (FP)990 False Positives (FP)

• Without confirmatory testTell 990 or ~0.1% of the population they are infected when in reality they y y yare notPPV = 91% NPV = 99 999%PPV 91%, NPV 99.999%

1% Prevalence1% Prevalence• 10980 total test positive by ELISAp y

9990 true positive990 false positivep

• 9990/10980 = probability diseased GIVEN positive by ELISA = PPV = 0.91 = 91%

• 989,020 total test negatives by ELISA989,010 true negatives10 f l i10 false negatives

• 989010/989020 = NPV = 99.999%

0.1% Prevalence0.1% Prevalence

• 1 000 infected – ELISA picks up 9991,000 infected ELISA picks up 9991 FN

999 000 t i f t d• 999,000 not infected989,001 True Negatives (TN)999 False Positives (FP)

• Positive predictive value = 50%Positive predictive value 50%• Negative predictive value = 99.999%

10% Prevalence10% Prevalence

• 99% PPV• 99% PPV• 99.99% NPV

Prevalence Matters(Population You Sample to Estimate(Population You Sample to Estimate

Prevalence, too)• Numbers look “good” with high g g

prevalenceTesting at STD clinic in high risk

l tipopulations• Low prevalence means even very high

sensitivity and specificity will result insensitivity and specificity will result in middling PPV

• Calculate PPV and NPV for 0 01%Calculate PPV and NPV for 0.01% prevalence found in female blood donors

Prevalence MattersPrevalence Matters

• PPV and NPV tend to come from• PPV and NPV tend to come from good cohort data

• Can estimate PPV/NPV from case control studies but the formulas are hard and you need to be REALLY sure about the prevalenceREALLY sure about the prevalence

Triple sure

High OR D N G d T M kDoes Not a Good Test Make

• Diagnostic tests need separation• Diagnostic tests need separation• ROC curves

Not logistic regression with high OROR

• Strong association between 2 i bl d NOT dvariables does NOT mean good

prediction of separation

What do you need to think b ?about?

• How good does the test need to be?How good does the test need to be?96% sensitivity and 10% specificity?66% AUC? (Wh t i th t?)66% AUC? (What is that?)

• Guide you as to what amount of differentiation, levels of sensitivity, specificity, PPV and NPV you are willing to tolerate in your trial design

OutlineOutline

Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals RegressiongErrorDiagnostic TestingDiagnostic TestingMistakes & Misconceptions

Avoid Common Mistakes: H h i T iHypothesis Testing

• Mistake: Have paired data and do notMistake: Have paired data and do not do a paired test OR do not have paired data and do a paired testp

• If you have paired data, use a paired test

If you don’t then you can lose power• If you do NOT have paired data, do NOT y p ,

use a paired testYou can have the wrong inference

• Mistake: assume independent measurementsp• Tests have assumptions of independence

Taking multiple samples per subject ? g p p p jStatistician MUST knowDifferent statistical analyses MUST be

d d th b diffi lt!used and they can be difficult!• Mistake: ignore distribution of observations

Histogram of the observationsHistogram of the observationsHighly skewed data - t test and even non-parametric tests can have incorrect resultsparametric tests can have incorrect results

• Mistake: Assume equal variancesMistake: Assume equal variances (and the variances are not equal)

Did not show variance testDid not show variance testNot that good of a testALWAYS graph your data first to assess symmetry and variancey y

• Mistake: Not talking to a statisticianstatistician

Estimates and P-ValuesEstimates and P Values

• Study 1: 25±9• Study 1: 25±9Stat sig at the 1% level

• Study 2: 10±9Not statistically significant (ns)Not statistically significant (ns)

• 25 vs. 10 wow a big difference between these studies!

Um, no. 15±12.7Um, no. 15±12.7

Comparing A to BComparing A to B

• AppropriateAppropriateStatistical properties of A-BSt ti ti l ti f A/BStatistical properties of A/B

• NOT Appropriatepp pStatistical properties of AStatistical properties of BStatistical properties of BLook they are different!

Not a big difference? 15?!?Not a big difference? 15?!?

• Distribution of the difference• Distribution of the difference15±12.7Not statistically significantStandard deviations! ImportantStandard deviations! Important.

• Study 3 has much larger sample size!

2.5±0.92.5±0.9

3 Studies. 3 Answers, Maybe3 Studies. 3 Answers, Maybe

• Study # 3 is statistically significant• Study # 3 is statistically significant• Difference between study 3 and the

other studiesStatisticalStatisticalDifferent magnitudes

• Does study 3 replicate study 1?• Is it all sample size?Is it all sample size?

(Mis)conceptions(Mis)conceptions

• P-value = inferential tool ? Yes• P-value = inferential tool ? YesHelps demonstrate that population

i t t lmeans in two groups are not equal• Smaller p-value → larger effect ? g

NoEffect size is determined by theEffect size is determined by the difference in the sample mean or proportion between 2 groupsproportion between 2 groups

(Mis)conceptions(Mis)conceptions• A small p-value means the difference is p

statistically significant, not that the difference is clinically significant. YES

A l l i h l tA large sample size can help get a small p-value. YES, so do not be trickedtricked.

• Failing to reject H0 means what?There is not enough evidence to reject H0There is not enough evidence to reject H0 YESH0 is true! NO NO NO NO NO!

Questions?Questions?

AppendixAppendix

• Formulas for Critical Values• Formulas for Critical Values• Layouts for how to choose a test

Do Not Reject H0Do Not Reject H0

0 220 211 0.98 1.96/ /

XZ μ− −= = = <

/ 46 / 25nσ

046220=X > 1.96 =211+1.96* 228.03 NO!25n

σμ + =

046220=X < 1.96 211-1.96* 192.97 NO!25n

σμ − = =25n

Paired Tests: Difference T C i OTwo Continuous Outcomes

• Exact same idea• Exact same idea• Known variance: Z test statistic• Unknown variance: t test statistic• H : μ = 0 vs H : μ ≠ 0• H0: μd = 0 vs. HA: μd ≠ 0• Paired Z-test or Paired t-test

/ /d dZ or T= =/ /n s nσ

2 Samples: Same Variance + Sample Size Calculation Basis

• Unpaired - Same idea as pairedUnpaired - Same idea as paired• Known variance: Z test statistic

U k i t t t t ti ti• Unknown variance: t test statistic• H0: μ1 = μ2 vs. HA: μ1 ≠ μ2

H0: μ1 - μ2 = 0 vs. HA: μ1 - μ2 ≠ 0• Assume common varianceAssume common variance

1/ 1/ 1/ 1/x y x yZ or T

n m s n mσ− −

= =+ +1/ 1/ 1/ 1/n m s n mσ + +

2 Sample Unpaired Tests: 2 Diff V i2 Different Variances

• Same idea• Same idea• Known variance: Z test statistic• Unknown variance: t test statistic• H : μ = μ vs H : μ ≠ μ• H0: μ1 = μ2 vs. HA: μ1 ≠ μ2

• H0: μ1 - μ2 = 0 vs. HA: μ1 - μ2 ≠ 0

2 2 2 2/ / / /x y x yZ or Tn m s n s mσ σ− −

= =+ +1 2 1 2/ / / /n m s n s mσ σ+ +

One Sample Binary OutcomesOne Sample Binary Outcomes

• Exact same idea• Exact same idea• For large samples

Use Z test statisticSet up in terms of proportionsSet up in terms of proportions, not means

ˆ 0ˆ(1 ) /p pZ

p p n−

=−0 0(1 ) /p p n

Two Population ProportionsTwo Population Proportions

• Exact same idea• Exact same idea• For large samples use Z test

statisticˆ ˆp p1 2

1 1 2 2ˆ ˆ ˆ ˆ(1 ) (1 )p pZ

p p p p−

=− −1 1 2 2( ) ( )p p p pn m

Normal/Large Sample Data?

Binomial?No

Independent? Nonparametric test

p Nonparametric testNo

Expected ≥5 McNemar’s testNoYes

2 sample Z test for ti

Fisher’s Exact t t

proportions or contingency table

Normal/Large Sample Data?

YesInference on means?

Independent? Inference on variance?Yes No

Variance known?

Paired t F test for

Yes YesNo

known?

Variances equal?

variances

Z test

Variances equal?

T test w/ T test w/Yes No

T test w/ pooled

variance

T test w/ unequal variance

Principles of Hypothesis Testing for Public HealthTesting ... · Principles of Hypothesis Testing...

Documents

Transcript of Principles of Hypothesis Testing for Public HealthTesting ... · Principles of Hypothesis Testing...

The so-called Extraterrestrial Hypothesis · Extraterrestrial hypothesis 1 Extraterrestrial hypothesis The extraterrestrial hypothesis (ETH) is the hypothesis that some unidentified

Babush, Neiman, Kornman & Johnson, LLP Certified Public Accountants & Consultants

Null hypothesis AND ALTERNAT HYPOTHESIS

HYPOTHESIS TESTING Null Hypothesis and Research Hypothesis ?

1 The People of the San Joaquin Valley Hans Johnson Public Policy Institute of California johnson@ppic.org.

PUBLIC NOTICE – OCTOBER 14, 2021 REGARDING JOHNSON …

1 Population Growth and Housing in California Hans Johnson Public Policy Institute of California johnson@ppic.org.

PUBLIC SAFETY 1. 2. 3. Jordan Johnson)

Resume of Aubrey A. Johnson - Public Service Company of ...

Richard Johnson Portfolio Small Public

Sharon Stewart Johnson - College of Public Health and Human

Johnson County Public Health

1 Census 2000: Growing Together or Apart? California’s Regions Hans Johnson Public Policy Institute of California johnson@ppic.org.

JOHNSON WING RENOVATION & BREEZEWAY Manassas City Public Schools.

Public Engagement - Institute for Public RelationsEdelman 2008 Trust Barometer Survey . Employees are the New Credible Source Twice as Believable as CEO ... Johnson & Johnson Family

Dr. G. Johnson, Writing Survey Questions Research Methods for Public Administrators Dr. Gail Johnson.

Dr. G. Johnson, Basic Concepts Research Methods for Public Administrators Dr. Gail Johnson.

Religion and Public Schools: The meaning of the Johnson ...

A Brief Demography of California Hans Johnson Public Policy Institute of California johnson@ppic.org November 30, 2010.

East 67th Street / Billy Johnson Playground Renovationassets.centralparknyc.org/pdfs/pdc/Billy-Johnson... · East 67th Street / Billy Johnson Playground Renovation Public Review |