Post on 15-Jun-2018
Principles of Hypothesis Testing for Public HealthTesting for Public Health
Laura Lee Johnson Ph DLaura Lee Johnson, Ph.D.Statistician
National Center for ComplementaryNational Center for Complementary and Alternative Medicine
johnslau@mail.nih.govFall 2011Fall 2011
Answers to Questions I U ll G A d NI Usually Get Around Now
• ITT is like generalizing to real lifeg g• I am not a fan of stratification
Except by clinic/siteN t ithNot everyone agrees with me
• OK to adjust for (some) variablesBaseline covariatesBaseline covariates
Cannot stratify a continuous variableAt least rarely can you do it wellt east a e y ca you do t e
Some variables are not ok, or you just upgraded to a fancy model!
ObjectivesObjectives
• Formulate questions for statisticiansFormulate questions for statisticians and epidemiologists using
P valueP-valuePowerType I and Type II errors
• Identity a few commonly used y ystatistical tests for comparing two groupsg p
OutlineOutlineEstimation and HypothesesEstimation and Hypotheses
• How to Test Hypotheses• Confidence Intervals• Confidence Intervals • Regression
E• Error• Diagnostic Testing• Misconceptions• Appendix
Estimation and HypothesesEstimation and Hypotheses
InferenceInferenceHow we use Hypothesis Testing
• Estimation• Distributions• Distributions• Hypothesis testing• Sides and Tails
Statistical InferenceStatistical Inference
• Inferences about a population are• Inferences about a population are made on the basis of results
bt i d f l dobtained from a sample drawn from that population
• Want to talk about the larger population from which the subjectspopulation from which the subjects are drawn, not the particular subjects!subjects!
Use Hypothesis TestingUse Hypothesis Testing• Designing a studyDesigning a study• Reviewing the design of other studies
Grant or application review (e g NIHGrant or application review (e.g. NIH study section, IRB)
• Interpreting study results• Interpreting study results• Interpreting other’s study results
R i i i t j lReviewing a manuscript or journalInterpreting the news
I Use Hypothesis TestingI Use Hypothesis Testing
• Do everything on previous slide• Do everything on previous slide• Analyze the data to find the results
Program formulas not presented here in detailhere in detail
• Anyone can analyze the data, too, y y , ,but be careful
Analysis Follows DesignAnalysis Follows Design
Questions → Hypotheses →Questions → Hypotheses → Experimental Design → Samples →Data → Analyses →Conclusions
What Do We TestWhat Do We Test• Effect or Difference we are interested in
Difference in Means or ProportionsOdds Ratio (OR)Relative Risk (RR)Correlation Coefficient
Cli i ll i t t diff• Clinically important differenceSmallest difference considered biologically or clinically relevantbiologically or clinically relevant
• Medicine: usually 2 group comparison of population means
Estimation and HypothesesEstimation and Hypotheses
InferenceInferenceHow we use Hypothesis TestingEstimation
• Distributions• Distributions• Hypothesis testing• Sides and Tails
Estimation: From the SampleEstimation: From the Sample
• Point estimation• Point estimationMeanMedianChange in mean/medianChange in mean/median
• Interval estimationVariation (e.g. range, σ2, σ, σ/√n)95% Confidence interval95% Confidence interval
Pictures, Not NumbersPictures, Not Numbers
• Scatter plotsScatter plots• Bar plots (use a table)
Hi t• Histograms• Box plots
• Not EstimationNot EstimationSee the data and check assumptions
Graphs and TablesGraphs and Tables
• A picture is worth a thousand t-testsA picture is worth a thousand t-tests• Vertical (Y) axis can be misleading
Like the Washington Post W h Th hWeather, Though
Estimation and HypothesesEstimation and Hypotheses
InferenceInferenceHow we use Hypothesis TestingEstimationDistributionsDistributions
• Hypothesis testing• Sides and Tails
DistributionsDistributions
• Parametric tests are based on• Parametric tests are based on distributions
Normal Distribution (standard normal, bell curve, Z distribution), , )
• Non-parametric tests still have assumptions but not based onassumptions, but not based on distributions
2 of the Continuous Distributions2 of the Continuous Distributions
• Normal distribution: N( μ, σ2)Normal distribution: N( μ, σ )μ = mean, σ2 = varianceZ or standard normal = N(0 1)Z or standard normal = N(0,1)
• t distribution: tω
d f f d (df)ω = degrees of freedom (df)Usually a function of sample size
XMean = (sample mean)Variance = s2 (sample variance)
X
Binary DistributionBinary Distribution
• Binomial distribution: B (n p)• Binomial distribution: B (n, p)Sample size = nProportion ‘yes’ = pMean = npMean = npVariance = np(1-p)
• Can do exact or use Normal
Many More DistributionsMany More Distributions
• Not going to coverNot going to cover• Poisson
L l• Log normal• Gamma• Beta• Weibull• Weibull • Many more
Estimation and HypothesesEstimation and Hypotheses
InferenceInferenceHow we use Hypothesis TestingEstimationDistributionsDistributionsHypothesis TestingSides and Tails
Hypothesis TestingHypothesis Testing
• Null hypothesis (H )• Null hypothesis (H0)• Alternative hypothesis (H1 or Ha)
Null HypothesisNull Hypothesis• For superiority studies we think for examplep y p
Average systolic blood pressure (SBP) on Drug A is different than average SBP on Drug B g
• Null of that? Usually that there is no effectMean = 0OR = 1OR 1RR = 1Correlation Coefficient = 0
• Sometimes compare to a fixed value so Null• Sometimes compare to a fixed value so NullMean = 120
• If an equivalence trial, look at NEJM paper or other specific resourcesother specific resources
Alternative HypothesisAlternative Hypothesis
• Contradicts the null• Contradicts the null• There is an effect• What you want to prove• If equivalence trial special way to• If equivalence trial, special way to
do this
Example HypothesesExample Hypotheses
• H : μ = μ• H0: μ1 = μ2
• HA: μ1 ≠ μ2
Two-sided test• H : μ > μ• HA: μ1 > μ2
One-sided test
1 vs. 2 Sided Tests1 vs. 2 Sided Tests
• Two-sided testTwo sided testNo a priori reason 1 group should have stronger effecthave stronger effectUsed for most tests
One sided test• One-sided testSpecific interest in only one directionNot scientifically relevant/interesting if reverse situation true
Use a 2-Sided TestUse a 2 Sided Test
• Almost always• Almost always• If you use a one-sided test
Explain yourselfPenalize yourself on the alphaPenalize yourself on the alpha
0.05 2-sided test becomes a 0.025 1-sided test
Never “Accept” AnythingNever Accept Anything• Reject the null hypothesisj yp• Fail to reject the null hypothesis
• Failing to reject the null hypothesis does NOT mean the null (H0) is true
• Failing to reject the null means Not enough evidence in your sample to reject the null hypothesisreject the null hypothesisIn one sample saw what you saw
OutlineOutline
Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t l• Confidence Intervals
• Regressiong• Error• Diagnostic Testing• Diagnostic Testing• Misconceptions
ExperimentExperiment
Develop hypothesesDevelop hypothesesCollect sample/Conduct experimentexperiment
• Calculate test statistic• Compare test statistic with what is
expected when H0 is truep 0
Information at HandInformation at Hand
• 1 or 2 sample test?• 1 or 2 sample test?• Outcome variable
Binary, Categorical, Ordered, Continuous, SurvivalContinuous, Survival
• Population• Numbers (e.g. mean, standard
deviation))
Example: H i /Ch l lHypertension/Cholesterol
• Mean cholesterol hypertensive menMean cholesterol hypertensive men• Mean cholesterol in male general
(normotensive) population (20 74(normotensive) population (20-74 years old)I th 20 74 ld l• In the 20-74 year old male population the mean serum cholesterol is 211 mg/ml with acholesterol is 211 mg/ml with a standard deviation of 46 mg/ml
One Sample:Ch l l S l DCholesterol Sample Data
• Have data on 25 hypertensive men• Have data on 25 hypertensive men• Mean serum cholesterol level is
220mg/ml ( = 220 mg/ml)Point estimate of the mean
X
• Sample standard deviation: s = 38 6 mg/ml38.6 mg/ml
Point estimate of the variance = s2
Compare Sample to PopulationCompare Sample to Population
• Is 25 enough?Is 25 enough?Next lecture we will discuss
Wh t diff i h l t l i• What difference in cholesterol is clinically or biologically
i f l?meaningful?• Have an available sample and want p
to know if hypertensives are different than general populationg p p
SituationSituation
• May be you are reading another• May be you are reading another person’s work
• May be already collected data
• If you were designing up front you ld l l t th l iwould calculate the sample size
But for now, we have 25 people, p p
Cholesterol HypothesesCholesterol Hypotheses
• H0: μ1 = μ2H0: μ1 μ2• H0: μ = 211 mg/ml
μ = POPULATION mean serumμ = POPULATION mean serum cholesterol for male hypertensivesMean cholesterol for hypertensiveMean cholesterol for hypertensive men = mean for general male populationpopulation
• HA: μ1 ≠ μ2• H : μ ≠ 211 mg/ml• HA: μ ≠ 211 mg/ml
Cholesterol Sample DataCholesterol Sample Data
• Population information (general)Population information (general)μ = 211 mg/ml
/ (σ = 46 mg/ml (σ2 = 2116)• Sample information (hypertensives)p ( yp )
= 220 mg/mls = 38 6 mg/ml (s2 = 1489 96)Xs = 38.6 mg/ml (s2 = 1489.96) N = 25
ExperimentExperimentDevelop hypothesesDevelop hypothesesCollect sample/Conduct experimentCalculate test statisticCalculate test statistic
• Compare test statistic with what is expected when H is trueexpected when H0 is true
Test StatisticTest Statistic• Basic test statistic for a meanBasic test statistic for a mean
point estimate of - target value of test statistic = μ μ
• σ = standard deviation (sometimes use σ/√n)
point estimate of
test statistic μσ
• σ = standard deviation (sometimes use σ/√n)• For 2-sided test: Reject H0 when the test
statistic is in the upper or lower 100*α/2% ofstatistic is in the upper or lower 100 α/2% of the reference distribution
• What is α?What is α?
VocabularyVocabulary
• Types of errors• Types of errorsType I (α) (false positives)Type II (β) (false negatives)
• Related words• Related wordsSignificance Level: α levelPower: 1- β
Unknown Truth and the DataUnknown Truth and the DataTruth H0 Correct HA CorrectTruth
DataH0 Correct HA Correct
Decide H 1- α βDecide H0
“fail to reject H0”
1- αTrue Negative
βFalse Negative
H0
Decide HA
“reject H ”α
False Positive1- β
True Positiveα = significance level
1- β = power
reject H0 False Positive True Positive
1 β power
Type I ErrorType I Error
• α = P( reject H0 | H0 true)α = P( reject H0 | H0 true)• Probability reject the null hypothesis
given the null is truegiven the null is true• False positive• Probability reject that hypertensives’
µ=211mg/ml when in truth the mean cholesterol for hypertensives is 211
Type II Error (or, 1- Power)Type II Error (or, 1 Power)
• β = P( do not reject H0 | H1 true )β P( do not reject H0 | H1 true )• False Negative
P b bilit NOT j t th t l• Probability we NOT reject that male hypertensives’ cholesterol is that
f th l l ti h iof the general population when in truth the mean cholesterol for h i i diff h hhypertensives is different than the general male population
PowerPower
• Power = 1-β = P( reject H | H true )• Power = 1-β = P( reject H0 | H1 true )• Everyone wants high power, and
therefore low Type II error
Cholesterol Sample DataCholesterol Sample Data
• N = 25N 25• = 220 mg/ml
211 / lX
• μ = 211 mg/ml• s = 38.6 mg/ml (s2 = 1489.96)g ( )• σ = 46 mg/ml (σ2 = 2116)• α = 0 05• α = 0.05• Power? Next lecture!
Z Test Statistic and N(0,1)Z Test Statistic and N(0,1)• Want to test continuous outcomeWant to test continuous outcome• Known variance• Under H X μ−• Under H0
Th f
0 ~ (0,1)/
X Nn
μσ
• Therefore,0
0Reject H if 1.96 (gives a 2-sided =0.05 test)/
X μ ασ
−>
0 0 0
/
Reject H if X > 1.96 or X < 1.96
nσσ σμ μ+ −0 0 0eject .96 o .96n n
μ μ
ExperimentExperimentDevelop hypothesesDevelop hypothesesCollect sample/Conduct experimentCalculate test statisticCalculate test statisticCompare test statistic with what is expected when H is trueexpected when H0 is true
Reference distributionA ti b t di t ib ti fAssumptions about distribution of outcome variable
Z or Standard Normal Di ib iDistribution
Z or Standard Normal Di ib iDistribution
Z or Standard Normal Di ib iDistribution
How to test?How to test?Rejection intervalj
Like a confidence interval but centered on the null mean
• Z test or Critical ValueN(0,1) distribution and alpha
• t test or Critical Valuet distribution and alpha
• P-value• Confidence interval
General Formula (1-α)%Rejection Region for Mean PointRejection Region for Mean Point
EstimateZ Z⎛ ⎞1 / 2 1 / 2,Z Zα ασ σμ μ− −⎛ ⎞− +⎜ ⎟
⎝ ⎠n n⎜ ⎟⎝ ⎠
• Note that +Z(α/2) = - Z(1-α/2)• 90% CI : Z = 1 64590% CI : Z = 1.645• 95% CI : Z = 1.96• 99% CI : Z = 2 58• 99% CI : Z = 2.58
Cholesterol Rejection IntervalU i H P l i I f iUsing H0 Population Information
Reject H if
N(211 462)
Reject Ho if 220 is outside of (193 229)N(211, 462) of (193,229)
211193 229
Normal DistributionNormal Distribution
Cholesterol Rejection IntervalU i H S l I f iUsing H0 Sample Information
Reject H ift (df=24, 211, 38.62)
Reject Ho if 220 is outside of (195 227)of (195,227)
211195 227
t Distribution (df = 24)t Distribution (df = 24)
Side Note on t vs. ZSide Note on t vs. Z
• If s = σ then the t value will beIf s σ then the t value will be larger than the Z value
• BUT here our sample standard• BUT, here our sample standard deviation (38.6) was quite a bit smaller than the population sd (46)smaller than the population sd (46)
HERE intervals using t look ll h Z i l BUTsmaller than Z intervals BUT
Because of sd, not distribution,
How to test?How to test?Rejection intervalj
Like a confidence interval but centered on the null mean
Z test or Critical ValueN(0,1) distribution and alpha
t test or Critical Valuet distribution and alpha
• P-value• Confidence interval
Z-test: Do Not Reject H0Z test: Do Not Reject H0
0 220 211 0.98 1.96/ 46 / 25
XZn
μσ
− −= = = <
/ 46 / 25nσ
Z or Standard Normal Di ib iDistribution
Determining Statistical Significance: Critical Value Method
• Compute the test statistic Z (0 98)• Compute the test statistic Z (0.98)• Compare to the critical value
Standard Normal value at α-level (1 96)Standard Normal value at α-level (1.96)• If |test statistic| > critical value
Reject H0Reject H0Results are statistically significant
• If |test statistic| < critical valueIf |test statistic| critical valueDo not reject H0Results are not statistically significanty g
T-Test StatisticT Test Statistic
• Want to test continuous outcomeWant to test continuous outcome• Unknown variance (s, not σ)
U d H X• Under H0 0( 1)~
/ nX ts n
μ−
−
• Critical values: statistics books or computerp
• t-distribution approximately normal for degrees of freedom (df) >30degrees of freedom (df) >30
Cholesterol: t-statisticCholesterol: t statistic
• Using data 0 220 211 1 17XT μ− −= = =Using data
F 0 05 t id d t t f t(24)
1.17/ 38.6 / 25
Ts n
= = =
• For α = 0.05, two-sided test from t(24) distribution the critical value = 2.064
• | T | = 1.17 < 2.064• The difference is not statistically y
significant at the α = 0.05 level• Fail to reject H0Fail to reject H0
Almost all ‘Critical Value’ Tests: E S IdExact Same Idea
• Paired tests• Paired tests• 2-sample tests• Continuous data• Binary data• Binary data
• See appendix at end of slides
How to test?How to test?Rejection intervalj
Like a confidence interval but centered on the null mean
Z test or Critical ValueN(0,1) distribution and alpha
t test or Critical Valuet distribution and alpha
P-value• Confidence interval
P-valueP value
• Smallest α the observed sample wouldSmallest α the observed sample would reject H0
• Given H is true probability of• Given H0 is true, probability of obtaining a result as extreme or more extreme than the actual sampleextreme than the actual sample
• MUST be based on a modelNormal, t, binomial, etc.
Cholesterol ExampleCholesterol Example
• P-value for two sided test• P-value for two sided test• = 220 mg/ml, σ = 46 mg/mlX• n = 25• H : μ = 211 mg/ml• H0: μ = 211 mg/ml• HA: μ ≠ 211 mg/ml
2* [ 220] 0.33P X > =
Determining Statistical Si ifi P V l M h dSignificance: P-Value Method
• Compute the exact p-value (0.33)Compute the exact p value (0.33)• Compare to the predetermined α-level
(0.05)(0.05)• If p-value < predetermined α-level
Reject H0Reject H0Results are statistically significant
• If p-value > predetermined α-levelIf p-value > predetermined α-levelDo not reject H0Results are not statistically significantResults are not statistically significant
P-value Interpretation RemindersP value Interpretation Reminders
• Measure of the strength of• Measure of the strength of evidence in the data that the null is
t tnot true
A random variable whose value• A random variable whose value lies between 0 and 1
• NOT the probability that the null hypothesis is true.hypothesis is true.
How to test?How to test?Rejection intervalj
Like a confidence interval but centered on the null mean
Z test or Critical ValueN(0,1) distribution and alpha
t test or Critical Valuet distribution and alpha
P-value• Confidence interval
OutlineOutline
Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals
• Regressiong• Error• Diagnostic Testing• Diagnostic Testing• Misconceptions
General Formula (1-α)% CI for μGeneral Formula (1 α)% CI for μ
Z Z⎛ ⎞1 / 2 1 / 2,Z ZX Xα ασ σ− −⎛ ⎞− +⎜ ⎟⎝ ⎠n n⎜ ⎟⎝ ⎠
• Construct an interval around the point estimate
• Look to see if the population/null mean is insideis inside
Cholesterol Confidence IntervalU i P l i V i ( Z )Using Population Variance ( Z )
N(220, 462)
220211202196 229 238 244
CI for the Mean, Unknown V iVariance
• Pretty commonPretty common• Uses the t distribution
D f f d• Degrees of freedom1,1 / 2 1,1 / 2n nt s t s
X Xα α− − − −⎛ ⎞+⎜ ⎟,
2.064*38.6 2.064*38.6220 220
X Xn n
− +⎜ ⎟⎝ ⎠
⎛ ⎞+⎜ ⎟220 , 22025 25
(204.06, 235.93)
= − +⎜ ⎟⎝ ⎠
= ( )
Cholesterol Confidence IntervalU i S l D ( )Using Sample Data ( t )
t (df=24, 220, 38.62)
220212204198 228 236 242
But I Have All Zeros! C l l 9 % b dCalculate 95% upper bound
• Known # of trials without an event (2.11 (van Belle 2002, Louis 1981)
• Given no observed events in n trials, 95% b d t f95% upper bound on rate of occurrence is 3 / (n + 1)
No fatal outcomes in 20 operationsNo fatal outcomes in 20 operations95% upper bound on rate of occurrence = 3 / (20 + 1) = 0 143 sooccurrence 3 / (20 + 1) 0.143, so the rate of occurrence of fatalities could be as high as 14.3%
Hypothesis Testing d C fid I land Confidence Intervals
• Hypothesis testing focuses on whereHypothesis testing focuses on where the sample mean is located
• Confidence intervals focus on plausible• Confidence intervals focus on plausible values for the population meanI l th b t t ti t• In general, the best way to estimate a confidence interval is to bootstrap (d t il t ti ti i )(details: see a statistician)
CI InterpretationCI Interpretation• Cannot determine if a particular interval p
does/does not contain true mean• Can say in the long run
Take many samples Same sample sizeFrom the same population95% of similarly constructed confidence intervals will contain true meanintervals will contain true mean
• Think about meta analyses
Interpret a 95% Confidence Interval (CI) for the populationInterval (CI) for the population
mean, μ• “If we were to find many such
intervals, each from a differentintervals, each from a different random sample but in exactly the same fashion then in the long runsame fashion, then, in the long run, about 95% of our intervals would incl de the pop lation meaninclude the population mean, μ, and 5% would not.”
Do NOT interpret a 95% CI…Do NOT interpret a 95% CI…• “There is a 95% probability that the trueThere is a 95% probability that the true
mean lies between the two confidence values we obtained from a particular psample”
• “We can say that we are 95% confident ythat the true mean does lie between these two values.”
• Overlapping CIs do NOT imply non-significance
Take Home: Hypothesis TestingTake Home: Hypothesis Testing
• Many ways to testMany ways to testRejection intervalZ test t test or Critical ValueZ test, t test, or Critical ValueP-valueC fid i t lConfidence interval
• For this, all ways will agreeIf not: math wrong, rounding errors
• Make sure interpret correctly
Take Home Hypothesis TestingTake Home Hypothesis Testing
• How to turn questions into hypothesesHow to turn questions into hypotheses• Failing to reject the null hypothesis
DOES NOT mean that the null is trueDOES NOT mean that the null is true• Every test has assumptions
A statistician can check all the assumptionsIf the data does not meet the assumptions there are non-parametric versions of tests (see text)(see text)
Take Home: CITake Home: CI
• Meaning/interpretation of the CIMeaning/interpretation of the CI• How to compute a CI for the true mean
when variance is known (normal model)when variance is known (normal model)• How to compute a CI for the true mean
h th i i NOT k (twhen the variance is NOT known (t distribution)
• In practice use Bootstrap
Take Home: VocabularyTake Home: Vocabulary• Null Hypothesis: H0Null Hypothesis: H0• Alternative Hypothesis: H1 or Ha or HA• Significance Level: α level• Significance Level: α level• Acceptance/Rejection Region
St ti ti ll Si ifi t• Statistically Significant• Test Statistic• Critical Value• P-value, Confidence Interval
OutlineOutline
Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals Regressiong
• Error• Diagnostic Testing• Diagnostic Testing• Misconceptions
RegressionRegression
• Continuous outcome• Continuous outcomeLinear
• Binary outcomeLogisticLogistic
• Many other types
Linear regressionLinear regression• Model for simple linear regression
Y β + β +Yi = β0 + β1x1i + εiβ0 = interceptβ = slopeβ1 = slope
• AssumptionsOb ti i d d tObservations are independentNormally distributed with constant variancevariance
• Hypothesis testingH : β = 0 vs H : β ≠ 0H0: β1 = 0 vs. HA: β1 ≠ 0
In Order of ImportanceIn Order of Importance
1. Independence2. Equal variance3 Normality3. Normality
(for ANOVA and linear regression)
More Than One CovariateMore Than One Covariate
• Yi = β0 + β1x1i + β2x2i + β3x3i + εiYi = β0 + β1x1i + β2x2i + β3x3i + εi
• SBP = β β D β M l β Aβ0 + β1 Drug + β2 Male + β3 Age
• β 1
Association between Drug and SBPAverage difference in SBP betweenAverage difference in SBP between the Drug and Control groups, given sex and agesex and age
Testing?Testing?
• Each β has a p-value associated with itEach β has a p-value associated with it• Each model will have an F-test
Oth th d t d t i fit• Other methods to determine fitResiduals
• See a statistician and/or take aSee a statistician and/or take a biostatistics class. Or 3.
Repeated Measures (3 i i )(3 or more time points)
• Do NOT use repeated measures• Do NOT use repeated measures AN(C)OVA
Assumptions quite stringent• Talk to a statisticianTalk to a statistician
Mixed modelGeneralized estimating equationsOtherOther
An Aside: CorrelationAn Aside: Correlation
• Range: -1 to 1Range: -1 to 1• Test is correlation is ≠ 0
With N 1000 t h hi hl• With N=1000, easy to have highly significant (p<0.001) correlation = 0.05
Statistically significant that isNo where CLOSE to meaningfully g ydifferent from 0
• Partial Correlation CoefficientPartial Correlation Coefficient
Do Not Use Correlation.U R iUse Regression
• Some fields: Correlation still popularSome fields: Correlation still popularPartial regression coefficients
Hi h l ti i 0 8 (i b l t• High correlation is > 0.8 (in absolute value). Maybe 0.7
• Never believe a p-value from a correlation test
• Regression coefficients are more meaningfulg
Analysis Follows DesignAnalysis Follows Design
Questions → Hypotheses →Questions → Hypotheses → Experimental Design → Samples →Data → Analyses →Conclusions
OutlineOutline
Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals RegressiongError
• Diagnostic Testing• Diagnostic Testing• Misconceptions
Is α or β more important ?Is α or β more important ?
• Depends on the question• Depends on the question• Most will say protect against Type I
errorMultiple comparisionsMultiple comparisions
• Need to think about individual and l ti h lth i li ti dpopulation health implications and
costs
OmicsOmics
• False negative (Type II error)• False negative (Type II error)Miss what could be importantAre these samples going to be looked at again?looked at again?
• False positive (Type I error)Waste resources following dead ends
HIV ScreeningHIV Screening
• False positive• False positiveNeedless worryStigma
• False negative• False negativeThinks everything is okContinues to spread disease
• For cholesterol example?• For cholesterol example?
What do you need to think b ?about?
• Is it worse to treat those who truly• Is it worse to treat those who truly are not ill or to not treat those who
ill?are ill?• That answer will help guide you as p g y
to what amount of error you are willing to tolerate in your trialwilling to tolerate in your trial design
OutlineOutline
Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals RegressiongErrorDiagnostic TestingDiagnostic Testing
• Misconceptions
Little Diagnostic Testing Lingo
• False Positive/False Negative (α, β)False Positive/False Negative (α, β) • Positive Predictive Value (PPV)
Probability diseased given POSITIVEProbability diseased given POSITIVE test result
• Negative Predictive Value (NPV)• Negative Predictive Value (NPV)Probability NOT diseased given NEGATIVE test resultNEGATIVE test result
• Predictive values depend on disease prevalenceprevalence
Sensitivity, SpecificitySensitivity, Specificity
• Sensitivity: how good is a test atSensitivity: how good is a test at correctly IDing people who have diseasedisease
Can be 100% if you say everyone is ill (all have positive result)is ill (all have positive result)Useless test with bad Specificity
• Specificity: how good is the test at correctly IDing people who are welly g p p
Example: Western vs. ELISAExample: Western vs. ELISA• 1 million people1 million people• ELISA Sensitivity = 99.9%• ELISA Specificity = 99 9%• ELISA Specificity = 99.9%• 1% prevalence of infection
10 000 iti b W t ( ld10,000 positive by Western (gold standard) 9990 t iti (TP) b ELISA9990 true positives (TP) by ELISA10 false negatives (FN) by ELISA
1% Prevalence1% Prevalence
• 990 000 not infected990,000 not infected989,010 True Negatives (TN)990 F l P iti (FP)990 False Positives (FP)
• Without confirmatory testTell 990 or ~0.1% of the population they are infected when in reality they y y yare notPPV = 91% NPV = 99 999%PPV 91%, NPV 99.999%
1% Prevalence1% Prevalence• 10980 total test positive by ELISAp y
9990 true positive990 false positivep
• 9990/10980 = probability diseased GIVEN positive by ELISA = PPV = 0.91 = 91%
• 989,020 total test negatives by ELISA989,010 true negatives10 f l i10 false negatives
• 989010/989020 = NPV = 99.999%
0.1% Prevalence0.1% Prevalence
• 1 000 infected – ELISA picks up 9991,000 infected ELISA picks up 9991 FN
999 000 t i f t d• 999,000 not infected989,001 True Negatives (TN)999 False Positives (FP)
• Positive predictive value = 50%Positive predictive value 50%• Negative predictive value = 99.999%
10% Prevalence10% Prevalence
• 99% PPV• 99% PPV• 99.99% NPV
Prevalence Matters(Population You Sample to Estimate(Population You Sample to Estimate
Prevalence, too)• Numbers look “good” with high g g
prevalenceTesting at STD clinic in high risk
l tipopulations• Low prevalence means even very high
sensitivity and specificity will result insensitivity and specificity will result in middling PPV
• Calculate PPV and NPV for 0 01%Calculate PPV and NPV for 0.01% prevalence found in female blood donors
Prevalence MattersPrevalence Matters
• PPV and NPV tend to come from• PPV and NPV tend to come from good cohort data
• Can estimate PPV/NPV from case control studies but the formulas are hard and you need to be REALLY sure about the prevalenceREALLY sure about the prevalence
Triple sure
High OR D N G d T M kDoes Not a Good Test Make
• Diagnostic tests need separation• Diagnostic tests need separation• ROC curves
Not logistic regression with high OROR
• Strong association between 2 i bl d NOT dvariables does NOT mean good
prediction of separation
What do you need to think b ?about?
• How good does the test need to be?How good does the test need to be?96% sensitivity and 10% specificity?66% AUC? (Wh t i th t?)66% AUC? (What is that?)
• Guide you as to what amount of differentiation, levels of sensitivity, specificity, PPV and NPV you are willing to tolerate in your trial design
OutlineOutline
Estimation and HypothesesEstimation and HypothesesHow to Test HypothesesC fid I t lConfidence Intervals RegressiongErrorDiagnostic TestingDiagnostic TestingMistakes & Misconceptions
Avoid Common Mistakes: H h i T iHypothesis Testing
• Mistake: Have paired data and do notMistake: Have paired data and do not do a paired test OR do not have paired data and do a paired testp
• If you have paired data, use a paired test
If you don’t then you can lose power• If you do NOT have paired data, do NOT y p ,
use a paired testYou can have the wrong inference
Avoid Common Mistakes: H h i T iHypothesis Testing
• Mistake: assume independent measurementsp• Tests have assumptions of independence
Taking multiple samples per subject ? g p p p jStatistician MUST knowDifferent statistical analyses MUST be
d d th b diffi lt!used and they can be difficult!• Mistake: ignore distribution of observations
Histogram of the observationsHistogram of the observationsHighly skewed data - t test and even non-parametric tests can have incorrect resultsparametric tests can have incorrect results
Avoid Common Mistakes: H h i T iHypothesis Testing
• Mistake: Assume equal variancesMistake: Assume equal variances (and the variances are not equal)
Did not show variance testDid not show variance testNot that good of a testALWAYS graph your data first to assess symmetry and variancey y
• Mistake: Not talking to a statisticianstatistician
Estimates and P-ValuesEstimates and P Values
• Study 1: 25±9• Study 1: 25±9Stat sig at the 1% level
• Study 2: 10±9Not statistically significant (ns)Not statistically significant (ns)
• 25 vs. 10 wow a big difference between these studies!
Um, no. 15±12.7Um, no. 15±12.7
Comparing A to BComparing A to B
• AppropriateAppropriateStatistical properties of A-BSt ti ti l ti f A/BStatistical properties of A/B
• NOT Appropriatepp pStatistical properties of AStatistical properties of BStatistical properties of BLook they are different!
Not a big difference? 15?!?Not a big difference? 15?!?
• Distribution of the difference• Distribution of the difference15±12.7Not statistically significantStandard deviations! ImportantStandard deviations! Important.
• Study 3 has much larger sample size!
2.5±0.92.5±0.9
3 Studies. 3 Answers, Maybe3 Studies. 3 Answers, Maybe
• Study # 3 is statistically significant• Study # 3 is statistically significant• Difference between study 3 and the
other studiesStatisticalStatisticalDifferent magnitudes
• Does study 3 replicate study 1?• Is it all sample size?Is it all sample size?
(Mis)conceptions(Mis)conceptions
• P-value = inferential tool ? Yes• P-value = inferential tool ? YesHelps demonstrate that population
i t t lmeans in two groups are not equal• Smaller p-value → larger effect ? g
NoEffect size is determined by theEffect size is determined by the difference in the sample mean or proportion between 2 groupsproportion between 2 groups
(Mis)conceptions(Mis)conceptions• A small p-value means the difference is p
statistically significant, not that the difference is clinically significant. YES
A l l i h l tA large sample size can help get a small p-value. YES, so do not be trickedtricked.
• Failing to reject H0 means what?There is not enough evidence to reject H0There is not enough evidence to reject H0 YESH0 is true! NO NO NO NO NO!
Analysis Follows DesignAnalysis Follows Design
Questions → Hypotheses →Questions → Hypotheses → Experimental Design → Samples →Data → Analyses →Conclusions
Questions?Questions?
AppendixAppendix
• Formulas for Critical Values• Formulas for Critical Values• Layouts for how to choose a test
Do Not Reject H0Do Not Reject H0
0 220 211 0.98 1.96/ /
XZ μ− −= = = <
/ 46 / 25nσ
046220=X > 1.96 =211+1.96* 228.03 NO!25n
σμ + =
046220=X < 1.96 211-1.96* 192.97 NO!25n
σμ − = =25n
Paired Tests: Difference T C i OTwo Continuous Outcomes
• Exact same idea• Exact same idea• Known variance: Z test statistic• Unknown variance: t test statistic• H : μ = 0 vs H : μ ≠ 0• H0: μd = 0 vs. HA: μd ≠ 0• Paired Z-test or Paired t-test
/ /d dZ or T= =/ /n s nσ
2 Samples: Same Variance + Sample Size Calculation Basis
• Unpaired - Same idea as pairedUnpaired - Same idea as paired• Known variance: Z test statistic
U k i t t t t ti ti• Unknown variance: t test statistic• H0: μ1 = μ2 vs. HA: μ1 ≠ μ2
H0: μ1 - μ2 = 0 vs. HA: μ1 - μ2 ≠ 0• Assume common varianceAssume common variance
1/ 1/ 1/ 1/x y x yZ or T
n m s n mσ− −
= =+ +1/ 1/ 1/ 1/n m s n mσ + +
2 Sample Unpaired Tests: 2 Diff V i2 Different Variances
• Same idea• Same idea• Known variance: Z test statistic• Unknown variance: t test statistic• H : μ = μ vs H : μ ≠ μ• H0: μ1 = μ2 vs. HA: μ1 ≠ μ2
• H0: μ1 - μ2 = 0 vs. HA: μ1 - μ2 ≠ 0
2 2 2 2/ / / /x y x yZ or Tn m s n s mσ σ− −
= =+ +1 2 1 2/ / / /n m s n s mσ σ+ +
One Sample Binary OutcomesOne Sample Binary Outcomes
• Exact same idea• Exact same idea• For large samples
Use Z test statisticSet up in terms of proportionsSet up in terms of proportions, not means
ˆ 0ˆ(1 ) /p pZ
p p n−
=−0 0(1 ) /p p n
Two Population ProportionsTwo Population Proportions
• Exact same idea• Exact same idea• For large samples use Z test
statisticˆ ˆp p1 2
1 1 2 2ˆ ˆ ˆ ˆ(1 ) (1 )p pZ
p p p p−
=− −1 1 2 2( ) ( )p p p pn m
+
Normal/Large Sample Data?
Binomial?No
NoYes
Independent? Nonparametric test
NoYes
p Nonparametric testNo
Yes
Expected ≥5 McNemar’s testNoYes
2 sample Z test for ti
Fisher’s Exact t t
Yes
proportions or contingency table
test
Normal/Large Sample Data?
YesInference on means?
Yes
Independent? Inference on variance?Yes No
No
Variance known?
Paired t F test for
Yes YesNo
known?
Variances equal?
variances
YesNo
Z test
Variances equal?
T test w/ T test w/Yes No
T test w/ pooled
variance
T test w/ unequal variance