Learning and Applying Biostatistics: How the Guinness Brewery Changed History

47
Learning and Applying Learning and Applying Biostatistics: Biostatistics: How the Guinness Brewery Changed How the Guinness Brewery Changed History History Katheryne Downes, M.P.H. Katheryne Downes, M.P.H. Statistical Data Analyst Statistical Data Analyst Tampa General/USF College of Tampa General/USF College of Medicine Medicine

description

Learning and Applying Biostatistics: How the Guinness Brewery Changed History. Katheryne Downes, M.P.H. Statistical Data Analyst Tampa General/USF College of Medicine. Lecture Outline. Part I: The Literature Review Part II: Statistics Part III: Sample Size Calculations. - PowerPoint PPT Presentation

Transcript of Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Page 1: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Learning and Applying Learning and Applying

Biostatistics:Biostatistics: How the Guinness Brewery Changed How the Guinness Brewery Changed

HistoryHistory

Katheryne Downes, M.P.H.Katheryne Downes, M.P.H.Statistical Data AnalystStatistical Data Analyst

Tampa General/USF College of Tampa General/USF College of MedicineMedicine

Page 2: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Lecture OutlineLecture Outline

Part I: The Literature ReviewPart I: The Literature Review

Part II: StatisticsPart II: Statistics

Part III: Sample Size CalculationsPart III: Sample Size Calculations

Page 3: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Part I: Part I: The Literature ReviewThe Literature Review

Page 4: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Who’s done what?Who’s done what?

Literature ReviewLiterature Review– Don’t want to duplicate efforts (or maybe Don’t want to duplicate efforts (or maybe

you should?)you should?)– Can give ideas about how to (or how not Can give ideas about how to (or how not

to) conduct the studyto) conduct the study– Required for sample size calculationsRequired for sample size calculations

Page 5: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Critical Review of LiteratureCritical Review of Literature

How were patients selected/recruited?How were patients selected/recruited? What population are they attempting to generalize to?What population are they attempting to generalize to? Definition of intervention?Definition of intervention? Definition of outcomes?Definition of outcomes? What was the sample size?What was the sample size? Sample size calculations vs. power analysisSample size calculations vs. power analysis What are the possible confounding variables? What What are the possible confounding variables? What

was done to control for these variables?was done to control for these variables? Statistics? Statistics? Interpretation of findings and conclusions?Interpretation of findings and conclusions?

Page 6: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Kat’s Notes: Kat’s Notes: The Lit ReviewThe Lit Review How big is the sample size?How big is the sample size?

– Sample size or power calculations?Sample size or power calculations?

Randomization? (if applicable)Randomization? (if applicable)– If you’re dealing with a clinical trial, randomization helps If you’re dealing with a clinical trial, randomization helps

you get rid of many potential sources of biasyou get rid of many potential sources of bias

Description of Design, Groups, Treatments?Description of Design, Groups, Treatments?– You need You need DETAILEDDETAILED descriptions of the design of the descriptions of the design of the

study, how the study groups were defined and details of study, how the study groups were defined and details of the treatment (dosage, machines, devices, etc)the treatment (dosage, machines, devices, etc)

Page 7: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Kat’s Notes: Kat’s Notes: The Lit ReviewThe Lit Review Confounding variables?Confounding variables?

– Does the author discuss/address possible confounding Does the author discuss/address possible confounding variables? (i.e. variables that might be distorting the variables? (i.e. variables that might be distorting the relationship between the two variables of interest) Does relationship between the two variables of interest) Does the author control for (statistically) the possible the author control for (statistically) the possible confounding variables?confounding variables?

Statistical significance Statistical significance ≠ Clinical Significance≠ Clinical Significance– Read carefully and critically!Read carefully and critically!

Page 8: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Be Careful…Be Careful…

REMEMBER:REMEMBER: Just because it’s published Just because it’s published does not necessarily mean that it’s a good does not necessarily mean that it’s a good study or that it’s without flaw. Also- study or that it’s without flaw. Also- remember publication bias: Studies that remember publication bias: Studies that show non-significant findings are often NOT show non-significant findings are often NOT published (Despite the fact that they are published (Despite the fact that they are equally important)equally important)

Page 9: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

BREAK

Page 10: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Part II: StatisticsPart II: Statistics

Page 11: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Statistics in Literature: The Statistics in Literature: The BasicsBasics

StatisticStatistic

Confidence IntervalsConfidence Intervals– (mean +/- SD)(mean +/- SD)

Significance ValuesSignificance Values– (P-values)(P-values)

Page 12: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Statistics in Literature: Statistics in Literature: The Confidence IntervalThe Confidence Interval

Confidence IntervalsConfidence Intervals

– Estimation (Avg IQ = 100; 95% CI= 70-130)Estimation (Avg IQ = 100; 95% CI= 70-130)

– Hypothesis Testing Hypothesis Testing (Sample Avg IQ = 136, normal 95% CI = 70-130)(Sample Avg IQ = 136, normal 95% CI = 70-130)

Page 13: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

One Tail or Two?One Tail or Two?

One-tail:One-tail: – We hypothesize Drug A is worse than Drug BWe hypothesize Drug A is worse than Drug B– We Hypothesize Drug A is better than Drug BWe Hypothesize Drug A is better than Drug B

Two-Tailed:Two-Tailed:– We hypothesize Drug A performs differently than Drug B We hypothesize Drug A performs differently than Drug B

(direction isn’t specified, more conservative test)(direction isn’t specified, more conservative test)

Page 14: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Confidence Intervals: Confidence Intervals: FAQsFAQsQ: My Standard Deviation is larger than my mean- what Q: My Standard Deviation is larger than my mean- what

did I do wrong?!?!did I do wrong?!?!

A: A: Most likely, you didn’t do anything wrong. An SD that’s Most likely, you didn’t do anything wrong. An SD that’s larger than the mean indicates one of two things: 1) a larger than the mean indicates one of two things: 1) a lot of variation in the dataset 2) a non-normal lot of variation in the dataset 2) a non-normal distributiondistribution

Q: Why is my confidence interval SO wide (or narrow)?Q: Why is my confidence interval SO wide (or narrow)?

A: The width of the confidence interval is a reflection of its A: The width of the confidence interval is a reflection of its precision. If there’s a lot of variation in the dataset or if precision. If there’s a lot of variation in the dataset or if there’s a great deal of uncertainty in the estimate, your there’s a great deal of uncertainty in the estimate, your interval will be quite wide. The opposite is also true.interval will be quite wide. The opposite is also true.

Page 15: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Statistics in Literature: Statistics in Literature: Significance ValuesSignificance Values

P-valueP-value: the probability of observing your finding by : the probability of observing your finding by chance alone. chance alone.

A p-value = .001 means that the probability of A p-value = .001 means that the probability of observing that particular event by chance would only observing that particular event by chance would only be about 1/1000. Translation? You can be fairly certain be about 1/1000. Translation? You can be fairly certain that your observation did NOT occur by chance alone- that your observation did NOT occur by chance alone- something intervened. something intervened.

Page 16: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Quiz Time! Quiz Time!

Q: What is the 95% CI for the following data: mean=30, Q: What is the 95% CI for the following data: mean=30, SD=5 ?SD=5 ?

A: 95% CI = 20 – 40A: 95% CI = 20 – 40

Q: For the previous question, if you obtained a sample Q: For the previous question, if you obtained a sample mean =10, what would you conclude?mean =10, what would you conclude?

A: Since 10 lies outside of the 95% CI, this event is unlikely A: Since 10 lies outside of the 95% CI, this event is unlikely to have occurred by chance alone. In fact, the chances to have occurred by chance alone. In fact, the chances of observing this event by chance would most likely be of observing this event by chance would most likely be less than 5%less than 5%

Q: How do you interpret a p-value = .05?Q: How do you interpret a p-value = .05?A: The probability that the event occurred by chance is A: The probability that the event occurred by chance is

approximately 5%.approximately 5%.

Page 17: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

How the Guinness Brewery Changed How the Guinness Brewery Changed History…History…

““Student’s” t-testStudent’s” t-test William Gossett (left)William Gossett (left) R.A. Fisher (right)R.A. Fisher (right)

Page 18: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Understanding Statistics in Understanding Statistics in LiteratureLiterature

Are the statistics appropriate?Are the statistics appropriate?

What, exactly, does this really What, exactly, does this really mean?mean?– What does an odds ratio of 1.5 really mean?– Why am I looking for a “1” or a “0” in this

confidence interval?– What does a significant ANOVA tell you? (for that

matter, what’s an ANOVA!?!?!)

Page 19: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

T-test/Z-testT-test/Z-test

What type of data? (2) Group Means What type of data? (2) Group Means (continuous)(continuous)

Reported as? t-statistic/z-score & p-Reported as? t-statistic/z-score & p-valuevalue

What does it REALLY test? What does it REALLY test?

The difference in group distributions- in particular- the The difference in group distributions- in particular- the difference in group means.difference in group means.

Page 20: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

T-test/Z-test T-test/Z-test Continued…Continued… T-tests are used when the sample size for each group T-tests are used when the sample size for each group

is very smallis very small

Z-tests utilize the normal distribution and can be used Z-tests utilize the normal distribution and can be used when the sample size is adequately largewhen the sample size is adequately large

Not Appropriate for categorical dataNot Appropriate for categorical data

Page 21: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

ANOVA: ANOVA: AnAnalysis alysis oof f VaVarianceriance

What type of Data? (3+) Means What type of Data? (3+) Means (continuous)(continuous)

Reported as? F-Statistic, p-valueReported as? F-Statistic, p-value

What does it REALLY test?What does it REALLY test?It compares the distributions of several groups simultaneously- It compares the distributions of several groups simultaneously- it examines whether the amount of variation between groups it examines whether the amount of variation between groups is greater than that of within groups. A significant F-statistic is greater than that of within groups. A significant F-statistic tells you that the groups are not all equal, but it does NOT tell tells you that the groups are not all equal, but it does NOT tell you which groups are different.you which groups are different.

Page 22: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

ANOVAANOVA

Once a significant F-statistic is obtained, your next Once a significant F-statistic is obtained, your next step would be to conduct a post-hoc test to determine step would be to conduct a post-hoc test to determine which groups are different (Tukey).which groups are different (Tukey).

Again, cannot be used for categorical data.Again, cannot be used for categorical data.

Page 23: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Chi-SquareChi-Square

What type of data? What type of data? Categorical/dichotomousCategorical/dichotomous

Reported as? Reported as? ΧΧ22, p-value, p-value

What does it REALLY test? What does it REALLY test?

A chi-square tests whether the observed frequency of an event is A chi-square tests whether the observed frequency of an event is different than the expected frequency of the event (that which different than the expected frequency of the event (that which would occur by chance). would occur by chance).

***Chi-Square tests can ONLY be used when each cell count is ***Chi-Square tests can ONLY be used when each cell count is greater than or equal to “5”greater than or equal to “5”

Page 24: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Fisher Exact TestFisher Exact Test

Works in basically the same manner as a chi-square, Works in basically the same manner as a chi-square, but it’s used when you have cell counts below “5”but it’s used when you have cell counts below “5”

An “exact” test CAN be used when cell counts are “5” An “exact” test CAN be used when cell counts are “5” or higher, but it becomes difficult to calculate with or higher, but it becomes difficult to calculate with large sample sizeslarge sample sizes

Page 25: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

OR, RR, HROR, RR, HR

OR: Odds RatioOR: Odds Ratio RR: Relative Risk or Risk RatioRR: Relative Risk or Risk Ratio HR: Hazard RatioHR: Hazard Ratio

All three are ratios of risk- one test group is reflected All three are ratios of risk- one test group is reflected in the numerator, the other in the denominator- in the numerator, the other in the denominator- therefore, if you get a ratio = “1” that means there’s therefore, if you get a ratio = “1” that means there’s NO DIFFERENCE between groups. Keep this in mind NO DIFFERENCE between groups. Keep this in mind while we look at them individually.while we look at them individually.

Page 26: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Odds RatiosOdds Ratios

What type of data? Case/Control What type of data? Case/Control Studies Studies

Reported as? Reported as? OR, CI, p-valueOR, CI, p-value

What does it REALLY test? What does it REALLY test? The amount of risk associated with a particular exposure. The amount of risk associated with a particular exposure.

***An Odds Ratio must be used in case-control studies as the ***An Odds Ratio must be used in case-control studies as the measure of risk because we have incomplete information about the measure of risk because we have incomplete information about the prevalence/incidence of the disease in the calculationsprevalence/incidence of the disease in the calculations

Page 27: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

OR: InterpretationOR: Interpretation

OR* <1: Exposure is ProtectiveOR* <1: Exposure is Protective OR*=1: No DifferenceOR*=1: No Difference OR*>1: Exposure is Risk FactorOR*>1: Exposure is Risk Factor

ExampleExample

OR, CI, and p-valueOR, CI, and p-value– OR = 1 = NO DIFFERENCEOR = 1 = NO DIFFERENCE– What would a CI containing “1” mean?What would a CI containing “1” mean?

(OR*: The same thing applies to RR and HR)(OR*: The same thing applies to RR and HR)

Page 28: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Relative RiskRelative Risk

What type of data? Cohort StudiesWhat type of data? Cohort Studies

Reported as? Reported as? RR, CI, p-valueRR, CI, p-value

What does it REALLY test? What does it REALLY test? The amount of risk associated with a particular exposure. The amount of risk associated with a particular exposure.

***Relative Risk can be safely used in cohort studies because ***Relative Risk can be safely used in cohort studies because we have incident rates available.we have incident rates available.

Page 29: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Quiz Time!Quiz Time!

Q: You’re conducting a study examining the Q: You’re conducting a study examining the complication rate (yes/no) in relationship to complication rate (yes/no) in relationship to type of plate utilized in surgery type of plate utilized in surgery (titanium/stainless steel). (titanium/stainless steel). – What type of data is this? Categorical or What type of data is this? Categorical or

Continuous?Continuous?– Let’s say that there are 4 people with titanium Let’s say that there are 4 people with titanium

plates that didn’t have complications- which test plates that didn’t have complications- which test would you have to use? would you have to use?

A: Categorical data, Fisher Exact TestA: Categorical data, Fisher Exact Test

Page 30: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Quiz Time!Quiz Time!

Q: You’re conducting a study on the average number of Q: You’re conducting a study on the average number of hours a surgery takes to complete. You have 3 groups hours a surgery takes to complete. You have 3 groups (70 people in each): interns, residents, and fellows. (70 people in each): interns, residents, and fellows. What’s the appropriate statistic to use to determine What’s the appropriate statistic to use to determine whether a difference exists between these groups?whether a difference exists between these groups?

A. Chi-SquareA. Chi-Square

B. Fisher Exact TestB. Fisher Exact Test

C. T-test/Z-testC. T-test/Z-test

D. ANOVAD. ANOVA

E. Odds RatioE. Odds Ratio

Page 31: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Quiz Time!Quiz Time!

Q: The t-distribution/test was created to test the Q: The t-distribution/test was created to test the brew quality of which of the following beers:brew quality of which of the following beers:

A.A. BudweiserBudweiserB.B. CoorsCoorsC.C. PresidentePresidenteD.D. GuinnessGuinnessE.E. Samuel AdamsSamuel AdamsF.F. MillerMiller

*Bonus Point: Name the country of origin of Presidente*Bonus Point: Name the country of origin of Presidente

Page 32: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Kat’s Notes: StatisticsKat’s Notes: Statistics

Confidence IntervalsConfidence Intervals– Mean +/- SDMean +/- SD– EstimationEstimation– Hypothesis TestingHypothesis Testing

P-valueP-value– Probability of observing a phenomenon by chance Probability of observing a phenomenon by chance

alonealone

Page 33: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Kat’s Notes: StatisticsKat’s Notes: Statistics

T-Test/Z-TestT-Test/Z-Test– Used for testing 2 group means.Used for testing 2 group means.

ANOVAANOVA– Used for testing 3+ group means. Tells you that a Used for testing 3+ group means. Tells you that a

difference exists, but doesn’t tell you which groups difference exists, but doesn’t tell you which groups are different.are different.

Chi-SquareChi-Square– Used for categorical data (yes/no; male/female). Used for categorical data (yes/no; male/female).

Tells you whether observed matches expected Tells you whether observed matches expected outcomes. Every cell count MUST be “5” or greater.outcomes. Every cell count MUST be “5” or greater.

Page 34: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Kat’s Notes: StatisticsKat’s Notes: Statistics

Fisher ExactFisher Exact– Also used for categorical data. Necessary when any cell Also used for categorical data. Necessary when any cell

count is below “5”count is below “5”

Odds RatioOdds Ratio– Used for comparing categorical data again- observed vs. Used for comparing categorical data again- observed vs.

expected. Needed to approximate RR in Case-control expected. Needed to approximate RR in Case-control studiesstudies

Relative RiskRelative Risk– Used for comparing risk in two groups with categorical Used for comparing risk in two groups with categorical

data (sick/not sick; male/female). Can be used in cohort data (sick/not sick; male/female). Can be used in cohort studies where incidence/prevalence data are available.studies where incidence/prevalence data are available.

Page 35: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

BREAK

Page 36: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Part III: Sample Size Part III: Sample Size

Page 37: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Why does it matter?Why does it matter?

Why are sample size calculations so Why are sample size calculations so important?important?

*A sample size calculation allows us to *A sample size calculation allows us to determine how many people we need to determine how many people we need to detect a difference if one exists…detect a difference if one exists…

Page 38: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Why does it matter?Why does it matter?

1)1) Significant difference.Significant difference. -You might have been able to use a smaller sample -You might have been able to use a smaller sample

size…size…

2)2) Not Significant.Not Significant. -You don’t know whether your lack of significance -You don’t know whether your lack of significance

was due to low power or the fact that no difference was due to low power or the fact that no difference really exists…really exists…

So, What happens if you don’t do sample

size calculations?

Page 39: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Sample Size Calculations vs. Sample Size Calculations vs. Power AnalysisPower Analysis

Sample Size Calculations:Sample Size Calculations: – Completed prior to gathering dataCompleted prior to gathering data– Tells you how many people you need to investigate Tells you how many people you need to investigate

your phenomenon of interestyour phenomenon of interest

Power AnalysisPower Analysis– Completed after all data has been collected and Completed after all data has been collected and

analyzedanalyzed– Determines whether you had adequate power to Determines whether you had adequate power to

find a significant differencefind a significant difference

Page 40: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Sample Size Sample Size CalculationsCalculations Depends on what test you’re planning on Depends on what test you’re planning on

conducting, but, in general…conducting, but, in general…

– Expected value in your control (mean, proportion, Expected value in your control (mean, proportion, etc)etc)

– Expected differences Large or small?Expected differences Large or small?– Amount of variation known to exist (SDs, etc)Amount of variation known to exist (SDs, etc)

Heterogeneous vs. homogeneousHeterogeneous vs. homogeneous

Page 41: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Sample Size Sample Size Calculations: t-testsCalculations: t-tests

From Literature/pilot studyFrom Literature/pilot study Standard deviationStandard deviation Expected difference (based off Expected difference (based off

experience, previous research or other experience, previous research or other evidence)evidence)

Remember:Remember: select your numbers from a select your numbers from a well-designed study. Be Careful!!well-designed study. Be Careful!!

Page 42: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Sample Size Sample Size Calculations: Calculations: Proportions TestProportions TestFrom the literature/pilot study:From the literature/pilot study:

Proportion of observed events in the control groupProportion of observed events in the control group Anticipated proportion of observed events in the Anticipated proportion of observed events in the

active group (based off previous trends)active group (based off previous trends)

Page 43: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Kat’s Notes: Kat’s Notes: Sample Size Sample Size CalculationsCalculations Sample Size Calculations are much more Sample Size Calculations are much more

desirable than power analysisdesirable than power analysis

Obtain information from well-conducted Obtain information from well-conducted studies- Remember: GIGO (garbage in, studies- Remember: GIGO (garbage in, garbage out) Don’t pick out your numbers from garbage out) Don’t pick out your numbers from a bad study!a bad study!

You generally need the 1) average value and You generally need the 1) average value and 2) amount of variation in your control 2) amount of variation in your control (comparison group)(comparison group)

Page 44: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

REMEMBER!REMEMBER!

No matter what- if you find a significant result, there’s No matter what- if you find a significant result, there’s still a small possibility that you’re WRONG. This is still a small possibility that you’re WRONG. This is inherent in probability- we don’t have 100% certainty. inherent in probability- we don’t have 100% certainty. We can only attempt to minimize the possible problems.We can only attempt to minimize the possible problems.

If you fail to find a significant result- it doesn’t If you fail to find a significant result- it doesn’t necessarily mean that there isn’t a relationship there. necessarily mean that there isn’t a relationship there. The study might have been structured incorrectly, used The study might have been structured incorrectly, used the wrong statistics, the wrong model, the relationship the wrong statistics, the wrong model, the relationship might not be the form that you think it is (linear might not be the form that you think it is (linear regression on curvilinear data), or there might be regression on curvilinear data), or there might be another variable interfering that you don’t know about…another variable interfering that you don’t know about…

Page 45: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

QUESTIONS?QUESTIONS?

Page 46: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

On-Site Biostatistics: On-Site Biostatistics: The Take-Home MenuThe Take-Home Menu

Clinical Trial DesignClinical Trial Design Database DesignDatabase Design Sample Size CalculationsSample Size Calculations Randomization SchemesRandomization Schemes Data AnalysisData Analysis InstructionInstruction IRB Statistical ReviewIRB Statistical Review Publication consultationPublication consultation

Page 47: Learning and Applying Biostatistics: How the Guinness Brewery Changed History

Thank you!