Statistics pres 3.31.2014
Transcript of Statistics pres 3.31.2014
ADLT 673 : TEACHING AS SCHOLARSHIP IN MEDICAL EDUCATION
MONDAY, MARCH 31 , 2014
An Overview of Quantitative Data Analysis
Outline of Today’s Class
Analytic Methods Summary Measures Hypothesis Testing Statistical Methodologies Group Discussion
Sample Size Determination Group Discussion
Additional Resources
Analytic Methods: Summary Measures
Representative Measures
Reflect the most “typical” or “average” data value.
Continuous Measurements: Mean (Average), Median and Mode
Categorical Measurements: Frequencies and Proportions
Analytic Methods: Summary Measures
Measures of Variability
Reflect how much the values differ from one another.
Continuous Measurements: Standard deviation, range, interquartile range
Categorical Measurements: None that are meaningful (sorry!)
“Normally” Distributed Data
“Skewed” Data
Analytic Methods: Summary Measures
0
100
200
300
400
500
600
100
150
200
250
300
“Normally” Distributed Data
“Skewed” Data
Analytic Methods: Summary Measures
100
150
200
250
300-2.33-1.64-1.28-0.670.00.671.281.642.33
0.5 0.80.20.05 0.95
Normal Quantile Plot
0
100
200
300
400
500
600 -2.33-1.64-1.28-0.670.00.671.281.642.33
0.5 0.80.20.05 0.95
Normal Quantile Plot
Analytic Methods: Summary Measures
Measures of Association Continuous Measures: Correlation Coefficient (ρ): -1 < ρ <
1 Correlations close to 1 indicate two measurements are highly
predictive and “track” with one another. Correlations close to -1 indicate two measurements are highly
predictive and have inverse relationship. Correlations close to 0 indicate little association.
Categorical Measures: Odds Ratio (OR): 0 < OR < ∞ OR greater than 1 indicates outcome (e.g., passed test) more
likely in test group than in control. OR less than 1 indicates outcome less likely in test group than
in control. OR ≈ 1 indicates little difference in outcomes between groups.
Analytic Methods: Hypothesis Testing
Most commonly accepted format of providing quantitative evidence.
Consists of 5 Steps: Translate research question into a set of testable
hypotheses. Select most appropriate statistical test for your
hypotheses. Collect your data. Calculate test statistic and/or p-value. Make Decision.
Analytic Methods: Hypothesis Testing
Translating Research Question into Testable Hypotheses Identify parameter: population Mean (μ), proportion (p) or
difference (e.g., μ1-μ2).
Identify statements made about that parameter. Should be in the form of: <, ≤, >, ≥, = or ≠
Write research question in symbolic form, and find its opposite. Opposite of “<“ is “≥” “≤” is opposite of “>” “≠” is opposite of “=“
Analytic Methods: Hypothesis Testing
Example: Does an active learning curriculum improve the
proportion of students passing their board examinations compared to students receiving the standard curriculum? Parameter: proportion passing board exams p
Statement: pactive is greater than pstandard
Symbolic Form: pactive > pstandard or pactive – pstandard > 0
Opposite of Symbolic Form: pactive ≤ pstandard or pactive –
pstandard ≤ 0
Analytic Methods: Hypothesis Testing
Testable Hypotheses: Null Hypothesis: Statement that parameter (or difference) is
equal to zero. Any statement in symbolic form with a ≤, ≥ or = is automatically
the null (note: we replace ≤ or ≥ with 0).
Alternative Hypothesis: Statement that parameter (or difference) is somehow different from zero. Any statement in symbolic form with a <, > or ≠ is automatically
the alternative.
Example: pactive – pstandard > 0 becomes the alternative (HA)
pactive – pstandard ≤ 0 becomes the null (H0)
Analytic Methods: Hypothesis Testing
Make Decision Based on statistical methodology you use, you get a p-value.
Probability of observing outcomes that are more extreme than the data you actually observed, given the null hypothesis is true.
Plain English: If your study was ineffective, p-value is the probability of observing more extreme results than what you observed. If this probability is high, then your results match with the null
hypothesis, and you fail to reject the null (intervention didn’t work) If this probability is low, then your results do not seem to match the
null hypothesis, and you reject the null (intervention likely worked).
In practice: we compare p-value to significance level (α = 0.05). If p-value ≥ 0.05, we fail to reject the null. If p-value < 0.05, we reject the null.
Analytic Methods: Continuous Data
# of Measurements
# of Samples
Single Pre/Post Repeated Measures
1 Sample t-test Paired t-test Repeated Measures ANOVA (RMA) / Linear Mixed Model (LMM)*
2 Samples Two-sample t-
test
RMA / LMM* RMA / LMM*
“k” Samples
Analysis of Variance (ANOVA)
RMA / LMM* RMA / LMM*
Adjusting for
Covariates:
Multiple Linear Regression*, Analysis of Covariance (ANCOVA)*, Linear Mixed Models*
*Will likely require statistical assistance
Analytic Methods: Categorical Data
# of Measurements
# of Samples
Single Pre/Post Repeated Measures
1 Sample z-test McNemar’s Test
Generalized Linear Mixed Models
(GLMM)*
2 Samples Chi-square Test
GLMM* GLMM*
“k” Samples
Chi-square Test
GLMM* GLMM*
Adjusting for
Covariates:
Multiple Logistic Regression*, Generalized Linear Mixed Models*
*Will likely require statistical assistance
Analytic Methods: Group Discussion
Please break into groups by table
For the next 10-15 minutes, take turns discussing what analytic approaches are appropriate for your proposed study. What are your null and alternative hypotheses?
Is your outcome continuous or categorical?
How many groups and measurements?
If your study is qualitative, discuss how statistical methodologies could be used (e.g. data summary, association).
Sample Size Determination
As a general rule, larger sample sizes: Lead to more representative samples Lead to better estimation of parameters (e.g.,
representative measures) Provide estimators with lower variability
N=9 N=36
N=100
Sample Size Determination
Averages over 10,000 Simulations
Sample Size
Sample Mean
Sample Std. Dev.
Standard Error*
9 204.4 36.5 12.3
16 204.3 37.1 9.5
25 204.2 37.2 7.8
36 204.1 37.5 6.5
49 204.1 37.6 5.5
64 204.2 37.7 4.9
81 204.1 37.7 4.2
100 204.1 37.7 3.9
1000 204.1 37.7 1.2
*SE: explains variability in estimator; not the sample data
Sample Size Determination
Possible Decisions
Power = 1 - β
True State
Decision H0 is “True” HA is True
Reject H0 Type I Errorα
Correct Decision
Fail to Reject H0
Correct Decision
Type II Error
β
Sample Size Determination
Determinants of Required Sample Size
Significance Level (α): probability of rejecting H0 when it is true.
Power (1-β): probability of failing to reject H0 when it is false.
These values are selected during design phase α = 5% 1-β = 80% (sometimes 90%).
Sample Size Determination
Determinants of Required Sample Size
Measure of variability (usually standard deviation) inherent in study population. As data become more variable… Standard error of Test statistic increases… p-value increases… Ability to reject H0 decreases… Power decreases.
Controlling variability: Better measurement methodology Homogeneous samples
Sample Size Determination
Determinants of Required Sample Size
Effect Size: smallest difference or change in outcome that you are hoping to find As difference you want to observe decreases… Test statistic decreases… p-value increases… Ability to reject H0 decreases… Power decreases.
Considerations: Clinical significance Clinical possibility (larger differences are easier to detect and
harder to find)
Sample Size Determination
Calculating Required Sample Size Equations exist (involving α, β, variability and effect
size) for simple analytic methods (t-test, chi-square, etc.).
Advanced methods require professional assistance.
Where do you find variability and effect size? Previous literature of similar populations Pilot study Guess-timates
Sample Size Determination
What if required sample size is too large? Consider a different outcome
Continuous measures generally require smaller sample sizes than categorical measures
Consider multiple sections or sites Will require more sophisticated analytic methods
Reconfigure study as a “pilot” Emphasis switches from “hypothesis testing” to “estimation”
and “data summary” Goal is to provide data summaries and estimate confidence
intervals Summaries can be used to power larger study
Sample Size Determination: Group Discussions
Please break into groups by table.
For the next 10-15 minutes, take turns discussing: Whether you will be able to power your study.
Where to find information to perform power analysis.
Your options if you are unable to adequately power your study.
Additional Resources
VCU Department of Biostatistics 18 full-time faculty
Can assist with: study design, sample size determination, interim and final analyses, dissemination
Grant funding (or prospects of funding) usually required.
BIOS 516 Biostatistical Consulting: graduate students available for FREE consultations Contact Russ Boyle ([email protected]) and provide a
protocol.
Additional Resources
VCU Center for Clinical and Translation Research
Research Incubator: study design, sample size determination, and other resources (e.g. grant writing) Contact: Pam Dillon ([email protected])
Biomedical Informatics: data management and storage (e.g. REDCAP) Support requested online:
(http://www.cctr.vcu.edu/informatics/index.html)
Additional Resources
Textbooks (i.e., shameless plug):
Statistical Research Methods: A Guide for Non-Statisticians Sabo and Boone, Springer, 2013 Available on the web ($45-$65):
http://www.springer.com/statistics//life+sciences,+medicine+%26+health/book/978-1-4614-8707-4
http://www.amazon.ca/Statistical-Research-Methods-Guide-Non-Statisticians/dp/1461487072