Enhancing Community Health Center PCORI Engagement … › wp-content › uploads › 2017 ›...
Transcript of Enhancing Community Health Center PCORI Engagement … › wp-content › uploads › 2017 ›...
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Enhancing Community Health Center PCORI Engagement (EnCoRE)
This work was partially supported through aPatient-Centered Outcomes Research Institute (PCORI) Program Award
(NCHR 1000-30-10-10 EA-0001).
With support from:N2 PBRN
funded by:
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Project PartnersClinical Directors Network (CDN) New York, NY
National Association of Community Health Centers (NACHC) Washington D.C.
The Association of Asian Pacific Community Health Organizations (AAPCHO) Oakland, CA
Access Community Health NetworkChicago, IL
Institute for Community Health (ICH) a Harvard Affiliated InstituteCambridge, MA
The South Carolina Primary Health Care Association (SCPHCA)Columbia, South Carolina
Jonathan N. Tobin, PhD [email protected]
Michelle Proser, MPP [email protected] Jester, MA [email protected]
Rosy Chang Weir, PhD [email protected]
Danielle Lazar, [email protected]
Shalini, A. Tendulkar, ScM, ScD [email protected] Zallman [email protected]
Vicki Young, PhD [email protected]
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
EnCoRE Partners’ Geography2014-2015
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
AIM
AIM: To build health center capacity to engage in patient-centered outcomes research through an interactive 12-month long training
curriculum, walking health centers through the steps and skills needed to develop a patient-centered research proposal
EnCoRE: Enhancing Community Health Center PCORI Engagement
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
EnCoRE
Goal:To adapt, enhance, and implement an existing year long training curriculum designed to educate and engage Health Center teams including patients, clinical and administrative staff in Patient Centered Outcomes Research (PCOR).
Objectives: • Build infrastructure to strengthen the patient-centered comparative
effectiveness research (CER) capacity of Health Centers as they develop or expand their own research infrastructure
• Develop, implement, and disseminate an innovative online training, which will be targeted to and accessible at no cost to all Health Centers and other primary care practices.
• Content will prepare Health Center patients, staff, and researchers in the conduct of community-led PCOR
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Chat
During this live training, you may ask questions at any time in the Chat Window.
This area is located in the lower left hand corner of your screen.
These questions will be answered at the end of the presentation
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Audio Setup
Configure your PC for Audio
Configure Your PCClick the Microphone/Gears icon or
Go to: Tools > Audio > Audio Setup Wizard
1.
2.
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
This program has been reviewed and approved for up to 1.5 Prescribed CME credits by the American
Academy of Family Physicians (AAFP).
Please complete the CE Evaluation launched at the end of the presentation and email
[email protected] with a request for credits.
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Presenters
Mary Ann McBurnie, PhDSenior Investigator, Kaiser Permanente Center for Health
Research Steering Committee Chair, Community Health Applied Research Network (CHARN)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Session 7:Basic Concepts in Biostatistics
May 21, 2015 Mary Ann McBurnie, PhD
Senior Investigator, Kaiser Permanente Center for Health Research Steering Committee Chair,
Community Health Applied Research Network (CHARN)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Acknowledgment
Material in this presentation was developed as part of
the curriculum for the international Methods in
Epidemiologic, Clinical and Operations Research
(MECOR) program sponsored by the American
Thoracic Society (ATS)
Designed for physicians and health care professionals
Intended to strengthen capacity and leadership in research
related to respiratory conditions, critical care and sleep
medicine in middle and low income countries
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Objectives
To be able to identify different data types
and appropriate statistical tests
To understand when non-parametric
methods are preferred over parametric
methods
To be able to interpret results (summary
measures) of statistical tests.
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Entire Population
Study Design
Study Sample
Data Collection
& Analysis
Results
(e.g. Mean FEV1 in pa)
But how good is our estimate from the sample?
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Data Analysis
• Using data to answer questions
• Evaluate the association between exposure
measure(s) and outcome measure(s)
• i.e., we have a hypothesis we want to test
• Use data to estimate measures of interest and
make inferences (test our hypothesis) about
these measures
• E.g., does prevalence of COPD vary by geographic
region?
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Data Analysis
• Descriptive statistics
• Characterize the distributions of variables of
interest
• Inferential statistics
• Formally evaluate the role of chance in explaining
the findings of a study – is the observed difference
due to chance alone?
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Types of Data
• Continuous measures
• E.g., age, weight, blood pressure, FEV1
• Discrete measures
• Binary
• Yes/no, present/not present, high/low
• Categorical
• Ordered: education level, income level
• Unordered: race/ethnicity categories, blood type
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Getting Started - Descriptive
Statistics
• What do the data look like?
• Understand/confirm types of data
• Assess quality and completeness of the data
• Describe the distributions of variables of interest
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Getting Started - Descriptive Statistics
• Categorical variables• simple frequencies, minimum and maximum values,
proportions/rates, crosstabs - esp. for nested questions (i.e., if “yes” to Q1, then ask Q2) and recoded variables
• Continuous variables • mean, SD, median/percentiles, minimum and maximum values,
range
• Listings of selected variables
• Graphics – histograms, bar charts, scatter plots
=> Essential to describe/understand data before testing/modeling
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Variable Obs Mean
Std.
Dev. Min Max
ID 10712 137334.6 91395.18 1 452001
age 10712 56.45622 11.5369 40 98
female 10712 0.521378 0.499566 0 1
weight_kg 10712 74.15711 19.05139 0 181
height_cm 10712 166.1451 10.47751 115 203
bmi 10578 27.10881 5.388997 12.5 70.9968
gold_nhanes 10001 0.317268 0.72707 0 4
stage_nhanes 0
smokestat 10711 2.186724 0.8042 1 3
cursmok 10711 0.247409 0.431527 0 1
eversmoking 10711 0.565867 0.495666 0 1
packyrs 10712 13.54834 23.05431 -9 614.25
asthma 10712 0.120426 0.325474 0 1
country 10712 1471.688 874.0461 101 2901
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Frequency listing to check format, content of specific variables
Stata command: tab gold_nhanes
gold_nhanes Freq. Percent Cum.
0 8,111 81.1 81.1
1 858 8.58 89.68
2 807 8.07 97.75
3 199 1.99 99.74
4 26 0.26 100
Total 10,001 100
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Crosstabs to check coding of new variablesStata command: tab gold_nhanes Stage3plus
Stage3plus
gold_nhanes 0 1 Total
0 8,111 0 8,111
1 858 0 858
2 807 0 807
3 0 199 199
4 0 26 26
Total 9,776 225 10,001
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
0
.02
.04
.06
.08
De
nsity
0 20 40 60 80bmi
0
.01
.02
.03
.04
De
nsity
0 200 400 600f_100_derived: number of cigarette packs smoke per year
0
.00
5.0
1.0
15
.02
.02
5
De
nsity
0 50 100 150 200weight_kg
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Normal Distribution – has some nice properties
Mean = 30, SD = 4
Mean = 30, SD = 7
•Symmetric•Bell-shaped•Mean=median
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interval Estimate
1SD
2SD
3SD
±1 SD, coverage=68.26%
±2 SDs, coverage=95.46%
±1.96 SDs, coverage=95%
±3 SDs, coverage=99.73%
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Non-Normal Distribution
Mean = 7.5
Median = 5.8
Skewed Right
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Non-Normal Data
• Extreme values in skewed distributions have bigger
effect on means than on medians.
• Mean gets “pulled” out toward the tail
• Median more robust to skewing
• Mean and median can be very different for a very skewed
distribution
• Important to look at both measures when data are
skewed
• Median may be more informative/appropriate
• Normality (or lack thereof) is important – impacts
analytic approach
• “parametric” tests assume an underlying normal
distribution
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Inferential StatisticsSummary Measures
• Point estimate• Estimate of a population parameter (e.g., mean FEV1, prevalence of TB)
• P-value• probability (under the assumptions of the test statistic) of obtaining a
result (i.e., the point estimate) equal to or more extreme than the one we observed.
• Is our observed value (point estimate) consistent with the expected value?
• Interval estimate (confidence interval)• Certainty, or confidence level we have that the interval covers the true
population value. (95% confidence intervals are very common)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Population A
N 100
Mean FEV1 1.70
SD (mean FEV1) .4
SE(mean) = SD/Sqrt(N) .04
Does Population A have a different mean FEV1 (1.70 L) than that of the reference population (1.60 L, say)? That is, is the sample mean FEV1 L different from 1.60 L or is the difference due to chance?
A simple example
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Compare sample mean FEV1 = 1.70 to
population mean (=1.60)
Observed
Mean
Expected
Mean
Distance In SE?
How many SEs
between 1.70 &
1.60?
(1.70-1.60)/.044
= 2.5 SEs
Covers ~49.4%
0.6% chance of
getting a mean of at
least 1.70 L.
P-value = 0.006
49.4%
1.40 1.50 1.60 1.70 1.80
FEV1, L
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
P-Value
• A probability between 0 and 1
• Interpretation: probability of observing a difference
that is at least as extreme as the one we observed
when there is really no difference.
• Smaller the p-value => stronger the evidence for a
difference
• Commonly use a significance level of 0.05 (5%)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
P-Value = 0.006
• Assuming the mean FEV1 is not different from
1.60 L:
• If random samples (of 100) are taken repeatedly,
mean FEV1 will be at least as great as 1.70 L only
0.6% of the time.
• i.e., it is very unlikely we would observe this value
(1.70L) if the true mean FEV1 for this population
were 1.60L
=> We conclude that mean FEV1 in this
population is different from 1.60L.
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Confidence Interval for the Mean
Standard Deviation
Sample Size (N) Standard Error of the mean =
N
SD
95% CI: Mean + 1.96 x
99.7% CI: Mean + 3.0 x
68.3% CI: Mean + 1.0 x
N
SD
N
SD
N
SD
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Population A
N 100
Mean FEV1 1.70
SD(mean FEV1) .4
SE(mean) = SD/Sqrt(N) .04
95% CI for mean FEV1
1.70 + 1.96 x .04
(1.62, 1.78)
One Sample T-test
95% CI: Mean + 1.96 x N
SD
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
“Small” sample”“Larger”
sample
N 100 400
Mean FEV1 1.70 1.70
SD(mean FEV1) .40 .40
SE(mean) = SD/Sqrt(N) .04 .02
95% CI for mean FEV1
1.70 + 1.96 x .04
(1.62, 1.78)
1.70 + 1.96 x .02
(1.66, 1.73)
Lower & Upper
Confidence Limits
Impact of Sample Size: As sample size increases, confidence interval get tighter
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Hypothesis Testing
• Test H0: FEV1 = 160 vs Ha: FEV1 > 160
Test statistic is:
T = FEV1 difference = 1.70 -1.60
Mean FEV1 se .04
• The properties of the statistic, T, are known if assumptions hold, and a p-value can be calculated
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Females, mean FEV1 (sd) 1.61 (.44)
Males, mean FEV1 (sd) 2.28 (.66)
Difference, mean FEV1, (se) .67 (.04)
Comparing Two Means – Two Sample T-test:
Do males have a different mean FEV1 than females?
Test H0: FEV1(males) – FEV1(females)} = FEV1 diff = 0, vs
Ha: FEV1 diff = 0
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Females, mean FEV1 (sd) 1.61 (.44)
Males, mean FEV1 (sd) 2.28 (.66)
Difference, mean FEV1, (se) .67 (.04)
Comparing Two Means – Two Sample T-test:
Do males have a different mean FEV1 than females?
Mean FEV1 Difference = T
se(Mean FEV1 Difference )
• Properties of T are known if assumptions hold
=> Can compute p-value. (In this case, T = 16.14 and p<.001)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Key Assumptions for the Two-Sample T-test
• Each sampled observation is random and independent of all other observations
• Data randomly sampled from normally distributed populations
• The two populations have “equal” variances• Test for “homogeneity” of variance (F-test)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Key Assumptions for the Two-Sample T-test
• Each sampled observation is random and independent of all other observations• If pairs of observations are correlated (e.g., blood pressure before and
after a challenge test) a paired t-test can be used
• Data randomly sampled from normally distributed populations
• The two populations have “equal” variances• Test for “homogeneity” of variance (F-test)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Key Assumptions for the Two-Sample T-test
• Each sampled observation is random and independent of all other observations• If pairs of observations are correlated (e.g., blood pressure before and
after a challenge test) a paired t-test can be used
• Data randomly sampled from normally distributed populations• If distribution is not normal, one can use a non-parametric test – e.g., the
Wilcoxon Rank Sum (still requires independent obs.)
• The larger the sample size, the less serious the departure from normality
• The two populations have “equal” variances• Test for “homogeneity” of variance (F-test)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Key Assumptions for the Two-Sample T-test• Each sampled observation is random and independent of all
other observations• If pairs of observations are correlated (e.g., blood pressure before and
after a challenge test) a paired t-test can be used
• Data randomly sampled from normally distributed populations• If distribution is not normal, one can use a non-parametric test – e.g., the
Wilcoxon Rank Sum (still requires independent obs.)
• The larger the sample size, the less serious the departure from normality
• The two populations have “equal” variances• Test for “homogeneity” of variance (F-test)
• There is an “adjusted” version of the two-sample t-test (Welch’s) if variances aren’t homogeneous
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Patient Tech 1 Tech 2 Difference
1 3.96 3.88 0.08
2 2.80 2.85 -0.05
3 3.91 3.86 0.05
4 3.17 3.14 0.03
5 2.95 2.90 0.05
6 2.55 2.63 -0.08
7 3.29 3.22 0.07
8 4.30 4.23 0.07
Mean 3.366 3.339 0.026
SD 0.624 0.579 0.060
Paired T testBased on the differences between the values of pairs of observations
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Patient Tech 1 Tech 2 Difference
1 3.96 3.88 0.08
2 2.80 2.85 -0.05
3 3.91 3.86 0.05
4 3.17 3.14 0.03
5 2.95 2.90 0.05
6 2.55 2.63 -0.08
7 3.29 3.22 0.07
8 4.30 4.23 0.07
Mean 3.366 3.339 0.026
SD 0.624 0.579 0.060
Paired T testBased on the differences between the values of pairs of observations
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Patient Tech 1 Tech 2 Difference
1 3.96 3.88 0.08
2 2.80 2.85 -0.05
3 3.91 3.86 0.05
4 3.17 3.14 0.03
5 2.95 2.90 0.05
6 2.55 2.63 -0.08
7 3.29 3.22 0.07
8 4.30 4.23 0.07
Mean 3.366 3.339 0.026
SD 0.624 0.579 0.060
T = mean(diff)
sd/sqrt(n)
Paired T testBased on the differences between the values of pairs of observations
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Paired T testBased on the differences between the values of pairs of observations
Patient Tech 1 Tech 2 Difference
1 3.96 3.88 0.08
2 2.80 2.85 -0.05
3 3.91 3.86 0.05
4 3.17 3.14 0.03
5 2.95 2.90 0.05
6 2.55 2.63 -0.08
7 3.29 3.22 0.07
8 4.30 4.23 0.07
Mean 3.366 3.339 0.026
SD 0.624 0.579 0.060
T = mean(diff)
sd/sqrt(n)
= .026
.06/sqrt(8)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Patient Tech 1 Tech 2 Difference
1 3.96 3.88 0.08
2 2.80 2.85 -0.05
3 3.91 3.86 0.05
4 3.17 3.14 0.03
5 2.95 2.90 0.05
6 2.55 2.63 -0.08
7 3.29 3.22 0.07
8 4.30 4.23 0.07
Mean 3.366 3.339 0.026
SD 0.624 0.579 0.060
T = mean(diff)
sd/sqrt(n)
= .026
.06/sqrt(8)
= 1.23
p-value = .234
Paired T testBased on the differences between the values of pairs of observations
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Patient Tech 1 Tech 2 Difference
1 3.96 3.88 0.08
2 2.80 2.85 -0.05
3 3.91 3.86 0.05
4 3.17 3.14 0.03
5 2.95 2.90 0.05
6 2.55 2.63 -0.08
7 3.29 3.22 0.07
8 4.30 4.23 0.07
Mean 3.366 3.339 0.026
SD 0.624 0.579 0.060
T = mean(diff)
sd/sqrt(n)
= .026
.06/sqrt(8)
= 1.23
p-value = .234
Under the null hypothesis (of no difference), the probability that we
we would observe a difference of at least .026 L by chance is .23
Paired T testBased on the differences between the values of pairs of observations
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Analysis of Variance (ANOVA)
• Extension of T-test for evaluating differences in
means. Generalizes T-test to > 2 groups, e.g.,
H0: m1 = m2 = m3, vs
Ha: at least one m differs from one of the others
• Assumptions Independent observations
Normality
Homogeneity of variances
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Non-Parametric Tests
• Alternatives to parametric tests when assumptions don’t hold• Don’t require assumptions about shape of distributions or
variances
• Do require that observations/pairs be randomly and independently chosen.
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Non-Parametric Tests• Wilcoxon Rank Sum (WRS)
• Alternative to the two-sample t-test
• Based on the ranks –the order in which the observations (of both groups combined) fall
• WRS test statistic is the sum of the ranks for observations from one of the samples. “Large” or “small” rank sums constitute evidence against the null hypothesis.
• For smaller sample sizes, tables for WRS exist to look up p-values. For larger sample sizes a normal approximation can be used.
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Non-Parametric Tests• Wilcoxon Signed Rank test (WSR)
• Alternative to the paired t-test• Paired differences are ranked
• Kruskal-Wallis• Alternative to ANOVA• Also based on ranks
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Chi-square test for categorical data
Is smoking (y/n) associated with CVD (y/n)
CVD
No Yes
SmokerNo 140 50 190
Yes 60 50 110
200 100 300
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
CVD
No Yes
SmokerNo 140 50 190
Yes 60 (30%) 50 (50%) 110
200 100 300
Compare proportions of smokers in each CVD group
Test H0: p1 = p2, or p1-p2 = 0, vs
Ha: p1-p2 = 0
Chi-square test for categorical data
Is smoking (y/n) associated with CVD (y/n)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Computation of Chi-square statistic
CVD
No Yes
Smoker
No190*200/300
= 127.7190
Yes
200 300
Compute the expected value for each cell from the
marginal totals
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Computation of Chi-square statistic
CVD
No Yes
Smoker
No190*200/300
= 127.7
190*100/300
= 63.3190
Yes110*200/300
= 66.7
110*100/300
= 36.7110
200 100 300
Compute the expected value for each cell from the
marginal totals
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Computation of Chi-square statistic
CVD
No Yes
Smoker
No 104 -127.7 50 - 63.3 190
Yes 60 - 66.7 50 - 36.7 110
200 100 300
• Compute the differences between Observed and Expected values
• Square the difference and divide by the expected value: (O-E)2
/ E
• Add these up to compute the chi-square statistic: S {(O-E)2
/ E}
• Our test statistic, X = 12.69
• Properties of X are known if assumptions hold
• => Can compute p-value
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Chi-square test for categorical data
CVD
No Yes
SmokerNo 140 50 190
Yes 60 (30%) 50 (50%) 110
200 100 300
50% of CVD patients vs 30% non-CVD patients are smokers
Chi-Square Test gives a P-Value <0.001.
=> Very unlikely that we would see this difference (30% vs 50%) if
there really were no difference between the 2 groups.
=> Very strong evidence for a real difference.
Note: This test can be used for exposure variables with >2 categories
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Key Assumptions for the Chi-Square Test
• Each observation is sampled randomly and independently of all other observations• McNemar’s test assesses paired observations
• “Sufficient” sample size• E.g., all cells have counts > 5 for 2x2 tables, or 80% of cells have counts >5
for larger tables and no cells have zero’s
• Apply “Yate’s Correction” if this assumption isn’t met
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Parametric
Analyses
Type of Outcome Variable
Binary Continuous
Type of
Predictor
Variable
Binary Pearson’s c2 test
McNemar’s c2 test
Two-sample t-test
Paired T-test
Categorical
(K-levels)
Pearson’s c2 test ANOVA
Continuous
Multivariate
Recap
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Non-Parametric
Analyses
Type of Outcome Variable
Binary Continuous
Type of
Predictor
Variable
Binary Wilcoxon Sign Rank test Wilcoxon Rank Sum test
Categorical
(K-levels)
Kruskall-Wallis
Continuous
Multivariate
Recap
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Simple Linear Regression
• E{Y|X=x} = b0 + b1 * x
• E{Y| X} = mean response
• X = predictor
• b0 = ??
• b1 = ??
• What do b0 and b1 mean?
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interpretation of Coefficients - Example
• E{FEV1| Gender}= b0 + b1 * Gender
• Y = response = FEV1
• X = predictor = Gender (dichotomous)
= 0 if female
= 1 if male
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interpretation of Coefficients
Example 2: E{FEV1|Gender}= b0 + b1 * gender
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
“jittered” data:
gender = gender + N(0, 0.1)
E{FEV1|Gender} =
b0 + b1 * Gender
Mean (se) FEV1
Females
1.61 (.44)
Males
2.28 (.66)
..
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
b0 (se)
1.61 (.024)
b1 (se)
.67 (.038)
E{FEV1|Gender} = b0 + b1 * Gender
..
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interpretation of Coefficients Example 2: E{FEV1|Gender}= b0 + b1 * gender
E{FEV1 | Gender} = 1.61 + .67 * Gender
1. X = 0 (Females)• E{FEV1| Gender = 0} = 1.61 + .67 * 0 = 1.61
2. X = 1 (Males)• E{FEV1| Gender = 1} = 1.61 + .67 * 1 = 2.28
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
FEV1 = b0 + b1gender
Regression and ANOVA:The t-test as a regression model
0
2
4
6
0=F 1=M
b0b0 = mean FEV1 in women
---------------------b0+b1
---------------------} b1
b1 = mean difference
test of b1 = 0 is equivalent to the t-test
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Relationship between T-test and Regression (2-sample problem)
T-Test
Females (x=0) Mean FEV: 1.61 (.44)
Males (x=1) Mean FEV: 2.28 (.66)
Mean Difference: 0.67 (.04) T-stat = 16.140*, p<.001
Linear Regression
b0 (se): 1.61 (.024)
b1 (se): 0.67 (.04)
b0 + b1: 2.28
T-stat for b1 = 17.526*,
p<.001
*T-statistics from T-test and regression coefficient are EXACT when
the variances for both groups (males and females) are equal
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Hypothesis Testing for Regression Coefficients
• Test H0: b1 = 0 vs Ha: b1 = 0
• Test statistic for coefficient estimate is:
b1 = Tse(b1)
• Properties of T are known if assumptions hold
• Linear regression intercept and slope estimates, b0 and b1, are asymptotically normally distributed
=> Can compute p-value
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interpretation of Regression Coefficients
What if we don’t reject the hypothesis that b1 = 0?
• There may, in fact, be no association
• Zero slope doesn’t prove there is no association • May be an association but not in the parameter we looked at
(multiplicative model?)
• May be an association but it may not be linear (curvilinear assoc.)
• May be a linear trend but we lack statistical precision to be confident that it truly exists (type II error: we didn’t have a big enough sample or we were unlucky – suerte mala)
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interpretation of Regression Coefficients
What if we don’t reject the hypothesis that b1 = 0?
• Non-zero slope suggests an association is present between the mean response and the predictor• Reject the hypothesis that there is no linear trend in the
average response (e.g., FEV1) across predictor groups (e.g., age)
• Does NOT imply causality
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Simple Linear Regression Model Assumptions• Linearity of the regression function, i.e., relationship is linear
in the modeled predictors
• Independence of observations
• Equal variance across predictor groups
• Normality of error terms
Consider “robust” regression methods if assumptions don’t hold.
More precise than robust methods if assumptions hold
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Robust Regression Model AssumptionsRobust
• Allow correlated observations within identified clusters;
• Allow unequal variances across groups
• Still correct if classical assumptions hold but may be less precise
• Avoid need to check model assumptions
Requires fewer assumptions than classical methods
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Type of Outcome Variable
Binary Continuous
Type of
Predictor
Variable
Binary c2 test
logistic regression
T-test
ANOVA
linear regression
K-levels chi-square
logistic regression
ANOVA
linear regression
Continuous logistic regression correlation
Linear, non-linear regression
Multivariate logistic regression linear/non-linear regression
Parametric analyses
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Type of Outcome Variable
Binary Continuous
Type of
Predictor
Variable
Binary sign test Mann-Whitney
Kruskall-Wallis
K-levels Robust logistic
regression
Kruskall-Wallis
Continuous Robust logistic
regression
Robust regression
Multivariate Robust logistic
regression
Robust regression
Non-parametric analyses
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Correlation
• Measures how closely the largest values of one variable are associated with the largest values of a second variable and vice versa.
• Sample correlation coefficient, R, is an estimate of the population correlation r.
• Ranges from –1 to +1• –1 (perfect negative correlation)
• +1 (perfect positive correlation)
• R=0 indicates no linear association
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interpretation of Coefficients
• E{FEV1| Age}= b0 + b1 * Age
• Y = response = FEV1
• X = predictor = Age (continuous)
We’ve estimated that
• b0 = 3.12
• b1 = -.016
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interpretation of Coefficients for E{FEV1 |Age} = b0 + b1 * Age
1. Age = 0• E{Y|X=0} = b0 + b1* 0 = b0
• E{FEV1|Age = 0} = 3.12 - .016* 0 = 3.12
2. Age = x• E{Y|X=x} = b0 + b1* x
• E{FEV1|Age = x} = 3.12 - .016 * x
3. Age = x+1• E{Y1|X x+1} = b0 + b1*(x+1) = b0 + b1*x + b1
• E{FEV1|Age = x+1} = 3.12 - .016*(x+1) = 3.12 – .016*x –.016
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Interpretation of Coefficients for E{FEV1 |Age} = b0 + b1 * Age
Mean FEV1 at age=x+1… (b0 + b1*x + b1) (3.12 - .016*x -.016)
Mean FEV1 at age=x… - (b0 + b1*x ) - (3.12 - .016*x )
------------------------ ----------------------------
b1 -.016
b1 (= -.016) is the average difference in FEV1 when age increases from x
to x+1
OR
On average, a one year difference in age results in a change of b1
(= -.016) in FEV1
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Additional Questions?
And Discussion
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Upcoming Webinars
Webinar Date Content Activities Presenters
May 192:00 – 3:30 pm EST
Bioinformatics Data management and EHR data collection methods
Michelle Proser and Mickey Eder
June 162:00 – 3:30 pm EST
Research Ethics, IRB, and Good Clinical Practices
Creation of informed consent forms, plans for fair compensation of patient participants
Leah Zallman and Rosy Chang Weir
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Available Resources
• EnCoRE Website for Past Webinars and Materials• https://cdnencore.wordpress.com/live-session-library/
• Additional resources to build research capacity at health centers
• www.CDNetwork.org/NACHC
www.CDNetwork.org www.NACHC.com
www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net
Future Funding Opportunities from PCORI
• Visit http://www.pcori.org/funding/opportunities for more information
Opportunity Letter of Intent Due
Application Due
Addressing Disparities March 3, 2015 May 5, 2015
Improving Healthcare Systems March 3, 2015 May 5, 2015
Assessment of Prevention, Diagnosis, and Treatment Options
March 3, 2015 May 5, 2015
Communication and Dissemination Research March 3, 2015 May 5, 2015
Clinical Management of Hepatitis C Infection March 3, 2015 May 5, 2015
Improving Methods for Conducting PCOR March 3, 2015 May 5, 2015
Engagement Award: Knowledge, Training and Development, and Dissemination Awards
April 1, 2015 April 1, 2015
Engagement Award: Research Meeting and Conference Support
April 1, 2015
• Visit http://www.pcori.org/funding/opportunities for more information