7 Case-control Analysis Chihaya Hundout
-
Upload
onlygodwill-judgeme -
Category
Documents
-
view
216 -
download
0
Transcript of 7 Case-control Analysis Chihaya Hundout
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
1/47
Analysis and presentation of
Case-control study data
Chihaya Koriyama
February 14 (Lecture 1)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
2/47
Study design in epidemiology
Observationalstudy
individual
Case-control
study
Cohortstudy
population
Ecological
study
intervention
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
3/47
Why case-control study?
In a cohort study, you need a large number
of the subjects to obtain a sufficient number
of case, especially if you are interested in a
rare disease. Gastric cancer incidence in Japanese male:
128.5 / 100,000 person year
A case-control study is more efficient in
terms of study operation, time, and cost.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
4/47
Comparison of the study design
Case-control Cohort
Rare diseases suitable not suitable
Number of disease 1 1
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
5/47
Case-control study- Sequence of determining exposure and outcome status
Step1: Determine and select cases of
your research interest
Step2: Selection of appropriate controls
Step3: Determine exposure status in
both cases and controls
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
6/47
Case ascertainment
What is the definition of the case?
Cancer (clinically? Pathologically?)
Virus carriers (Asymptomatic patients)
You need to screen the antibody
Including deceased cases?
You have to describe the following points,
the definition
when, where & how to select
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
7/47
Who will be controls?
Control non-case
Controls are also at risk of the disease
in his(her) future.
Controls are expected to be arepresentative sample of the
catchment population from which the
case arise.
In a case-control study of gastric
cancer, a person who has received the
gastrectomy cannot be a control since
he never develop gastric cancer .
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
8/47
a population-based case-control study
Both cases and controls are recruited from the
population.
a case-control study nested in a cohort
Both case and controls are members of the cohort.
a hospital-based case-control studyBoth case and controls are patients who are
hospitalized or outpatients.
Controls with diseases associated with the exposure
of interest should be avoided.
Various types of case-control studies
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
9/47
The following points should be
recorded (described in your paper) The list (number) of eligible cases
whose medical records unavailable
The list (number) of refused subjects,if possible, with descriptions of the
reasons of refusal
The length of interview The list (number) of subjects lacking
the measurement data, with
descriptions of the reasons
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
10/47
Exploratory or Analytic
Exploratory case-control studies
There is no specific a priori
hypothesis about the relationship
between exposure and outcome.
Analytic case-control studies
Analytic studies are designed to test
specific a priori hypotheses aboutexposure and outcome.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
11/47
Case-control study - information
Sources of the information of exposure and
potential confounding factors
Existing records
Questionnaires
Face-to-face / telephone interviews
Biological specimens
Tissue banks Databases on biochemical and
environmental measurements
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
12/47
Temporality is essential in Hills criteria
Disease
onset
Initial
Symptoms
Clinical
Diagnosis
The study exposure
is unlikely to bealtered at this stage
because of the
disease.
The study exposure
is more likely to bealtered at this stage
because of the
symptoms.
Essential Epidemiology (WA Oleckno)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
13/47
Bias should be minimized
Bias & Confounding
Selection bias
Detection bias
Information bias (recall bias)
Confounding
Confounding can be controlledby statistical analyses but we
can do nothing about bias after
data collection.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
14/47
Case-control studies
are potential sources
of many biases
should be carefully
designed, analyzed,
and interpreted.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
15/47
How can we solve the problem of
confounding in a case-control study?
Prevention at study design
Limitation
Matching in a cohort study But
not in a case-control study
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
16/47
Matching in a case-control study
Matched by confounding
factor(s) to increase the
efficiency of statistical analysis
Cannot control confounding
A conditional logistic analysis is
required.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
17/47
Over matching
Matched by factor(s) strongly
related to the exposure which is
your main interest
CANNOT see the difference in
the exposure status between
cases and controls
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
18/47
How can we solve the problem of
confounding?
Treatment at statistical analysis
Stratification by a confounder
Multivariate analysis
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
19/47
What you should describe in the
materials and methods,
1. Study design
2. Definition of eligible cases
and controls
Inclusion / exclusion criteria of
cases and controls
3. Number of the respondents
and response rate4. Main exposure and other
factors including potential
confounding factors
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
20/47
5. Sources of the information of
exposure and other factors
6. Matched factors, if any7. The number of subjects used
in statistical analyses
8. Statistical test(s) and model(s)
9. Name and version of the
statistical software
What you should describe in the
materials and methods,
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
21/47
Assuring adequate study power
Following information is necessary
The confidence level desired (usually 95%
corresponding to a p-value of 0.05)
The level of power desired (80-95%) The ratio of controls to cases
The expected frequency of the exposure in
the control group
The smallest odds ratio one would like to beable to detect (based on practical
significance)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
22/47
Statistical analysis
Matched vs. Unmatched studiesThe procedures for analyzing the
results of case-control studies
differ depending on whether the
cases and controls are matched orunmatched.
Matched Unmatched
McNemars test Chi-square test
Conditional logistic Unconditional logistic
regression analysis regression analysis
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
23/47
Advantages of pair matching in case-
control studies
Assures comparability between cases and
controls on the selected variables
May simplify the selection of controls by
eliminating the need to identify a randomsample
Useful in small studies where obtaining cases
and controls that are similar on potentially
confounding factors may otherwise be difficult
Can assure adequate numbers of subjects with
specified characteristics so as to permit
statistical comparisons Essential Epidemiology (WA Oleckno)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
24/47
Disdvantages of pair matching in case-
control studies
May be difficult or costly to find a sufficientnumber of controls
Eliminates the possibility of examining the effects
of the matched variables on the outcome Can increase the difficulty or complexity of
controlling for confounding by the remaining
unmatched variables
Overmatching
Can result in a greater loss of data since a pair
of subjects has to be eliminated even if ne
subject is not responsive Essential Epidemiology (WA Oleckno)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
25/47
Lung cancer Controlscases
N=100 N=100
Smokers (NOT recently started)
70 40
An example of unmatched case-control study
Cases Controls
smoker 70 40
Non-smoker 30 60
Odds ratio=
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
26/47
Risk measure in a case-control study
Odds = prevalence / (1 prevalence)Odds ratio = odds in cases / odds in controls
Disease
+case control+ a c
Exposure b dExposure odds in cases a / bExposure odds in controlsc / dOdds ratio(a / b) / (c / d) a * d / b * c
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
27/47
Lung cancer Matched controlsCases by sex & age
N=100 N=100
Smokers (NOT recently started)
70 40
An example of matched case-control study
Case
Smoker Non-smoker
Control smoker 30 10Non-smoker 40 20
Notice that this is the distribution of 100 matched pairs.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
28/47
McNemars test
Case
Smoker Non-smoker
Controlsmoker 30 10
Non-smoker 40 20
Chi-square (test) statistic
= (40 10)2 / (40+10)
= 18
where degree of freedom is 1.
Odds ratio = 40 / 10 = 4
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
29/47
Logistic regression analysis
Logistic regression is used to
model the probability of a
binary response as a function
of a set of variables thought topossibly affect the response
(called covariates).
1: case (with the disease)
Y =
0: control (no disease)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
30/47
One could imagine trying to fit a linear model
(since this is the simplest model !) for the
probabilities, but often this leads to problems:
In a linear model, fitted probabilities can fall
outside of 0 to 1. Because of this, linear models
are seldom used to fit probabilities.
Probability
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
31/47
In a logistic regression analysis, the
logit of the probability is modelled,
rather than the probability itself.
P = probability of getting disease
plogit (p) = log
1-p
As always, we use the natural log. The logit
is therefore the log odds,
since odds = p / (1-p)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
32/47
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
33/47
The values ofa and b will determine whether or
not and how steeply the dose-response curve
rises (or falls) and where it is centered.
Ifb = 0 px is constant over x
b > 0 px increases with xb < 0 px decreases with x
H0: b= 0 is the null hypothesis in a test of trendwhen x is a continuous variable. Knowledge ofb
would give us insight to the direction and degree
of association outcome and exposure.
e (a+bx)
Px =
1 + e(a+bx)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
34/47
Simple logistic regression (with a dichotomous covariate)
Suppose we are considering a case-control study
where the response variable is disease (case) /
non-disease (control) and the predictor variable is
exposed / non-exposed, which we code as an
indicator variable, or dummy variable.
1 D1 1 E1
Y = x =
0 D0 0 E0
And px = Prob (disease given exposure x)= P (Y = 1 | x) x = 0, 1
Thus, p1 = probability of disease among exposed
p0 = probability of disease among non-exposed
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
35/47
In case of exposure (X=1): logit(PE1)=intercept + b
In case of non-exposure (X=0): logit (PE0) =intercept
If you want to obtain odds ratio of exposure group,
ORPE1 / (1-PE1)/ (PE0 / (1-PE0))
log(OR) = log {PE1 / (1-PE1)/ (PE0 / (1-PE0))}
= log (PE1 / (1-PE1)) log(PE0 / (1-PE0))
= logit (P for exposure) logit (P for non-exposure)= (intercept + b)intercept
= b OR = e b
Definition of odds ratio
Si l l i ti i
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
36/47
Simple logistic regression
(with a covariate having more than two categories)
Suppose we are considering a case-control study
where the predictor variable is current smoker / ex-smoker / non-smoker, which we code as a dummy
variable.
Case Smokingstatus
SMK1(X1)
SMK2(X2)
1 Current 1 0
0 Ex-smoker 0 1
1 Non-smoker 0 0
1 Ex-smoker 0 1
0 Non-smoker 0 0
0 Non-smoker 0 0
Original data Dummy variables
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
37/47
Logistic regression model of the previous example
logit (P) = a + b1(X1) + b2 (X2)
In case of current smoker (X1=1, X2=0):
logit(Pcurrent)= a + b1
In case of ex-smoker (X1=0, X2=1) :
logit(Pex)= a + b2
In case of non-smoker (X1=0, X2=0) :logit(Pnon)= a
ORcurrent = e b1
ORex = e b2
ORnon = 1 (referent)
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
38/47
Walds test for no association
The null hypothesis of no association betweenoutcome and exposure corresponds to
H0: OR=1 or H0: b =logOR=0
Using logistic regression results, we can testthis hypothesis using standard coefficients or
Walds test.
Note: STATA and SAS present two-sidedWalds test p-values.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
39/47
Likelihood Ratio Test (LRT)
An alternative way of testing hypotheses in alogistic regression model is with the use of a
likelihood ratio test. The likelihood ratio test
is specifically designed to test between
nested hypotheses.
H0: log (Px / (1-Px)) = a
HA: log (Px / (1-Px)) = a + bxand we say that H0 is nested in HA.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
40/47
Likelihood Ratio Test (LRT)
In order to test H0 vs. HA, we compute the likelihood
ratio test statistic:
G= -2log(LH0 / LHA) = 2 (log LHA log LH0)
= (-2log LH0) (-2log LHA)
Where
LHA is the maximized likelihood under the
alternative hypothesis HA and
LH0 is the maximized likelihood under the nullhypothesis H0.
If the null hypothesis H0 were true, we would expect
the likelihood ratio test statistic to be close to zero.
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
41/47
Walds test vs. LRT
In general, the LRT often works a little better thanthe Wald test, in that the test statistic more closely
follows a X2 distribution under H0. But the Wald test
often works very well and usually gives similar
results.
More importantly, the LRT can more easily be
extended to multivariate hypothesis tests, e.g.,
H0: b1 = b2 = 0 vs. HA: b1 = b2 = 0
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
42/47
World J. Gastroenterology 2006
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
43/47
216CASES
173formalin-fixed
paraffin-embeddedblocks
We could not obtain the information
on tumor location for 23 cases, and
those cases were excluded from the
tumor location specific analysis.
81 cases were excluded
7
65
91
16
REFUSED TOPARTICIPATE
IN THE STUDY
LIVED INVALLE DEL CAUCALESS THAN 5 YEARS
RECURRENT CASES
COULD NOCONTACT
Recruitment of cases1
2
3
4
PATIENTSNEWLY
DIAGNOSEDAS G.C.
395
Sep.2000Dec.2002
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
44/47
431CONTROLS
POTENTIALCONTROLS
528
67
1
29
LIVED INVALLE DEL CAUCALESS THAN 5 YEARS
REFUSED TO
PARTICIPATE
IN THE STUDY
Histry of G.C.
Recruitment of controls1
2
3
Matched by sex, age (5-year ),hospital, date of
administration
Case: control= 1 : 2
Major diseases of controls
cardiovascular diseases 208 trauma 117 infectious diseases 38 urological disorders 21)
| i
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
45/47
xi:logistic casocon i.fumar
i.fumar _Ifumar_0-2 (naturally coded; _Ifumar_0 omitted)
Logistic regression Number of obs = 647
LR chi2(2) = 4.24
Prob > chi2 = 0.1198
Log likelihood = -409.93333 Pseudo R2 = 0.0051
------------------------------------------------------------------------------------------------
casocon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------------------------
_Ifumar_1 | 1.479399 .2817549 2.06 0.040 1.018526 2.148813
_Ifumar_2 | 1.205128 .2660901 0.85 0.398 .7817889 1.857706
------------------------------------------------------------------------------------------------
| gastric cancer
Smoking | 0 1 | Total
-----------+----------------------+----------
Never 0 | 188 78 | 266
Ex- 1 | 145 89 | 234
Current 2 | 98 49 | 147
-----------+----------------------+----------
Total | 431 216 | 647
Walts test p values
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
46/47
xi:clogit casocon i.fumar, group(identi) or
Conditional (fixed-effects) logistic regression Number of obs = 647
LR chi2(2) = 4.64
Prob > chi2 = 0.0982
Log likelihood = -234.5745 Pseudo R2 = 0.0098
---------------------------------------------------------------------------------------------------
casocon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+-------------------------------------------------------------------------------------
_Ifumar_1 | 1.535023 .3061998 2.15 0.032 1.038295 2.269389
_Ifumar_2 | 1.219851 .2784042 0.87 0.384 .7799 1.907985
---------------------------------------------------------------------------------------------------
Walds test p values
Fumar=0
Fumar=1
Fumar=2
Results ofconditional logistic regression analysis using the same data
Case Control OR (95%CI)
Stata command
-
7/28/2019 7 Case-control Analysis Chihaya Hundout
47/47
OR (95%CI)Lower Middle Upper(N=116)* (N=52)* (N=24)*
cigarrete smokingnever 1.0 referent 1.0 referent 1.0 referent
ex-smoker 1.9 (1.1 - 3.4) 1.2 (0.6 - 2.5) 3.7 (1.1 - 12.5)current 1.3 (0.7 - 2.3) 1.3 (0.5 - 3.4) 3.0 (0.6 - 13.9)
P for trend 0.257 0.597 0.083P forheterogeneity 0.059 0.859 0.070
GC risk by smoking in Cali, Colombia
results of tumor-location specific analysis
P= 0.51 P value by LRT
This test examines the difference in the magnitude of the
association between smoking and GC risk among 3 tumor sites.