RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis...
Transcript of RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis...
RACE 615 Introduction to Medical Statistics
Analysis for categorical data
Assist.Prof.Dr.Sasivimol Rattanasiri
Doctor of Philosophy Program in Clinical Epidemiology Section for Clinical Epidemiology & Biostatistics Faculty of Medicine Ramathibodi Hospital Mahidol University
Semester I Academic year 2016 www.ceb-rama.org
CONTENTS
1. Hypothesis testing of two categorical variables ................................................ 3
1.1 Independent samples ................................................................................... 4
1.2 Paired samples ........................................................................................... 12
2. Estimation of the strength of association ........................................................ 16
2.1 Risk ratio (RR) .......................................................................................... 16
2.2 Odds ratio (OR) ......................................................................................... 18
2
OBJECTIVES
This module should help you to
- understand the concept of testing for independence
- Perform analysis for 2x2, 3x2, 3x3 contingency tables
- Estimate the strength of associations such as risk ratio, odds ratio
REFERENCES
1. Agresti A. Categorical data analysis. 2nd edition. John Wiley & Sons, Inc 2002.
New York.
2. Schlesselman, JJ. Case-control studies: Design, conduct, analysis, Oxford University
press, Oxford 1982.
3. Altman GD. Practical statistics for medical research. London: Chapman & Hall 1991.
ASSIGNMENT II (10%)
P. 23 (Due on: September 6, 2016)
3
1. HYPOTHESIS TESTING OF TWO CATEGORICAL VARIABLES
This module deals with hypothesis testing of two categorical variables. There are two
alternative situations that must be considered, according to whether the samples are
independent or paired. Both situations are considered in this module. The situation of
independent samples is presented in section 1.1. The situation of paired samples is explained
later in section 1.2. An analysis of two categorical variables is based upon a contingency table. The contingency
table can be considered as a cross-tabulation of two categorical variables. The contingency
table can be of any size, for example, 2x2, 3x2, or 3x3. The size of the table depends on the
number of categories of two categorical variables. For a 2x2 contingency table, each of the
variables has 2 categories, whereas a 3x2 contingency table, the first variable has 3 categories
whereas the second variable has 2 categories. Layouts of the 2x2 and 3x2 tables are presented
in Table 1-1 and 1-2, respectively. The first digit refers to the number of rows in the table, and
the second digit refers to the number of columns in the table. The entries in the table are the
frequencies that correspond to a particular combination of row and column categories. For
example, n11 in Table 1-1 is the number of subjects in the combination of the first row and the
first column. These combinations are known as cells.
Table 1-1 A layout of the 2x2 contingency table
Variable I Variable II Total Category I Category II Category I n11 n12 n1+ Category II n21 n22 n2+
Total n+1 n+2 n++
Table 1-2 A layout of the 3x2 contingency table
Variable I Variable II Total Category I Category II Category I n11 n12 n1+ Category II n21 n22 n2+
Category III n31 n32 n3+ Total n+1 n+2 n++
4
1.1 Independent samples For a cohort study of oral contraceptive (OC) use, women who were disease-free at the
beginning, were classified into two categories as have ever used OC and have never used OC.
They were followed up for 1 year, with cervical cancer as the outcome of interest. At the end
of the study, they were classified into two categories as had cervical cancer or did not have
cervical cancer. Researchers wanted to compare the proportions of cervical cancer between
have ever used OC and have never used OC groups. In other words, they want to assess the
association between the use of OC and the risk of cervical cancer at 1 year follow up.
From this example, the outcome of interest, cervical cancer, and the risk factor, the OC used,
are binary variables. A 2x2 contingency table was used to assess the association in this
example. The observed frequencies of the 2x2 contingency table for assessing the association
between the OC used and the risk of cervical cancer has been presented in Table 1-3.
Table 1-3 The 2x2 contingency table for assessing the association between the OC used and the
risk of cervical cancer.
Cervical cancer OC used Total
Ever used Never used
Yes 68 254 322
No 150 875 1025
Total 218 1129 1347
1.1.1 Chi-square test
A test which conducted to examine the association between two independent
categorical variables is the Chi-square test. This test is based upon the null hypothesis that the
two categorical variables are independent or the proportions between two independent groups
are identical. The principle of the Chi-square test is based on the probability structure of
contingency tables.
Let X and Y be categorical variables with I and J levels, and the joint probability is a
probability that a sample falls in an i row and a j column which can be calculated by:
++
=nn
p ijij (1)
5
The marginal probability in row i is a probability that a sample falls in an i row which can be
calculated by:
++
++ = n
np ii (2)
The marginal probability in column j is a probability that a sample falls in a j column which
can be calculated by:
++
++ =
nn
p jj (3)
If the null hypothesis is true and two categorical variables are independent, then the joint
probability for each cell is equal to the product of the marginal probability in an i row and in a j
column of that cell. An expected frequency for each cell can be calculated by:
jiij ppne ++++ ××=
++
+
++
+++ ××=
nn
nn
n ji
++
++ ×=n
nn ji (4)
Therefore, the expected frequency for any cell is the product of the relevant row and column
totals divided by the total sample size.
The Chi-square test is obtained by comparing the observed frequency in each cell with the
expected frequency as follows:
∑∑= =
−=
r
i
c
j ij
ijij
een
1 1
22 )(
χ (5)
where r is the number of rows, and c is the number of columns. If the observed frequencies are
away from the expected frequencies, the less likely it is that the null hypothesis is true. Thus, a
large value of the Chi-square test is evidence that the two categorical variables are
independent. The Chi-square test is distributed approximately as a Chi-square distribution with
(r-1)x(c-1) degrees of freedom.
6
Example 1-1: A case-control study was conducted to look at the effects of traditional
medicine usage on the risk of osteoporotic hip fracture in women. The Chi-square test was
used to assess the association between traditional medicine usage and the risk of osteoporotic
hip fracture in women. The 2x2 contingency table for assessment is given as follows:
Table 1-4 The observed frequencies for assessment of the association between traditional
medicine use and the risk of osteoporotic hip fracture in women
Hip fracture
Traditional medicine use
Total Yes No
Yes 20 208 228
No 8 216 224
Total 28 424 452
H0: Traditional medicine use is not associated with the risk of osteoporotic hip fracture or
H0: The proportion of traditional medicine use in women with osteoporotic hip fracture is
equal to the proportion of traditional medicine use in women without osteoporotic hip
fracture.
HA: Traditional medicine use is associated with the risk of osteoporotic hip fracture or
HA: The proportions of traditional medicine use in women with osteoporotic hip fracture and
women without osteoporotic hip fracture are different.
Step I: Calculate the expected frequency for each cell:
210.1452
42422422e
13.9452
2822421e
213.9452
42422812e
14.1452
2822811e
=×
=
=×
=
=×
=
=×
=
7
Step II: Calculate the chi-square test:
Step III: Calculate the associated p value by the STATA program:
. disp chi2tail(1, 5.30)
.02132542
For individual data, the Chi-square test using the menu in the STATA program, we would select:
Statistics menu --> Summaries, tables, and tests --> Frequency Tables --> Two-way tables with
measure of association, and then specify Pearson’s chi-squared in option test statistics . tab hip tredmed,col exp chi2 +--------------------+ | Key | |--------------------| | frequency | | expected frequency | | column percentage | +--------------------+ hip | traditional medicine fracture | yes no | Total -----------+----------------------+---------- yes | 20 208 | 228 | 14.1 213.9 | 228.0 | 71.43 49.06 | 50.44 -----------+----------------------+---------- no | 8 216 | 224 | 13.9 210.1 | 224.0 | 28.57 50.94 | 49.56 -----------+----------------------+---------- Total | 28 424 | 452 | 28.0 424.0 | 452.0 | 100.00 100.00 | 100.00 Pearson chi2(1) = 5.2588 Pr = 0.022
For summary data in Table 1-4, the Chi-square test using the menu in the STATA program, we
would select:
Statistics menu --> Summaries, tables, and tests --> Frequency Tables --> Table calculator, and
then specify Pearson’s chi-squared in option test statistics
5.30 0.172.500.162.47
210.1
2210.1)(21613.9
213.9)(8213.9
2213.9)(20814.1
214.1)(202χ
=
+++=
−+
−+
−+
−=
8
. tabi 20 208 \ 8 216,col exp chi2 +--------------------+ | Key | |--------------------| | frequency | | expected frequency | | column percentage | +--------------------+ hip | traditional medicine fracture | yes no | Total -----------+----------------------+---------- yes | 20 208 | 228 | 14.1 213.9 | 228.0 | 71.43 49.06 | 50.44 -----------+----------------------+---------- no | 8 216 | 224 | 13.9 210.1 | 224.0 | 28.57 50.94 | 49.56 -----------+----------------------+---------- Total | 28 424 | 452 | 28.0 424.0 | 452.0 | 100.00 100.00 | 100.00 Pearson chi2(1) = 5.2588 Pr = 0.022
Step IV: Compare the p value with the level of significance
The p value from Step III is equal to 0.02 which is less than the level of significance. As a
result, we reject the null hypothesis and conclude that traditional medicine use is significantly
associated with osteoporotic hip fracture in women. In other words, the proportion of
traditional medicine use in women with osteoporotic hip fracture is different from the
proportion of traditional medicine use in women without osteoporotic hip fracture. The
proportion of traditional medicine use in women with hip fracture is 8.77 while the proportion
of traditional medicine use is 3.57 in women without hip fracture.
1.1.2 Fisher’s exact test
However, the Chi-square test is not an appropriate method of test for independence if
the expected value is less than 5 for more than 20% of the total cells. The Fisher’s exact test is
an alternative method when the requirement of the Chi-square test is not met.
The principle of Fisher’s exact test is to evaluate the probability associated with all possible
2×2 contingency tables which have the same row and column totals as the observed data. The
probability of obtaining the cell frequencies a, b, c, and d when the null hypothesis is true and
the row and column totals are fixed can be calculated by:
(6)
d!c!b!a!N!
d)!(cd)!(bc)!(ab)!(a +++++
9
Where the symbol x!, called “x factorial”, means that all the integers from 1 up to x are
multiplied together.
A two-sided p value is the sum of probabilities of all possible tables that have probabilities
less than or equal to the probability of the observed table. The details of Fisher’s exact test
can be found in Altman, Section 10.7.2, pages 253-257.
Example 1-2: A case-control study was conducted to look at the effects of receiving
hormonal replacement therapy (HRT) on the risk of hip fracture. The Fisher’s exact test was
used to assess the association between receiving hormonal replacement therapy (HRT) and
the risk of hip fracture. Performing the Chi-square test is not appropriate in this case because
the expected frequency is less than 5 for 50% (2 in 4 cells have expected frequencies less than
5). The 2x2 contingency table of observed frequencies and expected frequencies for
assessment is given as follows.
Table 1-5 The observed frequencies (the expected frequencies) for assessment of the effects
of HRT on hip fracture.
Hip fracture HRT
Total Yes No
Yes 1 (1.5) 213 (212.5) 214
No 2 (1.5) 214 (214.5) 216
Total 3 427 430
H0: HRT is not associated with the risk of hip fracture or
H0: The proportion of patients who receive HRT and have hip fracture is equal to the
proportion of patients who receive HRT and do not have hip fracture.
HA: HRT is associated with the risk of hip fracture or
HA: The proportion of patients who receive HRT and have hip fracture is different from the
proportion of patients who receive HRT and do not have hip fracture
Step I: Calculate the probability associated with all possible 2×2 contingency tables which
have the same row and column totals as the observed data.
10
All possible tables of frequencies which have the same row and column totals as the observed
data in Table 1-5 are presented in Table 1-6.
Table 1-6 All possible tables of frequencies which have the same row and column totals as
Table 1-5.
(i) 0 3 (ii) 1 2
214 213 213 214
(iii) 2 1 (iv) 3 0
212 215 211 216
The probability of obtaining the cell frequencies associated with a set of frequencies in table
(i) can be defined as:
1) Probability of table (i)
0 3
214 213
Table 1-7 Summary of probability associated with each set of frequencies of all possible
tables
Table Probability
i 0.126
ii 0.378
iii 0.374
iv 0.122
0.126428429430214215216
213!430!216!427!
4!213!430!0!3!21427!3!214!216!
=××
××=
=
=
11
Step II: Calculate p value
A two-sided p value is the sum of probabilities of all possible tables that have probabilities
less than or equal to the probability of the observed table. The probability of the observed
value for this study is 0.378. Therefore, the two-sided p value for this study is equal to
0.126+0.378+0.374+0.122=1.000.
For individual data, the Fisher’s exact test using the STATA program, we would select:
Statistics menu --> Summaries, tables, and tests --> Frequency Tables --> Two-way tables
with measure of association, and then specify Fisher’s exact test in option test statistics
. tab hip hrt,col exp exact +--------------------+ | Key | |--------------------| | frequency | | expected frequency | | column percentage | +--------------------+ | hrt hip | yes no | Total -----------+----------------------+---------- yes | 1 213 | 214 | 1.5 212.5 | 214.0 | 33.33 49.88 | 49.77 -----------+----------------------+---------- no | 2 214 | 216 | 1.5 214.5 | 216.0 | 66.67 50.12 | 50.23 -----------+----------------------+---------- Total | 3 427 | 430 | 3.0 427.0 | 430.0 | 100.00 100.00 | 100.00 Fisher's exact = 1.000 1-sided Fisher's exact = 0.503
Step III: Compare the p value with the level of significance
The p value for two-tailed test from the Step II is equal to 1.00 which is greater than the level
of significance. As a result, we fail to reject the null hypothesis and conclude that HRT is not
significantly associated with osteoporotic hip fracture. In other words, the proportion of
patients who receive HRT and have hip fracture is equal to the proportion of patients who
receive HRT and do not have hip fracture. The proportion of patients who receive HRT and
have hip fracture is 0.47 and the proportion of patients who receive HRT and do not have hip
fracture is 0.93.
12
1.1.3 Dummy tables
The example of dummy table for assessing the association between patient’s
characteristics and death has been presented in Table 1-8.
Table 1-8 Description of patient’s characteristics between dying and living
Characteristics Death Survival P value N=40(%) N=160(%)
Sex Male Female Race Light Dark Other Service at ICU admission Medical Surgical Cancer part of present problem Yes No PH from initial blood gases >=7.25 <7.25
1.2 Paired samples There are several circumstances in which researchers want to investigate the relationship
between two categorical variables for paired samples. For example, a researcher wants to
compare the pain relief by two different analgesics in the same subject. Another example is a
matched case-control study in which investigators want to match case to control patients with
body mass index (BMI) and assess the association between HRT and hip fracture. The
simplest case is when subjects are classified into two groups. A 2x2 contingency table for
paired samples is presented in Table 1-9.
13
Table 1-9 A 2x2 contingency table for paired samples
Cases Controls
Total Exposed Unexposed
Exposed a b a+b
Unexposed c d c+d
Total a+c b+d n
where a, b, c, and d are the frequencies of pairs for each combination and b+c is the number
of discordant pairs1. An appropriate statistical test for testing the relationship between two
categorical variables in paired samples which are shown as a 2x2 table, is the McNemar’s test
which can be calculated by:
cb|)cb(|χ
22
+−
= (7)
The unit of analysis of this test is the matched pair, which is different from the Chi-square
test, in which the unit of analysis is the person. If the number of discordant pairs is less than
20, the Exact McNemar’s test is more appropriate. The McNemar’s test is a test of paired
proportions especially when the data are presented as a 2x2 table. When there is k x k
contingency table, an extension of the McNemar’s test known as the Stuart-Maxwell test is
required.
Example 1-3: For a matched case-control study, researchers wanted to assess the association
between HRT and hip fracture. The patients were assigned to pairs matched on BMI. The 2x2
contingency table for assessment has been given in Table 1-10.
1 The discordant pairs are the pairs of different outcomes, for example, b is the pairs of outcomes in which a subject who has exposure is paired with an individual who does not.
14
Table 1-10 A 2x2 contingency table of 372 matched pairs for assessment of the association
between HRT and hip fracture
Cases Controls
Total HRT+ HRT-
HRT+ 102 50 152
HRT- 100 120 220
Total 202 170 372
H0: HRT is not associated with hip fracture or
H0: The paired proportion of patients who receive HRT and have hip fracture is equal to the
paired proportion of patients who receive HRT and do not have hip fracture.
HA: HRT is associated with hip fracture or
HA: The paired proportion of patients who receive HRT and have hip fracture is different
from the paired proportion of patients who receive HRT and do not have hip fracture
Step I Calculate the McNemar’s test
16.67100)(50
2|)10050(|2χ =+−
=
Step II Calculate the associated p value by the STATA program
. disp chi2tail(1, 16.67)
.00004448
For individual data, for the McNemar’s test using the STATA program, we would select:
Statistics menu --> Epidemiology and related --> Tables for epidemiologists --> Matched
case-control studies
For summary data, the McNemar’s test using the STATA program, we would select:
Statistics menu --> Epidemiology and related --> Tables for epidemiologists --> Matched
case-control calculator
15
. mcc case control | Controls | Cases | Exposed Unexposed | Total -----------------+------------------------+------------ Exposed | 102 50 | 152 Unexposed | 100 120 | 220 -----------------+------------------------+------------ Total | 202 170 | 372 McNemar's chi2(1) = 16.67 Prob > chi2 = 0.0000 Exact McNemar significance probability = 0.0001 Proportion with factor Cases .4086022 Controls .5430108 [95% Conf. Interval] --------- -------------------- difference -.1344086 -.2001631 -.0686541 ratio .7524752 .656141 .8629532 rel. diff. -.2941176 -.4547495 -.1334858 odds ratio .5 .3487202 .7089431 (exact)
Step III Compare the p value with the level of significance
The p value for a two-tailed test from Step II is less than 0.001 which is less than the level of
significance. As a result, we reject the null hypothesis and conclude that HRT is significantly
associated with osteoporotic hip fracture. In other words, the paired proportion of patients
who receive HRT and have hip fracture is different from the paired proportion of patients who
receive HRT and do not have hip fracture.
16
2. ESTIMATION OF THE STRENGTH OF ASSOCIATION The estimation of the strengths of association is another way for describing the relationship
between two categorical variables. In this section, we consider only the case where there are
two groups of subjects and only two groups of outcome. The analysis is based on a 2x2
contingency table as Table 2-1.
Table 2-1 A 2x2 contingency table for estimation of the strength of association.
Exposed Non-exposed Total
Case a b a+b
Control c d c+d
Total a+c b+d a+b+c+d
2.1 Risk ratio (RR) Risk ratio (RR) is a ratio of the proportion of disease (or event) in exposed (or high risk) versus
non-exposed (low risk) groups. It is used to determine association in a cohort study or randomized
control trial (RCT). This can be estimated as:
d)b/(b
c)a/(a
EIEI
RR+
+=
−
+= (8)
and 95% CI can be estimated as:
}SE(ln(RR))1.96exp{ln(RR)95%CI ×±= (9)
where,
db
1
b
1
ca
1
a
1SE(ln(RR))
+−+
+−= . (10)
Example 2-1: For a cohort study of kidney transplantations, researchers wanted to estimate
the strength of association of the graft failure rate in kidney transplantation between cadaveric
kidney transplantation (CDKT) and living related kidney transplantation (LRKT). It was
found that 61 patients had graft failure among 134 patients who were transplanted by CDKT,
whereas 35 patients had graft failure among 220 patients who were transplanted by LRKT.
The 2x2 contingency table for estimation has been presented in Table 2-2.
17
Table 2-2 The 2x2 contingency table for estimation of the risk ratio of the graft failure and
types of donors (CDKT or LRKT)
Graft failure Types of donors
Total CDKT LRKT
Yes 61 35 96
No 73 185 258
Total 134 220 354
The risk ratio can be estimated as:
The 95% CI can be estimated as:
95%CI = 2.00, 4.08
For individual data, the estimation of risk ratio using the STATA program, we would select:
Statistics menu --> Epidemiology and related --> Tables for epidemiologists --> Cohort study
risk-ratio etc.
For summary data, the estimation of risk ratio using the STATA program, we would select:
Statistics menu --> Epidemiology and related --> Tables for epidemiologists --> Cohort study
risk-ratio etc. calculator
2.86145.52/15.9RR15.9135/220I45.5261/134I
LRKT
CDKT
======
0.182201
351
1341
611SE(lnRR) =−+−=
0.18)}(1.966)exp{ln(2.895%CI ×±=
18
. csi 61 35 73 185 | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 61 35 | 96 Noncases | 73 185 | 258 -----------------+------------------------+------------ Total | 134 220 | 354 | | Risk | .4552239 .1590909 | .2711864 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .296133 | .1989455 .3933204 Risk ratio | 2.861407 | 2.004715 4.084197 Attr. frac. ex. | .6505216 | .501176 .7551538 Attr. frac. pop | .4133523 | +------------------------------------------------- chi2(1) = 36.95 Pr>chi2 = 0.0000 The RR is equal to 2.86 with 95% CI (2.00 to 4.08). It means that patients who were
transplanted by CDKT had 2.86 times higher risk of graft failure than the patients who were
transplanted by LRKT with a 95% confidence that it would lie between 2.00 to 4.08 times.
2.2 Odds ratio (OR) Odds ratio (OR) is a ratio of the odds of having disease (or event) in exposed (high risk)
versus non-exposed (low risk) groups. It is used in case-control or cross-sectional studies.
There are two different conditions for estimation of OR: 1) independent samples, 2) matched
paired samples.
2.2.1 Independent samples
The odds ratio for independent samples can be estimated as:
bcad
b/da/c
oddsodds
ORE
E ===−
+
(11)
and 95% CI can be estimated as:
}SE(ln(OR))1.96exp{ln(OR)95%CI ×±= (12)
Where,
d
1
c
1
b
1
a
1SE(ln(OR)) +++= . (13)
19
Example 2-2: For a case-control study, researchers wanted to estimate the strength of
association of traditional medicine usage and hip fracture. It was found that among 228
patients who had hip fracture, 20 used traditional medicine, whereas among 224 patients who
did not have hip fracture, 8 used traditional medicine. The 2x2 contingency table for
estimation has been presented in Table 2-3.
Table 2-3 The 2x2 contingency table for estimation of the OR of the hip fracture and
traditional medicine usage
Hip fracture Traditional medicine used
Total Yes No
Yes 20 208 228
No 8 216 224
Total 28 424 452
The odds ratio can be estimated as:
The 95% CI can be estimated as:
95%CI=1.120, 6.019
2.596208821620OR
208/216odds
20/8odds
E
E
=××
=
=
=
−
+
0.4292161
81
2081
201SE(ln(OR)) =+++=
{ }0.429)(1.96ln(2.596)exp95%CI ×±=
20
To estimate the odds ratio using the STATA program, we would type:
. cci 20 208 8 216
Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+------------------------ Cases | 20 208 | 228 0.0877 Controls | 8 216 | 224 0.0357 -----------------+------------------------+------------------------ Total | 28 424 | 452 0.0619 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Odds ratio | 2.596154 | 1.064509 6.956335 (exact) Attr. frac. ex. | .6148148 | .0606001 .8562461 (exact) Attr. frac. pop | .0539311 | +------------------------------------------------- chi2(1) = 5.26 Pr>chi2 = 0.0218
The odds ratio is equal to 2.60 with 95% CI (1.12 to 6.02). It means that patients who used
traditional medicine had 2.60 times higher risk of developing hip fracture than the patients
who did not use traditional medicine.
2.2.2 Matched paired samples
In case that the design is a matched design, or the samples are related samples, the data layout
for testing hypothesis of independence and estimation of OR are presented in Table 1-9. The
estimation of the OR for matched design is based on the number of discordant pairs b and c.
The maximum likelihood estimation of the OR, which is conditional on the number of
discordant pairs, is given by:
b/cOR = (14)
and 95% CI can be estimated as:
}SE(ln(OR))1.96exp{ln(OR)95%CI ×±= (15)
where,
c
1
b
1SE(ln(OR)) += . (16)
Example 2-3: For a matched case-control study in which researchers matched by age, they
wanted to estimate the strength of association between HRT and hip fracture. The 2x2
contingency table for assessment has been given in Table 2-4.
21
Table 2-4 The 2x2 contingency table for estimation of the OR of hip fracture and HRT
Case
Control
Total HRT+ HRT-
HRT+ 102 50 152
HRT- 100 120 220
total 202 170 372
The odds ratio can be estimated as:
The 95% CI can be estimated as:
95%CI=0.35, 0.71
To estimate the odds ratio using the STATA program, we would type: . mcci 102 50 100 120 | Controls | Cases | Exposed Unexposed | Total -----------------+------------------------+------------ Exposed | 102 50 | 152 Unexposed | 100 120 | 220 -----------------+------------------------+------------ Total | 202 170 | 372 McNemar's chi2(1) = 16.67 Prob > chi2 = 0.0000 Exact McNemar significance probability = 0.0001 Proportion with factor Cases .4086022 Controls .5430108 [95% Conf. Interval] --------- -------------------- difference -.1344086 -.2001631 -.0686541 ratio .7524752 .656141 .8629532 rel. diff. -.2941176 -.4547495 -.1334858 odds ratio .5 .3487202 .7089431 (exact)
0.510050OR ==
0.17100
1501SE(ln(OR)) =+=
{ }0.17)(1.96ln(0.5)exp95%CI ×±=
22
The odds ratio is equal to 0.5 with 95% CI (0.35 to 0.71). That is after adjusting for BMI,
receiving HRT could prevent hip fracture by 50% with a 95% confidence that it would lie
between 35% to 71%.
2.2.3 Dummy tables
Example of the dummy table for reporting the OR from the study of the association between
patient’s characteristics and risk of death is presented in Table 2-5.
Table 2-5 Association between patient’s characteristics and risk of death
Characteristics OR 95% CI P value
Sex
Male
Female 1*
Race
Light
Dark
Other 1
Service at ICU admission
Medical
Surgical 1
Cancer part of present problem
Yes
No 1
PH from initial blood gases
>=7.25 <7.25 1
*Odd ratios is equal to 1 for the reference group
23
Assignment II: Hypothesis testing for categorical data (10%)
Due date : Sep 6, 2016
1) From the RCT of calcium supplements in Thai osteoporotic women, subjects will be
randomly allocated to receive either HRT or calcium supplement. The primary objective is to
compare bone mineral density (BMD) after receive treatment between HRT and calcium
supplement groups. The secondary objective is to compare the rate of fracture and adverse
events between 2 groups.
In order to make sure that randomization can balance the groups with respect to many known
and unknown confounding factors, researchers have to compare baseline characteristics of
subjects between 2 groups such as age, BMI, risk behaviors, and underlying disease, see
Table below.
Variable Variable name Label value
Treatment treat 1=calcium supplement, 0=HRT
Age age Classify age into 3 groups as age < 65 years; age 65-70
years; age > 70 years
Age at menopause agemenop Classify age at menopause into 2 groups as <=35 years
and > 35 years
Body mass index (BMI) - Calculate from weight and height
- Classify BMI <18.5 as underweight
BMI 18.5-24.9 as normal
BMI 25.0-29.9 as overweight
BMI >=30.0 as obesity
Smoking behavior smoking 1=smoke, 2=ex-smoke, 3=non-smoke, 9=missing
Alcohol consumption alcohol 1=yes, 2=sometime, 3=no, 9=missing
Sun exposure sunexp 1 <0.5 hr/day, 2=0.5-2 hr/day, 3 >2 hr/day, 9=missing
Exercise exercise 1=regular, 2=sometime, 3=no, 9=missing
Diabetes mellitus Classify by glucose>=126 mg/dl
Hypertension Classify SBP>=140 mmHg or DBP>=90 mmHg as
hypertension, otherwise, normal
High cholesterol Classify cholesterol >=240 mg/dl as high cholesterol,
otherwise, normal
24
The data are given in the data set cross-sectional_BMD_&_risk_factor.dta
The aim of this assignment is to:
a) Categorize continuous data according to the definition in the third column.
b) Compare baseline characteristics, risk behaviors, and underlying disease between
subjects who receive HRT and calcium supplement.
c) Write up the details of statistical analysis, which statistical test that you apply for
each variable.
d) Create a dummy table and present the results according to the dummy table.
e) Interpret and writing results according to the table.
2) For the secondary objective, researchers would like to assess the association between
treatment and interested outcomes (fracture and adverse events). The details of related
variables are presented in following table.
Variable Variable name Label value
Treatment treat 1=calcium supplement, 0=HRT
Fracture fracture 1=yes, 0=no
Adverse events adveffect 1=yes, 0=no
The aim of this assignment is to:
a) Assess the association between treatment and outcomes and explain which
statistical test that you apply for.
b) Estimate the strength of association between treatment and outcomes.
c) Interpret and writing the results.