RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis...

25
RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy Program in Clinical Epidemiology Section for Clinical Epidemiology & Biostatistics Faculty of Medicine Ramathibodi Hospital Mahidol University Semester I Academic year 2016 www.ceb-rama.org

Transcript of RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis...

Page 1: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

RACE 615 Introduction to Medical Statistics

Analysis for categorical data

Assist.Prof.Dr.Sasivimol Rattanasiri

Doctor of Philosophy Program in Clinical Epidemiology Section for Clinical Epidemiology & Biostatistics Faculty of Medicine Ramathibodi Hospital Mahidol University

Semester I Academic year 2016 www.ceb-rama.org

Page 2: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

CONTENTS

1. Hypothesis testing of two categorical variables ................................................ 3

1.1 Independent samples ................................................................................... 4

1.2 Paired samples ........................................................................................... 12

2. Estimation of the strength of association ........................................................ 16

2.1 Risk ratio (RR) .......................................................................................... 16

2.2 Odds ratio (OR) ......................................................................................... 18

Page 3: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

2

OBJECTIVES

This module should help you to

- understand the concept of testing for independence

- Perform analysis for 2x2, 3x2, 3x3 contingency tables

- Estimate the strength of associations such as risk ratio, odds ratio

REFERENCES

1. Agresti A. Categorical data analysis. 2nd edition. John Wiley & Sons, Inc 2002.

New York.

2. Schlesselman, JJ. Case-control studies: Design, conduct, analysis, Oxford University

press, Oxford 1982.

3. Altman GD. Practical statistics for medical research. London: Chapman & Hall 1991.

ASSIGNMENT II (10%)

P. 23 (Due on: September 6, 2016)

Page 4: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

3

1. HYPOTHESIS TESTING OF TWO CATEGORICAL VARIABLES

This module deals with hypothesis testing of two categorical variables. There are two

alternative situations that must be considered, according to whether the samples are

independent or paired. Both situations are considered in this module. The situation of

independent samples is presented in section 1.1. The situation of paired samples is explained

later in section 1.2. An analysis of two categorical variables is based upon a contingency table. The contingency

table can be considered as a cross-tabulation of two categorical variables. The contingency

table can be of any size, for example, 2x2, 3x2, or 3x3. The size of the table depends on the

number of categories of two categorical variables. For a 2x2 contingency table, each of the

variables has 2 categories, whereas a 3x2 contingency table, the first variable has 3 categories

whereas the second variable has 2 categories. Layouts of the 2x2 and 3x2 tables are presented

in Table 1-1 and 1-2, respectively. The first digit refers to the number of rows in the table, and

the second digit refers to the number of columns in the table. The entries in the table are the

frequencies that correspond to a particular combination of row and column categories. For

example, n11 in Table 1-1 is the number of subjects in the combination of the first row and the

first column. These combinations are known as cells.

Table 1-1 A layout of the 2x2 contingency table

Variable I Variable II Total Category I Category II Category I n11 n12 n1+ Category II n21 n22 n2+

Total n+1 n+2 n++

Table 1-2 A layout of the 3x2 contingency table

Variable I Variable II Total Category I Category II Category I n11 n12 n1+ Category II n21 n22 n2+

Category III n31 n32 n3+ Total n+1 n+2 n++

Page 5: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

4

1.1 Independent samples For a cohort study of oral contraceptive (OC) use, women who were disease-free at the

beginning, were classified into two categories as have ever used OC and have never used OC.

They were followed up for 1 year, with cervical cancer as the outcome of interest. At the end

of the study, they were classified into two categories as had cervical cancer or did not have

cervical cancer. Researchers wanted to compare the proportions of cervical cancer between

have ever used OC and have never used OC groups. In other words, they want to assess the

association between the use of OC and the risk of cervical cancer at 1 year follow up.

From this example, the outcome of interest, cervical cancer, and the risk factor, the OC used,

are binary variables. A 2x2 contingency table was used to assess the association in this

example. The observed frequencies of the 2x2 contingency table for assessing the association

between the OC used and the risk of cervical cancer has been presented in Table 1-3.

Table 1-3 The 2x2 contingency table for assessing the association between the OC used and the

risk of cervical cancer.

Cervical cancer OC used Total

Ever used Never used

Yes 68 254 322

No 150 875 1025

Total 218 1129 1347

1.1.1 Chi-square test

A test which conducted to examine the association between two independent

categorical variables is the Chi-square test. This test is based upon the null hypothesis that the

two categorical variables are independent or the proportions between two independent groups

are identical. The principle of the Chi-square test is based on the probability structure of

contingency tables.

Let X and Y be categorical variables with I and J levels, and the joint probability is a

probability that a sample falls in an i row and a j column which can be calculated by:

++

=nn

p ijij (1)

Page 6: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

5

The marginal probability in row i is a probability that a sample falls in an i row which can be

calculated by:

++

++ = n

np ii (2)

The marginal probability in column j is a probability that a sample falls in a j column which

can be calculated by:

++

++ =

nn

p jj (3)

If the null hypothesis is true and two categorical variables are independent, then the joint

probability for each cell is equal to the product of the marginal probability in an i row and in a j

column of that cell. An expected frequency for each cell can be calculated by:

jiij ppne ++++ ××=

++

+

++

+++ ××=

nn

nn

n ji

++

++ ×=n

nn ji (4)

Therefore, the expected frequency for any cell is the product of the relevant row and column

totals divided by the total sample size.

The Chi-square test is obtained by comparing the observed frequency in each cell with the

expected frequency as follows:

∑∑= =

−=

r

i

c

j ij

ijij

een

1 1

22 )(

χ (5)

where r is the number of rows, and c is the number of columns. If the observed frequencies are

away from the expected frequencies, the less likely it is that the null hypothesis is true. Thus, a

large value of the Chi-square test is evidence that the two categorical variables are

independent. The Chi-square test is distributed approximately as a Chi-square distribution with

(r-1)x(c-1) degrees of freedom.

Page 7: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

6

Example 1-1: A case-control study was conducted to look at the effects of traditional

medicine usage on the risk of osteoporotic hip fracture in women. The Chi-square test was

used to assess the association between traditional medicine usage and the risk of osteoporotic

hip fracture in women. The 2x2 contingency table for assessment is given as follows:

Table 1-4 The observed frequencies for assessment of the association between traditional

medicine use and the risk of osteoporotic hip fracture in women

Hip fracture

Traditional medicine use

Total Yes No

Yes 20 208 228

No 8 216 224

Total 28 424 452

H0: Traditional medicine use is not associated with the risk of osteoporotic hip fracture or

H0: The proportion of traditional medicine use in women with osteoporotic hip fracture is

equal to the proportion of traditional medicine use in women without osteoporotic hip

fracture.

HA: Traditional medicine use is associated with the risk of osteoporotic hip fracture or

HA: The proportions of traditional medicine use in women with osteoporotic hip fracture and

women without osteoporotic hip fracture are different.

Step I: Calculate the expected frequency for each cell:

210.1452

42422422e

13.9452

2822421e

213.9452

42422812e

14.1452

2822811e

=

=

=

=

Page 8: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

7

Step II: Calculate the chi-square test:

Step III: Calculate the associated p value by the STATA program:

. disp chi2tail(1, 5.30)

.02132542

For individual data, the Chi-square test using the menu in the STATA program, we would select:

Statistics menu --> Summaries, tables, and tests --> Frequency Tables --> Two-way tables with

measure of association, and then specify Pearson’s chi-squared in option test statistics . tab hip tredmed,col exp chi2 +--------------------+ | Key | |--------------------| | frequency | | expected frequency | | column percentage | +--------------------+ hip | traditional medicine fracture | yes no | Total -----------+----------------------+---------- yes | 20 208 | 228 | 14.1 213.9 | 228.0 | 71.43 49.06 | 50.44 -----------+----------------------+---------- no | 8 216 | 224 | 13.9 210.1 | 224.0 | 28.57 50.94 | 49.56 -----------+----------------------+---------- Total | 28 424 | 452 | 28.0 424.0 | 452.0 | 100.00 100.00 | 100.00 Pearson chi2(1) = 5.2588 Pr = 0.022

For summary data in Table 1-4, the Chi-square test using the menu in the STATA program, we

would select:

Statistics menu --> Summaries, tables, and tests --> Frequency Tables --> Table calculator, and

then specify Pearson’s chi-squared in option test statistics

5.30 0.172.500.162.47

210.1

2210.1)(21613.9

213.9)(8213.9

2213.9)(20814.1

214.1)(202χ

=

+++=

−+

−+

−+

−=

Page 9: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

8

. tabi 20 208 \ 8 216,col exp chi2 +--------------------+ | Key | |--------------------| | frequency | | expected frequency | | column percentage | +--------------------+ hip | traditional medicine fracture | yes no | Total -----------+----------------------+---------- yes | 20 208 | 228 | 14.1 213.9 | 228.0 | 71.43 49.06 | 50.44 -----------+----------------------+---------- no | 8 216 | 224 | 13.9 210.1 | 224.0 | 28.57 50.94 | 49.56 -----------+----------------------+---------- Total | 28 424 | 452 | 28.0 424.0 | 452.0 | 100.00 100.00 | 100.00 Pearson chi2(1) = 5.2588 Pr = 0.022

Step IV: Compare the p value with the level of significance

The p value from Step III is equal to 0.02 which is less than the level of significance. As a

result, we reject the null hypothesis and conclude that traditional medicine use is significantly

associated with osteoporotic hip fracture in women. In other words, the proportion of

traditional medicine use in women with osteoporotic hip fracture is different from the

proportion of traditional medicine use in women without osteoporotic hip fracture. The

proportion of traditional medicine use in women with hip fracture is 8.77 while the proportion

of traditional medicine use is 3.57 in women without hip fracture.

1.1.2 Fisher’s exact test

However, the Chi-square test is not an appropriate method of test for independence if

the expected value is less than 5 for more than 20% of the total cells. The Fisher’s exact test is

an alternative method when the requirement of the Chi-square test is not met.

The principle of Fisher’s exact test is to evaluate the probability associated with all possible

2×2 contingency tables which have the same row and column totals as the observed data. The

probability of obtaining the cell frequencies a, b, c, and d when the null hypothesis is true and

the row and column totals are fixed can be calculated by:

(6)

d!c!b!a!N!

d)!(cd)!(bc)!(ab)!(a +++++

Page 10: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

9

Where the symbol x!, called “x factorial”, means that all the integers from 1 up to x are

multiplied together.

A two-sided p value is the sum of probabilities of all possible tables that have probabilities

less than or equal to the probability of the observed table. The details of Fisher’s exact test

can be found in Altman, Section 10.7.2, pages 253-257.

Example 1-2: A case-control study was conducted to look at the effects of receiving

hormonal replacement therapy (HRT) on the risk of hip fracture. The Fisher’s exact test was

used to assess the association between receiving hormonal replacement therapy (HRT) and

the risk of hip fracture. Performing the Chi-square test is not appropriate in this case because

the expected frequency is less than 5 for 50% (2 in 4 cells have expected frequencies less than

5). The 2x2 contingency table of observed frequencies and expected frequencies for

assessment is given as follows.

Table 1-5 The observed frequencies (the expected frequencies) for assessment of the effects

of HRT on hip fracture.

Hip fracture HRT

Total Yes No

Yes 1 (1.5) 213 (212.5) 214

No 2 (1.5) 214 (214.5) 216

Total 3 427 430

H0: HRT is not associated with the risk of hip fracture or

H0: The proportion of patients who receive HRT and have hip fracture is equal to the

proportion of patients who receive HRT and do not have hip fracture.

HA: HRT is associated with the risk of hip fracture or

HA: The proportion of patients who receive HRT and have hip fracture is different from the

proportion of patients who receive HRT and do not have hip fracture

Step I: Calculate the probability associated with all possible 2×2 contingency tables which

have the same row and column totals as the observed data.

Page 11: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

10

All possible tables of frequencies which have the same row and column totals as the observed

data in Table 1-5 are presented in Table 1-6.

Table 1-6 All possible tables of frequencies which have the same row and column totals as

Table 1-5.

(i) 0 3 (ii) 1 2

214 213 213 214

(iii) 2 1 (iv) 3 0

212 215 211 216

The probability of obtaining the cell frequencies associated with a set of frequencies in table

(i) can be defined as:

1) Probability of table (i)

0 3

214 213

Table 1-7 Summary of probability associated with each set of frequencies of all possible

tables

Table Probability

i 0.126

ii 0.378

iii 0.374

iv 0.122

0.126428429430214215216

213!430!216!427!

4!213!430!0!3!21427!3!214!216!

=××

××=

=

=

Page 12: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

11

Step II: Calculate p value

A two-sided p value is the sum of probabilities of all possible tables that have probabilities

less than or equal to the probability of the observed table. The probability of the observed

value for this study is 0.378. Therefore, the two-sided p value for this study is equal to

0.126+0.378+0.374+0.122=1.000.

For individual data, the Fisher’s exact test using the STATA program, we would select:

Statistics menu --> Summaries, tables, and tests --> Frequency Tables --> Two-way tables

with measure of association, and then specify Fisher’s exact test in option test statistics

. tab hip hrt,col exp exact +--------------------+ | Key | |--------------------| | frequency | | expected frequency | | column percentage | +--------------------+ | hrt hip | yes no | Total -----------+----------------------+---------- yes | 1 213 | 214 | 1.5 212.5 | 214.0 | 33.33 49.88 | 49.77 -----------+----------------------+---------- no | 2 214 | 216 | 1.5 214.5 | 216.0 | 66.67 50.12 | 50.23 -----------+----------------------+---------- Total | 3 427 | 430 | 3.0 427.0 | 430.0 | 100.00 100.00 | 100.00 Fisher's exact = 1.000 1-sided Fisher's exact = 0.503

Step III: Compare the p value with the level of significance

The p value for two-tailed test from the Step II is equal to 1.00 which is greater than the level

of significance. As a result, we fail to reject the null hypothesis and conclude that HRT is not

significantly associated with osteoporotic hip fracture. In other words, the proportion of

patients who receive HRT and have hip fracture is equal to the proportion of patients who

receive HRT and do not have hip fracture. The proportion of patients who receive HRT and

have hip fracture is 0.47 and the proportion of patients who receive HRT and do not have hip

fracture is 0.93.

Page 13: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

12

1.1.3 Dummy tables

The example of dummy table for assessing the association between patient’s

characteristics and death has been presented in Table 1-8.

Table 1-8 Description of patient’s characteristics between dying and living

Characteristics Death Survival P value N=40(%) N=160(%)

Sex Male Female Race Light Dark Other Service at ICU admission Medical Surgical Cancer part of present problem Yes No PH from initial blood gases >=7.25 <7.25

1.2 Paired samples There are several circumstances in which researchers want to investigate the relationship

between two categorical variables for paired samples. For example, a researcher wants to

compare the pain relief by two different analgesics in the same subject. Another example is a

matched case-control study in which investigators want to match case to control patients with

body mass index (BMI) and assess the association between HRT and hip fracture. The

simplest case is when subjects are classified into two groups. A 2x2 contingency table for

paired samples is presented in Table 1-9.

Page 14: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

13

Table 1-9 A 2x2 contingency table for paired samples

Cases Controls

Total Exposed Unexposed

Exposed a b a+b

Unexposed c d c+d

Total a+c b+d n

where a, b, c, and d are the frequencies of pairs for each combination and b+c is the number

of discordant pairs1. An appropriate statistical test for testing the relationship between two

categorical variables in paired samples which are shown as a 2x2 table, is the McNemar’s test

which can be calculated by:

cb|)cb(|χ

22

+−

= (7)

The unit of analysis of this test is the matched pair, which is different from the Chi-square

test, in which the unit of analysis is the person. If the number of discordant pairs is less than

20, the Exact McNemar’s test is more appropriate. The McNemar’s test is a test of paired

proportions especially when the data are presented as a 2x2 table. When there is k x k

contingency table, an extension of the McNemar’s test known as the Stuart-Maxwell test is

required.

Example 1-3: For a matched case-control study, researchers wanted to assess the association

between HRT and hip fracture. The patients were assigned to pairs matched on BMI. The 2x2

contingency table for assessment has been given in Table 1-10.

1 The discordant pairs are the pairs of different outcomes, for example, b is the pairs of outcomes in which a subject who has exposure is paired with an individual who does not.

Page 15: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

14

Table 1-10 A 2x2 contingency table of 372 matched pairs for assessment of the association

between HRT and hip fracture

Cases Controls

Total HRT+ HRT-

HRT+ 102 50 152

HRT- 100 120 220

Total 202 170 372

H0: HRT is not associated with hip fracture or

H0: The paired proportion of patients who receive HRT and have hip fracture is equal to the

paired proportion of patients who receive HRT and do not have hip fracture.

HA: HRT is associated with hip fracture or

HA: The paired proportion of patients who receive HRT and have hip fracture is different

from the paired proportion of patients who receive HRT and do not have hip fracture

Step I Calculate the McNemar’s test

16.67100)(50

2|)10050(|2χ =+−

=

Step II Calculate the associated p value by the STATA program

. disp chi2tail(1, 16.67)

.00004448

For individual data, for the McNemar’s test using the STATA program, we would select:

Statistics menu --> Epidemiology and related --> Tables for epidemiologists --> Matched

case-control studies

For summary data, the McNemar’s test using the STATA program, we would select:

Statistics menu --> Epidemiology and related --> Tables for epidemiologists --> Matched

case-control calculator

Page 16: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

15

. mcc case control | Controls | Cases | Exposed Unexposed | Total -----------------+------------------------+------------ Exposed | 102 50 | 152 Unexposed | 100 120 | 220 -----------------+------------------------+------------ Total | 202 170 | 372 McNemar's chi2(1) = 16.67 Prob > chi2 = 0.0000 Exact McNemar significance probability = 0.0001 Proportion with factor Cases .4086022 Controls .5430108 [95% Conf. Interval] --------- -------------------- difference -.1344086 -.2001631 -.0686541 ratio .7524752 .656141 .8629532 rel. diff. -.2941176 -.4547495 -.1334858 odds ratio .5 .3487202 .7089431 (exact)

Step III Compare the p value with the level of significance

The p value for a two-tailed test from Step II is less than 0.001 which is less than the level of

significance. As a result, we reject the null hypothesis and conclude that HRT is significantly

associated with osteoporotic hip fracture. In other words, the paired proportion of patients

who receive HRT and have hip fracture is different from the paired proportion of patients who

receive HRT and do not have hip fracture.

Page 17: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

16

2. ESTIMATION OF THE STRENGTH OF ASSOCIATION The estimation of the strengths of association is another way for describing the relationship

between two categorical variables. In this section, we consider only the case where there are

two groups of subjects and only two groups of outcome. The analysis is based on a 2x2

contingency table as Table 2-1.

Table 2-1 A 2x2 contingency table for estimation of the strength of association.

Exposed Non-exposed Total

Case a b a+b

Control c d c+d

Total a+c b+d a+b+c+d

2.1 Risk ratio (RR) Risk ratio (RR) is a ratio of the proportion of disease (or event) in exposed (or high risk) versus

non-exposed (low risk) groups. It is used to determine association in a cohort study or randomized

control trial (RCT). This can be estimated as:

d)b/(b

c)a/(a

EIEI

RR+

+=

+= (8)

and 95% CI can be estimated as:

}SE(ln(RR))1.96exp{ln(RR)95%CI ×±= (9)

where,

db

1

b

1

ca

1

a

1SE(ln(RR))

+−+

+−= . (10)

Example 2-1: For a cohort study of kidney transplantations, researchers wanted to estimate

the strength of association of the graft failure rate in kidney transplantation between cadaveric

kidney transplantation (CDKT) and living related kidney transplantation (LRKT). It was

found that 61 patients had graft failure among 134 patients who were transplanted by CDKT,

whereas 35 patients had graft failure among 220 patients who were transplanted by LRKT.

The 2x2 contingency table for estimation has been presented in Table 2-2.

Page 18: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

17

Table 2-2 The 2x2 contingency table for estimation of the risk ratio of the graft failure and

types of donors (CDKT or LRKT)

Graft failure Types of donors

Total CDKT LRKT

Yes 61 35 96

No 73 185 258

Total 134 220 354

The risk ratio can be estimated as:

The 95% CI can be estimated as:

95%CI = 2.00, 4.08

For individual data, the estimation of risk ratio using the STATA program, we would select:

Statistics menu --> Epidemiology and related --> Tables for epidemiologists --> Cohort study

risk-ratio etc.

For summary data, the estimation of risk ratio using the STATA program, we would select:

Statistics menu --> Epidemiology and related --> Tables for epidemiologists --> Cohort study

risk-ratio etc. calculator

2.86145.52/15.9RR15.9135/220I45.5261/134I

LRKT

CDKT

======

0.182201

351

1341

611SE(lnRR) =−+−=

0.18)}(1.966)exp{ln(2.895%CI ×±=

Page 19: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

18

. csi 61 35 73 185 | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 61 35 | 96 Noncases | 73 185 | 258 -----------------+------------------------+------------ Total | 134 220 | 354 | | Risk | .4552239 .1590909 | .2711864 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | .296133 | .1989455 .3933204 Risk ratio | 2.861407 | 2.004715 4.084197 Attr. frac. ex. | .6505216 | .501176 .7551538 Attr. frac. pop | .4133523 | +------------------------------------------------- chi2(1) = 36.95 Pr>chi2 = 0.0000 The RR is equal to 2.86 with 95% CI (2.00 to 4.08). It means that patients who were

transplanted by CDKT had 2.86 times higher risk of graft failure than the patients who were

transplanted by LRKT with a 95% confidence that it would lie between 2.00 to 4.08 times.

2.2 Odds ratio (OR) Odds ratio (OR) is a ratio of the odds of having disease (or event) in exposed (high risk)

versus non-exposed (low risk) groups. It is used in case-control or cross-sectional studies.

There are two different conditions for estimation of OR: 1) independent samples, 2) matched

paired samples.

2.2.1 Independent samples

The odds ratio for independent samples can be estimated as:

bcad

b/da/c

oddsodds

ORE

E ===−

+

(11)

and 95% CI can be estimated as:

}SE(ln(OR))1.96exp{ln(OR)95%CI ×±= (12)

Where,

d

1

c

1

b

1

a

1SE(ln(OR)) +++= . (13)

Page 20: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

19

Example 2-2: For a case-control study, researchers wanted to estimate the strength of

association of traditional medicine usage and hip fracture. It was found that among 228

patients who had hip fracture, 20 used traditional medicine, whereas among 224 patients who

did not have hip fracture, 8 used traditional medicine. The 2x2 contingency table for

estimation has been presented in Table 2-3.

Table 2-3 The 2x2 contingency table for estimation of the OR of the hip fracture and

traditional medicine usage

Hip fracture Traditional medicine used

Total Yes No

Yes 20 208 228

No 8 216 224

Total 28 424 452

The odds ratio can be estimated as:

The 95% CI can be estimated as:

95%CI=1.120, 6.019

2.596208821620OR

208/216odds

20/8odds

E

E

=××

=

=

=

+

0.4292161

81

2081

201SE(ln(OR)) =+++=

{ }0.429)(1.96ln(2.596)exp95%CI ×±=

Page 21: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

20

To estimate the odds ratio using the STATA program, we would type:

. cci 20 208 8 216

Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+------------------------ Cases | 20 208 | 228 0.0877 Controls | 8 216 | 224 0.0357 -----------------+------------------------+------------------------ Total | 28 424 | 452 0.0619 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Odds ratio | 2.596154 | 1.064509 6.956335 (exact) Attr. frac. ex. | .6148148 | .0606001 .8562461 (exact) Attr. frac. pop | .0539311 | +------------------------------------------------- chi2(1) = 5.26 Pr>chi2 = 0.0218

The odds ratio is equal to 2.60 with 95% CI (1.12 to 6.02). It means that patients who used

traditional medicine had 2.60 times higher risk of developing hip fracture than the patients

who did not use traditional medicine.

2.2.2 Matched paired samples

In case that the design is a matched design, or the samples are related samples, the data layout

for testing hypothesis of independence and estimation of OR are presented in Table 1-9. The

estimation of the OR for matched design is based on the number of discordant pairs b and c.

The maximum likelihood estimation of the OR, which is conditional on the number of

discordant pairs, is given by:

b/cOR = (14)

and 95% CI can be estimated as:

}SE(ln(OR))1.96exp{ln(OR)95%CI ×±= (15)

where,

c

1

b

1SE(ln(OR)) += . (16)

Example 2-3: For a matched case-control study in which researchers matched by age, they

wanted to estimate the strength of association between HRT and hip fracture. The 2x2

contingency table for assessment has been given in Table 2-4.

Page 22: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

21

Table 2-4 The 2x2 contingency table for estimation of the OR of hip fracture and HRT

Case

Control

Total HRT+ HRT-

HRT+ 102 50 152

HRT- 100 120 220

total 202 170 372

The odds ratio can be estimated as:

The 95% CI can be estimated as:

95%CI=0.35, 0.71

To estimate the odds ratio using the STATA program, we would type: . mcci 102 50 100 120 | Controls | Cases | Exposed Unexposed | Total -----------------+------------------------+------------ Exposed | 102 50 | 152 Unexposed | 100 120 | 220 -----------------+------------------------+------------ Total | 202 170 | 372 McNemar's chi2(1) = 16.67 Prob > chi2 = 0.0000 Exact McNemar significance probability = 0.0001 Proportion with factor Cases .4086022 Controls .5430108 [95% Conf. Interval] --------- -------------------- difference -.1344086 -.2001631 -.0686541 ratio .7524752 .656141 .8629532 rel. diff. -.2941176 -.4547495 -.1334858 odds ratio .5 .3487202 .7089431 (exact)

0.510050OR ==

0.17100

1501SE(ln(OR)) =+=

{ }0.17)(1.96ln(0.5)exp95%CI ×±=

Page 23: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

22

The odds ratio is equal to 0.5 with 95% CI (0.35 to 0.71). That is after adjusting for BMI,

receiving HRT could prevent hip fracture by 50% with a 95% confidence that it would lie

between 35% to 71%.

2.2.3 Dummy tables

Example of the dummy table for reporting the OR from the study of the association between

patient’s characteristics and risk of death is presented in Table 2-5.

Table 2-5 Association between patient’s characteristics and risk of death

Characteristics OR 95% CI P value

Sex

Male

Female 1*

Race

Light

Dark

Other 1

Service at ICU admission

Medical

Surgical 1

Cancer part of present problem

Yes

No 1

PH from initial blood gases

>=7.25 <7.25 1

*Odd ratios is equal to 1 for the reference group

Page 24: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

23

Assignment II: Hypothesis testing for categorical data (10%)

Due date : Sep 6, 2016

1) From the RCT of calcium supplements in Thai osteoporotic women, subjects will be

randomly allocated to receive either HRT or calcium supplement. The primary objective is to

compare bone mineral density (BMD) after receive treatment between HRT and calcium

supplement groups. The secondary objective is to compare the rate of fracture and adverse

events between 2 groups.

In order to make sure that randomization can balance the groups with respect to many known

and unknown confounding factors, researchers have to compare baseline characteristics of

subjects between 2 groups such as age, BMI, risk behaviors, and underlying disease, see

Table below.

Variable Variable name Label value

Treatment treat 1=calcium supplement, 0=HRT

Age age Classify age into 3 groups as age < 65 years; age 65-70

years; age > 70 years

Age at menopause agemenop Classify age at menopause into 2 groups as <=35 years

and > 35 years

Body mass index (BMI) - Calculate from weight and height

- Classify BMI <18.5 as underweight

BMI 18.5-24.9 as normal

BMI 25.0-29.9 as overweight

BMI >=30.0 as obesity

Smoking behavior smoking 1=smoke, 2=ex-smoke, 3=non-smoke, 9=missing

Alcohol consumption alcohol 1=yes, 2=sometime, 3=no, 9=missing

Sun exposure sunexp 1 <0.5 hr/day, 2=0.5-2 hr/day, 3 >2 hr/day, 9=missing

Exercise exercise 1=regular, 2=sometime, 3=no, 9=missing

Diabetes mellitus Classify by glucose>=126 mg/dl

Hypertension Classify SBP>=140 mmHg or DBP>=90 mmHg as

hypertension, otherwise, normal

High cholesterol Classify cholesterol >=240 mg/dl as high cholesterol,

otherwise, normal

Page 25: RACE 615 Introduction to Medical Statistics · RACE 615 Introduction to Medical Statistics Analysis for categorical data Assist.Prof.Dr.Sasivimol Rattanasiri Doctor of Philosophy

24

The data are given in the data set cross-sectional_BMD_&_risk_factor.dta

The aim of this assignment is to:

a) Categorize continuous data according to the definition in the third column.

b) Compare baseline characteristics, risk behaviors, and underlying disease between

subjects who receive HRT and calcium supplement.

c) Write up the details of statistical analysis, which statistical test that you apply for

each variable.

d) Create a dummy table and present the results according to the dummy table.

e) Interpret and writing results according to the table.

2) For the secondary objective, researchers would like to assess the association between

treatment and interested outcomes (fracture and adverse events). The details of related

variables are presented in following table.

Variable Variable name Label value

Treatment treat 1=calcium supplement, 0=HRT

Fracture fracture 1=yes, 0=no

Adverse events adveffect 1=yes, 0=no

The aim of this assignment is to:

a) Assess the association between treatment and outcomes and explain which

statistical test that you apply for.

b) Estimate the strength of association between treatment and outcomes.

c) Interpret and writing the results.