Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten,...

35
cs of Biostatistics for Health Rese Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences & Department of Psychiatry [email protected]

description

Missing Values and Logical Operators management/logical-expressions-and- missing-values/http://www.stata.com/support/faqs/data- management/logical-expressions-and- missing-values/

Transcript of Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten,...

Page 1: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Basics of Biostatistics for Health ResearchSession 4 – February 28, 2013

Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences

& Department of Psychiatry

[email protected]

Page 2: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Generate Commands Using Logic

generate obese2 = .recode obese2 .=0 if bmi <= 30recode obese2 .=1 if bmi > 30tab obese obese2prtest obese2, by(sex)

Missing as obese, which is strange.

Page 4: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Generate Commands Using Logic

generate obese2 = .recode obese2 .=0 if bmi <= 30recode obese2 .=1 if bmi > 30 & bmi !=.tab obese obese2, missingprtest obese2, by(sex)

This code works.

Page 5: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Statistical Errors

Page 6: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.
Page 7: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

P (non-exposed) 0.1Alt Hypoth. 0.2 (diff. between 2 prop.)P (exposed) 0.3

N (exposed) 30N (non-exposed) 30 (set equal to exposed)

Alpha 0.05

Power 0.5095

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

-0.5 -0.4 -0.3 -0.2 -0.14.3715E-160.1 0.2 0.3 0.4 0.5

Null Hypothesis Alternative Hypothesis Reject Indicator

Increase Sample Size

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

Power

Reset

Increase Effect Size

Increase Alpha

Sample Size Simulation

Page 8: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Sample Size Calculation in STATA

3

21

Page 9: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Sample Size Dialogue Boxes

Page 10: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Let’s do a calculation!

• You are planning a parallel group RCT – with treatment and control groups.

• Normally, 20% of people die with disease X, but you expect to cut this in half with a new treatment.

• How many do you need in each group to achieve 95% power at alpha = 5%?

Page 11: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Output (sampsi)

n2 = 349 n1 = 349

Estimated required sample sizes:

n2/n1 = 1.00 p2 = 0.1000 p1 = 0.2000 power = 0.9500 alpha = 0.0500 (two-sided)

Assumptions: and p2 is the proportion in population 2Test Ho: p1 = p2, where p1 is the proportion in population 1

Estimated sample size for two-sample comparison of proportions

. sampsi .2 .1, alpha(0.05) power(.95)

Page 12: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Another Calculation

• A QoL scale in a particular disease has a mean score of 20 and a standard deviation of 5.

• You are conducting a placebo controlled trial to evaluate a treatment that is expected to improve the QoL by 2 points on this scale.

• You recruit n=50 into each group – what power will you achieve?

Page 13: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Output (sampsi)

power = 0.5160

Estimated power:

n2/n1 = 1.00 n2 = 50sample size n1 = 50 sd2 = 5 sd1 = 5 m2 = 22 m1 = 20 alpha = 0.0500 (two-sided)

Assumptions: and m2 is the mean in population 2Test Ho: m1 = m2, where m1 is the mean in population 1

Estimated power for two-sample comparison of means

Page 14: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten

• Scroll to the bottom.• Right click to download the files described as

being “for PGME Students”– One is a dataset– One is a data dictionary

• Save them on your desktop

Page 15: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Review: Comparing Proportions

• We’ve looked at several procedures for comparing proportions (e.g. for obesity in men vs. women):

generate obese = .recode obese .=0 if bmi <= 30recode obese .=1 if bmi > 30 & bmi !=.tab obese obese, missingprtest obese, by(sex)

Page 16: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Epitab Commands

1

32

Page 17: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Review: Comparing Proportions

• We’ve looked at several procedures for comparing proportions (e.g. for obesity in men vs. women):

recode sex 2=1 1=0cs obese sex

Page 18: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

The output…

chi2(1) = 17.16 Pr>chi2 = 0.0000 Attr. frac. pop .1118099 Attr. frac. ex. .181502 .0997744 .25581 Risk ratio 1.22175 1.110833 1.343743 Risk difference .0265444 .0141393 .0389496 Point estimate [95% Conf. Interval] Risk .1462487 .1197042 .1347732 Total 6571 5004 11575 Noncases 5610 4405 10015 Cases 961 599 1560 Exposed Unexposed Total sex

. cs obese sex

Page 19: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

A “non-significant” association

generate highgluc = .recode highgluc .=0 if glucose <= 140 recode highgluc .=1 if glucose > 140 & glucose !=.generate female=sexrecode female (1=0) (2=1)tab highgluc female, exact

Page 20: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

How does this look with cs?

.

chi2(1) = 3.51 Pr>chi2 = 0.0609 Prev. frac. pop .12358 Prev. frac. ex. .2215609 -.0122169 .4013463 Risk ratio .7784391 .5986537 1.012217 Risk difference -.0054099 -.0111474 .0003276 Point estimate [95% Conf. Interval] Risk .0190074 .0244173 .0213998 Total 5682 4505 10187 Noncases 5574 4395 9969 Cases 108 110 218 Exposed Unexposed Total female

. cs highgluc female

Page 21: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Review: Try the cci command to obtain the OR

.

chi2(1) = 3.51 Pr>chi2 = 0.0609 Prev. frac. pop .12358 Prev. frac. ex. .2215609 -.0122169 .4013463 Risk ratio .7784391 .5986537 1.012217 Risk difference -.0054099 -.0111474 .0003276 Point estimate [95% Conf. Interval] Risk .0190074 .0244173 .0213998 Total 5682 4505 10187 Noncases 5574 4395 9969 Cases 108 110 218 Exposed Unexposed Total female

. cs highgluc female

Check your work with the cc command.

Page 22: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Comparing Proportions?

Yes No

Fisher’s Exact Test Parametric Assumptions?

Yes No

Multiple Groups? Multiple Groups?

Yes NoYes No

ANOVA t-test Kruskall-Wallis Wilcoxon’s-Rank Sum

Page 23: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Two situations we haven’t covered…

• Severely skewed distributions• Two continuous variables

Page 24: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Severely Skewed Variables

Page 25: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Solution: Make Some Categories

• For example:– Non-smokers– Light smokers (<20)– Moderate 20-40– Heavy > 40

• Your task: Make a variable with these categories and do a statistical test to compare men to women.

Page 26: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

E.g. for the recoding…

generate smoke = .recode smoke .=1 if cigpday==0recode smoke .=2 if cigpday > 0 & cigpday < 20recode smoke .=3 if cigpday >=20 & cigpday <= 40recode smoke .=4 if cigpday > 40 & cigpday !=.tab smoke, missing

Page 27: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Some output…

Fisher's exact = 0.000

Total 4,990 6,558 11,548 4 122 23 145 3 1,754 1,073 2,827 2 686 1,292 1,978 1 2,428 4,170 6,598 smoke 1 2 Total sex

stage 1: enumerations = 0stage 2: enumerations = 142603stage 3: enumerations = 146stage 4: enumerations = 1Enumerating sample-space combinations:

. tab smoke sex, exact

Page 28: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Two continuous variables

• E.g. diastolic blood pressure and BMI• The place to start is always a scatter plot• STATA calls this a “two way” graph

Page 29: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Start with Create

Page 30: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Select the two variables

Submit

Page 31: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.
Page 32: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

The command produced…• Produced by our dialogue box…

twoway (scatter diabp sysbp)• The same dialogue box can fit a line…

twoway (lfit diabp sysbp)

This time select “line”

Page 33: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

You can combine the two..

• Try it!twoway (scatter diabp sysbp) (lfit diabp sysbp)

• To assess significance, use the regress command (can you find the menu option?)regress diabp sysbp

Page 34: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

Note: the linear output

• Line: y = mx + b

• diabp = 33.42 + 0.364(sysbp) _cons 33.42091 .4606105 72.56 0.000 32.51804 34.32379 sysbp .3639623 .0033325 109.22 0.000 .3574301 .3704946 diabp Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1580658.92 11626 135.958965 Root MSE = 8.1921 Adj R-squared = 0.5064 Residual 780160.451 11625 67.1105764 R-squared = 0.5064 Model 800498.474 1 800498.474 Prob > F = 0.0000 F( 1, 11625) =11928.05 Source SS df MS Number of obs = 11627

. regress diabp sysbp

Page 35: Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.

(In Class) Assignment for Today

• Assess whether there is an association between systolic blood pressure and death

(you need to decide how)• We’ll define elevated systolic blood

pressure as being > 140 mm of Hg.– What is the risk ratio for death for people with

elevated systolic blood pressure?– Is the risk ratio statistically significant?