Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013
description
Transcript of Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013
![Page 1: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/1.jpg)
Basics of Biostatistics for Health ResearchSession 2 – February 14th, 2013
Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences
& Department of Psychiatry
![Page 2: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/2.jpg)
• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten
• Scroll to the bottom.
• Right click to download the files described as being “for PGME Students”– One is a dataset– One is a data dictionary
• Save them on your desktop
![Page 3: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/3.jpg)
Open the Datafile
![Page 4: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/4.jpg)
The task from last week…
• Create a 95% exact binomial confidence interval for the proportion of people with Framingham with > H.S. education
![Page 5: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/5.jpg)
Review of Last Week’s Task
• “use”
• “generate”
• “recode”
• “tabulate”
• “ci”
![Page 6: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/6.jpg)
The actual commands…
generate highschool = educ
recode highschool 1/2=0 3/4=1
tabulate highschool
ci highschool, binomial
![Page 7: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/7.jpg)
Creating a “do” file…
1
2 3
![Page 8: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/8.jpg)
The “do file” editor
![Page 9: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/9.jpg)
Executing a “do” file
![Page 10: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/10.jpg)
What is a “do” file?
• It is a text file – you can copy and paste from the output window in Stata, or from a word processor
• It is a computer program that consists of actual commands and therefore doesn’t need a compiler
• Others would call it a “macro”
![Page 11: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/11.jpg)
Different Types of Data
• One type of distinction– Nominal (e.g. sex, race)– Ordinal (e.g. rating scales)– Cardinal (e.g. physical measures)
• Another type of distinction– Categorical (e.g. # of pregnancies)– Continuous (e.g. height, weight)
![Page 12: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/12.jpg)
Body Mass Index (BMI)
![Page 13: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/13.jpg)
The BMI in our Data Set
This is an example of a continuous variable
![Page 14: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/14.jpg)
Changing Data Types in Stata(e.g. continuous to categorical)
• recode bmi x/y=z
• This will recode all values of the variable bmi having values from x to y to a single value equal to z.
![Page 15: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/15.jpg)
Interpretation of BMI
• Underweight: < 18.5
• Normal weight: 18.5 to 25
• Over weight: >25 to 30
• Obese: 30+
• Your task: Make a “do file” that calculates a 95% confidence interval for the proportion of the population that are overweight or obese.
![Page 16: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/16.jpg)
Example of Code for this…
generate owo = bmi
recode owo 0/25 = 0 25.01/100 = 1
tab owo, missing
ci owo, binomial
![Page 17: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/17.jpg)
Another Task…
• Add a use command to your do file
• Save your “do file” on the desktop using a descriptive file name of your choice
• Exit Stata
• Open Stata again
• Open the “do file” editor and select your do file
• Execute your “do file”
![Page 18: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/18.jpg)
The Power of “do files”
• Task: Calculate an exact 95% CI for the proportion of the population that are obese (BMI > 30)
• IMPORTANT: do NOT start from scratch as we did before – try to do this by editing your do file.
![Page 19: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/19.jpg)
generate owo = bmirecode owo 0/25 = 0 25.01/100 = 1tab owo, missingci owo, binomial
generate owo = bmigenerate obese = bmirecode owo 0/25 = 0 25.01/100 = 1recode obese 0/30 = 0 30.01/100=1tab owo, missingtab obese, missingci owo obese, binomial
For Example…
![Page 20: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/20.jpg)
Starting a Log File
1
2 3
![Page 21: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/21.jpg)
Closing a Log File
1
23
![Page 22: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/22.jpg)
Another Task…
• Start a log file
• Run your “do file”
• Close and save the resulting log file on your desktop
• Open your log file
![Page 23: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/23.jpg)
“do file” Etiquette
• When you add an * before a line on a “do file” Stata will ignore that line
• Use this to….– Add descriptive comments to your code– Remove commands that you don’t want now,
but might want later
![Page 24: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/24.jpg)
E.g. Without the Tables…
![Page 25: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/25.jpg)
Review…
• Make a value label for obesity
• Attach this value label to the variable representing obesity
![Page 26: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/26.jpg)
Making a Graphic
![Page 27: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/27.jpg)
The Pie Chart Dialogue Box
Find the Variable that you made
1
2
![Page 28: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/28.jpg)
Unedited Output
![Page 29: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/29.jpg)
The Graph Editor
![Page 30: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/30.jpg)
Here is a good place to start
![Page 31: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/31.jpg)
See if you can do these things…
• Change the color of the pie
• Add a title
• Add a comment
• Change the background
• Create a work of art
![Page 32: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/32.jpg)
Save in a Standard Format
![Page 33: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/33.jpg)
Back to BMI• May not wish to categorize variables like this
• Measures of central tendency– Mode– Median– Mean
• Different types of graphs are useful for examining continuous variables– Box plots– Histograms
![Page 34: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/34.jpg)
Box Plots
![Page 35: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/35.jpg)
Terminology
• Median: value with 50% of observations above and 50% below.
• Interquartile range – contains 50% of observations – plus or minus one quartile
• Adjacent values (whiskers) – observation that is less than 1.5x the IQR
• Outliers: anything outside of the adjacent values
![Page 36: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/36.jpg)
Calculating Summary Stats
Calculate summary stats for BMI
![Page 37: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/37.jpg)
Calculating Summary Stats
Calculate the mean BMI
![Page 38: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/38.jpg)
Calculating Summary Stats
Calculate median BMI
![Page 39: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/39.jpg)
Make a Box (and whisker) Plot
![Page 40: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/40.jpg)
The Boxplot Dialogue Box
1
2
Select BMI fromthe dropdownlist
![Page 41: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/41.jpg)
Introducing Histograms
1
2
![Page 42: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/42.jpg)
The Histogram Dialogue Box
Select thevariable here
Select thebin# here
![Page 43: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/43.jpg)
A Task for You to Do…
• Make 3 histograms of BMI– In one use the default number of bins– In one use a larger number– In one, use a smaller number
• Save your favorite histogram• Open it in the graph editor, give it a title and
improve its appearance• Save it in a standard form (e.g. png, jpg, tif)
![Page 44: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/44.jpg)
Assessing Normality with a Histogram
![Page 45: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/45.jpg)
The distribution is not quite normal, but close
![Page 46: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/46.jpg)
Is BMI Higher in Men or Women?
• We could use confidence intervals to assess this…
• E.g. 12
3
![Page 47: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/47.jpg)
Here is the dialogue box…Once you’ve selected BMI, click this
![Page 48: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/48.jpg)
The dialogue box, continued..
Enter sex as a group variable
![Page 49: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/49.jpg)
The output
2 25.62873 .0559382 25.51909 25.73838
1 26.20382 .0484566 26.10883 26.2988
bmi
Over Mean Std. Err. [95% Conf. Interval]
2: sex = 2
1: sex = 1
Mean estimation Number of obs = 11575
. mean bmi, over(sex)
![Page 50: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/50.jpg)
It looks better with value labels
Women 25.62873 .0559382 25.51909 25.73838
Men 26.20382 .0484566 26.10883 26.2988
bmi
Over Mean Std. Err. [95% Conf. Interval]
Women: sex = Women
Men: sex = Men
Mean estimation Number of obs = 11575
. mean bmi, over(sex)
![Page 51: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/51.jpg)
Statistical Tests• Start with an hypothesis that an “effect” exists
– In this case, that there is an effect of sex on BMI
• Assume that the effect DOES NOT exist– This is the null hypothesis
• Find the probability of results, or those more extreme given the null hypothesis– This is what the “test” calculates for you
• If the null is unlikely (alpha value), reject it
![Page 52: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/52.jpg)
The t-test (assumptions)
• The variables are approximately normally distributed
• The standard deviations of the two groups are approximately equal
• The two samples are independent
![Page 53: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/53.jpg)
Using summarize similarly
• Use summarize with “by” in the dialogue box
• Use histograms with a normal density plot and the “by” tab in the dialogue box
Your task: use these two techniques to assess the t-test assumptions.
![Page 54: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/54.jpg)
Variance Comparisons
1
2
3
![Page 55: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/55.jpg)
The t-test
1
2
3
![Page 56: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/56.jpg)
The t-test dialogue box
1 2
3optional
![Page 57: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/57.jpg)
Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Ho: diff = 0 Satterthwaite's degrees of freedom = 11572.4
diff = mean(Men) - mean(Women) t = 7.7706
diff .5750831 .0740075 .4300158 .7201504
combined 11575 25.87735 .0381332 4.10264 25.8026 25.9521
Women 6571 25.62873 .0559382 4.534443 25.51908 25.73839
Men 5004 26.20382 .0484566 3.427767 26.10882 26.29881
Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
Two-sample t test with unequal variances
. ttest bmi, by(sex) unequal
The output
![Page 58: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/58.jpg)
Two group tests for proportions..
1
3
2
![Page 59: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/59.jpg)
You can also do this with tab
tab obese sex, exact
1-sided Fisher's exact = 0.000
Fisher's exact = 0.000
Total 5,004 6,571 11,575
Obese 599 961 1,560
Not Obese 4,405 5,610 10,015
obese Men Women Total
sex
![Page 60: Basics of Biostatistics for Health Research Session 2 – February 14 th , 2013](https://reader036.fdocuments.in/reader036/viewer/2022081520/56815932550346895dc66bfc/html5/thumbnails/60.jpg)
Your Final Task for Today
• Create a “do file” that …– Reads in the data– Recodes BMI to a categorical variable for
obesity– Tests whether obesity differs between men and
women
• Create a log file to store the results