Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson Session 1: Quantitative...
-
Upload
violet-daniel -
Category
Documents
-
view
216 -
download
0
Transcript of Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson Session 1: Quantitative...
Biostatistics in Practice
Youngju PakBiostatistician
Peter D. Christenson
http://research.LABioMed.org/Biostat
Session 1: Quantitative and Inferential Issues I
Why Statistics ?• For Today’s Graduate, Just One Words:
Statistics, NY Times, Aug 5, 2009 " I keep saying that the sexy job in the next 10 years
will be statisticians," said Hal Varian, chief economist at Google.
• "I am not much given to regret, so I puzzled over this one a while. Should have taken much more statistics in college, I think. :)" —Max Levchin, Paypal Co-founder, Slide founder
Who am I?
• Dr. Youngju Pak• Originally come from South Korea.• PhD-Biostatistics, MS-Stat., BA-Stat.• Assistant Professor of Biostatistics at MU
until 2012• Joined LA BioMED in March 2013• Practicing Biostatistics since 2000
Who are you?
• Name• Career Aspirations
Class webpage & Session Schedule• Class Webpage: Select "Courses" at
http://research.LABioMed.org/biostat (use Explore. Chrome is not quite working with this website somehow)
• All class material are posted and will be updated on the class webpage
• There will be some pop-up Quizzes • There will be some HW assignments. • The TOP THREE will be announced and
rewarded at the last session.
Session 1 Objectives
• General quantitative needs in biological research
• Overview of statistical issues using a published paper
• How to run Statistical software, MYSTAT
General Quantitative Needs
Descriptive: Appropriate summarization to meet scientific questions: e.g.,
• changes, or % changes, or reaching threshold?
• mean, or minimum, or range of response?
• average time to death, or chances of dying by a fixed time?
General Quantitative Needs, Cont’d
• Inferential: Could results be spurious, a fluke, due to “natural” variations or chance?
• Inferential statistics: 95% confidence intervals, p-values, etc.
• Sensitivity/Power: How many subjects are needed?
Professional Statistics Software Package
Output
Enter code; syntax.
Stored data; access-ible.
Microsoft Excel for Statistics
• Primarily for descriptive statistics.
• Limited output.
• No analyses for %s.
Free Statistics Software: Mystatwww.systat.com
Free Study Size Softwarewww.stat.uiowa.edu/~rlenth/Power
Session 1 Objectives
• General quantitative needs in biological research
• Overview of statistical issues using a published paper
• How to run Statistical software, MYSTAT
Statistical Issues
•Subject selection
•Randomization
•Efficiency from study design
•Summarizing study results
Paper with Common Statistical Issues
Case Study:
McCann, et al., Lancet 2007 Nov 3;370(9598):1560-7
• Food additives and hyperactive behaviour in 3-year-old and 8/9-year-old children in the community: a randomised, double-blinded, placebo-controlled trial.
• Objective: test whether intake of artificial food color and additive (AFCA) affects childhood behavior
• Target population: 3-4, 8-9 years old children• Study design: randomized, double-blinded, controlled, crossover
trial • Sample size: 153 (3 years), 144(8-9 years) in Southampton UK• Sampling: Stratified sampling based on SES• Baseline measure: 24h recall by the parent of the child’s pretrial diet• Group: three groups (mix A, mix B, placebo)• Outcomes: ADHD rating scale IV by teachers, WWP hyperactivity
score by parents, classroom observation code, Conners continuous performance test II (CPTII) GHA score
Statistical Issues
•Subject selection
•Randomization
•Efficiency from study design
•Summarizing study results
Representative or Random Samples
How were the children to be studied selected (second column on the first page)? The authors purposely selected "representative" social classes.
Is this better than a "randomly" chosen sample that ignores social class?
Often hear: Non-random = Non-scientific.
Case Study: Participant Selection
No mention of random samples.
Case Study: Participant Selection
It may be that only a few schools are needed to get sufficient individuals. If, among all possible schools, there are few that are lower SES, none of these schools may be chosen.
So, a random sample of schools is chosen from the lower SES schools, and another random sample from the higher SES schools.
Selection by Over-Sampling
It is not necessary that the % lower SES in the study is the same as in the population. There may still be too few subjects in a rare subgroup to get reliable data.
Can “over-sample” a rare subgroup, and then weight overall results by proportions of subgroups in the population. The CDC NHANES(http://www.cdc.gov/nchs/nhanes.htm) studies do this.
Statistical Issues
•Subject selection
•Randomization
•Efficiency from study design
•Summarizing study results
Basic Study Designs
1. Prospective (longitudinal) :Risk Factor (2014) Disease status (2020)
2. Retrospective(Case-Control) :Disease status (2014) Risk Factor (2000)
3. Cross sectional : Disease status (2014) Risk Factor (2014)
4. Experimental or Randomized- Control
: Risk Factor (2014) Disease status (2020) with assignment of Risk Factor
Random Samples vs. Randomization
We have been discussing the selection of subjects to study, often a random sample.
An observational study would, well, just observe them. An interventional study assigns each subject to one or more treatments in order to compare treatments. Randomization refers to making these assignments in a random way.
Why Randomize?
So that groups will be similar except for the intervention.
So that, when enrolling, we will not unconsciously choose an “appropriate” treatment for a particular subject.
Minimizes the chances of introducing bias when attempting to systematically remove it, as in plant yield example.
Case Study: Crossover Design
Each child is studied on 3 occasions under different diets.
Is this better than three separate groups of children?
Why, intuitively?
How could you scientifically prove your intuition?
Statistical Issues
• Subject selection
• Randomization
• Efficiency from study design
• Summarizing study results
Blocked vs. Unblocked Studies
AKA matched vs. unmatched.AKA paired vs. unpaired.
Block = Pair = Set receiving all treatments. Set could be an individual at multiple times (pre and post), or left and right arms for sunscreen comparison; twins or family; centers in multi-center study, etc. Block ↔ Homogeneous.
Blocking is efficient because treatment differences are usually more consistent among subjects than each separate treatment is.
Potential Efficiency Due to Pairing
. .
.
.
. . . . … . .. .. ..
. .
.
.
. . . . … . .. .. ..
.
. . … . . . . . .. . . . .
. .
..
.
. .
. . . . . . ..
A B A B Δ=B-A
…….… ….
…….… ….
Δ3 3
3
Unpaired
A and B Separate Groups
Paired
A and B in a Paired Set
Statistical Issues
• Subject selection
• Randomization
• Efficiency from study design
• Summarizing study results
Outcome Measures
Generally, how were the outcome measures defined (third page)?
They are more complicated here than for most studies.
What are the units (e.g., kg, mmol, $, years)?
Outcome measures are specific and pre-defined. Aims and goals may be more general.
Summarization of Data with Descriptive Statistics
Summarization of Data with Descriptive Statistics
What is the difference between Table 1 and Table 2 in terms of methods used to summarize the data?
Variable
Categorical Numerical
Ordinal
Categories are mutually exclusive and ordered
Examples:Disease stage, Education level , 5 point likert scale
Counts
Integer values
Examples:Days sick per year, Number of pregnancies,Number of hospital visits
Measured(continuous)Takes any value in a range of values
Examples:weight in kg, height in feet, age (in years)
Qualitative Quantitative
Nominal
Categories are mutually exclusive and unordered
Examples:Gender, Blood group,Eye colour,Marital status
Types of Data
It is critical to identify the type of data since the choice of an appropriate statistical test as well as how to summarize the data depend on
the type of the data.
36
Describing categorical & quantitative data
• Categorical Data
– Binary, Nominal, or Ordinal data• Disease status ( yes, no)• Education level • The assignment of the
treatment• Cancer stage• Marital Status
– Frequency tables (one, two, or multi way tables) are usually used
• Quantitative Data
– Counts or Continuous Data• Weight • Blood pressure• Age• Length of hospital stay in days• The total number of ER visits per
year– Means or Medians are used
for the measure of the central tendency.
– Standard deviations or percentiles are used for the measure of variability.
– When data is skewed, Medians & percentiles are better summary statistics
How to display Data
• A picture is worth a thousand words !
• To getting a ‘feel’ for the data.
• Categorical data– Frequency tables, Contingency tables (cross
tables), Bar charts, Pie-charts• Quantitative data
– Dot plots, Histograms, Box-Whisker plots*, Scatter plots
Frequency Tables
How Arrived
Frequency Percent Valid Percent
Cumulative
Percent
BUS 11 13.9 13.9 13.9
CAR 66 83.5 83.5 97.5
WALK 2 2.5 2.5 100.0
Valid
Total 79 100.0 100.0
Gender
Frequency Percent Valid Percent
Cumulative
Percent
F 46 58.2 58.2 58.2
M 33 41.8 41.8 100.0
Valid
Total 79 100.0 100.0
Contingency Tables(Crosstabulations)
How Arrived * Gender Crosstabulation
Count
Gender
F M Total
BUS 6 5 11
CAR 38 28 66
How Arrived
WALK 2 0 2
Total 46 33 79
Bar Charts
Pie Charts
Histograms• To catch the patterns of the data• Divide up the data points into several mutually exclusive
intervals –Categorize the data points.
Scatter plots• Usually used to illustrate a relationship b/w two variables.
Box-Whisker Plots
What have we learn today?
Assignments • HW #1 is posted on the course website• Pre-Step for HW #1
– Install MYSTAT in your labtop or a computer in your school computer lab with permission from your school (Ask Ms. Aberle for help)
– Download Survey.sav (SPSS data file) from the course website (under Session 1)
• Submit the hard copy of the completed HW in next session.
• Read the article focusing on contents in Table 3 &4 and Figure 4.