Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson Session 1: Quantitative...

46
Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson http://research.LABioMed.org/ Biostat Session 1: Quantitative and Inferential Issues I

Transcript of Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson Session 1: Quantitative...

Page 1: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Biostatistics in Practice

Youngju PakBiostatistician

Peter D. Christenson

http://research.LABioMed.org/Biostat

Session 1: Quantitative and Inferential Issues I

Page 2: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Why Statistics ?• For Today’s Graduate, Just One Words:

Statistics, NY Times, Aug 5, 2009 " I keep saying that the sexy job in the next 10 years

will be statisticians," said Hal Varian, chief economist at Google. 

• "I am not much given to regret, so I puzzled over this one a while. Should have taken much more statistics in college, I think. :)" —Max Levchin, Paypal Co-founder, Slide founder

Page 3: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Who am I?

• Dr. Youngju Pak• Originally come from South Korea.• PhD-Biostatistics, MS-Stat., BA-Stat.• Assistant Professor of Biostatistics at MU

until 2012• Joined LA BioMED in March 2013• Practicing Biostatistics since 2000

Page 4: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Who are you?

• Name• Career Aspirations

Page 5: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Class webpage & Session Schedule• Class Webpage: Select "Courses" at

http://research.LABioMed.org/biostat (use Explore. Chrome is not quite working with this website somehow)

• All class material are posted and will be updated on the class webpage

• There will be some pop-up Quizzes • There will be some HW assignments. • The TOP THREE will be announced and

rewarded at the last session.

Page 6: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Session 1 Objectives

• General quantitative needs in biological research

• Overview of statistical issues using a published paper

• How to run Statistical software, MYSTAT

Page 7: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

General Quantitative Needs

Descriptive: Appropriate summarization to meet scientific questions: e.g.,

• changes, or % changes, or reaching threshold?

• mean, or minimum, or range of response?

• average time to death, or chances of dying by a fixed time?

Page 8: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

General Quantitative Needs, Cont’d

• Inferential: Could results be spurious, a fluke, due to “natural” variations or chance?

• Inferential statistics: 95% confidence intervals, p-values, etc.

• Sensitivity/Power: How many subjects are needed?

Page 9: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Professional Statistics Software Package

Output

Enter code; syntax.

Stored data; access-ible.

Page 10: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Microsoft Excel for Statistics

• Primarily for descriptive statistics.

• Limited output.

• No analyses for %s.

Page 11: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Free Statistics Software: Mystatwww.systat.com

Page 12: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Free Study Size Softwarewww.stat.uiowa.edu/~rlenth/Power

Page 13: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Session 1 Objectives

• General quantitative needs in biological research

• Overview of statistical issues using a published paper

• How to run Statistical software, MYSTAT

Page 14: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Statistical Issues

•Subject selection

•Randomization

•Efficiency from study design

•Summarizing study results

Page 15: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Paper with Common Statistical Issues

Case Study:

Page 16: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

McCann, et al., Lancet 2007 Nov 3;370(9598):1560-7

• Food additives and hyperactive behaviour in 3-year-old and 8/9-year-old children in the community: a randomised, double-blinded, placebo-controlled trial.

• Objective: test whether intake of artificial food color and additive (AFCA) affects childhood behavior

• Target population: 3-4, 8-9 years old children• Study design: randomized, double-blinded, controlled, crossover

trial • Sample size: 153 (3 years), 144(8-9 years) in Southampton UK• Sampling: Stratified sampling based on SES• Baseline measure: 24h recall by the parent of the child’s pretrial diet• Group: three groups (mix A, mix B, placebo)• Outcomes: ADHD rating scale IV by teachers, WWP hyperactivity

score by parents, classroom observation code, Conners continuous performance test II (CPTII) GHA score

Page 17: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Statistical Issues

•Subject selection

•Randomization

•Efficiency from study design

•Summarizing study results

Page 18: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Representative or Random Samples

How were the children to be studied selected (second column on the first page)? The authors purposely selected "representative" social classes.

Is this better than a "randomly" chosen sample that ignores social class?

Often hear: Non-random = Non-scientific.

Page 19: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Case Study: Participant Selection

No mention of random samples.

Page 20: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Case Study: Participant Selection

It may be that only a few schools are needed to get sufficient individuals. If, among all possible schools, there are few that are lower SES, none of these schools may be chosen.

So, a random sample of schools is chosen from the lower SES schools, and another random sample from the higher SES schools.

Page 21: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Selection by Over-Sampling

It is not necessary that the % lower SES in the study is the same as in the population. There may still be too few subjects in a rare subgroup to get reliable data.

Can “over-sample” a rare subgroup, and then weight overall results by proportions of subgroups in the population. The CDC NHANES(http://www.cdc.gov/nchs/nhanes.htm) studies do this.

Page 22: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Statistical Issues

•Subject selection

•Randomization

•Efficiency from study design

•Summarizing study results

Page 23: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Basic Study Designs

1. Prospective (longitudinal) :Risk Factor (2014) Disease status (2020)

2. Retrospective(Case-Control) :Disease status (2014) Risk Factor (2000)

3. Cross sectional : Disease status (2014) Risk Factor (2014)

4. Experimental or Randomized- Control

: Risk Factor (2014) Disease status (2020) with assignment of Risk Factor

Page 24: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Random Samples vs. Randomization

We have been discussing the selection of subjects to study, often a random sample.

An observational study would, well, just observe them. An interventional study assigns each subject to one or more treatments in order to compare treatments. Randomization refers to making these assignments in a random way.

Page 25: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Why Randomize?

So that groups will be similar except for the intervention.

So that, when enrolling, we will not unconsciously choose an “appropriate” treatment for a particular subject.

Minimizes the chances of introducing bias when attempting to systematically remove it, as in plant yield example.

Page 26: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Case Study: Crossover Design

Each child is studied on 3 occasions under different diets.

Is this better than three separate groups of children?

Why, intuitively?

How could you scientifically prove your intuition?

Page 27: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Statistical Issues

• Subject selection

• Randomization

• Efficiency from study design

• Summarizing study results

Page 28: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Blocked vs. Unblocked Studies

AKA matched vs. unmatched.AKA paired vs. unpaired.

Block = Pair = Set receiving all treatments. Set could be an individual at multiple times (pre and post), or left and right arms for sunscreen comparison; twins or family; centers in multi-center study, etc. Block ↔ Homogeneous.

Blocking is efficient because treatment differences are usually more consistent among subjects than each separate treatment is.

Page 29: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Potential Efficiency Due to Pairing

. .

.

.

. . . . … . .. .. ..

. .

.

.

. . . . … . .. .. ..

.

. . … . . . . . .. . . . .

. .

..

.

. .

. . . . . . ..

A B A B Δ=B-A

…….… ….

…….… ….

Δ3 3

3

Unpaired

A and B Separate Groups

Paired

A and B in a Paired Set

Page 30: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Statistical Issues

• Subject selection

• Randomization

• Efficiency from study design

• Summarizing study results

Page 31: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Outcome Measures

Generally, how were the outcome measures defined (third page)?

They are more complicated here than for most studies.

What are the units (e.g., kg, mmol, $, years)?

Outcome measures are specific and pre-defined. Aims and goals may be more general.

Page 32: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Summarization of Data with Descriptive Statistics

Page 33: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Summarization of Data with Descriptive Statistics

What is the difference between Table 1 and Table 2 in terms of methods used to summarize the data?

Page 34: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Variable

Categorical Numerical

Ordinal

Categories are mutually exclusive and ordered

Examples:Disease stage, Education level , 5 point likert scale

Counts

Integer values

Examples:Days sick per year, Number of pregnancies,Number of hospital visits

Measured(continuous)Takes any value in a range of values

Examples:weight in kg, height in feet, age (in years)

Qualitative Quantitative

Nominal

Categories are mutually exclusive and unordered

Examples:Gender, Blood group,Eye colour,Marital status

Types of Data

Page 35: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

It is critical to identify the type of data since the choice of an appropriate statistical test as well as how to summarize the data depend on

the type of the data.

Page 36: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

36

Describing categorical & quantitative data

• Categorical Data

– Binary, Nominal, or Ordinal data• Disease status ( yes, no)• Education level • The assignment of the

treatment• Cancer stage• Marital Status

– Frequency tables (one, two, or multi way tables) are usually used

• Quantitative Data

– Counts or Continuous Data• Weight • Blood pressure• Age• Length of hospital stay in days• The total number of ER visits per

year– Means or Medians are used

for the measure of the central tendency.

– Standard deviations or percentiles are used for the measure of variability.

– When data is skewed, Medians & percentiles are better summary statistics

Page 37: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

How to display Data

• A picture is worth a thousand words !

• To getting a ‘feel’ for the data.

• Categorical data– Frequency tables, Contingency tables (cross

tables), Bar charts, Pie-charts• Quantitative data

– Dot plots, Histograms, Box-Whisker plots*, Scatter plots

Page 38: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Frequency Tables

How Arrived

Frequency Percent Valid Percent

Cumulative

Percent

BUS 11 13.9 13.9 13.9

CAR 66 83.5 83.5 97.5

WALK 2 2.5 2.5 100.0

Valid

Total 79 100.0 100.0

Gender

Frequency Percent Valid Percent

Cumulative

Percent

F 46 58.2 58.2 58.2

M 33 41.8 41.8 100.0

Valid

Total 79 100.0 100.0

Page 39: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Contingency Tables(Crosstabulations)

How Arrived * Gender Crosstabulation

Count

Gender

F M Total

BUS 6 5 11

CAR 38 28 66

How Arrived

WALK 2 0 2

Total 46 33 79

Page 40: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Bar Charts

Page 41: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Pie Charts

Page 42: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Histograms• To catch the patterns of the data• Divide up the data points into several mutually exclusive

intervals –Categorize the data points.

Page 43: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Scatter plots• Usually used to illustrate a relationship b/w two variables.

Page 44: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Box-Whisker Plots

Page 45: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

What have we learn today?

Page 46: Biostatistics in Practice Youngju Pak Biostatistician Peter D. Christenson  Session 1: Quantitative and Inferential.

Assignments • HW #1 is posted on the course website• Pre-Step for HW #1

– Install MYSTAT in your labtop or a computer in your school computer lab with permission from your school (Ask Ms. Aberle for help)

– Download Survey.sav (SPSS data file) from the course website (under Session 1)

• Submit the hard copy of the completed HW in next session.

• Read the article focusing on contents in Table 3 &4 and Figure 4.