Statistics and Epidemiology Robert F. Waters, Ph.D.
description
Transcript of Statistics and Epidemiology Robert F. Waters, Ph.D.
Statistics and EpidemiologyRobert F. Waters, Ph.D.
Statistics– “status” (manner of standing)
– In medicine• Biostatistics
• Biometrics
Epidemiology– Epi (upon) demos (people)
– Study of health and illness in human populations
– Pattern recognition
Reasons to use Biostatistics
Evaluation of medical research Applying study results to patient care Understanding epidemiological problems Interpreting information about drugs Evaluating study protocols Participating and directing research projects
Elementary Probability Theory
Probability of success (p)– p = Pr{E} = h/n
• h = # of ways
• n = total number of ways
Probability of failure (q)– q = Pr{not E} = n - h/n = 1 - h/n= 1-p
p + q = 1.00
Probability Example:
– 1000 tosses of fair coin get 529 heads– Another 1000 tosses gives 493 heads– Keep repeating tosses should approach p = .5
Cards– Mutually exclusive events..add Pr
• What is probability of drawing an Ace?– 4/52
• What is probability of drawing a king?– 4/52
• How about an ace and king? (With Replacement)– 4/52 + 4/52
Probability
How about dice?– Throwing two fair die
• Probability of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
• Let’s play “craps”
Combinatorial Analysis
What is a factorial?– N! = n(n – 1)(n – 2) ….. 1
– What is 5! ?
Permutation– nPr = n(n – 1)(n – 2) …. (n – r + 1) or:
• n!/(n – r)!
– Problem: How many permutations of the letters a b c be taken 2 at a time?
• 3!/(3 – 2)! = 3 x 2 = 6
• List the permutations.
Combinatorial Analysis
Combinations (Order does matter)– nCr = n!/r!(n – r)! Or nPr/r!– Number of combinations of a b c taken two at a
time.• 3C2 = 3!/2!(3 – 2)! = ?
• List the combinations.
Data
Continuous variable– Temperature
Discrete variable– Number of children in a family
• Can’t have 2.3 children
Nominal data– Pretty, tall, etc.
Ordinal data– 0, 3, 5 (order from worst to best)
What is a Population?
Infinite? Finite We have a sample! Sometimes we need to sample from a large
population. Therefore statistics is called..
– Statistical Inference– Inductive Statistics
• Trying to characterize infinite population
Measures of Central Tendency Mean
– Arithmetic mean– Harmonic mean (RMS)– Geometric mean
Median Mode
When would the mean median and mode be the same?
What is a variate?
Measures of Dispersion
Old story of two surgical students! Variance
–
Standard Deviation
Properties of Standard Deviation
+ & - 1s from mean– 68.27%
+ & -2s from mean– 95.45%
+ & -3s from mean– 99.73%
Problem: Heights in Class
Moments, Skewness, Kurtosis Major problem in biometrics Moments about the mean
– First moment• Arithmetic mean
– Second moment• Variance
– Skewness• Degree of asymmetry
– Kurtosis• Leptokurtic (narrow)• Platykurtic (flattened)• Mesokurtic (normal)
Elementary Sampling Theory Many problems in biometrics
– Random samples– Without bias– Error evenly distributed– Level of significance (usually 0.05 in science)– Degrees of freedom
• Orthogonality (Comparisons)– Example:
• Ways to account for sources of variation• Patients with different doctors in different clinics• Patients with same doctors different clinics• Patients with same doctors same clinic
Application to Epidemiology Binomial Distribution
– p(X) = nCxpxqn-x = [N!/X!(N-X)!] pxqn-x – Problem:
• What is the probability in a family of four children there will be at least 1 boy?
– 1 boy 4C1 (1/2)1 * (1/2)3
• = 4!/1!(4 – 1)! * ½ * 1/8 = ¼– 2 boys = 3/8– 3 boys = ¼– 4 boys = 1/16
• 4C1(1/2)4 * (1/2)0
• What is probability of at least one boy?– Pr(1boy) + Pr(2boys) + Pr(3boys) + Pr(4boys)– ¼ + 3/8 + ¼ + 1/16 = 15/16
Application of Binomial
Out of 2000 families with four children, how many have at least one boy?– 1875
Out of 2000 families with four children, how many are expected to have two boys?– 750
How can we tell if something is wrong? Chi-square
– Compares observed with expected
Statistical Decision Theory
P value Statistical significance One-tailed vs. Two-tailed test Confidence intervals Standard Error
– Standard deviation
The Correlation
Two independent variables Ice cream in Georgia story
The Regression
Dependent with Independent Variable Least Squares analysis
Multiple Linear Regression Analysis (MLRA)
One dependent and multiple independent variables
Predictive? Problems
– Variables normally distributed– Equal variances– True independence between independent
variables
Hardy Weinberg Equilibrium
Alleles in populations tend towards H-W equilibrium
Answers the questions:– How can O be the most common of the blood types if it
is a recessive trait?
– If Huntington's disease is a dominant trait, shouldn't three-fourths of the population have Huntington's while one-fourth have the normal phenotype?
– Shouldn't recessive traits be gradually be swamped out so they disappear from the population?
Hardy Weinberg Cont:
Hardy Weinberg equilibrium is achieved if:
– There is a large population– There is random mating– No selection for a particular allele– No mutations– No migration or isolation
Trend Analysis
Autocorrelation Predictive with assumptions
Discussion Questions What should you expect in a paper (epidemiology) that
uses statistics? Why not just compare means of samples? Can we always assume statistical assumptions to be
correct? When should a correlation be used? How about a linear regression? MLRA? Binomial? Hardy Weinberg? Chi-Square?