Biostatistics in biology. Why we use biostatistics in biology.
Advanced Biostatistics - Simplified
-
Upload
mohammed-alhefzi -
Category
Health & Medicine
-
view
928 -
download
2
description
Transcript of Advanced Biostatistics - Simplified
1
A d v a n c e d
BiostatisticsS i m p l i fi e d
DR. M. ALHEFZI
DR. B. ALHEJAILI
SB
CM
| R1
| Taif
DR. N. ALOTAIBI
DR. M. ALGOTHAMI
PREPARED & PRESENTED BY:
DR. A. KHALAWI
DR. S. ALGHAMDI
SBCM | R1 | Taif
2
WHY BIOSTAT ?!
Collection
Summarization
Analysis – inference.
Interpretation of the results
Abhaya Indrayan (2012). Medical Biostatistics. CRC Press. ISBN 978-1-4398-8414-0. (QR-code above).
SBCM | R1 | Taif
3
Philosophy behind HypothesisWhat is a hypothesis?
CHANCE?!
Mill’s Cannons / Methods – Agreement, Difference, Concomitant, Residues
SBCM | R1 | Taif
4
Am I right or wrong ?!Is it the truth ?!
SBCM | R1 | Taif
5
SIGNIFICANCE
• BIAS?
• CONFOUNDING?
• CHANCE?
• CAUSE / EFFECT?
• GENERALIZABILITY!
SBCM | R1 | Taif
6
My HypothesisHa
TEST!
SBCM | R1 | Taif
7
SBCM | R1 | Taif
8
In other words …
HypothesisTest
Hypothesis
Measure Assoc Sig Reject
or FTR.
SBCM | R1 | Taif
9
So, what language do we speak in biostat?
MATH?
MEAN, MEDIAN, MODE, RANGE …
AREA UNDER THE CURVE, VARIANCE, SD …
MEDICINE?
EXPOSURE, DISEASE, OUTCOME, EFFECTIVITY, PREVENTION
RELATIVE RISK, ABSOLUTE RISK
SBCM | R1 | Taif
10
Biostatisticians’ language
MEAN (μ).
MEDIAN.
MODE.
AREA UNDER THE CURVE: Variance.
SD (σ).
SBCM | R1 | Taif
11
Biostatisticians’ languageStandard Deviation (SD)
SBCM | R1 | Taif
12
Photo courtesy of Judy Davidson, DNP, RN
”
“
SBCM | R1 | Taif
13
WE MAKE MISTAKES!
IN ORDER TO AVOID THEM, WE NEED TO SET RANGES FOR CHANCE, ALSO SET OUR CRITICAL LIMITS. TO END UP WITH A MASTERPIECE OF EVIDENCE!
H0
p-value vs. α level
CI *
SBCM | R1 | Taif
14
SBCM | R1 | Taif
15
Test Hypothesis
SBCM | R1 | Taif
16
Test Hypothesis
ASSUMPTIONS.
STEPS.
TESTS.
SBCM | R1 | Taif
17
Test Hypothesis
ASSUMPTIONS
– Differs for each test.
LARGE SAMPLE SIZE.
NORMAL DISTRIBUTION. Gaussian Dist.
HOMOGENEITY.
NO MULTICOLINIARITY.
KNOWN ( μ & σ ).
INDEPENDENCY.
SBCM | R1 | Taif
18
Test Hypothesis
STEPS– 7 steps of hypothesis testing.
1) RQ ?
2) H0 & H1
3) TEST & ASSUMPTIONS.
4) α LEVEL, P-VALUE.
5) TEST STATISTIC (DF).
6) DECISION.
7) CONCLUSION (YES/NO).
SBCM | R1 | Taif
19
Test Hypothesis
TEST STATISTICS
SBCM | R1 | Taif
20
InputIndep. VA.Exposure
OutputDep. VAOutcome
Disease
Dependency Concept
Each member in this group is exclusively
linked to it
Output changes
whenever input do so
SBCM | R1 | Taif
21
• Summarizing percentage, averages…Univariate
• 2 VABivariate
• Control confoundingsMultivariate
Data Analysis
• Randomization.• Restriction.• Matching.• Stratification.
SBCM | R1 | Taif
22Statistical Tests
Parametric Tests
Student’s t-test
Paired Samples
t-test
ANOVA
Correlation
Regression
SBCM | R1 | Taif
23Statistical Tests
Non-Parametric
Tests
Chi-Square
(χ2)
Wilcoxon
Mann-Whitney (U Test)
Kruskal
Wallis
Logistic Regressio
n
SBCM | R1 | Taif
24
Dependent VA (outcome, output)
2 Cat. >2 Cat. Continuous
Indep. VA
Inputexpos
ure
Cat. χ2 χ2 t-test
> 2 Cat. χ2 χ2 ANOVA
Continuous t-test ANOV
A
CorrelationLinear
Regression
Choosing a Bivariate test
SBCM | R1 | Taif
25Continuous Data
Comparing 2 Gps
t-testComparing >2 Gps
ANOVAAssoc. 2 Gps
Pearson Correlation
Prediction
Regression
SBCM | R1 | Taif
26Ordinal Data
Comparing 2 Gps
Mann-Whitney (U) test.Wilcoxon (Pre-Post).
Comparing >2 Gps
Kruskal WallisAssoc. 2 Gps
Spearman’s ρ
SBCM | R1 | Taif
27Categorical Data
Test of frequency (χ2)
How often something is observed(AKA: Goodness of Fit Test, Test of Homogeneity)
Examples:- Do negative ads change how people vote?- Is there a relationship between marital status and health insurance coverage?
28
SBCM | R1 | Taif
Comparison the difference between groups
Cat. VA (2) Cont. VA
Independent sample(t-test)
Mann-Whitney(U test)
Cont. Dep. VA same group
Paired Sample (t-test) Wilcoxon
Cat. VA (>3) Cont. VA
One Way ANOVA Kruskal Wallis
Association / Strength of Relationship
Cont. VA Cont. VA
Pearson (r) Spearman’s ρ
Prediction
Cont. VA Cont. or Cat.
SLR (Bivariate)
Cont. VA Cont. + Other VAs
MLR
Cat. VA >1 Other VAs
Logistic Regression
By @alhefzi
Choosing the Best Statistical Test
Cat. VA Cat. VA
Chi-Square(χ2 ) McNemar
PMT NPMT
SBCM | R1 | Taif
29
SBCM | R1 | Taif
30
SBCM | R1 | Taif
31Considerations
Normal Distribution & Sample Size. Large sample size ().
Shape by inspection.
Otherwise, do (Kolmogorov Smirnov) to check normality.
If NPMT with Large sample size () less powerful than a PMT.
Gaussian Distribution ().
NPMT with Gaussian distribution, “small” sample size (). (small, Non-Gaussian) ( p-value).
PMT with Non-Gaussian distribution () CLT.
PMT with Non-Gaussian distribution, “small” sample size () CLT won’t work, inaccurate p-value.
SBCM | R1 | Taif
32Considerations
1 or 2 sided p-value
H0 ().
Based on: equal population means. Otherwise, any discrepancy is due to chance!!
Question: WHICH p-value is larger and why? (1 or 2 sided)?
i.e. when formulating your Ha; consider “larger” critical p-value accordingly!
Go for 1 sided (if)
You have formulate a “directional” hypothesis.
Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.
Go for 2 sided (if)
Unsure or in doubt of your hypothesis direction.
Set it BEFORE data collection. Otherwise, you will have to attribute the difference to chance.
SBCM | R1 | Taif
33
2-tailed testBiostatisticians’ language
The critical value is the number that separates the “blue zone” from the middle (± 1.96 this example).
In a t-test, in order to be statistically significant the t score needs to be in the “dark-blue zone”.
If α = .05, then 2.5% of the area is in each tail
SBCM | R1 | Taif
34
1-tailed testBiostatisticians’ language
The critical value is either + or -, but not both.
e.g. in a t-test In this case, you would
have statistical significance (p < .05) if t ≥ 1.645.
SBCM | R1 | Taif
35
Chi-Square (χ2) – as an exampleBiostatisticians’ language
Any number squared is a positive number.
Therefore, area under the curve starts at 0 and goes to infinity (∞).
To be statistically significant, needs to be in the upper 5% (α = .05).
Compares observed frequency to what we expected.
Published on STAT 100 - Statistical Concepts and Reasoning (QR-code above)
SBCM | R1 | Taif
36Considerations
Regression or Correlation
Correlation Regression
Cause-effect relationship X&Y are important to be
set Swapping X&Y in the curve gives different
results
In Gaussian distribution Pearson SLR, MLR
NPMT Spearman’s rho Logistic Regression
37
End of Part I
Thank you…QUESTIONS?
@alhefzi