Statistical Testing I - uni-kiel.de
Transcript of Statistical Testing I - uni-kiel.de
![Page 1: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/1.jpg)
De gustibus non est disputandum
Statistical Testing I
![Page 2: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/2.jpg)
"Take the Pepsi Challenge" was the motto of a marketing campaign by the Pepsi-Cola Company in the 1980's. A total of 100 Coca-Cola drinkers were asked to blindly taste unmarked cups of Diet Pepsi and Diet Coke, and to select their favorite. A subsequent Pepsi TV commercial stated
The Pepsi Challenge
"... in recent blind taste tests, more than half of all Diet Coke drinkers surveyed said they preferred the taste of Diet Pepsi".
Assume that, out of the 100 Diet Coke drinkers, 56 preferred Diet Pepsi. Would this result support the claim that more than half of all Diet Coke drinkers prefer Diet Pepsi to Diet Coke?
![Page 3: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/3.jpg)
"Scientific Method"
"The validity of knowledge is tied to the probability of falsification."
"Scientific propositions can be falsified empirically. On the other
hand, unscientific claims are always 'right' and cannot be falsified at all."
Karl Popper(1902-1994)
![Page 4: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/4.jpg)
Statistical Testing
current knowledge
falsification
new knowledge
H0 HA
New Knowledge Through Falsification
![Page 5: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/5.jpg)
Decision Making
- Scientific questions are often formulated in the form of mutually exclusive hypotheses (i.e. H0
versus HA) about one or more population parameters.
- A statistical test is a decision rule that allows a researcher to either reject H0 ("statistically significant result") or maintain H0 on the basis of sample data.
![Page 6: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/6.jpg)
Statistical TestingNull Hypothesis
The null hypothesis usually implies the opposite of what a researcher expects (or wishes) to be
true. It often represents conservatism or common opinion.
H0: The expected diastolic blood pressure of patients with aparticular disease equals that of control individuals.
![Page 7: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/7.jpg)
Statistical TestingAlternative Hypothesis
The alternative hypothesis usually implies what a researcher expects (or wishes) to be true.
The alternative hypothesis is regarded as established when the null hypothesis is rejected.
HA: The expected diastolic blood pressure of patients with aparticular disease differs from that of control individuals.
![Page 8: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/8.jpg)
Blood Pressure and Myocardial Infarction
A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control
individuals, namely 80 mmHg.
H0: µ=µ0 HA: µ≠µ0
![Page 9: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/9.jpg)
- All information from the sample data is collapsed in a single numerical quantity, called the test statistic (T).
- The maintenance region of the test comprises all values of T for which H0 is maintained.
- The rejection region comprises all values of T for which H0 is rejected.
- The maintenance and rejection regions are demarcated by the critical values.
Statistical TestingProcedure
![Page 10: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/10.jpg)
T
maintenance regionrejection
regionrejection
region
Statistical Testing
critical value critical value
Procedure
H0
T in maintenance region
T in rejection region
maintain H0
reject H0
![Page 11: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/11.jpg)
maintain H0 correcttype II error
reject H0 correct
decision
truth
Statistical TestingPossible Errors
type Ierror
H0 HA
A type I error is made when H0 is rejected although it is true.
A type II error is made when H0 is maintained although it is wrong.
![Page 12: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/12.jpg)
Significance Level
- A statistical test has significance level α if the probability of making a type I error is at most α.
- Before data collection, the critical values of a test are chosen such that the test has a pre-specified significance level (e.g. 0.05).
- The choice of critical values depends upon the pre-specified significance level and the nature of H0, but not the nature of HA.
Statistical Testing
![Page 13: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/13.jpg)
The significance level of a test of H0 versus HA limits the probability of erroneously claiming a difference between the expected DBP of MI patients and a reference value.
H0: µ=µ0 HA: µ≠µ0
Blood Pressure and Myocardial Infarction
![Page 14: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/14.jpg)
Statistical TestingCritical Values
c1-α/2cα/2
α/2 α/2
T
H0
![Page 15: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/15.jpg)
Procedure
One-sample t-Test
00 :H µ=µ 0A :H µ≠µ
X∼N(µ,σ2) both parameters unknown
T≤tα/2,n-1 or T≥t1-α/2,n-1=-tα/2,n-1
Hypotheses
Test Statistic
RandomVariable
RejectionRegion
'degrees of freedom' (ν)
n/S
XT 0
µ−=
![Page 16: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/16.jpg)
Blood Pressure and Myocardial Infarction
A study was carried out to assess whether the expected diastolic blood pressure (DBP) of patients with myocardial infarction (MI) differs from the expected DBP of control individuals, namely 80 mmHg. The following DBP values
were observed in 9 patients with MI:
92, 87, 79, 87, 99, 82, 74, 83, 103
mmHg 33.87=x
306.2t2.354t 8,975.0 =≥=
mmHg34.9 s =
![Page 17: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/17.jpg)
Quantiles
t-Distribution
![Page 18: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/18.jpg)
Statistical TestingPower
- The probability of making a type II error (i.e. to adhere
to H0 if, in fact, HA is true) is designated as ββββ.
- The converse probability 1-β, i.e. the probability of avoiding a type II error, is called the power of a test.
- The power of a statistical test depends upon the nature of HA, but not the nature of H0.
![Page 19: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/19.jpg)
maintain H0
H0 HA
≥1-α ββββ
reject H0
decision
truth
≤αααα
Error Probabilities
1-β
Statistical Testing
![Page 20: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/20.jpg)
Critical Values
HA
ββββ
c1-α/2cα/2
α/2 α/2
T
H0
Statistical Testing
![Page 21: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/21.jpg)
Blood Pressure and Myocardial Infarction
µ Pµ(T≤-2.306, T≥2.306)
80
81 (79)
85 (75)
90 (70)
0.050
0.058
0.262
0.748
α=0.05
1-β1-β1-β
σ=10 mmHg
H0: µ=80 HA: µ≠80
H0
HA
![Page 22: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/22.jpg)
c1-α/2cα/2
α/2 α/2
T
H0 HA
ββββ
Statistical TestingEffect Size and Power
![Page 23: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/23.jpg)
Statistical Testing
H0 HA
c1-α'/2cα'/2
α'/2 α'/2
T
ββββ'
Significance and Power
![Page 24: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/24.jpg)
Quantiles
t-Distribution
![Page 25: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/25.jpg)
Blood Pressure and Myocardial Infarction
µ Pµ(T≤-2.896, T≥2.896)
80
81 (79)
85 (75)
90 (70)
1-β1-β1-β
H0: µ=80 HA: µ≠80
0.050
0.058
0.262
0.748
0.020
0.024
0.143
0.566
H0
HA
σ=10 mmHg
α=0.02
![Page 26: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/26.jpg)
- reflects a lack of prior knowledge about realistic alternatives to the null hypothesis
- reads "is different from" or "deviates from"
A two-sided alternative hypothesis does not specify a direction of the expected findings and usually
Alternative HypothesesTwo-Sided
HA: The expected diastolic blood pressure of patients with aparticular disease differs from that of control individuals.
![Page 27: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/27.jpg)
T
HA
c1-α/2cα/2
α/2 α/2
H0
ββββ
HA (?)
Alternative HypothesesTwo-Sided
![Page 28: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/28.jpg)
α
HA
c1-αT
H0
ββββ
Alternative HypothesesOne-Sided
![Page 29: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/29.jpg)
- reflects common sense or suitable knowledge from previous scientific experiments
- reads "is larger than", "is heavier than" or "is longer than"
A one-sided alternative hypothesis specifies the direction of the expected findings and usually
Alternative HypothesesOne-Sided
HA: The expected diastolic blood pressure of patients with aparticular disease exceeds that of control individuals.
![Page 30: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/30.jpg)
Clinical Studies
In a clinical study, researchers often wish to compare the respective probability of therapeutic success between a
new medication (πM) and placebo (πP).
HA: πM>πP H0: πM≤πP
significance level upper limit for the probability to declare a useless medication effective
power probability to recognise an effective medication as effective
![Page 31: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/31.jpg)
One-Sided
One-sample t-Test
00 :H µ≥µ 0A :H µ<µ
X∼N(µ,σ2) both parameters unknown
00 :H µ≤µ 0A :H µ>µ
T≤tα,n-1
T≥t1-α,n-1
or
or
Hypotheses
Test Statistic
RandomVariable
RejectionRegion
n/S
XT 0
µ−=
![Page 32: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/32.jpg)
Quantiles
t-Distribution
![Page 33: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/33.jpg)
Blood Pressure and Myocardial Infarction
H0: µ≤80 HA: µ>80
µ
80
75
85
90
1-β1-β
σ=10 mmHg
0.262
0.748
Pµ(T≥1.860)
0.050
0.005
0.392
0.862
α=0.02
H0
HA
Pµ(|T|≥2.306)
![Page 34: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/34.jpg)
Which sample size, n, is required to detect, at significance level α, a given effect
µ-µ0 with power 1-β?
2
0
11 zzn
µ−µ+
⋅σ≥ β−α−
Sample Size
one-sided two-sided
2
0
12/1 zzn
µ−µ+
⋅σ≥ β−α−
One-sample t-Test
![Page 35: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/35.jpg)
1 2 3 4 5
10
100
1000
Sample Size (one-sided)
σ = 10α = 0.05
1-β = 0.90, 0.80, 0.70
µ – µ0
n
One-sample t-Test
![Page 36: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/36.jpg)
1 2 3 4 5
10
100
1000
σ = 10α = 0.05
1-β = 0.90, 0.80, 0.70
µ – µ0
n
Sample Size (two-sided)
One-sample t-Test
![Page 37: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/37.jpg)
H0: Pepsi does not taste better than Coke (π≤0.5). HA: Pepsi tastes better than Coke (π>0.5).
The Pepsi Challenge
c0.05 = 59
Conclusion: The number of Diet Coke drinkers who preferred Diet Pepsi (i.e. 56) was not significantly higher than the
number who preferred Diet Coke (i.e. 44).
( ) 044.05.05.0i
10059TP
100
59i
i100i =⋅⋅
=≥ ∑=
−
( ) 067.05.05.0i
10058TP
100
58i
i100i =⋅⋅
=≥ ∑=
−
![Page 38: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/38.jpg)
"No test based upon the theory of probability can by itself provide any valuable evidence of the truth or
falsehood of a hypothesis."
Neyman J, Pearson E (1933) Phil Trans R Soc A, 231:289-337
Egon Pearson (1895-1980)
Jerzy Neyman (1894-1981)
Statistics and Truth
![Page 39: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/39.jpg)
"It would, therefore, add greatly to the clarity with which the tests of significance are regarded if it were generally understood that the tests of significance, when used accurately, are capable of rejecting or invalidating hypotheses, in so far as they are contradicted by the data: but that they are never capable of establishing them as certainly true."
Ronald A. Fisher(1890-1962)
Statistics and Truth
![Page 40: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/40.jpg)
p
tobs
T
H0
The p-value is the probability of obtaining the observed, or an even less probable, value of T than tobs when the
null hypothesis is correct.
p Value
![Page 41: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/41.jpg)
p ValueEvidence Against H0
0.1
0.01
0.0001
0.001
evidence
1.0
p value
none
"moderate"
"strong"
"very strong"
![Page 42: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/42.jpg)
H0: µ=80 HA: µ≠80H0: µ≤80 HA: µ>80
p = P(T>2.354)= 0.023
( ) 1356.05.05.0i
10056XPp
100
56i
i100i =⋅⋅
=≥= ∑=
−
p = P(|T|>2.354) = 0.046
H0: π≤0.5 HA: π>0.5
The Pepsi Challenge
Blood Pressure and Myocardial Infarction
![Page 43: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/43.jpg)
Pravastatin and Cardiovascular Disease
major cardiovascularoutcome
non-fatal MI or death from CHD
CABG or PTCA
Stroke
0.132
0.188
0.038
placebo(n=2078)
0.102
0.141
0.026
Pravastatin(n=2081) p
0.003
<0.001
0.030
CAGB: coronary artery bypass grafting, PTCA: percutaneous transluminal coronary angioplasty
Sacks FM et al. (1996) N Engl J Med 335: 1001–1009
![Page 44: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/44.jpg)
Negative findings are as important as positive findings because they reduce ignorance and may suggest interesting new hypotheses and lines of investigation. They are also necessary to guide future research in the field of interest
(publication bias).
Negative Findings
![Page 45: Statistical Testing I - uni-kiel.de](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a279511a7b741a34f6541/html5/thumbnails/45.jpg)
Summary
- Statistical problems are usually defined as mutally exclusive hypotheses about population parameters.
- Statistical tests are decision rules to either maintain or reject a given null hypothesis on the basis of sample data.
- When performing a statistical test, two types of error can occur through falsely rejecting either the null hypothesis or the alternative hypothesis.
- The probability of making a type I error is limited by the significance level of the test; the probability of avoiding a type II error is called the power of the test.
- The p value is a measure of the discrepancy between the data and the null hypothesis.