intro2010.files.wordpress.com · Created Date: 3/18/2010 10:17:00 AM
About the course Introduction: Review of basic...
Transcript of About the course Introduction: Review of basic...
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Faculty of Health Sciences
Introduction: Review of basic ideasAnalysis of Variance and Regression, 29th May 2010
Julie Lyng FormanDepartment of Biostatistics, University of Copenhagen
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Outline
About the course
Two-sample t-test
The paired t-test
When model assumptions fail
(Sample size calculations)
2 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Aim of the course
To make the participants able to:I understand and interpret statistical analysesI judge the assumptions behind the use of various methods of
analysesI perform own analyses using SASI understand output from a statistical program package
- in general, i.e. other than SASI present results from a statistical analysis - numerically and
graphically
To create a better platform for communication between ’users’ ofstatistics and statisticians, to benefit subsequent collaboration
3 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
We expect students to . . .
Be interested
Be motivatedI ideally from your own (future) research project
Have basic knowledge of statistical concepts such as:I mean, averageI variance, standard deviation,
standard error of the meanI distributionI correlation, regressionI t-test, χ2-test
4 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Topics for the courseQuantitative data (normal distribution):
I Analysis of varianceI Variance component models
I Regression analysisI General linear modelsI Non-linear modelsI Repeated measurements over time
Non-normal outcome:I Binary data: logistic regressionI Ordinal data: ordinal regression
Not covered:I Counts (Poisson regression)I Censored data (survival analysis)
5 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Recommended readingIn english:
I D.G. Altman: Practical statistics for medical research. Chapmanand Hall.
I P. Armitage, G. Berry & J.N.S Matthews: Statistical methods inmedical research. Blackwell.
I R.P Cody og J.K. Smith: Applied statistics and the SASprogramming language. Prentice Hall.
In danish:I D. Kronborg og L.T. Skovgaard: Regressionsanalyse med
anvendelser i lægevidenskabelig forskning. FADL.I Aa. T. Andersen, T.V. Bedsted, M. Feilberg, R.B. Jakobsen and A.
Milhøj: Elementær indføring i SAS + Statistik med SAS.Akademisk Forlag.
6 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Teaching activities
Lectures:I Tuesday and Thursday mornings (9.15–12.00)I Copies of overheads have to be downloaded in advanceI Coffee break around 10.30
Computers labs:I In the afternoon (13.00-15.45) following each lectureI Coffee, tea, and fruit will be servedI Exercises will be handed outI We use SAS programmingI Solutions can be downloaded after classes
7 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Course diploma
To pass the course 80% attendance is required.I It is your responsibility to sign the list each morning and
afternoonI Note: 8× 2 = 16 lists, 80% equals 13 half days
There is no compulsory home work . . .I but to benefit from the course you need to work with the
material at homeI We expect you to do so!
8 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Outline
About the course
Two-sample t-test
The paired t-test
When model assumptions fail
(Sample size calculations)
9 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Example: Calcium supplement to adolescent girls
Study: 112 11-year old girls wererandomised to get either calciumsupplement or placebo.
Outcome: BMD = bone mineraldensity (g/cm2) was measured 5times over 2 years at 6 monthintervals.
10 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Increase in BMD by treatment
Increase in BMDafter 2 years:
Calcium PlaceboN 44 47Mean 0.1069 0.0879Std.Dev 0.0321 0.0294Std.Error 0.0048 0.0043
Boxplot
11 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Summary statistics
Numerical description of quantitative variables:
Location, centerI average (mean value) x̄ = (x1 + · · ·+ xn)/nI median (middle observation, 50% above and 50% below)
VariationI variance, s2 = Σ(xi − x̄)2/(n − 1) (quadratic units)I standard deviation, s =
√variance (units as outcome)
I quantiles, e.g. Inter Quantile Range (25% to 75% quantile)I standard error, SE = s/
√n (uncertainty of mean estimate)
12 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Two-sample comparisonTest: Is mean increase in BMD the same in the two groups?
The two treatments were applied to separate groups of subjcets– we have two independent samples
Traditional model assumptions:
x11, · · · , x1n1 ∼ N (µ1, σ2)
x21, · · · , x2n2 ∼ N (µ2, σ2)
I all observations are independentI observations follow a normal distribution within each groupI both groups have the same variance, σ2
I the mean values, µ1 and µ2 may differ13 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
The normal distribution
x
Den
sit
y
2
1 1( , )N m s
2
2 2( , )N m s
1 1m s+1 1m s- 2 2m s-
2 2m s+2m1m
N (µ, σ2)
The mean is often denotedµ, α etc.
The standard deviation isoften denoted σ
14 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Two sample t-test in SAS
The analysis can be carried out in SAS by:
PROC TTEST DATA=bmd;CLASS grp;VAR increase;
RUN;
Conclusions:I Clear difference in means in favour of calcium:
Estimated difference 0.019 (P=0.0064), CI: (0.006, 0.032)I No detectable difference in variances:
(0.0321 vs. 0.0294, P=0.55)
15 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Output from PROC TTEST
Lower CL Upper CL Lower CLVariable grp N Mean Mean Mean Std Dev Std Dev
increase C 44 0.0971 0.1069 0.1167 0.0265 0.0321increase P 47 0.0793 0.0879 0.0965 0.0244 0.0294increase Diff (1-2) 0.0062 0.019 0.0318 0.0268 0.0307
Upper CLVariable grp Std Dev Std Err Minimum Maximum
increase C 0.0407 0.0048 0.055 0.181increase P 0.0369 0.0043 0.018 0.138increase Diff (1-2) 0.036 0.0064
T-Tests
Variable Method Variances DF t Value Pr > |t|
increase Pooled Equal 89 2.95 0.0041increase Satterthwaite Unequal 86.9 2.94 0.0042
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
increase Folded F 43 46 1.20 0.5513
16 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
About the two sample t-test
The hypothesis is H0 : µ1 = µ2 (no difference)
t = x1 − x2SE(x1 − x2) = x1 − x2
s ·√1/n1 + 1/n2= 0.019
0.0064 = 2.95
hence P = 0.0041 in a t distribution with 89 degrees of freedom.
The reasoning behind the test statistic:I If x1 is normally distributed N (µ1, σ2/n1)I and x2 is normally distributed N (µ2, σ2/n2)I Then x1 − x2 is normal N (µ1 − µ2, (1/n1 + 1/n2) · σ2)
σ2 is estimated by s2 = ((n1 − 1)s21 + (n2 − 1)s2
2)/(n1 + n2 − 2), the pooled varianceestimate. The degrees of freedom are df = (n1 − 1) + (n2 − 1) = n1 + n2 − 2
17 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Model assumption: Equal variances?The assumption of the classical t-test is that σ2
1 = σ22.
I We can test this assumption (hypothesis H0 : σ21 = σ2
2) by
F = s22
s21
= 0.03212
0.02942 = 1.20
P=0.55 in a F-distribution with (43,46) degrees of freedom.Hence we cannot reject the equality of the variances.
I The assumption is not necessary, use a more general t-test:
t = x1 − x2SE(x1 − x2) = x1 − x2√
s21/n1 + s2
2/n2= 2.94 ∼ t(86.9)
with P = 0.0042 the conclusion is the same as before.Note: The formula for the degrees of freedom is complicated.18 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Model assumption: Normality?
The assumption: data follows a normal distribution within groups.
We can check the assumption bye.g. looking at the histograms:
Calcium
BMD
Den
sity
0.00 0.05 0.10 0.15 0.20
05
1015
Placebo
BMD
Den
sity
0.00 0.05 0.10 0.15 0.20
05
1015
But with large samples theassumption is not necessary:
The validity of the t-test and theconfidence intervals only dependon the distributions of theaverages x̄1 and x̄2 . . .
and averages tend to be normaldue to the central limit theorem.
19 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
The central limit theorem (CLT)Averages of dice rolls are ’much more normal’ than the single rolls
One dice roll
Average0 1 2 3 4 5 6 7
0.0
0.2
0.4
0.6
2 dice rolls
Average0 1 2 3 4 5 6 7
0.0
0.2
0.4
0.6
10 dice rolls
Average2 3 4 5
0.0
0.2
0.4
0.6
50 dice rolls
Average2.5 3.0 3.5 4.0 4.5
0.0
0.5
1.0
1.5
20 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Outline
About the course
Two-sample t-test
The paired t-test
When model assumptions fail
(Sample size calculations)
21 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Example: MF vs SV
Two measurement methods,expected to give the same result:
MF: Transmitral volumetric flow,determined by Dopplereccocardiography
SV: Left ventricular strokevolume, determined bycross-sectional eccocardiography
subject MF SV1 47 432 66 703 68 724 69 815 70 60. . .. . .. . .. . .18 105 9819 112 10820 120 13121 132 131
22 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Comparison of measurement methods
Usually a comparison of a new experimental method with anestablished method (the reference)
I How well do the two measurements agree?I Is the new method biased compared to the reference?
The data is pairedI The subjects act as their own controlsI Hence we look at differences within subjects
Set up a statistical model to:I Describe the typical size of the differencesI Test if the bias (i.e. the mean difference) is zero
23 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Description of the dataGraphical description
I ScatterplotI Sample pathsI Bland-Altman plotI Histogram
Numerical description
Variable Mean Std.Dev-------------------------MF 86.05 20.32SV 85.81 21.19DIF 0.24 6.96AVERAGE 85.93 20.46-------------------------
24 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Statistical model for paired data
xi : MF-measurement for the i’th subjectyi : SV-measurement for the i’th subject
Look at the differences:
di = xi − yi , for i = 1, . . . , 21
The model asssumes that the differences are:I independentI normally distributed di ∼ N (δ, σ2
d)
Note: No assumptions about the distribution of the basic flow measurements
25 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Paired t-test in SAS
Can be performed in two different ways:1. as a paired two-sample test
PROC TTEST;PAIRED mf*sv;
RUN;
The TTEST ProcedureStatistics
Lower CL Upper CL Lower CL Upper CLDifference N Mean Mean Mean Std Dev Std Dev Std Devmf - sv 21 -2.932 0.2381 3.4078 5.3275 6.9635 10.056
Difference Std Err Minimum Maximummf - sv 1.5196 -13 10
T-TestsDifference DF t Value Pr > |t|mf - sv 20 0.16 0.8771
26 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
One-sample tests in SAS
2. as a one-sample test on the differences:
PROC UNIVARIATE NORMAL;VAR dif;
RUN;
The UNIVARIATE ProcedureVariable: dif
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student’s t t 0.156687 Pr > |t| 0.8771Sign M 2.5 Pr >= |M| 0.3593Signed Rank S 8 Pr >= |S| 0.7603
Moments
N 21 Sum Weights 21Mean 0.23809524 Sum Observations 5Std Deviation 6.96351034 Variance 48.4904762... ... ... ...
27 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
About the paired t-test
Test of the null hypothesis H0 : δ = 0 (no bias)
The t-statistic is given by:
t = d − 0SEM = 0.24− 0
6.96/√
21= 0.158 ∼ t(20)
which gives P = 0.88, i.e. no significant bias.
Does this mean that the measurement methods are equally good?
28 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Estimation of bias
The estimated mean difference is given by
d = 0.24 cm3
The estimate is our best guess, but repeating the experimentwould give us a somewhat different result
The estimate has a distribution, with an uncertainty called thestandard error of the estimate.
I The standard error of the mean is given by
SEM = sd√n = 6.96√
21= 1.52 cm3
29 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
General confidence intervals
Confidence intervals tells us what the parameter is likely to beI An interval, that ’catches’ the true mean with a 95%
probability is called a 95% confidence intervalI 95% is called the coverage
The usual construction is:I Average ±t97.5%(n − 1) · SEMI Often a good approximation, even if data are not normally
distributed (due to the central limit theorem)
The t-quantile t97.5% may be looked up in a table or computed by a program (e.g. R,see http://mirrors.dotsrc.org/cran/).
30 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Confidence limits for the bias
For the differences mf-sv, we get the confidence interval:
d ± t97.5%(20) · SEM0.24 ± 2.086 · 6.96/
√21
(−2.93 ; 3.41)
If there is a bias, it is likely (i.e. with 95% certainty) within thelimits (−2.93cm3, 3.41cm3)
Conclusion:We cannot rule out a bias of approx. 3 cm3 in either direction
31 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
P-values and confidence intervals
Tests and confidence intervals are equivalent in a certain senseI They agree on ’reasonable’ values for the mean)I The confidence interval contains the values δ0 for which
H0 : δ = δ0 would be accepted
But the P-value is less informative than the confidence intervalI If the study is large a tiny bias may be significantI If the study is small a large bias may be insignificantI Better use the confidence interval to judge the clinical
implications of the bias!
32 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Note the difference
Standard error (of the mean), SE(M)I telles us something about the uncertainty of the estimate of
the meanI SEM = SD/√n is the standard deviation of the distibution of
the estimateI – is used for comparisons, relations etc.
Standard deviation, SDI tells us something about the variation in our sample,I and presumably in the populationI – is used when describing the data
33 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Normal regions
The normal region is an interval containing 95% of the ’typical’observations, i.e. the midrange of the population:
2.5%-quantile to 97.5%-quantile
If the distribution is normal N (µ, σ2), thenI The 2.5%-quantile is µ− 1.96σI The 97.5%-quantile is µ+ 1.96σ
An estimated normal region is given by:
Average± 2× SD
But this does not account for parameter uncertainty!34 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Prediction intervals
A prediction interval has to ’catch’ future observations with highprobability, say 95%.
The estimated normal region x̄ ± 2s is only a good predictioninterval if the sample is large. If the sample is small the coveragewill be too low.
The right coverage is attained by the prediction interval:
(x̄ − s ·√
1 + 1/n · t2.5%, x̄ + s ·√
1 + 1/n · t97.5%)
I.e. the probability that a randomly chosen subject from thepopulation has a value in this interval is 95% if the data is normal
35 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Limits of agreement
When comparing measuring methods, the prediction interval iscalled limits-of-agreement
I These limits are important for deciding whether or not twomeasurement methods may replace each other.
For the differences mf-sv limits-of-agreement are given by:
0.24± 2.086 ·√
1 + 1/21 · 6.96 = (−14.97, 15.45)
Compared to this the estimated normal region is too narrow:
d ± 2 · sd = 0.24± 2 · 6.96 = (−13.68, 14.16)
36 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Derivation of the prediction interval
Assume that dnew is a new observation, then
dnew − d ∼ N(0, σ2
d ·(1 + 1
n) )
dnew−dsd ·√
1+1/n∼ t(n − 1)
implying that with 95% probability:
t2.5% < dnew−dsd ·√
1+1/n< t97.5%
d + sd√
1 + 1/n · t2.5% < dnew < d + sd ·√
1 + 1/n · t97.5%
d − sd√
1 + 1/n · t97.5% < dnew < d + sd ·√
1 + 1/n · t97.5%
since t2.5% = −t97.5% by symmetry of the t-distribution.37 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Assumptions for the paired comparison
The differences:I are independent: the subjects are unrelatedI are normally distributed: judge graphically or numerically
I by inspection of histograms or QQ-plotsI by formal tests (e.g. PROC UNIVARIATE NORMAL in SAS)
I have identical variances: is judged using the ’Bland-Altmanplot’ of differencs vs. averages
Sometimes it is necessary to tranform the data in order to fulfillthe assumptions
38 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Checking normality: the QQ-plot
Aka the probability plot
Observed quantiles againsttheoretical normal quantiles
If the data is normal, the pointswill be close to the line
39 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Paired or unpaired comparison?
Note the consequences for the difference between MF and SV:I Mean difference: 0.24, CI: (-2.93, 3.41) according to the
paired t-testI Mean difference: 0.24, CI: (-12.71, 13.19) according to the
unpaired t-test, i.e. same estimate of bias, but a much widerconfidence interval
I The latter is wrong!
You have to respect your design.I Do not forget to take advantage of a subject serving as its
own control (higher power with fewer individuals)
40 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Outline
About the course
Two-sample t-test
The paired t-test
When model assumptions fail
(Sample size calculations)
41 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Comparing measurement methods
When comparing two measurement methods:I We have to determine the proper scale
. . .before carrying out the statistical analysis
Is the precision of the measurements approximately the same overthe entire range?
I In that case look at differences on an absolute scaleI Use the differences between the raw measurements
Or does the precision depend on the size of the quantity beingmeasured?
I In that case look at differences on a relative scaleI Do a logarithmic transformation
42 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Another comparison: REFE vs TEST
Two methods for determiningconcentration of glucose:
I REFE: Colour test, may be’polluted’ by urine acid
I TEST: Enzymatic test,more specific for glucose
Ref: R.G. Miller et.al. (eds):Biostatistics Casebook. Wiley,1980.
nr. REFE TEST1 155 1502 160 1553 180 169. . .. . .. . .44 94 8845 111 10246 210 188
average 144.1 134.2SD 91.0 83.2
43 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
The usual analysis
Do we see a systematic difference?Test ’δ=0’ for di = REFEi − TESTi ∼ N (δ, σ2
d)
d̄ = 9.89, sd = 9.70⇒ t = d̄SEM = d̄
sd/√
n = 6.92 ∼ t(45)hence P< 0.0001 , i.e. stong indication of bias.
Limits of agreement tells us that the typical differences are to befound in the interval
9.89± t97.5%(45) ·√
1 + 1/46 · 9.70 = (−9.85, 29.64)
Is this a valid analysis?!?
44 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Plots of the raw dataScatter plot and Bland Altman plot:
I the variance of the differences increases with the level; i.e. themodel assumptions of the usual analysis are violated!
45 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Plots of the log-transformed dataPrecision seem to be relative, hence we do a log-transformation
I The plots look better save from the outlier
46 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Close up
Following a logarithmictransformation the BlandAltman plot looks OK(when omitting the outlier,the smallest observation)
47 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Notes on the log-transformation
I It is the original measurements, that have to be transformedwith the logarithm, not the differences!
I Never make a logarithmic transformation on data that mightbe negative!
I It does not matter which logarithm you choose (i.e. whichbase of the logarithm) since they are all proportional
I The procedure with construction of limits of agreement isnow repeated for the transformed observations
I The result can be transformed back to the original scalewith the anti-logarithm (exp for the natural logarithm)
48 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
The correct analysis
Do we see a systematic difference?Test ’δ=0’ for di = log(REFEi)− log(TESTi) ∼ N (δ, σ2
d)
d̄ = 0.066, sd = 0.042⇒ t = d̄SEM = d̄
sd/√
n = 10.66 ∼ t(45)P< 0.0001 , i.e. stong indication of bias.
Limits of agreement tells us that the typical differences are to befound in the interval
0.066± t97.5%(45) ·√
1 + 1/46 · 0.042 = (−0.020, 0.152)
. . . on Log-scale!
49 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Back transformation
Limits of agreement on Log-scale are given by (−0.020, 0.152),which means that for 95% of the subjects we will have:
−0.020 < log(REFE)− log(TEST) < 0.152
−0.020 < log(REFETEST
)< 0.152
Back transforming (using the exponential function):
0.982 = exp(−0.020) < REFETEST
< exp(0.152) = 1.162
reversed: 0.859 = 11.162 <
TESTREFE
<1
0.982 = 1.02
TEST will typically be between 14% below and 2% above REFE.50 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Limits of agreement on the original scale
51 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Non-normal data
If the normal distribution is not a good description:I Tests and confidence intervals are ok if the sample is
sufficiently large (due to the central limit theorem)
I To judge the reliability for a given sample:I Use resampling techniquesI Or check with a statistician
I Normal regions and limits of agreement becomeuntrustworthy!
52 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Nonparametric models
Alternative statistical models, that do not assume a normaldistribution (note: not assumption free).
Drawbacks:I only apply to simple problems – unless you have plenty
computer power and an advanced computer packageI results are hard to interpret as no specific assumptions are
made about the distribution of the dataI no estimates and confidence intervals are outputted from
SAS or similar statistical softwareI loss of efficiency (typically small)I is of no use at all for tiny data sets
53 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Nonparametric two-sample tests
Wilcoxon’s rank sum test:I Tests if two distributions are the sameI uses the ranks of the joint set of observationsI Can be carried out by e.g. PROC NPAR1WAY in SAS
The Mann-Whitney test:I is equivalent to the to the Wilcoxon test . . .
Different names, same tests!
54 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Nonparametric one-sample or paired tests
The sign testI tests if the median (of the differences) equals zeroI uses only the sign of the observations/differences, not their
size (not very powerful)I Can be carried out by e.g. PROC UNIVARIATE in SAS
Wilcoxon’s signed rank testI tests if the distribution is symmetric around zeroI uses the sign of the observations combined with the rank of
the numerical values (more powerful than the sign test)I Can be carried out by e.g. PROC UNIVARIATE in SAS
55 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Nonparametric comparison of MF and SV
Output from PROC UNIVARIATE:
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student’s t t 0.156687 Pr > |t| 0.8771Sign M 2.5 Pr >= |M| 0.3593Signed Rank S 8 Pr >= |S| 0.7603
The conclusion remains the same... No significant bias.
56 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Outline
About the course
Two-sample t-test
The paired t-test
When model assumptions fail
(Sample size calculations)
57 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Theory of statistical testing
Making a decission based on a statistical test always implies a riskof making a wrong decission.
accept rejectH0 true 1-α α
error of type IH0 false β 1-β
error of type IISignificance level: α (usually 5%) denotes the risk, that we arewilling to take of rejecting a true hypothesis.
Power: 1-β denotes the probability of rejecting a false hypothesis.I But what does ’H0 false’ mean? How false is H0?
58 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
The power depends on the true differenceIf the true difference in means is xx, what is our chance ofdetecting it with a two-sample t-test on 5% level??
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
10, 16, 25 in each group
size of difference
pow
er
Power is calculated in order tochoose the sample size in theplanning of an investigation.
When the data has already beengathered confidence intervalsshould be presented.
59 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Sample size determination
The required sample size depends on the nature of the data,and on the conclusion we want to make.
Which magnitude of difference are we interested in detecting?I requires knowledge of the problem at hand, especially
biological variationI qualified guesses e.g. from previous investigation or pilot study
What should the chances be of detecting this difference?I the power should be large, at least 80%I we usually test on significance level 5%, or maybe 1%
60 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Example: planning a study
New drug in anaesthesia: XX, given in the dose 0.1 mg/kg.
I Aim: to establish a difference between two groups (but not ifit is uninterestingly small).
I Outcome: Time until some event, e.g. ’head lift’.I Sample size: How many patients do we need??
Use the information from a study on a similar drug:group N time to first response (mean ±SD)1 4 16.3± 2.62 10 10.1± 3.0
Expect: Difference in means about 6 minuteswith SD within groups about 2.5-3 minutes61 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Statistical and clinical significance
Statistical significance depends upon:I the true differenceI the sample sizeI the random variation, i.e. the biological variationI the significance level
Clinical significance depends upon:I the size of the detected difference
62 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Computing sample size
The required sample size for a two-sample t-test can be:I read off from a nomogram or tableI computed by use of a software package
(e.g.power.t.test-function in R or in SAS ANALYST,STATISTICS→SAMPLE SIZE→TWO SAMPLE T TEST).
In order to compute the sample size you need to input:I δ: the clinically relevant difference, often called the minimal
relevant difference (MIREDIF)I s: the standard deviation
I or δ/s: the standardised differenceI 1− β: the desired powerI α: the significance level
Sample size: N is total number assuming equal sized groups.63 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Altman’s nomogram
To find the required samplesize connect δ/s and 1− β,then read off for relevant α.
Example:I δ = 3 and s = 3 impliesδ/s = 1.
I Choose α = 0.05 and1− β = 0.80.
I Then N = 32.
64 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
What if we cannot get that many patients?
Include more centres in a multi center study
Take fewer from the one group, more from the otherI How many in each group?
Perform a paired comparisonI Use the patients as their own controls - how many are needed?
Be content to take less than neededI What is the power for the attainable sample size?
Give up on the investigationI instead of wasting time and money!
65 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Different group sizes?
Equal group sizes is optimal.
If group sizes are unequal, say: n1 in group 1n2 in group 2
}n1 = k · n2
Then the required total gets bigger.
To compute the sample sizes:I compute N as beforeI actual number needed is N ′ = N · (1 + k)2/(4k)I numbers needed in each group are
n1 = N ′k1+k = N(1+k)
4 and n2 = N ′1+k = N(1+k)
4k
66 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Implications for different group sizes
0 10 20 30 40 50 60
010
2030
4050
60
number in first group
num
ber
in s
econ
d gr
oup
Example: If k = 2 andN = 32, then N ′ = 36,n1 = 24 and n2 = 12
Least possible total:32 = 16 + 16
Need at least 8 = N/4 ineach group
67 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
Sample size for the paired design
Standardized difference should be replaced by√
2× clinically relevant differencesd
where sd denotes the standard deviation for the differences.
It is usefull to note that sd =√
2(1− ρ) · s ≤ √2 · s where sdenotes the standard deviation of the original measurements and ρdenotes the correlation between paired observations.
Note: N denotes total number of observations, so the requirednumber of patients is N/2.
68 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
The summary statistics for ’MF vs SV’ are made using the code:
Note: the data is read in from the file ’mf_sv.txt’(text file with two columns and 21 observations)
DATA mydata;INFILE ’mf_sv.txt’ FIRSTOBS=2;INPUT mf sv;
dif=mf-sv;average=(mf+sv)/2;
RUN;
PROC MEANS DATA=mydata MEAN STD;RUN;
69 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
The pictures for ’MF vs SV’ are made using the code:
proc gplot;plot mf*sv / haxis=axis1 vaxis=axis2 frame;
axis1 value=(H=2) minor=NONE label=(H=2);axis2 value=(H=2) minor=NONE label=(A=90 R=0 H=2);symbol1 v=circle i=none c=BLACK l=1 w=2;run;
proc gplot;plot flow*method=subject/ nolegend haxis=axis1 vaxis=axis2 frame;
axis1 value=(H=2) minor=NONE label=(H=2);axis2 value=(H=2) minor=NONE label=(A=90 R=0 H=2);symbol1 v=circle i=join l=1 w=2 r=21;run;
70 / 71
u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s
proc gplot;plot dif*average / vref=0 lv=1 vref=0.24 15.5 -15.0 lv=2
haxis=axis1 vaxis=axis2 frame;axis1 value=(H=2) minor=NONE label=(H=2 ’average’);axis2 order=(-16 to 16 by 4) value=(H=2) minor=NONE
label=(A=90 R=0 H=2 ’difference MF-SV’);symbol1 v=circle i=none l=1 w=2;title h=3 ’Bland Altman plot’;run;
title;proc gchart;
vbar dif;run;
71 / 71