About the course Introduction: Review of basic...

18
university of copenhagen department of biostatistics Faculty of Health Sciences Introduction: Review of basic ideas Analysis of Variance and Regression, 29th May 2010 Julie Lyng Forman Department of Biostatistics, University of Copenhagen university of copenhagen department of biostatistics Outline About the course Two-sample t-test The paired t-test When model assumptions fail (Sample size calculations) 2 / 71 university of copenhagen department of biostatistics Aim of the course To make the participants able to: understand and interpret statistical analyses judge the assumptions behind the use of various methods of analyses perform own analyses using SAS understand output from a statistical program package - in general, i.e. other than SAS present results from a statistical analysis - numerically and graphically To create a better platform for communication between ’users’ of statistics and statisticians, to benefit subsequent collaboration 3 / 71 university of copenhagen department of biostatistics We expect students to . . . Be interested Be motivated ideally from your own (future) research project Have basic knowledge of statistical concepts such as: mean, average variance, standard deviation, standard error of the mean distribution correlation, regression t-test, χ 2 -test 4 / 71

Transcript of About the course Introduction: Review of basic...

Page 1: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Faculty of Health Sciences

Introduction: Review of basic ideasAnalysis of Variance and Regression, 29th May 2010

Julie Lyng FormanDepartment of Biostatistics, University of Copenhagen

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Outline

About the course

Two-sample t-test

The paired t-test

When model assumptions fail

(Sample size calculations)

2 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Aim of the course

To make the participants able to:I understand and interpret statistical analysesI judge the assumptions behind the use of various methods of

analysesI perform own analyses using SASI understand output from a statistical program package

- in general, i.e. other than SASI present results from a statistical analysis - numerically and

graphically

To create a better platform for communication between ’users’ ofstatistics and statisticians, to benefit subsequent collaboration

3 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

We expect students to . . .

Be interested

Be motivatedI ideally from your own (future) research project

Have basic knowledge of statistical concepts such as:I mean, averageI variance, standard deviation,

standard error of the meanI distributionI correlation, regressionI t-test, χ2-test

4 / 71

Page 2: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Topics for the courseQuantitative data (normal distribution):

I Analysis of varianceI Variance component models

I Regression analysisI General linear modelsI Non-linear modelsI Repeated measurements over time

Non-normal outcome:I Binary data: logistic regressionI Ordinal data: ordinal regression

Not covered:I Counts (Poisson regression)I Censored data (survival analysis)

5 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Recommended readingIn english:

I D.G. Altman: Practical statistics for medical research. Chapmanand Hall.

I P. Armitage, G. Berry & J.N.S Matthews: Statistical methods inmedical research. Blackwell.

I R.P Cody og J.K. Smith: Applied statistics and the SASprogramming language. Prentice Hall.

In danish:I D. Kronborg og L.T. Skovgaard: Regressionsanalyse med

anvendelser i lægevidenskabelig forskning. FADL.I Aa. T. Andersen, T.V. Bedsted, M. Feilberg, R.B. Jakobsen and A.

Milhøj: Elementær indføring i SAS + Statistik med SAS.Akademisk Forlag.

6 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Teaching activities

Lectures:I Tuesday and Thursday mornings (9.15–12.00)I Copies of overheads have to be downloaded in advanceI Coffee break around 10.30

Computers labs:I In the afternoon (13.00-15.45) following each lectureI Coffee, tea, and fruit will be servedI Exercises will be handed outI We use SAS programmingI Solutions can be downloaded after classes

7 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Course diploma

To pass the course 80% attendance is required.I It is your responsibility to sign the list each morning and

afternoonI Note: 8× 2 = 16 lists, 80% equals 13 half days

There is no compulsory home work . . .I but to benefit from the course you need to work with the

material at homeI We expect you to do so!

8 / 71

Page 3: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Outline

About the course

Two-sample t-test

The paired t-test

When model assumptions fail

(Sample size calculations)

9 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Example: Calcium supplement to adolescent girls

Study: 112 11-year old girls wererandomised to get either calciumsupplement or placebo.

Outcome: BMD = bone mineraldensity (g/cm2) was measured 5times over 2 years at 6 monthintervals.

10 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Increase in BMD by treatment

Increase in BMDafter 2 years:

Calcium PlaceboN 44 47Mean 0.1069 0.0879Std.Dev 0.0321 0.0294Std.Error 0.0048 0.0043

Boxplot

11 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Summary statistics

Numerical description of quantitative variables:

Location, centerI average (mean value) x̄ = (x1 + · · ·+ xn)/nI median (middle observation, 50% above and 50% below)

VariationI variance, s2 = Σ(xi − x̄)2/(n − 1) (quadratic units)I standard deviation, s =

√variance (units as outcome)

I quantiles, e.g. Inter Quantile Range (25% to 75% quantile)I standard error, SE = s/

√n (uncertainty of mean estimate)

12 / 71

Page 4: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Two-sample comparisonTest: Is mean increase in BMD the same in the two groups?

The two treatments were applied to separate groups of subjcets– we have two independent samples

Traditional model assumptions:

x11, · · · , x1n1 ∼ N (µ1, σ2)

x21, · · · , x2n2 ∼ N (µ2, σ2)

I all observations are independentI observations follow a normal distribution within each groupI both groups have the same variance, σ2

I the mean values, µ1 and µ2 may differ13 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

The normal distribution

x

Den

sit

y

2

1 1( , )N m s

2

2 2( , )N m s

1 1m s+1 1m s-­ ­ 2 2m s-

2 2m s+2m1m

N (µ, σ2)

The mean is often denotedµ, α etc.

The standard deviation isoften denoted σ

14 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Two sample t-test in SAS

The analysis can be carried out in SAS by:

PROC TTEST DATA=bmd;CLASS grp;VAR increase;

RUN;

Conclusions:I Clear difference in means in favour of calcium:

Estimated difference 0.019 (P=0.0064), CI: (0.006, 0.032)I No detectable difference in variances:

(0.0321 vs. 0.0294, P=0.55)

15 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Output from PROC TTEST

Lower CL Upper CL Lower CLVariable grp N Mean Mean Mean Std Dev Std Dev

increase C 44 0.0971 0.1069 0.1167 0.0265 0.0321increase P 47 0.0793 0.0879 0.0965 0.0244 0.0294increase Diff (1-2) 0.0062 0.019 0.0318 0.0268 0.0307

Upper CLVariable grp Std Dev Std Err Minimum Maximum

increase C 0.0407 0.0048 0.055 0.181increase P 0.0369 0.0043 0.018 0.138increase Diff (1-2) 0.036 0.0064

T-Tests

Variable Method Variances DF t Value Pr > |t|

increase Pooled Equal 89 2.95 0.0041increase Satterthwaite Unequal 86.9 2.94 0.0042

Equality of Variances

Variable Method Num DF Den DF F Value Pr > F

increase Folded F 43 46 1.20 0.5513

16 / 71

Page 5: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

About the two sample t-test

The hypothesis is H0 : µ1 = µ2 (no difference)

t = x1 − x2SE(x1 − x2) = x1 − x2

s ·√1/n1 + 1/n2= 0.019

0.0064 = 2.95

hence P = 0.0041 in a t distribution with 89 degrees of freedom.

The reasoning behind the test statistic:I If x1 is normally distributed N (µ1, σ2/n1)I and x2 is normally distributed N (µ2, σ2/n2)I Then x1 − x2 is normal N (µ1 − µ2, (1/n1 + 1/n2) · σ2)

σ2 is estimated by s2 = ((n1 − 1)s21 + (n2 − 1)s2

2)/(n1 + n2 − 2), the pooled varianceestimate. The degrees of freedom are df = (n1 − 1) + (n2 − 1) = n1 + n2 − 2

17 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Model assumption: Equal variances?The assumption of the classical t-test is that σ2

1 = σ22.

I We can test this assumption (hypothesis H0 : σ21 = σ2

2) by

F = s22

s21

= 0.03212

0.02942 = 1.20

P=0.55 in a F-distribution with (43,46) degrees of freedom.Hence we cannot reject the equality of the variances.

I The assumption is not necessary, use a more general t-test:

t = x1 − x2SE(x1 − x2) = x1 − x2√

s21/n1 + s2

2/n2= 2.94 ∼ t(86.9)

with P = 0.0042 the conclusion is the same as before.Note: The formula for the degrees of freedom is complicated.18 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Model assumption: Normality?

The assumption: data follows a normal distribution within groups.

We can check the assumption bye.g. looking at the histograms:

Calcium

BMD

Den

sity

0.00 0.05 0.10 0.15 0.20

05

1015

Placebo

BMD

Den

sity

0.00 0.05 0.10 0.15 0.20

05

1015

But with large samples theassumption is not necessary:

The validity of the t-test and theconfidence intervals only dependon the distributions of theaverages x̄1 and x̄2 . . .

and averages tend to be normaldue to the central limit theorem.

19 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

The central limit theorem (CLT)Averages of dice rolls are ’much more normal’ than the single rolls

One dice roll

Average0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

2 dice rolls

Average0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

10 dice rolls

Average2 3 4 5

0.0

0.2

0.4

0.6

50 dice rolls

Average2.5 3.0 3.5 4.0 4.5

0.0

0.5

1.0

1.5

20 / 71

Page 6: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Outline

About the course

Two-sample t-test

The paired t-test

When model assumptions fail

(Sample size calculations)

21 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Example: MF vs SV

Two measurement methods,expected to give the same result:

MF: Transmitral volumetric flow,determined by Dopplereccocardiography

SV: Left ventricular strokevolume, determined bycross-sectional eccocardiography

subject MF SV1 47 432 66 703 68 724 69 815 70 60. . .. . .. . .. . .18 105 9819 112 10820 120 13121 132 131

22 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Comparison of measurement methods

Usually a comparison of a new experimental method with anestablished method (the reference)

I How well do the two measurements agree?I Is the new method biased compared to the reference?

The data is pairedI The subjects act as their own controlsI Hence we look at differences within subjects

Set up a statistical model to:I Describe the typical size of the differencesI Test if the bias (i.e. the mean difference) is zero

23 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Description of the dataGraphical description

I ScatterplotI Sample pathsI Bland-Altman plotI Histogram

Numerical description

Variable Mean Std.Dev-------------------------MF 86.05 20.32SV 85.81 21.19DIF 0.24 6.96AVERAGE 85.93 20.46-------------------------

24 / 71

Page 7: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Statistical model for paired data

xi : MF-measurement for the i’th subjectyi : SV-measurement for the i’th subject

Look at the differences:

di = xi − yi , for i = 1, . . . , 21

The model asssumes that the differences are:I independentI normally distributed di ∼ N (δ, σ2

d)

Note: No assumptions about the distribution of the basic flow measurements

25 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Paired t-test in SAS

Can be performed in two different ways:1. as a paired two-sample test

PROC TTEST;PAIRED mf*sv;

RUN;

The TTEST ProcedureStatistics

Lower CL Upper CL Lower CL Upper CLDifference N Mean Mean Mean Std Dev Std Dev Std Devmf - sv 21 -2.932 0.2381 3.4078 5.3275 6.9635 10.056

Difference Std Err Minimum Maximummf - sv 1.5196 -13 10

T-TestsDifference DF t Value Pr > |t|mf - sv 20 0.16 0.8771

26 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

One-sample tests in SAS

2. as a one-sample test on the differences:

PROC UNIVARIATE NORMAL;VAR dif;

RUN;

The UNIVARIATE ProcedureVariable: dif

Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student’s t t 0.156687 Pr > |t| 0.8771Sign M 2.5 Pr >= |M| 0.3593Signed Rank S 8 Pr >= |S| 0.7603

Moments

N 21 Sum Weights 21Mean 0.23809524 Sum Observations 5Std Deviation 6.96351034 Variance 48.4904762... ... ... ...

27 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

About the paired t-test

Test of the null hypothesis H0 : δ = 0 (no bias)

The t-statistic is given by:

t = d − 0SEM = 0.24− 0

6.96/√

21= 0.158 ∼ t(20)

which gives P = 0.88, i.e. no significant bias.

Does this mean that the measurement methods are equally good?

28 / 71

Page 8: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Estimation of bias

The estimated mean difference is given by

d = 0.24 cm3

The estimate is our best guess, but repeating the experimentwould give us a somewhat different result

The estimate has a distribution, with an uncertainty called thestandard error of the estimate.

I The standard error of the mean is given by

SEM = sd√n = 6.96√

21= 1.52 cm3

29 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

General confidence intervals

Confidence intervals tells us what the parameter is likely to beI An interval, that ’catches’ the true mean with a 95%

probability is called a 95% confidence intervalI 95% is called the coverage

The usual construction is:I Average ±t97.5%(n − 1) · SEMI Often a good approximation, even if data are not normally

distributed (due to the central limit theorem)

The t-quantile t97.5% may be looked up in a table or computed by a program (e.g. R,see http://mirrors.dotsrc.org/cran/).

30 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Confidence limits for the bias

For the differences mf-sv, we get the confidence interval:

d ± t97.5%(20) · SEM0.24 ± 2.086 · 6.96/

√21

(−2.93 ; 3.41)

If there is a bias, it is likely (i.e. with 95% certainty) within thelimits (−2.93cm3, 3.41cm3)

Conclusion:We cannot rule out a bias of approx. 3 cm3 in either direction

31 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

P-values and confidence intervals

Tests and confidence intervals are equivalent in a certain senseI They agree on ’reasonable’ values for the mean)I The confidence interval contains the values δ0 for which

H0 : δ = δ0 would be accepted

But the P-value is less informative than the confidence intervalI If the study is large a tiny bias may be significantI If the study is small a large bias may be insignificantI Better use the confidence interval to judge the clinical

implications of the bias!

32 / 71

Page 9: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Note the difference

Standard error (of the mean), SE(M)I telles us something about the uncertainty of the estimate of

the meanI SEM = SD/√n is the standard deviation of the distibution of

the estimateI – is used for comparisons, relations etc.

Standard deviation, SDI tells us something about the variation in our sample,I and presumably in the populationI – is used when describing the data

33 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Normal regions

The normal region is an interval containing 95% of the ’typical’observations, i.e. the midrange of the population:

2.5%-quantile to 97.5%-quantile

If the distribution is normal N (µ, σ2), thenI The 2.5%-quantile is µ− 1.96σI The 97.5%-quantile is µ+ 1.96σ

An estimated normal region is given by:

Average± 2× SD

But this does not account for parameter uncertainty!34 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Prediction intervals

A prediction interval has to ’catch’ future observations with highprobability, say 95%.

The estimated normal region x̄ ± 2s is only a good predictioninterval if the sample is large. If the sample is small the coveragewill be too low.

The right coverage is attained by the prediction interval:

(x̄ − s ·√

1 + 1/n · t2.5%, x̄ + s ·√

1 + 1/n · t97.5%)

I.e. the probability that a randomly chosen subject from thepopulation has a value in this interval is 95% if the data is normal

35 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Limits of agreement

When comparing measuring methods, the prediction interval iscalled limits-of-agreement

I These limits are important for deciding whether or not twomeasurement methods may replace each other.

For the differences mf-sv limits-of-agreement are given by:

0.24± 2.086 ·√

1 + 1/21 · 6.96 = (−14.97, 15.45)

Compared to this the estimated normal region is too narrow:

d ± 2 · sd = 0.24± 2 · 6.96 = (−13.68, 14.16)

36 / 71

Page 10: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Derivation of the prediction interval

Assume that dnew is a new observation, then

dnew − d ∼ N(0, σ2

d ·(1 + 1

n) )

dnew−dsd ·√

1+1/n∼ t(n − 1)

implying that with 95% probability:

t2.5% < dnew−dsd ·√

1+1/n< t97.5%

d + sd√

1 + 1/n · t2.5% < dnew < d + sd ·√

1 + 1/n · t97.5%

d − sd√

1 + 1/n · t97.5% < dnew < d + sd ·√

1 + 1/n · t97.5%

since t2.5% = −t97.5% by symmetry of the t-distribution.37 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Assumptions for the paired comparison

The differences:I are independent: the subjects are unrelatedI are normally distributed: judge graphically or numerically

I by inspection of histograms or QQ-plotsI by formal tests (e.g. PROC UNIVARIATE NORMAL in SAS)

I have identical variances: is judged using the ’Bland-Altmanplot’ of differencs vs. averages

Sometimes it is necessary to tranform the data in order to fulfillthe assumptions

38 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Checking normality: the QQ-plot

Aka the probability plot

Observed quantiles againsttheoretical normal quantiles

If the data is normal, the pointswill be close to the line

39 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Paired or unpaired comparison?

Note the consequences for the difference between MF and SV:I Mean difference: 0.24, CI: (-2.93, 3.41) according to the

paired t-testI Mean difference: 0.24, CI: (-12.71, 13.19) according to the

unpaired t-test, i.e. same estimate of bias, but a much widerconfidence interval

I The latter is wrong!

You have to respect your design.I Do not forget to take advantage of a subject serving as its

own control (higher power with fewer individuals)

40 / 71

Page 11: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Outline

About the course

Two-sample t-test

The paired t-test

When model assumptions fail

(Sample size calculations)

41 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Comparing measurement methods

When comparing two measurement methods:I We have to determine the proper scale

. . .before carrying out the statistical analysis

Is the precision of the measurements approximately the same overthe entire range?

I In that case look at differences on an absolute scaleI Use the differences between the raw measurements

Or does the precision depend on the size of the quantity beingmeasured?

I In that case look at differences on a relative scaleI Do a logarithmic transformation

42 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Another comparison: REFE vs TEST

Two methods for determiningconcentration of glucose:

I REFE: Colour test, may be’polluted’ by urine acid

I TEST: Enzymatic test,more specific for glucose

Ref: R.G. Miller et.al. (eds):Biostatistics Casebook. Wiley,1980.

nr. REFE TEST1 155 1502 160 1553 180 169. . .. . .. . .44 94 8845 111 10246 210 188

average 144.1 134.2SD 91.0 83.2

43 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

The usual analysis

Do we see a systematic difference?Test ’δ=0’ for di = REFEi − TESTi ∼ N (δ, σ2

d)

d̄ = 9.89, sd = 9.70⇒ t = d̄SEM = d̄

sd/√

n = 6.92 ∼ t(45)hence P< 0.0001 , i.e. stong indication of bias.

Limits of agreement tells us that the typical differences are to befound in the interval

9.89± t97.5%(45) ·√

1 + 1/46 · 9.70 = (−9.85, 29.64)

Is this a valid analysis?!?

44 / 71

Page 12: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Plots of the raw dataScatter plot and Bland Altman plot:

I the variance of the differences increases with the level; i.e. themodel assumptions of the usual analysis are violated!

45 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Plots of the log-transformed dataPrecision seem to be relative, hence we do a log-transformation

I The plots look better save from the outlier

46 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Close up

Following a logarithmictransformation the BlandAltman plot looks OK(when omitting the outlier,the smallest observation)

47 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Notes on the log-transformation

I It is the original measurements, that have to be transformedwith the logarithm, not the differences!

I Never make a logarithmic transformation on data that mightbe negative!

I It does not matter which logarithm you choose (i.e. whichbase of the logarithm) since they are all proportional

I The procedure with construction of limits of agreement isnow repeated for the transformed observations

I The result can be transformed back to the original scalewith the anti-logarithm (exp for the natural logarithm)

48 / 71

Page 13: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

The correct analysis

Do we see a systematic difference?Test ’δ=0’ for di = log(REFEi)− log(TESTi) ∼ N (δ, σ2

d)

d̄ = 0.066, sd = 0.042⇒ t = d̄SEM = d̄

sd/√

n = 10.66 ∼ t(45)P< 0.0001 , i.e. stong indication of bias.

Limits of agreement tells us that the typical differences are to befound in the interval

0.066± t97.5%(45) ·√

1 + 1/46 · 0.042 = (−0.020, 0.152)

. . . on Log-scale!

49 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Back transformation

Limits of agreement on Log-scale are given by (−0.020, 0.152),which means that for 95% of the subjects we will have:

−0.020 < log(REFE)− log(TEST) < 0.152

−0.020 < log(REFETEST

)< 0.152

Back transforming (using the exponential function):

0.982 = exp(−0.020) < REFETEST

< exp(0.152) = 1.162

reversed: 0.859 = 11.162 <

TESTREFE

<1

0.982 = 1.02

TEST will typically be between 14% below and 2% above REFE.50 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Limits of agreement on the original scale

51 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Non-normal data

If the normal distribution is not a good description:I Tests and confidence intervals are ok if the sample is

sufficiently large (due to the central limit theorem)

I To judge the reliability for a given sample:I Use resampling techniquesI Or check with a statistician

I Normal regions and limits of agreement becomeuntrustworthy!

52 / 71

Page 14: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Nonparametric models

Alternative statistical models, that do not assume a normaldistribution (note: not assumption free).

Drawbacks:I only apply to simple problems – unless you have plenty

computer power and an advanced computer packageI results are hard to interpret as no specific assumptions are

made about the distribution of the dataI no estimates and confidence intervals are outputted from

SAS or similar statistical softwareI loss of efficiency (typically small)I is of no use at all for tiny data sets

53 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Nonparametric two-sample tests

Wilcoxon’s rank sum test:I Tests if two distributions are the sameI uses the ranks of the joint set of observationsI Can be carried out by e.g. PROC NPAR1WAY in SAS

The Mann-Whitney test:I is equivalent to the to the Wilcoxon test . . .

Different names, same tests!

54 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Nonparametric one-sample or paired tests

The sign testI tests if the median (of the differences) equals zeroI uses only the sign of the observations/differences, not their

size (not very powerful)I Can be carried out by e.g. PROC UNIVARIATE in SAS

Wilcoxon’s signed rank testI tests if the distribution is symmetric around zeroI uses the sign of the observations combined with the rank of

the numerical values (more powerful than the sign test)I Can be carried out by e.g. PROC UNIVARIATE in SAS

55 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Nonparametric comparison of MF and SV

Output from PROC UNIVARIATE:

Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student’s t t 0.156687 Pr > |t| 0.8771Sign M 2.5 Pr >= |M| 0.3593Signed Rank S 8 Pr >= |S| 0.7603

The conclusion remains the same... No significant bias.

56 / 71

Page 15: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Outline

About the course

Two-sample t-test

The paired t-test

When model assumptions fail

(Sample size calculations)

57 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Theory of statistical testing

Making a decission based on a statistical test always implies a riskof making a wrong decission.

accept rejectH0 true 1-α α

error of type IH0 false β 1-β

error of type IISignificance level: α (usually 5%) denotes the risk, that we arewilling to take of rejecting a true hypothesis.

Power: 1-β denotes the probability of rejecting a false hypothesis.I But what does ’H0 false’ mean? How false is H0?

58 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

The power depends on the true differenceIf the true difference in means is xx, what is our chance ofdetecting it with a two-sample t-test on 5% level??

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

10, 16, 25 in each group

size of difference

pow

er

Power is calculated in order tochoose the sample size in theplanning of an investigation.

When the data has already beengathered confidence intervalsshould be presented.

59 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Sample size determination

The required sample size depends on the nature of the data,and on the conclusion we want to make.

Which magnitude of difference are we interested in detecting?I requires knowledge of the problem at hand, especially

biological variationI qualified guesses e.g. from previous investigation or pilot study

What should the chances be of detecting this difference?I the power should be large, at least 80%I we usually test on significance level 5%, or maybe 1%

60 / 71

Page 16: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Example: planning a study

New drug in anaesthesia: XX, given in the dose 0.1 mg/kg.

I Aim: to establish a difference between two groups (but not ifit is uninterestingly small).

I Outcome: Time until some event, e.g. ’head lift’.I Sample size: How many patients do we need??

Use the information from a study on a similar drug:group N time to first response (mean ±SD)1 4 16.3± 2.62 10 10.1± 3.0

Expect: Difference in means about 6 minuteswith SD within groups about 2.5-3 minutes61 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Statistical and clinical significance

Statistical significance depends upon:I the true differenceI the sample sizeI the random variation, i.e. the biological variationI the significance level

Clinical significance depends upon:I the size of the detected difference

62 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Computing sample size

The required sample size for a two-sample t-test can be:I read off from a nomogram or tableI computed by use of a software package

(e.g.power.t.test-function in R or in SAS ANALYST,STATISTICS→SAMPLE SIZE→TWO SAMPLE T TEST).

In order to compute the sample size you need to input:I δ: the clinically relevant difference, often called the minimal

relevant difference (MIREDIF)I s: the standard deviation

I or δ/s: the standardised differenceI 1− β: the desired powerI α: the significance level

Sample size: N is total number assuming equal sized groups.63 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Altman’s nomogram

To find the required samplesize connect δ/s and 1− β,then read off for relevant α.

Example:I δ = 3 and s = 3 impliesδ/s = 1.

I Choose α = 0.05 and1− β = 0.80.

I Then N = 32.

64 / 71

Page 17: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

What if we cannot get that many patients?

Include more centres in a multi center study

Take fewer from the one group, more from the otherI How many in each group?

Perform a paired comparisonI Use the patients as their own controls - how many are needed?

Be content to take less than neededI What is the power for the attainable sample size?

Give up on the investigationI instead of wasting time and money!

65 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Different group sizes?

Equal group sizes is optimal.

If group sizes are unequal, say: n1 in group 1n2 in group 2

}n1 = k · n2

Then the required total gets bigger.

To compute the sample sizes:I compute N as beforeI actual number needed is N ′ = N · (1 + k)2/(4k)I numbers needed in each group are

n1 = N ′k1+k = N(1+k)

4 and n2 = N ′1+k = N(1+k)

4k

66 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Implications for different group sizes

0 10 20 30 40 50 60

010

2030

4050

60

number in first group

num

ber

in s

econ

d gr

oup

Example: If k = 2 andN = 32, then N ′ = 36,n1 = 24 and n2 = 12

Least possible total:32 = 16 + 16

Need at least 8 = N/4 ineach group

67 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

Sample size for the paired design

Standardized difference should be replaced by√

2× clinically relevant differencesd

where sd denotes the standard deviation for the differences.

It is usefull to note that sd =√

2(1− ρ) · s ≤ √2 · s where sdenotes the standard deviation of the original measurements and ρdenotes the correlation between paired observations.

Note: N denotes total number of observations, so the requirednumber of patients is N/2.

68 / 71

Page 18: About the course Introduction: Review of basic ideasstaff.pubhealth.ku.dk/~jufo/courses/vr/oldlectures/intro2010-2x2.pdf · Course diploma To pass the course 80% attendance is required.

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

The summary statistics for ’MF vs SV’ are made using the code:

Note: the data is read in from the file ’mf_sv.txt’(text file with two columns and 21 observations)

DATA mydata;INFILE ’mf_sv.txt’ FIRSTOBS=2;INPUT mf sv;

dif=mf-sv;average=(mf+sv)/2;

RUN;

PROC MEANS DATA=mydata MEAN STD;RUN;

69 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

The pictures for ’MF vs SV’ are made using the code:

proc gplot;plot mf*sv / haxis=axis1 vaxis=axis2 frame;

axis1 value=(H=2) minor=NONE label=(H=2);axis2 value=(H=2) minor=NONE label=(A=90 R=0 H=2);symbol1 v=circle i=none c=BLACK l=1 w=2;run;

proc gplot;plot flow*method=subject/ nolegend haxis=axis1 vaxis=axis2 frame;

axis1 value=(H=2) minor=NONE label=(H=2);axis2 value=(H=2) minor=NONE label=(A=90 R=0 H=2);symbol1 v=circle i=join l=1 w=2 r=21;run;

70 / 71

u n i v e r s i t y o f c o p e n h a g e n d e pa rt m e n t o f b i o s tat i s t i c s

proc gplot;plot dif*average / vref=0 lv=1 vref=0.24 15.5 -15.0 lv=2

haxis=axis1 vaxis=axis2 frame;axis1 value=(H=2) minor=NONE label=(H=2 ’average’);axis2 order=(-16 to 16 by 4) value=(H=2) minor=NONE

label=(A=90 R=0 H=2 ’difference MF-SV’);symbol1 v=circle i=none l=1 w=2;title h=3 ’Bland Altman plot’;run;

title;proc gchart;

vbar dif;run;

71 / 71