Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29...

Post on 23-Jan-2021

1 views 0 download

Transcript of Use, misuse, and abuse of statistics - KU Leuven · EANS: Use, misuse, and abuse of statistics 29...

Use, misuse, and abuse of statistics

The European Academy of Nursing ScienceYear 2, Friday

Geert Verbeke

Interuniversity Institute for Biostatisticsand statistical Bioinformatics

geert.verbeke@kuleuven.be

http://perswww.kuleuven.be/geert_verbeke

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Errors in statistics: Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Errors in statistics: Practical implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Clustered data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Missing observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

EANS: Use, misuse, and abuse of statistics i

Chapter 1

Introduction

. Focus of the course

. Course materials

EANS: Use, misuse, and abuse of statistics 1

1.1 I will NOT talk about . . .

• Mathematics

• Technical details

• Software

• Algorithms

• . . .

EANS: Use, misuse, and abuse of statistics 2

1.2 I will focus on . . .

• Use and misuse of statistics

• Frequently observed errors

• Some misconceptions

• Applications

• Publications

• Intuition

• . . .

EANS: Use, misuse, and abuse of statistics 3

1.3 Course material

• Course notes, also available from:

http://perswww.kuleuven.be/geert_verbeke/courses

• Online voting tool ‘Poll Everywhere’:

http://pollev.com/geertverbeke

iOS equipment

↑EANS: Use, misuse, and abuse of statistics 4

Chapter 2

Errors in statistics: Basic concepts

. Introduction

. Two types of errors

EANS: Use, misuse, and abuse of statistics 5

2.1 Introduction

• Consider the comparison of weight gains in rats with high (group 1) or low (group 2)protein level diets:

• On average, there is an observed difference of 19g between both groups.

• Formal comparison can be based on the unpaired t-test in which one tests

H0 : µ1 = µ2 versus HA : µ1 6= µ2,

where µ1 and µ2 are the means of large populations of rats fed with high or lowprotein level diets, respectively.

EANS: Use, misuse, and abuse of statistics 6

POPULATION

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

S

A

M

P

L

E

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Hypotheses to test

H0 : µ1 = µ2

HA : µ1 6= µ2 ?Estimates for µ1 and µ2

µ1 = 120

µ2 = 101

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

••••••••••••••••••••••••••••••••••••••••••••••••

INFERENCE AND ESTIMATIONRANDOM

EANS: Use, misuse, and abuse of statistics 7

• Result:

There is no significant difference (p = 0.0757) in weight gain

between rats on a high protein level diet,

and rats on a low protein level diet

• The result of any statistical test should be interpreted as evidence in favour or againstthe null hypothesis, and should not be interpreted as formal proof.

• In our example, maybe a true difference was too small to be detected based on such asmall experiment.

• Alternatively, p = 0.001 would only indicate that the observed difference of 19g isunlikely to occur by pure chance, but maybe our sample was indeed the extreme onethat happens once every 1000 experiments.

EANS: Use, misuse, and abuse of statistics 8

• Hence, whenever statistical tests are used, errors in the conclusions can occur.

• It is therefore important to quantify the errors, and to keep them under control

• This is the case for our t-test but also for any other test, i.e., each time a p-value iscalculated and interpreted:

. (un-)paired t-test

. chi-squared test

. linear regression

. ANOVA

. logistic regression

. . . .

EANS: Use, misuse, and abuse of statistics 9

2.2 Two types of errors

RealityH0 correct H0 not correct

Test resultAccept H0 No error Type II error

Reject H0 Type I error No error

• Type I error: H0 is incorrectly rejected

• Type II error: H0 is incorrectly accepted

EANS: Use, misuse, and abuse of statistics 10

• The probability of a type I error can easily be controled by choosing the level ofsignificance α sufficiently small:

P (Type I error) = α =

1%

5%

10%

• The probability of a type II error can only be controled by conducting sufficiently largeexperiments:

P (Type II error) = 1 − power =⇒

power calculation

sample size calculation

EANS: Use, misuse, and abuse of statistics 11

Chapter 3

Errors in statistics: Practical implications

. Multiple testing

. Bonferroni correction

. Tests for baseline differences

. Equivalence tests

. Significance versus relevance

. Examples from biomedical literature

EANS: Use, misuse, and abuse of statistics 12

3.1 Multiple testing

• Each time a test is performed, there is probability α of making a type I error

• For example, if α = 0.05, we can expect to incorrectly reject the null hypothesisin 5 out of 100 times.

• Implication:

“The more tests one performs, the higher the probabilitythat something is detected by pure chance”

• This problem of multiple testing occurs very frequently in bio-medical sciences, invarious settings

EANS: Use, misuse, and abuse of statistics 13

3.1.1 Example: A classroom experiment

• On entry in the classroom, assign each student at random a seat at the left or at theright side of the classroom

• Compare both sides with respect to 100 aspects including weight, height, age, gender,color of hair, color of eyes,. . .

• It is to be expected that for at least 5 of these outcomes, a significant difference isobtained at the 5% level of significance, by pure chance.

EANS: Use, misuse, and abuse of statistics 14

3.1.2 Example: Testing many relations

• Amin et al. [1], Table 2:

. 18 tests performed

. only 2 significant results

EANS: Use, misuse, and abuse of statistics 15

3.1.3 Example: Subgroup analyses

• Kaplan et al. [2], Table 5:

. Tests based on C.I.’s for odds ratios

. C.I. containing 1 is equivalent to anon-significant test result

. 21 × 3 = 63 tests performed

. only 5 significant results

EANS: Use, misuse, and abuse of statistics 16

3.1.4 Example: Searching for the most significant results

• This ‘scientific finding’ was printed in the Belgian newspapers:

• It was even stated that those who wake up before 7.21am have a statisticallysignificant higher stress level during the day than those who wake up after 7.21am.

EANS: Use, misuse, and abuse of statistics 17

3.1.5 Conclusion

• Significant results obtained by multiple testing are often overinterpreted

• If the number of tests is reported, the reader knows that such results need to beinterpreted with extreme care

• The problem arises when only the significant results are reported, and one does notknow how many tests were performed in total

• This leads to reporting results which turn out to be not reproducible

• For example, a new study would not find that students seated on the left are tallerthan those on the right. Instead, they might weigh more.

• For example, a new experiment might show no difference in stress levels betweensubjects waking up early and those waking up late. Or maybe the critical wake uptime would be 8.12am.

EANS: Use, misuse, and abuse of statistics 18

3.2 Bonferroni correction

• Suppose two tests are performed, both at the 5% level of significance.

• The probability that at least one type I error will be made can be shown not to exceed2 × 0.05 = 0.10:

P (at least 1 type I error) ≤ 2 × 5% = 10%

• In general, if k tests are performed, all at the 5% level of significance, the probabilityof making at least one type I error can only be shown not to exceed k × 5%

• Obviously, controling the overall type I error rate can be done by performing eachseparate test at the α/k level of significance.

EANS: Use, misuse, and abuse of statistics 19

• For example, performing 2 tests at the 2.5% level of significance each implies that theprobability of making at least one type I error will not exceed 5%.

• In general, when k tests are performed at the α/k level of significance, one is surethat the overall probability of making at least one type I error will not exceed α.

• This correction of the significance level is called the Bonferroni correction.

• Note that, strictly speaking, the Bonferroni correction is an overcorrection, since theoverall type I error rate can only be shown not to exceed 5%, and usually will besmaller than the required 5%.

• In some specific testing situations (e.g., ANOVA analysis), more accurate correctionsare available (e.g., Tukey test)

EANS: Use, misuse, and abuse of statistics 20

3.3 Examples from the biomedical literature

• Baba et al. [3], p.1202 and p.1203:

EANS: Use, misuse, and abuse of statistics 21

• Kellett et al. [4], Table 2 (for example):

EANS: Use, misuse, and abuse of statistics 22

In the discussion, R.Roy writes:

Note that the reader cannot perform the Bonferroni correction as the exact p-valueshave not been reported.

EANS: Use, misuse, and abuse of statistics 23

3.4 Tests for baseline differences

• In order to show causal effects, patients are often randomized into 2 or more groups

• This ensures (at least in large studies) that all treatment groups are identical, exceptfor the treatment the patients receive

• In (relatively) small studies, imbalances can still occur by pure chance

• Therefore, one often compares the various groups with respect to important factorswhich are believed to be strongly related to the outcome of interest.

• This is called testing for baseline differences, as one compares the characteristicsof the patients at the start of the study.

EANS: Use, misuse, and abuse of statistics 24

• As an example, suppose interest is to compare two oral treatments, A and B, for thetreatment of hypertension.

• Suppose the change in diastolic BP is the oucome of interest

• Age is one of the factors believed to be strongly related to BP. Therefore, it isimportant that both treatment groups have the same age distribution

• Therefore, one often tests for age differences between A and B, e.g., based on thetwo-sample t-test.

• The hypothesis tested is

H0 : µA = µB versus HA : µA 6= µB

• Note that H0 and HA express properties of the populations, not the samples

EANS: Use, misuse, and abuse of statistics 25

• In the populations, we know that, due to the randomization, µA and µB are identical

• Conclusion:

It makes no sense at all to perform baseline testsin randomized studies

• No matter how small the resulting p-value would be (e.g., < 10−8) we know that theobserved difference in age between groups A and B has occurred purely by chance.

• Note also that testing for baseline differences cannot be used to check whether therandomization was done properly.

EANS: Use, misuse, and abuse of statistics 26

3.5 Example from the biomedical literature

Nissen et al. [5], abstract and table 1:

A two-arm randomized study

EANS: Use, misuse, and abuse of statistics 27

formal tests at baseline

EANS: Use, misuse, and abuse of statistics 28

3.6 Equivalence tests

• Suppose two groups A and B are to be compared, and an unpaired t-test is used to test

H0 : µA = µB versus HA : µA 6= µB

• In case of a non-significant test result, one often concludes that both groups areidentical or equivalent

• An alternative interpretation is that the experiment did not have sufficient power toshow an effect which is present.

• Conclusion:

Non-significance should not be interpreted as equivalence

EANS: Use, misuse, and abuse of statistics 29

• This can also be seen from the fact that, if the t-test could be used to showequivalence, it would be best to collect data on (extremely) small samples, as thiswould increase the chance to obtain an non-significant result, due to lack of power.

• Instead, one should reverse H0 and HA:

H0 : |µA − µB| > ∆ versus HA : |µA − µB| ≤ ∆

where ∆ is a pre-specified constant, defining ‘equivalence’

• The result of the equivalence test entirely depends on the choice of ∆

• Therefore, ∆ needs to be specified prior to the data collection

EANS: Use, misuse, and abuse of statistics 30

3.7 Example from the biomedical literature

Shatari et al. [6]:

• Title:

EANS: Use, misuse, and abuse of statistics 31

• Table 1:

No significantdifferences !

EANS: Use, misuse, and abuse of statistics 32

• Results and conclusions (abstract):

EANS: Use, misuse, and abuse of statistics 33

3.8 Significance versus relevance

• The power to detect some effect increases with the sample size

• This implies that any effect, no matter how small, will, sooner or later, be detected, ifthe sample is sufficiently large.

• For example, consider an experiment in hypertensive patients receive some treatment.

• The outcome of interest is change in BP:

BPbefore − BPafter

• Suppose that the observed difference would be 0.1 mmHg.

EANS: Use, misuse, and abuse of statistics 34

• A p-value as small as 0.001 would be likely to be obtained, provided that the samplewould be sufficiently large.

• Obviously, an average change in BP as small as 0.1 mmHg is not relevant from aclinical point of view.

• Conclusion:

Statistical significance 6= Clinical relevance

EANS: Use, misuse, and abuse of statistics 35

• A highly significant effect can be a large effect:

µ

0

[ ]

95% C.I. p = 0.0001

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

• A highly significant effect can also be a very small effect, but estimated with highprecision, due to a large sample size:

µ

0

[ ]

95% C.I. p = 0.0001

.

.

..

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

.

..

.

..

.

.

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

..

.

EANS: Use, misuse, and abuse of statistics 36

• The p-value cannot distinguish between both situations

• It is therefore important not to blindly overinterpret significant results withoutknowing the size of the effect

• This is another reason why confidence intervals are to be preferred over significancetesting

EANS: Use, misuse, and abuse of statistics 37

Chapter 4

Quiz

• Online voting tool ‘Poll Everywhere’: http://pollev.com/geertverbeke

iOS equipment

↑EANS: Use, misuse, and abuse of statistics 38

4.1 Question 1

A group of women is subdivided atdelivery into ‘Intrathecal analgesia’or ‘Systemic analgesia’.The table reports means and stan-dard deviations for both groups.Which statement is correct ?

ANSWER:(http://pollev.com/geertverbeke)

A. Correction for multiple testing is needed because three outcomes are tested

B. Correction only needed in case more than one sign. effect is observed

C. Correction only needed in case at least one sign. effect is observed

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/jxsMdbtAvUHGk8G?preview=true

EANS: Use, misuse, and abuse of statistics 39

4.2 Question 2

A publication contains thefollowing table with resultsfrom 6 different hypothesis tests.Which statement is correct ?

ANSWER:(http://pollev.com/geertverbeke)

A. Since only 2 tests are sign. no Bonferroni correction is needed

B. Bonferroni correction is needed since at least one test is sign.

C. No Bonferroni correction needed because no test is sign. at 0.05/6 significance level

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/yhANjCO9z4IuhP0?preview=true

EANS: Use, misuse, and abuse of statistics 40

4.3 Question 3

Two treatments A and B arecompared to placebo withfollowing results.Which treatment is themost promising ? Why ?

ANSWER:(http://pollev.com/geertverbeke)

A. A because most significant

B. A because smallest confidence interval

C. B because estimated treatment effect larger than for A

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/duBlG7zaaKV4oHq?preview=true

EANS: Use, misuse, and abuse of statistics 41

4.4 Question 4

An experiment is set up to compare two treatments, A and B.A difference is observed but is not significant.Which of the following statements is correct with absolute certainty ?

ANSWER:(http://pollev.com/geertverbeke)

A. The study does not have sufficient power

B. The experiment was too small

C. The variability in the data was too large

D. Maybe there is no difference between A and B in the population

https://www.polleverywhere.com/multiple_choice_polls/wP6yN6F101WdFOP?preview=true

EANS: Use, misuse, and abuse of statistics 42

4.5 Question 5

What is a potential danger in a (very) large implementation study ?

ANSWER:(http://pollev.com/geertverbeke)

A. The null hypothesis will too often be incorrectly rejected

B. Effects too small to be relevant can be highly significant

C. The null hypothesis will too often be correctly accepted

D. None of the above

https://www.polleverywhere.com/multiple_choice_polls/YNjQJeqOIs90aKA?preview=true

EANS: Use, misuse, and abuse of statistics 43

Chapter 5

Clustered data

. Data set

. Naive analysis

. Correct analysis

. Other examples

EANS: Use, misuse, and abuse of statistics 44

5.1 Data set: Washing without water

• Schoonhoven et al. [7]

• Comparison of traditional washing (soap & water) with the use of disposable washgloves, made of non-woven material, saturated with quickly vaporizing cleaning &caring lotions

• Nursing home residents requiring bathing by nurses

• 56 nursing home wards (±500 residents) randomized:

. Usual Care (UC: traditional bathing)

. Washing without water (WWW)

EANS: Use, misuse, and abuse of statistics 45

• Exclusion: In bath or shower > 1 day/week

• Outcome of interest is ‘Completeness of assisted bathing (1/0)’after 4 weeks post randomization

• Correction for dementia (1/0)

• Other covariates (age, gender, Barthel index, BMI, skin damage, . . . ) explored as well

EANS: Use, misuse, and abuse of statistics 46

5.2 Naive analysis

• Logistic regression with factors ‘intervention’ and ‘dementia’

• Results:

Effect OR 95% C.I. p-value

Intervention: WWW 4.739 [3.155; 7.143] <0.0001

UC

Dementia: NO 1.508 [1.005; 2.268] 0.0475

YES

• Bathing completeness more likely . . .

. . . . in WWW intervention group

. . . . in non-demented residents

EANS: Use, misuse, and abuse of statistics 47

5.3 Correct analysis

• Analysis did not account for the variability between wards w.r.t. proportion of residentswith complete bathing

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� �

� �

� �

� �

� � ���� ������� � �

� � � � � � � � � � � � � ! " � " # $ � % � � &

EANS: Use, misuse, and abuse of statistics 48

• Variability implies residents from one ward to be more alike than residents fromdifferent wards

=⇒ Correlated data

• This correlation should be accounted for in the statistical analysis

=⇒ Mixed models

EANS: Use, misuse, and abuse of statistics 49

• Corrected results:

Naive Correct

Effect OR 95% C.I. p-value OR 95% C.I. p-value

Intervention: WWW 4.739 [3.155; 7.143] <0.0001 12.821 [4.566; 35.714] <0.0001

UC

Dementia: NO 1.508 [1.005; 2.268] 0.0475 1.271 [0.883; 1.828] 0.1962

YES

• Conclusion:

Effects of covariates highly affectedby correlation witin clusters

EANS: Use, misuse, and abuse of statistics 50

5.4 Other examples

Clustering =⇒ Correlation

• Residents clustered within wards

• Patients clustered within hospitals

• Ophthalmology studies: Eyes within patients (−→ paired t-test)

• Longitudinal data: Repeated measurements within subjects

• . . .

EANS: Use, misuse, and abuse of statistics 51

Chapter 6

Missing observations

. Introduction

. Examples

. How to handle missing data ?

EANS: Use, misuse, and abuse of statistics 52

6.1 Introduction

• Complete data sets are rare in practice

• This implies loss of power, but more importantly may also imply biased results

• Problematic case:

Probability for an observation to be missingis related to the observation itself

• How to handle missingness in a data set ?

EANS: Use, misuse, and abuse of statistics 53

6.2 Examples

• Consider data from a longitudinal study with 20 subjects, measured at baseline andfollowed by 6 weekly visits:

' () *

+,-

.

/ .

0 .

1 .

2 .

3 4 5 6. / 0 1 2 7 8

9 : ; < = > ? > @ A ? A

EANS: Use, misuse, and abuse of statistics 54

• Due to dropout, not all subjects have been followed up to week 6:

B CD E

FGH

I

J I

K I

L I

M I

N O P QI J K L M R S

T U V W X Y W Z Z [ \ [

• Let us compare various common approaches to handle missingness,when interest is in estimation of the average trend

EANS: Use, misuse, and abuse of statistics 55

• Averaging the observed values at each visit:

] ^_ `

abc

d

e d

f d

g d

h d

i j k ld e f g h m n

o p q r s t r u u v w v

=⇒

Correct at visits without missing observations

Biased at visits with missing observations

EANS: Use, misuse, and abuse of statistics 56

• Averaging the values of the complete cases only:

x yz {

|}~

� �

� �

� �

� �

� � � �� � � � � � �

� � � � � � � � � � � � � � � � �

=⇒

Biased at visits without missing observations

Biased at visits with missing observations

EANS: Use, misuse, and abuse of statistics 57

• Averaging after last observation carried forward (LOCF):

� �� �

���

� �

� �

  �

¡ �

¢ £ ¤ ¥� � �   ¡ ¦ §

¨ © ª « ¬ ­ ® ¯ ° ± ° ¬ ² ³

=⇒

Biased at visits with missing observations

Distorted association structure (→ p-values)

EANS: Use, misuse, and abuse of statistics 58

• Averaging after mean imputation:

´ µ¶ ·

¸¹º

»

¼ »

½ »

¾ »

¿ »

À Á  û ¼ ½ ¾ ¿ Ä Å

Æ Ç È É Ê Ë Ì Í Î È Î Ê Ï É

=⇒

Biased at visits with missing observations

Distorted association & variance structure (→ p-values)

EANS: Use, misuse, and abuse of statistics 59

6.3 How to handle missing data ?

• No uniformly best answer:

. Depends on nature of missingness

. Depends on outcome type

. Depends on research question

. Depends on model considered

. . . .

• All methods rely on assumptions about the relation between the probability for anobservation to be missing and the observation itself

=⇒ Untestable assumptions

EANS: Use, misuse, and abuse of statistics 60

• Multiple imputation (M = 5 imputations):

Observeddata

.................................................................................................................................................................................................................................................................................................................................................

........................................................................................

........................................................................................

..........................................

........................................

....................................................................................................................................................................................... ........................................

.......................................................................................................................................................................................................................... ........................................

......................................................................................................................................................................................................................................................................................................... ........................................

Imputed 1

Imputed 2

Imputed 3

Imputed 4

Imputed 5

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

Results 1

Results 2

Results 3

Results 4

Results 5

......................................................................................................................................................................................................................................................................................................... ........................................

.......................................................................................................................................................................................................................... ........................................

....................................................................................................................................................................................... ........................................

.......................................................................................

........................................................................................

...........................................

........................................

.................................................................................................................................................................................................................................................................................................................................................

Finalresults

...........................

..........................

...........................

...........................

..................................................

....................................

....................................................................................

.....................................................................................................................................................................

.........................................................................................................................................................................................................................................................

Imputation CombinationAnalysis

EANS: Use, misuse, and abuse of statistics 61

• Advantages:

. Correctly accounts for uncertainty about imputed values

. Imputation can be based on observed information (covariates, outcomes)

. Expert opinion

. Various imputation models can be explored (−→ sensitivity analyses)

. Relatively straightforward to implement

• Often, a small number M of imputations is sufficient (M = 3, 5)

• Alternative approaches possible, but less generally applicableand/or more difficult to implement

EANS: Use, misuse, and abuse of statistics 62

The End !

EANS: Use, misuse, and abuse of statistics 63

Bibliography

[1] A.I. Amin, O. Hallbook, A.J. Lee, R. Sexton, B.J. Moran, and R.J. Heald. A 5-cm colonic j pouch colo-anal reconstruction following anteriorresection for low rectal cancer results in acceptable evacuation and continence in the long term. Colorectal Disease, 5:33–37, 2003.

[2] S. Kaplan, S. Etlin, I. Novikov, and B. Modan. Occupational risks for the development of brain tumours. American Journal of Industrial Medicine,31:15–20, 1997.

[3] Y. Baba, J.D. Putzke, N.R. Whaley, Z.K. Wszolek, and R.J. Uitti. Gender and the parkinson’s disease phenotype. Journal of Neurology,252:1201–1205, 2005.

[4] K.M. Kellett, D.A. Kellett, and L.A. Nordholm. Effects of an exercise program on sick leave due to back pain. Physical Therapy, 71:283–293,1991.

[5] S.E. Nissen, E.M. Tuzcu, P. Schoenhagen, et al. Statin therapy, LDL cholesterol, C-reactive protein, and coronary artery disease. The New

England Journal of Medicine, 352:29–38, 2005.

[6] T. Shatari, M.A. Clark, T. Yamamoto, A. Menon, C. Keh, J.Alexander-Williams, and M. Keighley. Long strictureplasty is as safe and effective asshort strictureplasty in small-bowel crohn’s disease. Colorectal Disease, 6:438–441, 2004.

[7] L. Schoonhoven, B.G. van Gaal, S. Teerenstra, E. Adang, C. van der Vleuten, and T. van Achterberg. Cost-consequence analysis of “washingwithout water” for nursing home residents: A cluster randomized trial. International Journal of Nursing Studies, 52:112–120, 2015.

EANS: Use, misuse, and abuse of statistics 64