Some help understanding and interpreting medical...

Some help understanding and interpreting

medical statistics …

John Norrie

Centre for Healthcare Randomised Trials (CHaRT)

University of Aberdeen, Scotland

Barcelona, June, 2013

www.eurordis.org

4

Acknowledgment

Thanks to Ferran Torres, Josep Torrent y Farnell and Julia Saperia for creating the previous versions of this workshop and providing the material that this version is based on.

Understanding and Interpreting medical statistics

http://ferran.torres.name/docencia/eurordis

5

Outline

• Why statistics

• Descriptive statistics

• Population and samples

• Hypothesis Tests and P-values

• Statistical “vs” clinical significance

• Statistical errors (a and b)

• Sample size

• Estimation of treatment effect: Confidence Intervals

• A worked example


6

What does statistics mean to you?

mean

information

percentage

variability data

median

average

graphs

measurement

probability

confidence

evidence

spread


7

Why Statistics?

• Statistics is the science of uncertainty – it is all about

investigating, understanding, and allowing for VARIABILITY


8

Variability


9

Why Statistics?

• Medicine is a quantitative science but not exact

– Not like physics or chemistry

• Variation characterises much of medicine

• Statistics is about handling and quantifying variation and

uncertainty

• Humans differ in response to exposure to adverse effects

Example: not every smoker dies of lung cancer

some non-smokers die of lung cancer

• Humans differ in response to treatment

Example: penicillin does not cure all infections

• Humans differ in disease symptoms Example: Sometimes cough and sometimes wheeze are presenting features for

asthma


10

Why Statistics Are Necessary

• Statistics can tell us whether events could have happened by

chance and to make decisions

• We need to use Statistics because of variability in our data

•Generalize: can what we know help to predict what will

happen in new and different situations?


11

Probability

• Probability is the methodological ‘glue’ that holds statistics together

• Given uncertainty is at the heart of statistical thinking, probability is our way of quantifying how likely various things that may happen will take place

• For example, a fair coin is tossed – in the long run, we would expect to get the same number of ‘heads’ as ‘tails’

If we threw the coin 10 times, intuitively we would think it much less likely to get 10 heads in a row (or for that matter ten tails) than say 5 heads and 5 tails (in any order) …

Why? Because there are lots of ways of getting 5 heads + 5 tails, only one way of getting 10 heads and no tails ….

• Probability is challenging – consider conditional probability …


12

draw a

sample

Population

sample

sample data

Inferential statistics

Draw conclusions for the

whole population based

on information gained

from a sample

Target of study design

representative sample

= ~ all "typical"

representatives are

included

Descriptive

statistics

(a population is a set of all

conceivable observations of a

certain phenomenon)


13

Population and Samples

Target Population

Population of the Study

Sample


14

Descriptive statistics (1)

• For continuous variables (e.g. height, blood pressure,

body mass index, blood glucose)

•Histograms

•Measures of location

–mean, median, mode …

•Measures of spread

–variance, standard deviation, range, interquartile

range, minimum, maximum


15

The histogram

•Taken from http://www.mathsisfun.com/data/histograms.html on 28 May 2012


http://www.mathsisfun.com/data/histograms.html

16

Johann Carl Friedrich Gauss 1777-1855

68,3%

~95,5%

~99%,

mean mean + sd mean + 2sd mean - sd mean - 2sd

Picturing “normal” data


17

Descriptive statistics (2)

• For categorical variables (e.g. sex, dead/alive,

smoker/non-smoker, blood group)

•Bar charts, (pie charts)…

• Frequencies, percentages, proportions

–when using a percentage, always state the number it

is based on

–e.g. 68% (n=279)

–or perhaps 190/279 (68%)


18

Memorable quotes

• 50% of what you learn about therapy in the next 5 years is wrong.

(The trouble is we don’t know which 50%) (Anon)

• “…in this world there is nothing certain but death and taxes.” Benjamin Franklin (1706-1790). (also said by Woody Allen)

• 86% of all statistics are invented on the spot (Huff – How to Lie with Statistics)

• “There are lies, damn lies, and statistics” Benjamin Disraeli (1804-1881)

• And now we add ‘… lies, damned lies, statistics, and government

statistics’ !!


19

Error ? …

Unreliable

or Imprecise

Reliable (precise) but

not valid (accurate)

Reliable

& valid


20

130 150 170

01 02 03 04 05

True Value

Random vs Systematic error

Random Systematic (Bias)

130 150 170

01 05

02 03

04

True Value

Example: Systolic Blood Pressure (mm Hg)


21

Valid samples?

Population

Likely to occur

Unlikely to occur Invalid Sample and Conclusions


22

HYPOTHESIS TESTING

• Testing two hypotheses H0: A=B (Null hypothesis – no difference)

H1: A≠B (Alternative hypothesis)

•Calculate test statistic based on the assumption that H0 is true (i.e. there is no real difference)

• Test will give us a p-value: how likely are the collected data if H0 is true

• If this is unlikely (small p-value), we reject H0


23 1 23

P-value

• The p-value is a “tool” to answer the question:

Could the observed results have occurred by chance*?

Remember:

– Decision given the observed results in a SAMPLE

– Extrapolating results to POPULATION

*: accounts exclusively for the random error, not bias

p < .05

“statistically significant”


24

A intuitive definition

• The p-value is the probability of having observed our data when

the null hypothesis is true

• Steps:

1) Calculate the treatment differences in the sample (A-B)

2) Assume that both treatments are equal (A=B) and then…

3) …calculate the probability of obtaining a magnitude of at least the

observed differences, given the assumption 2

4) We conclude according the probability:

a. p<0.05: the differences are unlikely to be explained by random,

we assume that the treatment explains the differences

b. p>0.05: the differences could be explained by random,

we assume that random explains the differences


25

P-value

•A “very low” p-value do NOT imply:

Clinical relevance (NO!!!)

Magnitude of the treatment effect (NO!!)

With n or variability

p

•Please never compare p-values!! (NO!!!)


26

P-value

•A “statistically significant” result

(p<.05)

tells us NOTHING about clinical or scientific

importance. Only, that the results were

unlikely to be due to chance.

A p-value does NOT account for bias

only for random error

STAT REPORT


27

Significance level

• p-values are compared to a pre-specified significance level, alpha

• alpha is usually 5% (analogous to 95% confidence intervals)

if p ≤ 0.05 reject the null hypothesis (i.e., the result is statistically significant

at the 5% level) conclude that there is a difference in treatments

if p > 0.05 do not reject the null hypothesis conclude that there is not

sufficient information to reject the null hypothesis

(See Statistics notes: Absence of evidence is not evidence of absence:

BMJ 1995;311:485)


28

Type I & II Error & Power

Reality

(Population)

A=B A≠B

Conclusion

(sample)

“A=B” p>0.05 OK Type II error

(b)

A≠B p<0.05 Type I error

(a) OK


29


• Type I Error (a)

False positive

Rejecting the null hypothesis when in fact it is true

Standard: a=0.05

In words, chance of finding statistical significance when in fact there truly

was no effect

• Type II Error (b)

False negative

Accepting the null hypothesis when in fact alternative is true

Standard: b=0.20 or 0.10

In words, chance of not finding statistical significance when in fact there

was an effect


30


• Power

1-Type II Error (b)

Usually in percentage: 80% or 90% (for b =0.1 or 0.2, respectively)

In words, chance of finding statistical significance when in fact there is an

effect

Reality

(Population)

A=B A≠B

Conclusion

(sample)


(b)


(a) POWER


31

Which sorts of error can occur when performing statistical tests?

H0 H1

H0 Correct

1-a

False negative

b

H1 False positive

a

Correct

1-b

Test

deci

sion

Hypoth

esi

s acc

epte

d

True state of nature

Significance level a is usually predetermined in the study protocol, e.g.,

a=0.05 (5%, 1 in 20), a= 0.01 (1%, 1 in 100), a= 0.001 (0.1%, 1 in 1000)

POWER 1- b : The power of a statistical test is the probability that the test will reject

a false null hypothesis.

Statistical mistakes/errors


32

Compare a statistical test to a court of justice

H0 Innocent

H1 Not Innocent

H0

Correct

1-a

False negative (i.e. guilty but not caught)

b

H1 False positive (i.e. innocent but convicted)

a

Correct

1-b

Test

deci

sion

Deci

sion o

f th

e c

ourt

True state of nature

False positive: A court finds a person guilty of a crime that they did not actually commit.

False negative: A court finds a person not guilty of a crime that they did commit.

Judged

"innocent"

Judged

“Not in

nocent"

Statistical tests vs court of law


33

Minimising statistical errors

Remember:

How do I gain adequate data?

Thorough planning of studies

• Defining acceptable levels of statistical error is key to the planning

of studies

• alpha (in clinical trials) is pre-defined by regulatory guidance

(usually)

• beta is not, but deciding on the power (1-beta) of the study is

crucial to enrolling sufficient patients

• the power of a study is usually chosen to be 80% or 90%

• conducting an “underpowered” study is not ethically acceptable

because you know in advance that your results will be inconclusive


34

Sample Size

• The planned number of participants is calculated on the basis

of:

Expected effect of treatment(s)

Variability of the chosen endpoint

Accepted risks in conclusion

↗ effect ↘ number

↗ variability ↗ number

↗ risk ↘ number


35

Sample Size


of:






↗ risk ↘ number

ALTURA

202.5

197.5

192.5

187.5

182.5

177.5

172.5

167.5

162.5

157.5

152.5

147.5

142.5

137.5

132.5

127.5

122.5

ALTURA

Fre

cu

en

cia

300

200

100

0

Desv. típ. = 25.54

Media = 165.1

N = 2000.00

ALTURA

220.0

210.0

200.0

190.0

180.0

170.0

160.0

150.0

140.0

130.0

120.0

110.0

ALTURA

Fre

cu

en

cia

300

200

100

0

Desv. típ. = 26.94

Media = 165.0

N = 2000.00

ALTURA

250.0

240.0

230.0

220.0

210.0

200.0

190.0

180.0

170.0

160.0

150.0

140.0

130.0

120.0

110.0

100.090.0

80.0

ALTURA

Fre

cu

en

cia

120

100

80

60

40

20

0

Desv. típ. = 32.27

Media = 165.1

N = 2000.00


36

Sample Size


of:






↗ risk ↘ number

Reality

(Population)

A=B A≠B

Conclusion

(sample)


(b)


(a) POWER


37

Deciding how many patients to enrol

• the sample size calculation depends on:

– the clinically relevant effect that is expected (1)

– the amount of variability in the data that is expected (2)

– the significance level at which you plan to test (3)

– the power that you hope to achieve in your study (4)

• if you knew (1) and (2), you wouldn’t need to conduct a study

picking your sample size is a gamble

• the smaller the treatment effect, the more patients you need

• the more variable the treatment effect, the more patients you need

• the smaller the risks (or statistical errors) you’re prepared to take,

the more patients you need


38

95% (or 99%) Confidence Intervals

• Confidence Intervals are ‘better’ than p-values…

This is statistical estimation rather than hypothesis testing

…use the data collected in the trial to give an estimate of the treatment

effect size, together with a measure of how certain we are of our

estimate

• CI is a range of values within which the “true” treatment effect is believed to be found, with a given level of confidence.

95% CI is a range of values within which the ‘true’ treatment effect will lie 95% of the time

A 99% CI is wider than a 95% CI …

• Generally, 95% CI is calculated as

Sample Estimate ± 1.96 x Standard Error


39

Interval Estimation

Confidence

interval

Sample statistic

(point estimate)

Confidence

limit (lower)

Confidence

limit (upper)

A probability that the population parameter

falls somewhere within the interval.


40

Superiority study

d > 0 + effect

IC95%

d = 0 No differences

d < 0 - effect

Test better Control better


41

Superiority study

d > 0 + effect

IC95%

d = 0 No differences

d < 0 - effect

Test better Control better


42

Standard Deviation

• The Standard Deviation is a measure of how spread

out numbers are.

• Its symbol is σ (the greek letter sigma)

• The formula is easy: it is the square root of the

Variance

The average of the squared differences from the Mean.


43

Calculating a variance

To calculate the variance follow these steps:

• Work out the Mean (the simple average of the numbers)

• Then for each number: subtract the Mean

• and then square the result (the squared difference).

• Then work out the average of those squared differences.


44

Example

You and your friends have just measured the heights of your dogs (in

millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.


45

First calculate the mean (or average) …

Mean = 600 + 470 + 170 + 430 + 300 =

1970 = 394

5 5


46

Next calculate the variance,

and then the standard deviation

To calculate the Variance, take each difference, square it, and then average the result:

So, the Variance is 21,704.

And the Standard Deviation is just the square root of Variance, so:

Standard Deviation: σ = √21,704 = 147.32... = 147 (to the nearest mm)


47

First calculate the mean (or average) …

And the good thing about the Standard Deviation is that it is useful.

Now we can show which heights are within one Standard Deviation (147mm) of the

Mean:

So, using the Standard Deviation we have a "standard" way of knowing what is

normal, and what is extra large or extra small.

Rottweilers are tall dogs. And Dachshunds are a bit short ...


48

Further reading

• www.consort-statement.org

standards for reporting clinical trials in the

literature • Statistical Principles for Clinical Trials ICH E9

useful glossary • http://openwetware.org/wiki/BMJ_Statistics_Notes_series

coverage of a number of topics related to

statistics in clinical research, mostly by Douglas

Altman and Martin Bland


http://www.consort-statement.org/



http://openwetware.org/wiki/BMJ_Statistics_Notes_series

Some help understanding and interpreting medical...

Documents

Transcript of Some help understanding and interpreting medical...