02.12.2014 - Sample Size Survival Analysis Common Issues in Data Analysis

Research Methodology

Statistics Lecture 6

Sample Size, Survival Analysis and Common Issues in Data Analysis

Rifat Hamoudi

Senior Lecturer

[email protected]

Outline

• Importance of Sample Size calculations - Precision and Power

• Sample size calculation for the difference of 2 means

• Sample size calculation for the difference of 2 proportions • Survival Analysis

• Common Issues to be aware of in Analysis

Intensive Rehabilitation Post Knee Replacement

• Over 79,000 knee replacement operations were undertaken in the NHS in England in 2011-12 • Following the operation, patients currently receive a short course of inpatient and community rehabilitation and advice to conduct simple exercises at home

• 20% of patients are not satisfied with the outcome of their knee replacement surgery which may in part be a result of inadequate rehabilitation

• Can the activity, independence and quality of life of patients undergoing knee replacement at high risk of a poor outcome be improved using an intensive rehabilitation program?

• Standard care versus Intense rehabilitation intervention consisting of: - Multi-disciplinary team - Intensive physiotherapy - Technology-assisted information (e.g. tablet based app, DVD)


• DESIGN: A multi-centre randomised controlled trial of intensive rehabilitation program versus usual rehabilitation following knee replacement

• PRIMARY OUTCOME : The change in the Western Ontario and McMaster Universities Arthritis Index (WOMAC) function domain score between pre and 2 years post-op (range 0-100)

• How many people do we need to recruit into this study? And why does this matter?


Why is it Important to Consider Sample Size?

• To ensure your study will provide useful information, specifically:

1. Estimates which are precise

2. Hypothesis tests that can detect important effects



- Sample size calculation ensures an estimate has adequate precision

Prevalence of 10%, sample size of 20 → 95% CI: 1% to 31% NOT VERY PRECISE!!!! Prevalence of 10%, sample size of 400 → 95% CI: 7% to 13% FAIRLY PRECISE

Precise Imprecise

Precise Estimates • Recall: A 95% Confidence interval gives a range of values in which we

are fairly sure includes the true population parameter 95% of the time

• A 95% confidence interval for (mean) is calculated as follows:

± 1.96 x SE( ) Where SE =

• Larger sample size = Smaller standard error → Higher precision &

Narrower confidence intervals

• The more people in our sample, the more precise our estimate

nSDx x

x

• Example:

We want a precise estimate of the mean difference in the 2 year post-op change in WOMAC score between the intensive rehabilitation and control group




2. Hypothesis tests that can detect important effects

Recap of Hypothesis Tests • Define null and alternative hypothesis under study (H0 and HA)

• Collect relevant data from a sample of individuals

• Calculate the appropriate test statistic specific to the null hypothesis

• Compute the probability of obtaining your observed results or something more extreme when the null hypothesis is true i.e P-value

• Interpret the P-value and present results. The smaller the P-value the stronger the evidence against the null hypothesis.

Hypothesis Tests

• Make one of two decisions:

- Reject the null hypothesis (usually if p<0.05) - Do not reject the null hypothesis

Errors in Hypothesis Tests • Two sources of error:

Type I, α : Incorrect rejection of the null hypothesis REJECT THE NULL HYPOTHESIS WHEN IT IS TRUE

Type II, β: Incorrect non-rejection of the null hypothesis

DO NOT REJECT THE NULL HYPOTHESIS WHEN IT IS FALSE

• Specifying values such as ‘p=0.05 is significant’ means you are willing to have a type I error rate of 5%. This is the significance level of the test.

Power of the Test • ‘Power’ is the probability of detecting a true result, given

that it exists

• The probability of rejecting the null hypothesis when it is false

• Power = 1 - type II error = 1 - β = 1 - Incorrect non-rejection of the null hypothesis

Power of the Test

Truth

Difference No difference

Test

Difference True

positive (power)

False positive (Type I error)

No difference

False negative (type II error)

True negative

• Helpful to think of this as a contingency table:

Power

• Ideally we would like power = 100%

• The probability of rejecting the null hypothesis when it is false

to be 100%

• But this is impossible! There is always chance of making a type II error → Not rejecting the null hypothesis when it is false.

Power and Sample Size • Power increases with increasing sample size

• A larger sample has greater ability than a small sample to

detect a clinically important effect if it exists

• When a sample size is very small, the test may have inadequate power to detect a particular effect = wasted resources!

• Sample size calculations ensure enough power to detect important effects as statistically significant

Example: We want a high probability of finding a meaningful difference

in the change in WOMAC score between the intensive rehabilitation and control group given it actually exists


Sample Size

• On the other hand if the sample size is unduly large the study may be:

- Unnecessarily time consuming - Expensive - Unethical

• How do we ensure we have an appropriate sample size then?

• We use a sample size calculation to ensure we have enough, but not too many patients. It will give the appropriate amount of samples required.

What values are required to determine sample size?

Suppose you are planning a two group trial with a proposed hypothesis test. Sample size will depend on 4 things that are required for calculation:

1. Assuming there is a true underlying difference, how certain do

you want to be of detecting this? Power - E.g. 90%. 2. What significance level is difference criterion? The cut off below

which we will reject the null hypothesis E.g. p=0.05. 3. Clinically important effect size wish to detect in the test 4. Variability of the outcome of interest, i.e. the standard deviation

if we have a numerical value.

CANNOT DO A SAMPLE SIZE CALCULATION WITHOUT ALL 4!!!

Sample Size - 3. Clinically Important Effect Size

• Smallest effect which would be considered clinically or biologically important - The magnitude of the effect which we do not want to overlook.

• Most often clinically important difference in means we wish to detect (or difference in proportions we wish to detect)

• Consider: What would the effect need to be before you and your colleagues would adopt the new treatment?

E.g. Mean pain improvement (difference of 20 out of 100 on

VAS score between group 1 and group2 would be considered clinically important.)

Sample Size - 4. Variability

• How variable is the outcome you are investigating?

• Providing an estimate of variability (SD) before you have collected the data gives the greatest difficulty.

• Use information from published studies with similar outcomes

• Use data from Pilot Study

• Note: Sample size calculation is just an approximation. Use best information available to provide estimates.

Sample size for comparing two independent group means -

Methodology

• Use a general formulae for calculating sample size

• Using: Desired power, 1 - β Desired significance level, α Smallest clinically important difference you wish to detect Standard Deviation (sd) of outcome

Sample size for comparing two independent group means

d – smallest difference we wish to detect as significant between the standard treatment and new treatment response

sd – standard deviation of response α – significance level β – type II error (1-power) patients per group = f(α,β) x 2sd2 (d)2

Sample size for comparing two independent group means

• f(α,β) is a function of significance level (α) and power • f(α,β) = 7.85 or 10.5 for 80% or 90% power respectively

Significance (risk of type I error) set at 5%

• Other often used values of f(α,β)

β (Type II Error)

0.05 0.10 0.20 0.50

α (Type I Error)

0.10 10.82 8.56 6.18 2.71

0.05 12.99 10.5 7.85 3.84

0.02 15.77 13.02 10.04 5.41

0.01 17.81 14.88 11.68 6.64


To determine how many patients required in the study we must specify: 1. Desired power 2. Desired significance level, α 3. Smallest clinically important difference in the 2 year change in WOMAC function score to detect 4. Standard Deviation of the 2 year change in WOMAC function score


To determine how many patients required in the study we must specify: 1. Desired power = 90% 2. Desired significance level, α = 0.05 or 5% 3. Smallest clinically important difference in the 2 year change in WOMAC function score to detect 4. Standard Deviation of the 2 year change in WOMAC function score

• Angst F, Aeschlimann A, Michel BA, Stucki G. Minimal clinically important rehabilitation effects in patients with osteoarthritis of the lower extremities. JRheumatol. 2002 Jan;29(1):131-8.

• MCID = 8 • SD = 22


- 90% power - Significance = 0.05 -Clinically meaningful difference we wish to detect is 8 - sd = 22

patients per group = f(α,β) x 2sd2 (d)2

= 10.5 x (2x222) (8)2

= 158.8 ~ 159 → 318 patients overall need to be recruited (comparing 2 groups)

f(α,β) = 10.5 for 90% power and type I error of 0.05


Required number of patients increases as:

• Clinically important difference, d decreases

• Standard deviation, sd, increases

• Significance p-value decreases

• Power increases

Further Considerations • Inflation of sample size to account for possible losses to follow-up

• If drop out rate is believed to be r % then the adjusted sample size is obtained by multiplying the unadjusted sample size, N, by 100/(100-r)

• Example: Based on a similar study which had a drop out rate of r=15%, for our example ( previously estimated N=318) the adjusted sample size is obtained by:

318*100/(100-15) =374.1~376 (188 per group)

Note: Always round up

Studies Which Are Too Small • Unlikely to produce a conclusive result

• Will not detect realistic, moderate treatment effects which

would be clinically important

• Estimate treatment effect imprecisely

• Far too common - Misleading for medicine and further research

• More likely to lead to publication bias: A large trial should publish whatever the result, but a small one may only do so if the result is sensational

Power Statement

• You should include a power statement in a study protocol or in the methods section of a paper

• Shows careful thought has been given to the sample size at the design stage of the investigation and that the study has sufficient power

• Typical statement: To detect a difference of 8 in the two year change in WOMAC functional domain score (SD=22) with 90% power and 5% significance, taking into account a 15% drop out rate, 188 patients in each group are required.

Survival Analysis

• In medical research survival analysis is concerned with the analysis of data in the form of time until some end-point, or event • Historically, the end-point was often death, but now survival analysis more broadly encompasses more general events • Survival analysis vs. time to event analysis

Survival Analysis

Examples of survival or time to event data: • Time to death after entry to a clinical trial • Time to death following a Myocardial Infarction • Time to diagnosis of cancer following the acquisition of genetic mutation • Time to relief of pain after taking an analgesic

Survival Analysis

Special Features of Survival Data • Data often non-normal and highly skewed • Observations may be censored * End-point not observed for some individuals * Actual survival time is greater than censored survival time leading to right censored survival times * This is the most common form of censoring, and the lecture will focus on this

Survival Analysis

AML chemotherapy survival data

SPSS: Analyze -> Survival -> Kaplan Meyer

Survival Analysis

Survival Analysis Complications caused by right censoring, but the empirical survivor function is easily generalised via: Kaplan-Meier (or product limit) estimate

Dong et al, 2011

Survival Analysis

Dong et al, 2011

Survival Analysis

Poulogiannis et al, 2010

Comparing different grades of colorectal cancer (Duke stage) with genetic markers

Common Issues in Data Analysis

Missing Data

• Missing data is common to almost every dataset

• A common strategy to dealing with missing data is to ignore it, by only analysing observations with complete data (called a complete case analysis)

• Gives valid results if missing data is missing completely at random (MCAR) – does not depend on the missing values or any observed values

Missing Data • If this is not the case a complete case analysis can give biased

and incorrect results!

• Example: want to know whether a new drug reduces the odds of mortality compared to the standard treatment

• Randomised 200 patients to each arm (N = 400)

• 50 patients dropped out of the study (lost to follow-up) in the new treatment arm, 10 patients dropped out in the standard treatment arm

Missing Data

• Ignoring drop-out, our results are shown in the table (N = 340)

• OR comparing the standard treatment to the new treatment, in terms of mortality: 0.80

• Conclusion: the new treatment reduces mortality by 20% compared to the standard treatment

Standard New

Survived 97 85

Died 93 65

Missing Data

• What assumptions have we made by ignoring the 60 patients who were lost to follow-up?

• MCAR - We have assumed that the patients who dropped out randomly dropped out and the reason they dropped out was not related to their outcome (i.e. mortality)

• Is this likely to be true?

• No – patients often drop out of trials when they are not improving or begin to do worse

Missing Data

• What if there was 70% mortality among patients who dropped out?

• Standard treatment: 7/10 dropouts died

• New treatment: 35/50 dropouts died

• Result: OR = 1.00 No difference between

treatment groups in terms of mortality

Standard New

Survived 100 100

Died 100 100

Missing Data

• Ignoring missing data can lead to incorrect results!

• This is true for missing data in both outcome and explanatory variables

• If we have less than 5% missing data then a complete case analysis (where we ignore observations with missing data) will probably be ok and/or if data is missing completely at random

• It is therefore important to ensure as little missing data as possible → Investigate and chase up missing data entries

Clustered Data • Most statistical analyses assume independence between

observations

• In many situations data is clustered

• With clustered data, the assumption of independence is violated

Clustered Data

• Examples of clustered data: - Multiple measurements on the same people - Patients at the same centre in a multi-centre study - Children in the same classroom - Children in the same school

• These data cannot be analysed using the methods we have

discussed so far

Clustered Data • Simple analysis option:

• Aggregate Level Analysis - Base analysis on a suitable summary measure for each unit at the cluster level

• Typical summary measures: - Mean (e.g average left and right measures) - Maximum value - Minimum value • The choice of summary measure depends on the purpose of the

study

• Point to Remember: If conducting an analysis at the aggregate level can only conclude at the aggregate level!!

Clustered Data

• Example: A trial of exercises in people with Parkinson Disease (PD) was carried out. 70 patients were randomised to exercises and 72 to control. Quality of life was measured using the SF-36 (Short Form health survey) at 8 weeks, 16 weeks and 6 months.

Analysis cannot compare all the exercise SF-36 values to all the control

SF-36 values. Repeat measurements on the same participant (8 weeks, 16 weeks & 6 months) will be correlated.

Simple Aggregate Analysis Options: - Compare mean SF-36 for each person (SF-36 8 wk + SF-36 16 wk + SF-36 6mo)/3 - Compare minimum SF-36 for each person - Choose 6 months as the primary end point and compare SF-36 6 month measures for each person

Clustered Data • If we ignore clustering results will be incorrect, assumptions

violated: - Our analysis thinks we have more information than we

really do - Estimates will be too precise, confidence intervals will be

too narrow, and p-values will be too small

• You will find significant associations between variables where none exists!

• Carefully consider the structure of your data

Variables on the Causal Pathway

• In many analyses we want to adjust for potential confounders

• When adjusting for explanatory variables, we need to be careful to ensure they are not on the causal pathway

Variables on the Causal Pathway • Example: does a new drug reduce the risk of stroke in

patients with high blood pressure?

• The drug works by reducing blood pressure, which in turn reduces the risk of stroke

• In this example, blood pressure is on the causal pathway between the new drug (explanatory variable) and risk of stroke (outcome variable)

http://www.google.co.uk/imgres?imgurl=http://www.healthcentral.com/common/images/w/WTS07800_49821_5.JPG&imgrefurl=http://www.healthcentral.com/acid-reflux/find-drug-49825-73.html&usg=__qkyMUrZ2aPSERiyP_M9Gc3ah_58=&h=216&w=288&sz=25&hl=en&start=12&zoom=1&tbnid=b9Yft5PYCinokM:&tbnh=86&tbnw=115&ei=HEyDTuzaM5O28QOg1uwJ&prev=/search?q=drug+tablet&hl=en&gbv=2&tbm=isch&itbs=1�

http://www.google.co.uk/imgres?imgurl=http://blog.ukmedix.com/wp-content/uploads/2011/07/high_blood_pressure.jpg&imgrefurl=http://blog.ukmedix.com/blood-pressure/&usg=__86NFOCL6m08rNcVnSLySxhm0b3M=&h=283&w=300&sz=57&hl=en&start=1&zoom=1&tbnid=IYGIzLJvFLAOkM:&tbnh=109&tbnw=116&ei=QUyDTsnfHoq08QP0oPQJ&prev=/search?q=blood+pressure&hl=en&gbv=2&tbm=isch&itbs=1�

http://www.google.co.uk/imgres?imgurl=http://3.bp.blogspot.com/_IZ8y8vnEu0Q/TOZT8m3-oCI/AAAAAAAAAIQ/tNWfiMyAlQc/s1600/5858-Hospital-Patient-In-A-Bed-A-Fish-In-His-IV-Container-Clipart-Illustration.jpg&imgrefurl=http://literaryrr.blogspot.com/2010_11_01_archive.html&usg=__XC2a931DaICRy4RfK-UXaARCF2o=&h=450&w=361&sz=75&hl=en&start=16&zoom=1&tbnid=_X5RNMenxNKuPM:&tbnh=127&tbnw=102&ei=WEyDTvDNI8Kr8QO1x4gK&prev=/search?q=patient&hl=en&gbv=2&tbm=isch&itbs=1�

Variables on the Causal Pathway • What happens if we adjust for patients’ blood pressure in our

analysis?

• Y = constant + b1*treatment + b2*blood pressure

• b1 is the effect of treatment after adjustment for blood pressure

• The interpretation of b1 is the change in risk/odds for a stroke for patients with the same blood pressure

• BUT this drug works by reducing blood pressure

Variables on the Causal Pathway • When we fit this model, the question we are asking

becomes “does our new drug reduce the risk of stroke when it doesn’t reduce blood pressure?”

• This is not the question we are interested in!

• Results will show the new drug doesn’t work, even if it does

• Carefully consider variables before you adjust for them. Do not adjust for a variable if it is on the causal pathway.

Assumptions

• Every method of analysis discussed so far makes assumptions – E.g. The paired t-test assumes normally distributed

differences in the outcome – One-way ANOVA assumes normally distributed data and

constant variance

When Assumptions Are not Met

• It is important to check that assumptions hold

• Always check appropriate assumptions

• Minor departures from assumptions are not a big problem but major departures will invalidate results

• Consider an alternative non-parametric test (a test without assumptions)

• Sensitivity analysis (ROC, Bland-Altman)

Key Points • Sample Size is vital to the strength of conclusions

• Ensures precision and adequate power

• To determine sample size for a two group trial you need to know 4

things: 1. Required power 2. Required significance level 3. The minimum clinically important difference you wish to detect 4. The variability of your outcome

• Real datasets come with many problems

• It is important that you’re aware of these problems -> if you ignore

them you may end up with incorrect results!

02.12.2014 - Sample Size Survival Analysis Common Issues in Data Analysis

Documents

Transcript of 02.12.2014 - Sample Size Survival Analysis Common Issues in Data Analysis