Post on 24-Jun-2020
False dichotomies and health intervention
research designs: Randomized trials are not
always the answer
Centre for Big Data Research in
Health Seminar Series
University of New South Wales
Friday 3rd November 2017
Professor Stephen Soumerai
Harvard Medical School and
Harvard Pilgrim Health Care Institute
Department of Population Medicine
Source: Soumerai SB et al. J Gen Intern Med. 2017 Feb;32(2):204-209.
“Information in administrative data
sets is spurious by default.”
John Ioannidis
Source: Ioannidis, JP. JAMA. 2013;309(13):1410-1.
Background
This statement prolongs the polarizing “all or
nothing” debate on “available data”
Administrative data are not always spurious
RCTs: the “gold standard”
Usually infeasible in “natural
experiments”
Study endpoints can be manipulated
Patients, are often not generalizable
• May not be blind to treatment
RCTs are not useful for most policy
interventions
Most national health policies, e.g., High
deductibles, Copays, Pay for Performance,
ACOs
Seat belt laws, Speed Limits
Banning medical technologies (e.g. drugs)
Opioid and antibiotic controls
Smoking regulations, etc.
What do we mean by “False
Dichotomies”
RCTs vs “everything else”
Ignores Quasi-Experiments
Campbell & Stanley (1963)
• Revised in 1979 and 2002
• Three main categories
–RCTs
–Strong quasi-experiments
–Weak “pre-experiments”
Hierarchy of Strong and Weak Designs:
Capacity to Control for Biases
Strong Design: Often Trustworthy Effects
Intermediate Design: Sometimes
Trustworthy Effects
Weak Designs: Rarely Trustworthy Effects
(No Controls for Common Biases.)
Hierarchy of Strong and Weak Designs:
Capacity to Control for Biases
Strong Design: Often Trustworthy Effects
Multiple RCTs The “gold standard” of evidence,
incorporating systematic review of all
studies.
Single RCT A single, strong randomized
experiment, but sometimes not
generalizable
Interrupted time
series with control
series (CITS)
Baseline trends often allow visible
effects and control for biases. Two
controls.
Hierarchy of Strong and Weak Designs:
Capacity to Control for Biases
Intermediate design: Sometimes Trustworthy Effects
Single ITS Controls for trends, but no comparison.
Before and after
with comparison
group
Pre-post change using two single
observations. Comparability of baseline
unclear.
Weak Designs: Rarely Trustworthy Effects (No Controls)
Uncontrolled
pre-post
Single observations before and after
intervention, no baseline or control
group.
Cross-sectional
designs
Simple correlation, no baseline, no
measure of change.
intervention intervention
Different Effects That Can Be
Observed in Time Series
before
after
before
after
intervention
beforeafter
intervention
before
after
Times series effects of drug benefit limits and cost sharing on the
average number of prescriptions per pt per month among
noninstitutionalized, chronically ill New Hampshire pts (n=860) and
other pts (n=8002). Soumerai, S et al. N Engl J Med. 1987;317(15)
Single ITS: Sometimes Trustworthy
Another ITS: Sometimes Trustworthy
Rates of antidepressant use and psychotropic drug poisoning per quarter before
and after the warnings among young adults (18-29) enrolled in 11 health plans in
nationwide Mental Health Research Network. Source: Lu CY et al. BMJ. 2014 Jun 18;348:g3596.
ITS ~200 years ago: Puerperal fever monthly mortality rates at Vienna Maternity
Institution 1841-1849. Rates drop when implementing handwashing.
Source: Semmelweis I (1861). Die Aetiologie, der Begriff und die Prophylaxis des Kindbettfiebers. [The
etiology, concept, and prophylaxis of childbed fever]. Budapest and Vienna: Hartleben.
A strong interrupted time-series design debunked IHI’s claim of
lives savedSource: AHRQ. Statistics on hospital stays. Accessed May 26, 2015.
Without baseline data the press
hyped the findings
AP headline 2008: “Campaign against
hospital mistakes says 122,000 lives
saved”
“A campaign to reduce lethal errors and
unnecessary deaths… has saved an estimated
122,300 lives in the last 18 months….”
“We in health care have never seen or
experienced anything like this,” said Dennis
O’Leary, president of JCAHO.”
Upper graph shows fatal and injurious crashes on Arizona interstate highways with the
increase to 65 MPH maximum speed limit. The lower graph indicates fatal and injurious
crashes on Arizona interstate highways with no change in the 55 MPH maximum speed
limit.
Source: Epperlein T. Arizona: Arizona Statistical Analysis Center; 1989.
ITS with Control Series
Objectives
Impacts of health system interventions
uncertain
Aim: Do the results of ITS differ from cluster
RCTs?
Results
ITS and RCTs were similar
• ITS with concurrent controls important
• Need to analyze baseline/follow-up trends
in cluster RCTs
Why RCTs should use controlled ITS
Ex: Dedicated chest pain unit, UK
Studied whether a chest pain unit
(CPU) would reduce hosp. admissions
14 hospitals randomized to establish a
chest pain unit, or not
90,000 visits with chest pain over 2 yrs
Source: Goodacre S et al. BMJ. 2007;335:659
Conventional Cluster (Diff-in-Diff)
RCT-perspective
0
50%
60%
70%
80%
10%
20%
30%
40%
Pro
po
rtio
n a
dm
itted
Before
Control CPU hospital
After
Reanalysis of data from Goodacre S et al. BMJ. 2007;335:659
Difference in admission rate between
intervention and control group
05
%1
0%
15%
20%
25%
DIffe
ren
ce in
adm
issio
n r
ate
(in
terv
en
tion
-con
tro
l)
0 5 10 15 20 25Months
Reanalysis of data from Goodacre S et al. BMJ. 2007;335:659
02
04
06
0
Nu
mb
er
of clin
ical pro
ble
ms a
dd
ed
to m
edic
al re
co
rd
5 10 15 20 25 30 35 40 45 50Week
Intervention group
Control group
Wright et al
The average effect size
only tells us half the
story.
ITS of RCT to Increase Reporting of
Clinical Decision Problems in EHR
Reanalysis of data from Wright A et al. J Am Med Inform Assoc. 2012;19(4):555-561
Without ITS, the decay to
no effect is not observable.
12
34
56
Fa
lls p
er
resi
de
nt-
yea
r
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Month
Intervention group
Control group
Kerse 2004
ITS of RCT Data: Fall Prevention in LTC
Large Baseline and Follow-
up Difference Hidden in
Conventional RCT. Not
interpretable.
Reanalysis of data from Kerse, N et al. J Am Geriatr Soc 2004;52(4)524-31
Summary of ITS of RCT Studies
Interrupted time series analysis is valuable in
evaluation of health systems and policy
interventions
• When RCTs are not feasible
• In the analysis of data from cluster RCTs
Important information may be lost if cluster
RCTs do not consider changes over time
Conclusions
Research design is the first consideration in
addressing trustworthiness of research.
Medical and graduate schools should
emphasize weaknesses of uncontrolled or
cross-sectional designs and include stronger
research designs.
Well controlled studies can save lives, while
weak designs promote wasteful programs,
and jeopardize public health.