FINAL RESEARCH REPORT - PCORI
Transcript of FINAL RESEARCH REPORT - PCORI
PATIENT-CENTERED OUTCOMES RESEARCH INSTITUTE FINAL RESEARCH REPORT
Using a Bayesian Approach to Predict Patients’ Health and Response to Treatment Scott L. Zeger, PhD1; Zhenke Wu, PhD2; Yates Coley, PhD3; Anthony Todd Fojo, MD, PhD4; Bal Carter, MD4; Katherine O’Brien, MD1;
Peter Zandi, PhD1,4; Mary Cooke, DHA5; Vince Carey, PhD6; Ciprian Crainiceanu, PhD1; John Muscelli, PhD1; Adrian Gherman, MSE1;
Jason Mekosh, MBA5
AFFILIATIONS: 1Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 2School of Public Health, University of Michigan, Ann Arbor 3Kaiser Permanente Washington Health Research Institute, Seattle 4The Johns Hopkins University School of Medicine, Baltimore, Maryland 5Johns Hopkins HealthCare, Baltimore, Maryland 6School of Medicine, Harvard University, Boston, Massachusetts
Original Project Title: Bayesian Hierarchical Models for the Design and Analysis of Studies to Individualize Healthcare PCORI ID: ME-1408-20318 HSRProj ID: HSRP20153591
_______________________________ To cite this document, please use: Zeger SL, Wu Z, Coley Y, et al. (2020). Using a Bayesian Approach to Predict Patients’ Health and Response to Treatment. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/09.2020.ME.140820318
2
TABLE OF CONTENTS ABSTRACT .............................................................................................................................. 4
BACKGROUND ........................................................................................................................ 5
PATIENT AND STAKEHOLDER ENGAGEMENT ........................................................................... 7
METHODS ............................................................................................................................... 8
Research Design ......................................................................................................................... 8
Table 1. Applications of Bayesian Hierarchical Model for Individualized Health to 3 Case Studies ..................................................................................................................... 9
Case Studies ............................................................................................................................. 13
OSLER inHealth ........................................................................................................................ 18
RESULTS ................................................................................................................................ 19
Aim 1 ........................................................................................................................................ 19
Figure 1. Graphical Representation of Person-Time Levels of Bayesian Hierarchical Model for Individualizing Health Showing State Trajectory (η)a for a Single Individual (i) ........................................................................................................... 20
Figure 2. Graphical Representation of a Population (n = 4) Showing the Population Level in Which the Individual Specific Parameters Are Assumed to be Independent Realizations From a Distribution (b, q, f, η | X)a ........................................ 21
Figure 3. Decomposition Results for Gel Electrophoresis Assay Data .............................. 23
Aim 2 ........................................................................................................................................ 24
Figure 4. Graphical Displays of the Estimated Population and Individual Etiologies Using Data From a Site of the PERCH Study ..................................................................... 25
Figure 5. Alternative Analytic Approaches Used for Determining Pneumonia Etiology ............................................................................................................................. 27
Figure 6. Estimating the Etiologic Fraction From a Study With 2 Types of Measurements, 1 With Control Data, and Accounting for Imperfect Sensitivity of Both Measurements: The PIA Methoda ............................................................................ 28
Figure 7. All Site Etiology Results Among HIV-Uninfected Patients by CXR Findings: CXR+ Patients vs All Patients ............................................................................ 30
Figure 8. Directed Acyclic Graphs Describing the Relationships Between Latent Class and Clinical Outcomesa ............................................................................................ 32
Table 2. Model Summary With Priors Used for Application to Johns Hopkins Active Surveillance Dataa .................................................................................................. 33
Figure 9. CONSORT Diagram for Johns Hopkins Active Surveillance Prospective Cohort Patients Included in This Analysisa ........................................................................ 34
3
Figure 10. Predictive Accuracy of Out-of-Sample Predictions of η Among Patients With η Observed ............................................................................................................... 36
Figure 11. Screen Shot From JHM EHR Showing the Model-Estimated Risk That a Patient’s Prostate Tumor Falls Into Each of the Gleason Classes Indicated by the Colors in the Pie Chart on the Lefta .................................................................................. 38
Figure 12. Example Predictions of PANSS Symptom Scores ............................................. 40
Aim 3 ........................................................................................................................................ 41
DISCUSSION ........................................................................................................................... 43
Bayesian Hierarchical Model ................................................................................................... 43
Case Studies ............................................................................................................................. 43
Software .................................................................................................................................. 44
Stakeholders ............................................................................................................................ 45
Influence on Population Health and Patient Care ................................................................... 45
Future Directions ..................................................................................................................... 46
CONCLUSIONS ....................................................................................................................... 47
REFERENCES .......................................................................................................................... 48
ACKNOWLEDGMENTS ............................................................................................................ 52
4
ABSTRACT The PCORI mission is to address questions about health care from the patients’ perspective, such as “What is my health status and its trajectory?” and “What are my treatment options and the expected benefits and harms of each?” The purpose of this PCORI-funded project is to make it easier for clinicians and patients to find valid answers to these and other clinical questions by using modern digital tools that support (1) learning from the experience of prior patients, and (2) translating what is learned to inform the decision at hand, taking into account each patient’s unique circumstances. For this project, we developed and implemented statistical methods called bayesian hierarchical models that combine existing data on past clinical experience from a reference population with new measurements for the individual. Clinicians currently use such methods when screening patients for disease. Modern technologies make it possible for this proven approach to extend far beyond its current use. The recent revolution in information technology has unleashed new types of health data, from DNA sequences to functional images of the brain to patient-reported outcomes. Furthermore, the electronic health record captures every patient’s sequence of health measurements, diagnoses, and treatments. The bayesian methods developed and reported on here combine even complex data to produce predictions about an individual patient’s health status, trajectory, and likely benefits and harms of interventions.
In addition to developing novel methods, we facilitated their use by creating and locally disseminating a software package, OSLER inHealth, that will allow other researchers to apply this methodology. The software repository is open-source and includes the methodology developed as part of this research as well as other existing methods that facilitate individualized health prediction.
We have tested the proposed methods and software on 3 case studies to (1) estimate the frequency with which various pathogens cause children’s pneumonia and predict which pathogen is likely to be causing a particular child’s pneumonia given her or his clinical data, potentially reducing unnecessary use of antibiotics; (2) infer whether a prostate cancer is indolent or aggressive for a patient under active surveillance; and (3) characterize the variation in multiple, time-varying symptoms of major mental disorders, including schizophrenia and depression, and then use this knowledge to provide patient-specific estimates of past and, likely, future trajectories.
With this project, we have developed and demonstrated the value of combining even complex measurements on a population of patients, then translating this experience into more valid assessments of a new patient’s health status and trajectory. The model also supports inferences about the likely benefits and harms associated with available interventions.
5
BACKGROUND The electronic health record (EHR) now captures more information about patients’
characteristics and changes in health over time.1-3 Fuller use of these data could improve
diagnostic accuracy and prediction of treatment effects, but standard approaches for analyzing
clinical data are not adequate for this purpose. To make the best use of newly available data,
statistical methods must account for longitudinal data, informative sampling and missing data,
heterogeneous treatment effects, and the need to combine expert knowledge with empirical
knowledge.
Our work focuses on developing and implementing more powerful and flexible
statistical frameworks and software that can generate more accurate information that can be
incorporated into tools to help patients and clinicians make better decisions. Specifically, we
designed and implemented bayesian hierarchical models for diverse types of emerging
longitudinal data to answer patient-centered research questions. We use hierarchical (or
“multilevel”) statistical models as a basic framework because they can address each of the
complexities that arise with the EHR data.
The main innovation of this research was to design and implement bayesian hierarchical
models for diverse types of emerging longitudinal data to answer patient-centered research
questions. We use hierarchical (or “multilevel”) statistical models as a basic framework because
they represent levels of variation, for example, time within an individual within a population,
and produce point and interval predictions for individuals.4,5,6,7,8,-9 A bayesian approach is used
for 2 reasons: (1) to explicitly include prior medical knowledge about treatment effects or other
parameters using prior distributions, and (2) to communicate findings in terms of probabilities
that clinicians are accustomed to using. Three case studies on pneumonia etiology in children,
prostate cancer, and major mental disorders, respectively, were chosen to provide different
challenges to the model development.
The specific aims of this research are as follows:
6
1. To develop a bayesian hierarchical model for possibly multivariate longitudinal data that predict the health status, trajectory, and likely intervention effects for each member of a clinical population. The modeling approach comprises the following components: the effects of exogenous (eg, age, clinical history) and endogenous (eg, current treatment) variables on the individual’s health status; multivariate health measurements from which health status can reasonably be inferred; the effects of health measurements at one time on subsequent interventions; and the embedding of the individual within a reference clinical population of persons with similar values for measured covariates.
2. To build, test, and refine the model in 3 case studies representing diverse clinical challenges: (1) predicting the probable causes for childhood pneumonia; (2) diagnosing and evaluating treatments of prostate disease; and (3) quantifying the sources of variation in patient-reported measures of major mental disorders, including depression and schizophrenia; and to create preliminary designs of decision-support tools for each clinical application in collaboration with the area-specific stakeholder.
3. To implement the statistical methods in open-source R-packages in a repository named Open-Source Learning Environment for Research on Individualized Health (OSLER inHealth). In this proof-of-concept study, we developed and implemented novel statistical models that address key PCORI questions. The next key steps are to scale these and other similar tools to larger and more diverse populations in which they can be systematically evaluated and then used to improve health outcomes at more affordable costs.
7
PATIENT AND STAKEHOLDER ENGAGEMENT This project included several clinical stakeholders to ensure it was focused appropriately
on patients and clinicians. Each of the 3 case studies included 1 or more clinical stakeholders as
a member of the research team: Dr Kate O’Brien for children’s pneumonia; Drs Todd Fojo and
Peter Zandi for mental disorders; and Dr Bal Carter for prostate cancer. We also included Dr
Mary Cooke, vice president of Johns Hopkins HealthCare to represent administrative leadership
at Johns Hopkins Medicine (JHM), an early consumer of the tools developed. Dr Cooke leads
JHM health plans with approximately 40 000 members and is charged with improving the
design of health care for all of the 500 000 covered lives.
Each stakeholder was involved in the planning and execution of this research program.
During the study, the stakeholders provided their expert knowledge about the disease and
measurement processes and also shared their clinical data. They participated in the
specification and refinement of the model. In addition, 3 patients with prostate cancer
participated in semiannual meetings to review design plans and early versions for our decision-
support tool. The clinician stakeholders met with the study team in person at least once per
month or more frequently when needed.
As part of the organizational structure of the study, an OSLER inHealth Steering
Committee provided guidance to the study team about the software issues. The committee was
chaired by Dr Vince Carey; members of the committee included Dr Martin Morgan (current
leader of the Bioconductor software project, senior staff scientist, Fred Hutchinson Cancer
Research Center); Dr Francesca Dominici (professor of biostatistics, vice senior associate dean
for Research, Harvard School of Public Health); Dr Roger Peng (associate professor of
biostatistics, Johns Hopkins School of Public Health); and Dr Patrick Heagerty (professor and
chair, Department of Biostatistics, University of Washington). The Steering Committee met at
least once per year via teleconference or internet meetings or in person.
8
METHODS
Research Design
Hierarchical Statistical Models
For this project, we developed and applied bayesian hierarchical statistical models with
2 levels—time within person and persons within population—to represent the key components
necessary to learn about the level and trajectory of a disease and to make well-informed health
decisions.10 The following components were included.
Health status. In most problems, an individual’s true health status (call it η) cannot be
precisely measured but is instead reflected in clinical measurements, Y. For example,
pathogens infecting a child’s lung cannot be directly observed, so their presence or absence in
samples from the nose and throat is observed instead. The conditional distribution of these
observations, given the actual health status (here, lung infection), is the first component of our
model.
Mechanistic effects of covariates and interventions on health status. Effects of
covariates (X, exogenous; R, endogenous) on health status is the second component. We use
standard generalized linear models to describe the conditional distribution of the health status
given the individual’s covariate values at each time. We allow these covariate effects to vary
across individuals to account for heterogeneous treatment effects. A special case is to assume
that more homogeneous subgroups of people exist and that interventions can be tailored to
the characteristics of the subgroup. Our combined scientific and clinical goal is to define
“clinically relevant and mechanistically anchored” subgroups.11
Treatment decisions with feedback. To learn about the efficacy and safety of an
intervention, the third component of the model is a regression of the intervention assignment
process on previous health measures. For example, in the mental disorders case study, one
might anticipate that the choice of therapy depends on previous measurements of depressive
symptoms.
9
Embedding the individual within a reference population. The final component of
our model is a second level that describes the variation in the individual-specific model terms
across the population.
Case Studies Dictate Model Development
As listed in Table 1, the 3 case studies on pneumonia etiology in children, prostate
cancer, and major mental disorders were chosen to provide different challenges to the model
development. For each case study, we specified and computed an initial model and worked
with the stakeholder to refine it.
When this project began, Johns Hopkins EHR data could not be readily accessed for
research. Therefore, another criterion for the case studies is that the data were ready for
modeling.
Table 1. Applications of Bayesian Hierarchical Model for Individualized Health to 3 Case Studies
Medical challenge Stakeholder Target of inference Data
I Diagnosing viral vs bacterial childhood pneumonia
Kate O’Brien, professor of international health
Health status (aim 1A)
PERCH Study: 5000 cases; 5000 control participants
II Evaluating efficacy and safety of active surveillance as alternative to surgery for prostate cancer
Bal Carter, professor of urology
Health status, trajectory; causal inference about treatment choice (aims 1A and 1B)
Brady Institute Active Surveillance Study: 1300 patients with prostate cancer at Johns Hopkins
III Diagnosis and evaluation of therapies for schizophrenia or depression
Peter Zandi, associate professor of psychiatry
Health status, trajectory; causal inference: entire model (aims 1A, 1B, 1C)
Janssen Schizophrenia Trial data; NNDC depression data
Abbreviations: NNDC, National Network of Depression Centers; PERCH, Pneumonia Etiology Research for Child
Health.
10
Model Specification
The framework in Figures 1 and 2 is general. We implement the model using the
following additional specifications.
Time. We have narrowed the focus of this project to discrete, rather than continuous,
time. By discrete time, we mean daily, monthly, or annual measures; continuous time allows
measures anywhere. There are 2 main reasons for this decision. First, continuous time
processes are tractable mainly for Gaussian processes, whereas continuous covariance
functions are natural. Continuous-time models do not easily accommodate non-Gaussian
outcomes such as categorical, count, or repeated event times, which are common in medical
research. Second, most continuous-time models can be closely approximated by a discrete
time.
Health status. We allow the latent health status to be represented by multiple
variables that can be a mixture of discrete and continuous variables.12,13
Measurements. We assume that the measured health outcomes (Ys), given the true
underlying health status (η) are distributed according to the exponential family of distributions
(including Gaussian, binomial, Poisson, γ, and others). Where measurement error may depend
on external covariates, generalized linear models are used.
Missing data. This model naturally handles missing data by treating them in the same
way as other latent variables. At each iteration of the estimation algorithm, the missing values
are randomly imputed. The full data set is then used to simulate parameters and latent
variables. Conditioned on the model, the inferences are not substantially affected by missing
data unless the unobserved value is what caused the missingness (“nonignorable
missingness”).14
Plug-and-play extensibility. A feature of this hierarchical model design is the possible
use of conditional independence of components. This allows the modeler to change parts of the
11
model while leaving the rest intact. For example, one health status measurement model can be
substituted for another without changing the rest of the structure.
Model Computation
Given the dramatic advances in bayesian computing packages, such as JAGS
(http://mcmc-jags.sourceforge.net/) and Stan (https://mc-stan.org/), we changed our plans to
implement our longitudinal models in R using the existing computational software rather than
writing original software.
Model Deliverables
This section briefly summarizes the key outputs from the bayesian hierarchical model
we developed. The inputs are the predictor variables (X and R), health outcome measurements
(Y), a prior distribution for the unknown health status (η), and the model structure that ties
the observations to the unknowns. The major outputs are listed in the following paragraphs and
illustrated in more detail for each of the 3 case studies.
Predictions of individual’s health status. The model produces an estimate of the
posterior distribution of the health status η!" for individual )at every time +. For example, in the
pneumonia etiology case study, the model produces the probability that the lung infection is
caused by each of the candidate pathogens given the observed measurements and the
estimated population pathogen frequencies.
Predictions of individual’s health trajectory. The model produces an estimate of
the posterior distribution of the trend in health status for each value of the predictor variables
that can include the treatment. For instance, in the major mental disorders example, the model
calculates the risk for a particular patient’s depression to worsen (negative slope) based on his
history of Patient Health Questionnaire-9 (PHQ-9) scores and covariates.
Estimates of treatment effects. The model produces an estimate of the marginal
distribution of the regression coefficients; each coefficient measures how the outcome, health
12
status, is associated with its predictor variable (R,X). For example, if health status is a binary
latent variable representing presence or absence of a disease and R is an indicator of whether
the patient has received intervention B rather than alternative A, then, a model output is the
estimated posterior distribution for the relative risk of disease for a person receiving B as
compared with another person with the same X value receiving A. The assumptions required to
interpret this quantity as a causal effect are well known.15-18
Measuring heterogeneity of treatment effects and predictions of individual
treatment effects. This modeling approach naturally accommodates heterogeneity of
intervention effects through both fixed and random effects. As with any regression, the user
can include interactions between exogenous covariates (X), for example, genetic markers, and
the intervention indicator variables (R). Here, the regression coefficients for the interaction
terms estimate the differences in intervention effects across the levels of the interacting
variables. In addition, this hierarchical model includes a distribution of intervention effects
across the population. With substantial amounts of information for each individual or with prior
knowledge about the variance of the intervention effect coefficients across the population, the
model produces the posterior distribution for an individual’s treatment effect under each
intervention.
Model Refinement
Sensitivity analyses. For a statistical model to be relied upon by clinician scientists
and practitioners, these users must develop an understanding of its main ideas. Trust was built
by making the practitioners full partners in the design of the models. In addition, “kicking tires,”
(ie, repeated testing of the sensitivity of results to varying assumptions and data) was used. Our
design of software has attempted to make this easier for statistical users.
Testing with stakeholders. In each case study, we met approximately weekly with
our clinical colleagues and their staff. Part of each meeting was dedicated to sensitivity analyses
to improve the model performance and clinical utility.
13
Model Evaluation
Statistical evaluation. Each model was evaluated in 2 ways. First, we used 10-fold
cross-validation to estimate the accuracy and precision of the predictors relative to
competitors. The second method was to simulate data sets from a known distribution like the
one estimated from the case study data and to calculate the statistical performance (ie, bias,
variance, mean squared error) of the model.
Clinical evaluation. The refinement process we have described included a qualitative
user evaluation of whether a tool was ready to test clinically. It was beyond the scope of this
project to complete this clinical evaluation that, in the future, will include 2 phases. In phase 1,
we will use the tool with a representative sample of clinicians and patients, then administer a
questionnaire to measure their opinions about the value added or subtracted by its use. In
phase 2, we will randomly assign clinicians and patients to use the tool or not and measure
clinical end points. For example, in the prostate cancer trial, we would hypothesize that our
decision-support tool would reduce the number of prostatectomies that remove indolent
tumors without increasing the rate of metastases.
Case Studies
The general model was tailored to address clinical research questions within the 3
collaborations: childhood pneumonia, prostate cancer, and major mental disorders, listed in
order of their maturity (most to least) in Table 1. In the following sections, we discuss each in
more detail, illustrating the scientific problem, how the general model applies, and what clinical
outcomes are anticipated.
Childhood Pneumonia Case Study: Collaboration With Pneumonia Etiology Research for Child Health Study
The Pneumonia Etiology Research for Child Health (PERCH) study, initiated in 2009, is a
large, multinational, case-control study of severe pneumonia in hospitalized children aged <5
years. Seven PERCH sites from South Asia and sub-Saharan Africa have been selected for the
study because they represent areas where most of the severe pneumonia cases in children
14
occurred in 2015 and where key interventions are already in place.19 Kate O’Brien, PERCH
principal investigator (PI), was a funded stakeholder on this project. To improve on previous
laboratory techniques that have remained largely unchanged for more than a century, her team
is applying modern diagnostic tools and standardized methods in the hope of contributing to
new, precise information about the cause of each pneumonia case and, ultimately, to guide the
development of new vaccines and treatments.
Scientific background and study aims. Pneumonia is the leading cause of global
childhood deaths, accounting for almost 1 in 5 childhood deaths in 2010.20,21 Pneumonia is a
syndrome associated with infection of the lung tissue, which can be caused by microorganisms
of >30 different species, including bacteria, viruses, mycobacteria, and fungi, among which only
a few are likely to have infected each patient by the time of hospitalization.22 Knowing which
pathogen has caused a pneumonia case is crucial for choosing effective treatment. For
example, antibiotics are ineffective for treating viral infections. The strategy for direct
pneumonia treatment and prevention efforts is also complicated by various epidemiologic and
microbiologic factors.19
In the PERCH study, approximately 5000 cases and 5000 control participants were
enrolled and specimens from both groups were tested by a comprehensive array of laboratory
measurements with differing precisions.23 These specimens are collected from the lungs and
peripheral body fluids, including the blood, nasopharyngeal (NP) cavity, pleural fluids, and
induced sputum. Direct sampling from the lungs (lung aspirates) of patients serves as a “gold
standard” measurement of the pathogens in the lung; that is, the test has nearly perfect
specificity and sensitivity. Culturing bacteria from blood samples gives a “silver standard”
measurement assumed to be perfectly specific but imperfectly sensitive. Obtaining lung
aspirate samples is painful for the patient and uncommon in resource-limited settings, so only
some case patients in the PERCH study had these collected in response to clinical needs,
whereas all case patients had blood samples collected. Finally, polymerase chain reaction (PCR)
evaluation of bacteria and viruses from NP samples are a “bronze standard” because they have
15
imperfect sensitivity and specificity. NP samples were taken from all case and control
participants and tested by PCR.
Our research addressed 2 biomedical questions:
1. What is the frequency with which each pathogen on a prespecified list causes clinical pneumonia in the population of infected children?
2. What is the probability that a child with clinical pneumonia has ≥1 particular pathogens infecting the lung given the child’s specimen measurements and other characteristics like age and disease severity?
We developed original methods called partial latent class models (PLCMs) and nested
versions (nPLCMs) that can estimate the etiology of any disease from multiple types of
measurements, regress the etiology distribution on covariates, and produce a patient-specific
probability distribution for each potential cause.
Prostate Cancer Case Study: Collaboration With Brady Institute of Urology
Through our collaboration with stakeholder Dr H. Ballentine Carter, the director of Adult
Urology at the Johns Hopkins School of Medicine and PI of the Active Surveillance Program
within the Department of Urology, we have access to longitudinal data on 1300 men who
elected to follow active surveillance upon receiving their initial prostate cancer diagnosis.24,25
Scientific background and aims. Prostate cancer is the most commonly diagnosed
nonskin cancer in men in the United States and has a lifetime risk of diagnosis of 15% and
lifetime risk of death of 2.7%.26 Upon diagnosis, early curative treatment with surgery,
radiation, or androgen deprivation therapy is common.27 In particular, nearly half of men with
biopsy-detected localized prostate cancer receive prostatectomy, whereas only 6.8% choose
surveillance.28 Curative interventions can be physically, emotionally, and financially taxing for
patients. In particular, 1-month mortality after surgery is as high as 0.5%, and at least 20% to
30% of men experience urinary incontinence and/or erectile dysfunction after surgery or
radiotherapy.29,30
16
There is also evidence that treatment is not always proportionate to risk; patterns of
both overtreatment of low-risk disease and undertreatment of high-risk disease have been
identified.28 To this point, the risks and benefits of treatment vary for patients depending on
the severity of their cancer. Specifically, men whose cancer would never become symptomatic
have no potential to benefit from treatment.
Despite the risks associated with overtreatment, patients and doctors may often choose
early treatment because of uncertainty in the initial diagnosis and, more specifically, the
inability of existing biopsy techniques to distinguish with certainty between cancers that will
remain indolent and those that are, or will become, life-threatening. Prostate biopsy specimens
are only informative about the biopsied tissue; features of nonbiopsied tissues, such as regional
lymph nodes, remain unobserved. As a result, doctors and patients must make treatment
decisions in the face of this uncertainty.
Active surveillance with curative intent offers an alternative to early treatment for
individuals with lower-risk disease detected.25,31-35 Though active surveillance regimens vary,
the approach generally entails regular biopsies (eg, annually) with curative intervention
recommended on disease reclassification. Although a primary concern for active surveillance is
the potential for delaying life-saving treatment, a low risk of prostate cancer–specific mortality
has been observed in several active surveillance studies. Correctly identifying patients at low
risk and who, therefore, would benefit from active surveillance could reduce overtreatment,
thus reducing the risk of complications and adverse effects from treatment, as well as financial
burden, for patients.
In this context, men with a recent diagnosis of lower-risk prostate cancer want to know
whether they have a lethal cancer and, for each of their intervention options, what their
expected quality and length of life is likely to be. The treatment options are continued active
surveillance, radiation treatment, or prostatectomy. For men who choose active surveillance,
they want to learn whether the frequency of (painful) biopsies could be safely reduced.
17
Methodologically, we developed and applied hierarchical models that incorporate
measurement error in cancer-state determinations on the basis of biopsied tissue, clinical
measurements possibly not missing at random, and informative partial observation of the true
state.
Major Mental Disorder Case Study
In this case study, access to data from the major clinical partner, the National Network
of Depression Centers (NNDC), was delayed nearly 2 years as this new organization became
established. Therefore, we developed our bayesian hierarchical models on the basis of directly
analogous schizophrenia data obtained from a Janssen Pharmaceutical clinical trial that we had
previously used in model development.36 Once JHM depression data became available, we
applied our methods to longitudinal measurements of depression, mania, and anxiety
symptoms. Dr Peter Zandi, a member of the NNDC data acquisition and analysis team, was a
funded stakeholder on this proposal.
Scientific background. Depression is a complex disease with heterogeneous etiology,
phenotypes, and treatments.37 Depression is also associated with significant psychiatric (eg,
anxiety, mania, panic) and general comorbidities, including cardiovascular disease, stroke, and
dementia. The heterogeneity of response to treatment makes the choices of first-line and later
therapies more difficult.38
Scientific aims. To support NNDC in their mission, a first key step was to build a
statistical understanding about sources of variation in the NNDC-selected, patient-reported
measure of depression, the PHQ-9, a general instrument for screening, diagnosing, monitoring,
and measuring severity of depression.39 To this end, we built a multivariate longitudinal data
model for the depression, anxiety, and mania data to estimate a patient’s level and trajectory
of symptoms and to predict them into the future, given the covariate profile.
Methodologically, we developed methods to jointly predict the trajectory of a patient’s
mental health status as measured by multiple outcomes and the effect of this trajectory on the
18
risk of key clinical events. Our methods accommodate substantial missing data and irregular
observation times.
OSLER inHealth
The overarching goal of OSLER inHealth (https://oslerinhealth.org/) is to provide an R-
based environment comprising software tools to support the visualization and analysis of
health data to better inform clinical decisions. For many health decisions, the intelligent
acquisition and use of data improves the chance of a successful outcome. The relevant
information is longitudinal and increasingly complex, now including digitalized images, DNA
sequences, novel biomarkers, and multivariate time series from wearable devices, in addition
to more traditional clinical indicators of phenotype. EHRs have made it possible to acquire and
manage health information more effectively. They also enable Boolean-style (ie, “if, then, else”)
decisions. For example, if a newly recorded laboratory value is above a particular level, an EHR
can automatically signal a clinician to inquire further, perhaps by scheduling a follow-up visit.
But in today’s information-rich environment, there is a heightened need to define,
measure, and track health status; integrate traditional with more complex health measures;
and develop and use appropriate tools for analysis. For an EHR to maximally benefit patients, it
must be a component in a system that integrates the relevant evidence to build, test, and
continuously refine mechanistic or empirical (statistical) models that evaluate and
communicate the evidence from the available data to the point of care where health decisions
are made.
OSLER inHealth, like Bioconductor (https://www.bioconductor.org/), must operate at
the interface of statistical and biomedical science. We intend for it to be used by professional
data scientists, by their quantitatively oriented biomedical colleagues, and by students from
both groups. However, unlike Bioconductor, the main consumers of the OSLER inHealth output
are nontechnical persons making health decisions. Hence, it must also support effective
communication of the questions, data, and findings to health experts and their clients/patients.
19
RESULTS
Aim 1
To develop a bayesian hierarchical model for longitudinal data that predicts the health
status, trajectory, and intervention effects for each member of a clinical population.
We have used a hierarchical statistical model with 2 levels—time within person and
persons within a population—to represent the key factors relevant to making medical
decisions.10 Details on the model formulation are provided by Ogburn and Zeger.40 Details of its
implementation are given in the papers specific to each of the case studies.41-46
The model for an individual over time is pictured in Figure 1. The health status trajectory
is represented by the temporal sequence of latent variables, here called η. For example, η can
represent the pathogenic species infecting a child’s lung or the cancer state of the prostate over
time. Specification of this part of our model has 3 major components.
Health Status
In many medical applications, the true health status cannot be directly or precisely
measured, but it can be inferred from measurements that we denote Y in Figure 1. For
example, the child’s lung cannot be directly sampled, so blood and the NP cavity are sampled
instead. The conditional distribution [Y|η,ϕ] of the observations, given the true health status,
is indexed by unknown parameters ϕ that represent errors in measurements.
Mechanistic Effects of Covariates and Interventions on Health Status
Effect of covariates (X, exogenous; R, endogenous) on health status η is represented by
the conditional distribution [η|X, R; β, δ]. The β parameters represent the intervention effects;
the unobserved δ parameters are latent indicators of possible classes of disease states or
trajectories or responses to treatment. The interaction effects of either observed (X) or latent
subgroup indicators (δ) with the treatment indicators (R) cause heterogeneity in treatment
effects across the population.
20
Treatment Decisions With Feedback
To estimate the efficacy or safety of a treatment, one must understand the intervention
assignment process [R|Y,X], especially its dependence on prior health measures.
The full model is completed by embedding the multiple measurements for an individual
(Figure 1) within a reference population (ie, to model the variation in the individual-specific
model terms across the population). This embedding is shown in Figure 2, where parameters
and latent variables for an individual can depend on covariates X in the distribution
[β, θ, ϕ, η|X] as discussed by Ogburn and Zeger.40
Figure 1. Graphical Representation of Person-Time Levels of Bayesian Hierarchical Model for Individualizing Health Showing State Trajectory (η)a for a Single Individual (i)
aHealth status (η) is affected by exogenous (X) and endogenous (R) covariates through person-specific regression
coefficients β! and expressed in observed outcomes $"# via a model with parameters ϕ". Trajectories can usefully
be partitioned into subgroups represented by δ".
21
Figure 2. Graphical Representation of a Population (n = 4) Showing the Population Level in Which the Individual Specific Parameters Are Assumed to be Independent Realizations From
a Distribution (b, q, f, η | X)a
aIn this model, the treatment effect or measurement or trajectory heterogeneity can vary across people in a
manner that depends on covariates X. This population-level variation also implies that the best estimates of the
health status for individuals rely not only on their data but also on data for otherwise similar individuals in their
reference population.
Complex Latent States
For the 3 case studies, the health status variable η is relatively low dimensional: an
indicator for 1 of 30 lung pathogens; an indolent or aggressive prostate tumor; or levels of
symptoms for depression, anxiety, and mania. However, there are applications, for example, in
image analysis, where the dimension of η is large enough to require specialized approaches.
22
This project investigated an example in which we observed the presence or absence of
hundreds of proteins indicative of an autoantibody “signature” in a patient with autoimmune
disease that we expect might be predictive of the disease trajectory. Wu et al44 introduce the
problem and provide methods for preprocessing the image data generated by a gel
electrophoresis assay. In another publication, Wu et al45 reported on their development of a
class of restricted latent class models (RLCMs) that favors sparse distributions for high-
dimensional latent classes.
Figure 3, taken from Wu et al,45 shows our proposed solution for the autoantibody
problem. The original data, after preprocessing, are shown in the matrix on the far left of the
figure. Each person’s signature forms a row. The color is blue if the protein is present and
yellow if not. The proposed RLCM solution is shown in the 2 matrices on the right. On the far
right, the matrix has the same number of columns (ie, proteins; here, n = 50 proteins) as the
original data (left) but only 7 rows, 1 for each estimated “machine” defined as a combination of
proteins that could be targeted by a single immune autoantibody. The middle matrix indicates
whether a person’s immune system responded to each of the machines using the same color
convention. Here, however, the degree of blue indicates the posterior probability that the
individual’s immune system targeted that machine. This method is designed to find a relatively
small number of machines, each of which has a sparse (ie, low number) set of proteins. Details
about how this variable reduction method can be used in practice are provided by Wu et al.45
23
Figure 3. Decomposition Results for Gel Electrophoresis Assay Data
Note: Left panel: Aligned data matrix for band presence or absence; row for 76 serum lanes, reordered into
optimal estimated clusters separated by gray horizontal lines “—–“; columns for L = 50 protein landmarks. A blue
vertical line “|” indicates a protein’s presence at that molecular weight. Middle panel: lane-machine matrix for the
probability of a lane (ie, serum sample) having a particular machine. The blue cells correspond to a high probability
of having a machine in that column. Smaller probabilities are shown in lighter blue. Right panel: Estimated machine
profiles. Here, 7 estimated machines are shown, each with component proteins shown by a blue bar “|”.
From Wu et al. bioRxiv. Posted September 21, 2020. The copyright holder for this preprint is the author/funder,
who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0
International license. https://www.biorxiv.org/content/10.1101/400192v3
Limitations
Among the many limitations of the hierarchical models discussed, we propose to focus
on 2. First, these models are entirely parametric, meaning that the entire distribution of the
observations must be specified. We tend to choose distributions that simplify computations
without increasing prediction variance or bias. Extensions of these methods to include
nonparametric or more flexible parametric models are of future interest. The second limitation
is that the image analysis example was lower dimensional (76 × 50 in Figure 3) than many
24
interesting problems. Our computational approaches must be improved to apply our approach
to neuroimage or genomic data.
Aim 2
To iteratively build, test, and refine the model in 3 case studies.
Diagnosing Viral vs Bacterial Childhood Pneumonia
The primary goal for this case study was estimating the rates with which 30 different
pathogens (ie, viruses, bacteria, and fungi) cause children’s pneumonia at 7 sites from Africa
and Asia. These estimates of pneumonia etiology can guide investments by governments and
nongovernmental organizations in prevention and intervention strategies. In addition, the
methods allow clinicians to predict the cause of a particular child’s pneumonia by integrating
measurements from the child’s blood, sputum, and/or NP cavities with current estimates of
population rates.
This project has funded the development, implementation, and application of this
bayesian hierarchical modeling approach and multiple extensions using the PERCH study data
and has enabled PERCH to more precisely achieve its aims. After a brief review of the method’s
products, we focus on the translation of the model into practice within the infectious disease
community. Details are available in the methods paper by Deloria Knoll et al.47 The
methodologic details are provided in 2 papers by Wu et al42,43 and in a recent technical report
also by Wu et al.45
Figure 4 shows the application of our bayesian hierarchical model to the problem of
estimating the etiology of children’s pneumonia. The left panel shows the posterior (and prior)
distributions of the fraction of pneumonia cases caused by viruses. The panels in the center and
on the right show the posterior distributions for 2 children, both with multiple detected
pathogens in the NP cavity (shown by the sequence of 0 or 1 values in black) and with no
pathogens detected in the blood assay (all 0s in the red). The posterior distribution for the
center panel identifies respiratory syncytial virus (RSV) as the cause with high certainty. This is
because RSV is rarely detected in the NP cavities of control-group children. The rightmost
25
posterior distribution puts the majority of probability on human metapneumovirus (HMPV) A
and B but cannot rule out 3 other causes.
To introduce the bayesian hierarchical approach to the clinical infectious disease
community, Deloria Knoll et al47 first synthesized the prior methods for estimating the etiology
of children’s pneumonia (shown in Figure 5). Only the first and last methods produce estimates
of etiology distributions. The first is not reproducible. The last ignores measurement error and
cannot easily cope with multiple measurements of each pathogen (eg, from multiple sampling
locations).
Deloria Knoll et al47 then introduce the bayesian hierarchical approach developed in this
program with the title PERCH Integrated Analysis (PIA) Method (Figure 6).
26
Figure 4. Graphical Displays of the Estimated Population and Individual Etiologies Using Data From a Site of the PERCH Study
Abbreviations: PCR, polymerase chain reaction; PERCH, Pneumonia Etiology Research for Child Health. Note: The left panel illustrates the prior (dashed line) and posterior (solid line) distributions of the viral etiologies that can be easily derived in our framework. The middle and right panels show the individual prediction histograms for the 2 measurement patterns in the case data.
27
Figure 5. Alternative Analytic Approaches Used for Determining Pneumonia Etiology
Abbreviations: BCX+, positive blood culture; Cor, coronavirus; Hinf, Haemophilus influenzae; HMPV, human metapneumovirus A/B; ID, identifier; NP/OP, nasopharyngeal/oropharyngeal; PCR, polymerase chain reaction; Rhino, rhinovirus; RSV, respiratory syncytial virus; S. aur, Staphylococcus aureus; Spn, Streptococcus pneumoniae. From Deloria Knoll et al. Clin Infect Dis 64(Suppl 3),S213-S227. Reprinted with permission from Clinical Infectious Diseases (Copyright ©2017). Oxford University Press. All Rights Reserved.
28
Figure 6. Estimating the Etiologic Fraction From a Study With 2 Types of Measurements, 1 With Control Data, and Accounting for Imperfect Sensitivity of Both Measurements: The PIA Methoda
Abbreviations: BCx, blood culture; NP, nasopharyngeal; OP, oropharyngeal; PCR, polymerase chain reaction; PERCH, Pneumonia Etiology Research for Child Health; PIA, PERCH integrated analysis. aThe PIA method can combine multiple specimens as whole-blood PCR and lung aspirate culture PCR and adjust each measurement for pathogen-specific sensitivity to estimate the etiologic fraction. From Deloria Knoll et al. Clin Infect Dis 64(Suppl 3),S213-S227. Reprinted with permission from Clinical Infectious Diseases (Copyright ©2017). Oxford University Press. All Rights Reserved.
29
Having introduced the method in epidemiologic terms and having conducted simulation
studies to compare its performance to the attributable fraction method, the PERCH Study team
used it as the main method in their PERCH results paper.48 The PERCH investigators showed
that most hospital admissions for childhood pneumonia were caused by a small set of
pathogens. An example of how the results are communicated is shown in Figure 7 (and the
PERCH team’s Figure 7 in the supplemental materials accompanying their article48), which
shows the estimated etiologic fraction for each pathogen with 95% posterior probability
interval, stratified by whether the cases were chest X-ray positive. The lower box in Figure 7
shows the estimated fractions of viruses and bacteria.
30
Figure 7. All Site Etiology Results Among HIV-Uninfected Patients by CXR Findings: CXR+ Patients vs All Patients
Abbreviations: Adeno, adenovirus; B. pert., Bordetella pertussis; C. pneu, Chlamydia pneumoniae; Cand, Candida; CMV, cytomegalovirus; Cor, coronavirus; CXR+, chest radiograph positive; Entrb, Enterobacter; HCoV, human coronavirus; Hinf, Haemophilus influenzae; HMPV, human metapneumovirus A/B; NP/OP, nasopharyngeal/oropharyngeal; Legio, Legionella; M. pneu, Mycoplasma pneumoniae; Mtb, Mycobacterium
31
tuberculosis; NFGNR, nonfermenting gram-negative rods; Nmen, Neisseria meningitidis; NoS, not otherwise specified; PERCH, Pneumonia Etiology Research for Child Health; P. jirov, Pneumocystis jirovecii; PCR, polymerase chain reaction; Rhino, rhinovirus; RSV, respiratory syncytial virus; Salm, Salmonella; S. aur, Staphylococcus aureus; S. pneu, Streptococcus pneumoniae. From PERCH Study Group. Lancet. 2019;394, 757-779. Reprinted with permission from Lancet (©2019). All Rights Reserved.
Evaluating Efficacy and Safety of Active Surveillance as An Alternative to Surgery for Prostate Cancer This case study was developed to support the critical clinical decision, made by a
clinician and his patient, to join, continue, or leave active surveillance in favor of prostatectomy
or radiation treatment. The team developed specific models, special cases of the general model
shown in Figures 1 and 2, to produce the probabilities that (1) a man’s prostate cancer is
aggressive (Gleason score >6) rather than indolent; and (2) a specimen collected via biopsy
conducted on that day would show evidence of an aggressive tumor. This section focuses on
the first question; results relevant to the second can be found in the report by Mamawala et
al.49
Model. The results presented here are summaries, with considerable direct citation in
quotation marks, from 1 of the 2 main articles that present the results in detail (Coley et al).41
The general models summarized in Figures 1 and 2 were tailored to prostate cancer
active surveillance. In this case study, we assume there is a true underlying binary state of the
tumor. Although the model allows that state to change (Figure 8, panel a), there is little
evidence in the data to learn about whether the state has changed. Therefore, as a first
approximation, we assume that the true tumor type is fixed (Figure 8, panel b). Given the latent
variable, there are 3 sources of direct evidence about the underlying unknown state that we
seek to predict: (1) the time series of prostate-specific antigen (PSA) scores that we represent
using a random-effects model with random intercept and slope, and a covariance matrix that is
the same regardless of true state; (2) the indicator of whether a biopsy was performed at a
visit; and (3) the pathology results of the biopsy specimen. In addition, we allow the decision to
conduct a prostatectomy to also depend on the underlying state as a sensitivity analysis of
32
whether the predictions change substantially over a range of assumptions about the degree of
representativeness of the patients undergoing prostatectomy for the entire cohort.41
Figure 8. Directed Acyclic Graphs Describing the Relationships Between Latent Class and Clinical Outcomesa
Abbreviation: PSA, prostate-specific antigen. aLatent class is indicated by blue circle or oval; clinical outcomes are indicated by a green rectangle. From Coley et al. Biometrics 73(2), 625-634. Reprinted with permission from Biometrics (Copyright ©2017). International Biometric Society. All Rights Reserved.
Model parameters and their priors are presented in Table 2. Posterior sampling was
performed with RJags (Plummer et al).50 Analysis code, sampler settings, and diagnostic plots
are available from the supplementary material for Coley et al.41
33
Table 2. Model Summary With Priors Used for Application to Johns Hopkins Active Surveillance Dataa
Abbreviations: AS, active surveillance; PSA, prostate-specific antigen. aD! is the length of vector X, and I"! is the identity matrix with dimension D! × D!. D#, D$, D%, and D&, and the associated identity matrices are similarly defined for covariate vectorsZ, U, V, and W. From Coley et al. Biometrics 73(2), 625-634. Reprinted with permission from Biometrics (Copyright ©2017). International Biometric Society. All Rights Reserved.
34
The Johns Hopkins Active Surveillance Cohort comprises a “total of 874 patients who
met study criteria and had at least two PSA measurements and at least one post-diagnosis
biopsy as of October 1, 2014... Patient outcomes are given in Figure 9. Grade reclassification
was observed in 160 patients (18% of the analysis cohort). Notably, over a quarter of patients
with grade reclassification who underwent prostatectomy were downgraded after surgery
(17/65) while nearly a third of patients who underwent prostatectomy in the absence of grade
reclassification were upgraded (30/96).”41
Figure 9. CONSORT Diagram for Johns Hopkins Active Surveillance Prospective Cohort Patients Included in This Analysisa
Abbreviation: GS, Gleason score. aPostsurgery full prostate GS observations are given in circles. Six patients who underwent prostatectomy did not
have true GS observations available.
From Coley et al. Biometrics 73(2), 625-634. Reprinted with permission from Biometrics (Copyright ©2017).
International Biometric Society. All Rights Reserved.
Prediction assessment. “Predictive accuracy was assessed using the whole prostate
surgical specimen as the gold standard. To avoid bias in estimating prediction error, we never
35
used the same data to build the model that is used to evaluate it. So called ‘out-of-sample’
predictions of η were obtained for each patient by removing his data from the analysis and re-
running the posterior sampler. Out-of-sample predictions of η! were then compared to known
values with receiver operating characteristic (ROC) curves and calibration plots. For the former,
the area under the curve (AUC) and associated 95% bootstrapped intervals were calculated. For
the latter, a plot comparing posterior predictions to observed rates of class membership was
constructed by performing logistic regression of the observed true state on a natural spline
representation of out-of-sample posterior predictions (degrees of freedom = 2).”41
Prostate cancer findings. “We estimated that the prevalence of aggressive tumors in
the cohort was 0.20 (95% posterior interval: 0.14, 0.28) when we allow the decision to have
surgery to be informative, and 0.30 (0.23, 0.38) without informative sampling. Ninety-five
percent of AS [active surveillance] patients who neither reclassified nor underwent surgery
have posterior predictions that are lower than 50%; a majority have predictions below 20%.”41
The posterior predictions with vs without informative biopsy timing or decision to have a
prostatectomy are similar for most patients.
“Posterior predictions of η from our full model gave out-of-sample AUC estimates
among patients with observed true cancer state of 0.75 (95% bootstrapped interval: 0.67,
0.83).”41 See the graphs in Figure 10.
36
Figure 10. Predictive Accuracy of Out-of-Sample Predictions of η Among Patients With η Observed
Abbreviations: AUC, area under the curve; IOP, intensive outpatient program; ROC, receiver operating
characteristic; η, true health status.
(a) The specificity of predictions from each model is highlighted at the sensitivity of a binary classifier defined by
final biopsy result (*).
(b) The dark line shows the empirical rate of observing a true Gleason score of ≥7 (y-axis) given an out-of-sample
posterior probability of true state (x-axis) under our model with informative biopsy and surgery components;
shading gives the 95% point-wise CI. Perfect agreement lies on the x = y axis (dotted line). Hashmarks at y = 0 and y
= 1 correspond to observed cancer states (η = 0, 1, respectively) for patients with postsurgery true-state
observations. Hashmarks are located along the x-axis at each patient’s out-of- sample posterior probability of the
true state.
“Posterior predictions of η from the IOP model also appear to accurately estimate a
patient’s risk having more aggressive cancer. The calibration plot in Figure 6b [Figure 10, panel
b here] shows that, for patients with known values of η, the average observed value of η is
close to the average posterior predicted probability of η=1, indicating that the model
reasonably reproduces the mean of observations. The risks of clinical outcomes (biopsy results)
and choices (occurrence of biopsy and surgery) for all patients appear to be accurately
37
estimated by the IOP model as well, as demonstrated by calibration plots in the online
supplement.”41
The results of this model have been visualized for clinician and patient use and
successfully implemented within the JHM EHR. Figure 11 shows a screen shot that summarizes
1 man’s risk of having an aggressive tumor and the implications of those risks. Clinical studies
are underway to evaluate the value of such patient information.
38
Figure 11. Screen Shot From JHM EHR Showing the Model-Estimated Risk That a Patient’s Prostate Tumor Falls Into Each of the Gleason Classes Indicated by the Colors in the Pie Chart on the Lefta
Abbreviations: EHR, electronic health record; JHM, Johns Hopkins Medicine; PSA, prostate-specific antigen. aBy clicking on a wedge of the pie chart, additional information appears on the right, indicating published
information for longer-term outcomes, given their true pathology.
Diagnosis and Evaluation of Therapies for Major Mental Illness The results presented here are summaries, with considerable direct citation in quotation
marks, from Fojo et al.46
39
For this case study, we intended to use NNDC data to develop bayesian hierarchical
models to predict the trajectory of each patient before treatment to be compared with the
outcomes under treatment. The NNDC project was chosen to feature its time-varying
multivariate symptoms data and possibly its neuroimaging data. However, NNDC data
availability was substantially delayed and so the research team developed the methods for
similar symptoms of schizophrenia, including scores for 3 distinct scales: positive, negative, and
general symptoms measured over time for approximately 1000 patients. The details about the
data set and the bayesian hierarchical models are presented by Fojo et al.46
Figure 12 displays the model’s prediction of schizophrenia symptoms for a single
patient. “The three panels illustrate the predictions (red circles) for the individual’s future
General, Negative, and Positive PANSS [Positive and Negative Syndrome Scale] subscale scores
as well as predicted cumulative probability of treatment failure (red bars at right) calculated
from prior measurements of the subscales at (a) week 0 when treatment is begun, (b) 0 and 1
weeks, and (c) 0, 1, 2, and 4 weeks. Green squares indicate the observed measurements. The
individual’s trajectory is displayed against other participants in the trial (background blue lines).
The dark gray ribbons indicate the 50% confidence intervals, the lighter gray ribbons indicate
the 95% intervals. The vertical dashed lines indicate the time up to which observations are used
to inform the predictions. A higher PANSS score indicates more severe symptoms. The General
subscale score ranges from 16 to 112, and the Positive and Negative subscale scores range from
7 to 49.” 46
The same model has more recently been applied to the NNDC data from Baltimore for
the 3 scales of depression, anxiety, and mania. That work is still in progress and thus is excluded
from further description here.
40
Figure 12. Example Predictions of PANSS Symptom Scores
Abbreviation: PANSS, Positive and Negative Syndrome Scale.
Note: The 3 panels illustrate the predictions (red circles) for 1 individual’s future General, Negative, and Positive
PANSS subscale scores as well as predicted cumulative probability of treatment failure (red bars at right) based on
measurements of the subscales at (a) week 0 (initiation of treatment), (b) 0 and 1 weeks, and (c) 0, 1, 2, and 4
weeks. Green squares indicate the observed measurements. The individual’s trajectory is displayed against other
41
participants in the trial (background blue lines). The dark gray ribbons indicate the 50% prediction intervals, and
the lighter gray ribbons indicate the 95% prediction intervals. The vertical dashed lines indicate the time up to
which observations are used to inform the predictions. A higher PANSS score indicates more severe symptoms. The
General subscale score ranges from 16 to 112, and the Positive and Negative subscale scores range from 7 to 49.
From Fojo et al. J Psychiatr Res. 95, 147-155. Reprinted with permission from the Journal of Psychiatric Research (Copyright ©2017). Elsevier. All Rights Reserved.
Aim 3 To implement the statistical methods in an open-source, easily extensible R package:
OSLER inHealth.
The longer-term goals of OSLER inHealth are as follows:
• Disseminate software updates quickly
• Educate a diverse community of scientists, using detailed tutorials
• Ensure quality via automatic and manual quality controls
• Promote the reproducibility of personalized health care data analysis
OSLER inHealth (https://oslerinhealth.org/) is a repository of current and future R
packages that are relevant for statisticians involved in precision medicine and health care. We
have built a repository infrastructure and begun the process of making centrally available high-
quality packages. The structure includes feedback for developers. OSLER attempts to fill the gap
in repositories whereby a package may not be listed on the Comprehensive R Archive Network
(CRAN) because it may contain crucial data that make it too large to store. For example, the
rnhanesdata package contains the National Health and Nutrition Examination Survey (NHANES)
wearable activity data, organized for the user. The size of this package prohibits acceptance
into CRAN, but its size is acceptable in OSLER and can be used to analyze NHANES data easily
and quickly.
We have an estimated 124 monthly users. We currently have 6 packages; the
developers all have been helped by OSLER developers to improve the packages and have them
pass a set of quality checks.
42
We plan to publicize OSLER more in the future. In tandem with OSLER, we have been
developing Neuroconductor (https://neuroconductor.org/) with similar goals for medical image
analysis. Though much effort has focused on Neuroconductor, all improvements to
Neuroconductor have been used to improve OSLER. Thus, we have learned lessons with a larger
set of packages and developers and have been able to improve OSLER, even though, because of
its recent completion, the number of OSLER packages is much fewer. For example, we have
developed custom build scripts
(https://github.com/muschellij2/neuroc_travis/blob/master/oslerinhealth_travis.yml)
specifically designed for OSLER. These customizations allow packages to be checked with
additional software to ensure that the packages run properly for users.
Next steps. Now that we have a stable repository for packages, we plan to publicize
this to researchers, encourage more developers to submit packages, and improve current
packages to provide more tutorials for researchers entering these areas.
43
DISCUSSION
Bayesian Hierarchical Model With this project, we have developed and demonstrated the utility of bayesian
hierarchical statistical models to better characterize and communicate an individual’s health
status, trajectory, and, to a more limited degree, the likely benefits of interventions. Our model
represents key elements of the process that gives rise to clinical or population health data,
including a dynamic latent health status process that can comprise discrete and/or continuous
variables; heterogeneous (across individuals) effects of treatments and covariates on that
process; nonignorable observation bias and complex outcome variables that reflect the
underlying process; and a treatment assignment process that can depend on past outcomes.
The model has 2 levels that allow the treatment effects, latent health status, and measurement
process parameters to vary among individuals.
Case Studies We have tailored the bayesian hierarchical model to 3 specific case studies representing
diverse types of questions and data. The children’s pneumonia case is a case-control study at a
single time across 6 countries in Africa and Asia. Its latent health status is a discrete multinomial
variable that can take 1 of 30 different values, indicating which pathogen caused a child’s
infection. The outcome data are complex, comprising presence or absence indicators of each
pathogen in 3 distinct samples from the NP cavity, blood, and, rarely, from the lung. The
prostate cancer and the mental disorder case studies involve multivariate longitudinal outcome
data. In the former, the state variable is a static binary indicator of whether the tumor is
aggressive or indolent. In the mental disorder case, the latent process is multivariate and takes
continuous values. To examine the flexibility of the modeling approach for image data, we
added a project using gel electrophoresis assays to identify patient autoantibody signatures.
Here, the state space is discrete with 2100 = 1.3 × 1030 possible outcomes.
The bayesian hierarchical model is adaptable to these different problems because of a
few key features. First, it is a likelihood-based approach so that, in larger samples, the
44
likelihood dominates the prior distribution for many key parameters such as regression
coefficients. Second, the use of priors allows us to address important substantive questions by
restricting the possible solutions through the choice of priors so that poorly identified models
can be used. For example, absent scientific constraints, the pneumonia etiology model cannot
estimate the frequency of lung infections caused by each pathogen, because the likelihood
itself does not separate these parameters from the sensitivities of the measurements. But prior
laboratory and clinical trials data provide a reasonable range for the assay sensitivities that,
once imposed through the prior assumptions, make the model identifiable. Note, in this case,
that the confidence intervals account for the prior uncertainty, which does not decrease with
sample size. See Wu et al42 for the statistical details and Deloria Knoll et al47 for references to
prior selections. Third, Markov chain Monte Carlo (MCMC) is used to estimate the posterior
distributions of interest so that missing data and complex outcome measurement are not a
computational burden. The bayesian approach also makes model checking relatively
straightforward by comparing observed characteristics of the joint distribution of the observed
data with predictions of those same quantities based on the model. See, for example, Wu et
al42 and Fojo et al46 for practical examples. Finally, posterior distributions are easily understood
by clinicians and patients with a small amount of training. They are easily visualized to
communicate predictions of health status, trajectories, or likely intervention benefits.
Software Despite our original plan to write stand-alone, new software, initial experimentation
revealed that dramatic improvements in available R-based software for MCMC (eg, RJags, Stan,
MCMCglmm) made writing new software a poor investment of resources. Hence, all our
software is written as R packages that were made publicly available as soon as completed
through GitHub, as cited in each article. We have also built an R software repository called
OSLER inHealth in which our software and similar software developed by others can be checked
for software standards and kept current as R and supporting software changes, and can be
more widely disseminated because of easy access to the programs and helpful documentation.
45
OSLER inHealth remains a work in progress. The repository is in place and has a few R
packages. We are now in a position to inform colleagues about its availability and to receive
other contributions. Over the longer term, however, funding for a staff person to support
contributors is needed, as was the case for its progenitors Neuroconductor and Bioconductor.
Fortunately, JHM has built a new Precision Medicine Analytics Platform (PMAP) that it internally
funds and OSLER will become the R repository for internal and external software packages
within PMAP. This may provide the core support for an OSLER manager.
Stakeholders This project was successful in supporting collaborations among clinical and statistical
experts to create the case study methods, software, and applications. Each of the case studies
has produced articles that both advance quantitative methods for clinical research and provide
substantive findings as reflected in the bibliography of published or submitted manuscripts.
Patient stakeholders have also played a critical role in the design and evaluation of the prostate
cancer application. Early versions of the model were critiqued by our patient advisory board
and significant changes were made as a result. Similarly, when the tool was functioning, the
board assisted us in designing the patient- and clinician-facing visualizations. Our early
questions were about whether patients wanted to see predictions directly or wanted their
physicians to be intermediaries. They were clear that they wanted full access to the model
results and to thorough documentation supporting these predictions. They also wanted to
make critical decisions in partnership with their physician. A JHM patient with depression and a
patient with prostate cancer also served on the OSLER inHealth oversight committee that met
once a year.
Influence on Population Health and Patient Care Each case study has generated patient-oriented results that are used by their target
clinical or public health audiences. For example, the prostate cancer software has been
implemented within the JHM EHR as described previously. Clinicians can now access and
consider the model’s risk predictions when they are assisting their patients to decide whether
to continue active surveillance. Future research is to determine whether clinicians and patients
46
derive benefits from using the tool, the critical one of which is whether fewer indolent tumors
are resected or irradiated. The main obstacle to more rapid dissemination of this tool is the
financial model for prostate cancer care. Clinicians in many American settings are rewarded for
providing treatment; there is a smaller financial reward for choosing not to treat. One
implication is that there is no health system funding to scale the prostate tool for regional or
national use. The software has been designed to scale. JHM has built a cloud-based system
whereby the tool could be used by clinicians in their private offices. But the start-up and
curation costs combined with the intervention incentives remain obstacles to the tool
becoming widely available.
Future Directions There are several important next steps to expand the influence of this bayesian
hierarchical model approach for the benefit of patients, some of which are already underway.
First, it is important to scale a tool that addresses a particular unmet need across a larger, more
diverse population of patients and clinicians so that its utility can be scientifically measured,
curation methods can be established to keep the tool current, and a financial model can be
established to support its continuous use. We are currently pursuing this vertical scaling of both
the prostate cancer and children’s pneumonia applications. We think it is equally important to
horizontally scale the approach to address a wider set of unmet clinical needs. To this end, we
have projects underway in autoimmune diseases, sudden cardiac arrest, and diabetes. Our
longer-term goals are to (1) embed a collection of tools to acquire and use the most relevant
information, agnostic to its level of measurement, to improve population and individual health
decisions that cause better outcomes at more affordable costs; and (2) scale the tool-creation
process so that data scientists around the world, in partnerships with population- or patient-
health managers (ie, clinicians) share equal access to the best information for each decision.
47
CONCLUSIONS Bayesian hierarchical models that include dynamic, latent health state; probabilities for
the selection and effects of interventions on those states; and the complex health outcomes
from which the underlying states can be inferred are useful tools to improve population health
or clinical decisions. Such a model combines diverse sources of prior knowledge and data with
evidence about the patient (population) at hand, to predict the patient’s health status,
trajectory, and/or likely benefits of interventions. Visualizations of characteristics of posterior
distributions can be immediately understood by clinicians and patients as relevant to their
decision. When tailored to the particular medical situation, the models summarize complex
information to answer questions of interest to patients, such as, “What is my health status; am I
improving; which treatment is most likely to improve these symptoms?” The next key steps are
to scale these and other similar tools to larger and more diverse populations where they can be
systematically evaluated and to increase the rate at which tools can be developed, tested,
disseminated, and then curated. The most significant obstacle is the lack of a financial model by
which the improvements in health and cost outcomes derived from using these tools are not
returned to support their curation and expand their influence.
48
REFERENCES 1. Washington AE, Lipstein SH. The Patient-Centered Outcomes Research Institute —
promoting better information, decisions, and health. N Engl J Med. 2011;365(15):e31. doi:10.1056/NEJMp1109407
2. Blumenthal D. Stimulating the adoption of health information technology. N Engl J Med. 2009;360(15):1477-1479. doi:10.1056/nejmp0901592
3. Institute of Medicine, National Academy of Engineering. Engineering a Learning Healthcare System: A Look at the Future: Workshop Summary. National Academies Press; 2011.
4. Diggle PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data. Clarendon Press; 2002.
5. Raudenbush S, Byrk A. Hierarchical Linear Models: Applications and Data Analysis Methods. Vol 1. Sage Publications; 2002.
6. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Springer; 2009.
7. Diaz F, Yeh HW, de Leon J. Role of statistical random-effects linear models in personalized medicine. Curr Pharmacogenomics Person Med. 2012;10(1):22-32. doi:10.2174/187569212800166693
8. Goldstein H. Multilevel Statistical Models. Vol 922. John Wiley & Sons; 2011.
9. Silverman M, Murray T, Bryan C. The Quotable Osler. American College of Physicians; 2008.
10. Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press; 2006.
11. Rosen A, Zeger SL. Precision medicine: discovering clinically relevant and mechanistically anchored disease subgroups at scale. J Clin Invest. 2019;129(3):944-945. doi:10.1172/jci126120
12. Prentice RL, Zhao LP. Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics. 1991;47(3):825-839. doi:10.2307/2532642
13. Lauritzen SL. Propagation of probabilities, means, and variances in mixed graphical association models. J Am Stat Assoc. 1992;87(420):1098-1108. doi:10.1080/01621459.1992.10476265
14. Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley; 2002.
49
15. Holland PW. Statistics and causal inference. J Am Stat Assoc. 1986;81(396):945.
16. Rubin DB. Formal mode of statistical inference for causal effects. J Stat Plan Inference. 1990;25(3):279. doi:10.1016/0378-3758(90)90077-8
17. Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Sci. 1999;14(1):29. doi:10.1214/ss/1009211805
18. Pearl J. Causal Inference in statistics: an overview. Stat Surv. 2009;3:96. doi:10.1214/09-ss057
19. Levine OS, Obrien KL, Deloria-Knoll M, et al. The Pneumonia Etiology Research for Child Health Project: a 21st century childhood pneumonia etiology study. Clin Infect Dis. 2012;54(Suppl 2):S93-S101. doi:10.1093/cid/cir1052
20. Black RE, Cousens S, Johnson HL, et al. Global, regional, and national causes of child mortality in 2008: a systematic analysis. Lancet. 2010;375(9730):1969-1987. doi:10.1016/s0140-6736(10)60549-1
21. Liu L, Oza S, Hogan D, et al. Global, regional, and national causes of child mortality in 2000–13, with projections to inform post-2015 priorities: an updated systematic analysis. Lancet. 2015;385(9966):430-440. doi:10.1016/s0140-6736(14)61698-6
22. Scott JAG, Brooks WA, Peiris JM, Holtzman D, Mulholland EK. Pneumonia research to reduce childhood mortality in the developing world. J Clin Invest. 2008;118(4):1291-1300.
23. Murdoch DR, O’Brien KL, Driscoll AJ, Karron RA, Bhat N. Laboratory methods for determining pneumonia etiology in children. Clin Infect Dis. 2012;54(Suppl_2):S146-S152. doi:10.1093/cid/cir1073
24. Carter HB, Walsh PC, Landis P, Epstein JI. Expectant management of nonpalpable prostate cancer with curative intent: preliminary results. J Urol. 2002;167(3):1231-1234. doi:10.1097/00005392-200203000-00006
25. Tosoian JJ, Trock BJ, Landis P, Feng Z, Epstein JI, Partin AW. Active surveillance program for prostate cancer: an update of the Johns Hopkins Experience. J Clin Oncol. 2011;29(16):2185-2190.
26. Howlader N, Noone AM, Krapcho M, et al (eds). Previous version: SEER Cancer Statistics Review, 1975-2011. National Cancer Institute. Posted April 2014. Updated December 17, 2014. Accessed September 30, 2020. https://seer.cancer.gov/archive/csr/1975_2011/
27. Welch HG, Albertsen PC. Prostate cancer diagnosis and treatment after the introduction of prostate-specific antigen screening: 1986-2005. J Natl Cancer Inst. 2009;101(19):1325-1329.
50
28. Cooperberg MR, Broering JM, Carroll PR. Time trends and local variation in primary treatment of localized prostate cancer. J Clin Oncol. 2010; 28(7):1117-1123.
29. Chou R, Croswell JM, Bougatsos C, Blazina I. Screening for prostate cancer: a review of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2011;155(11):762-771.
30. Chou R, Dana T, Bougatsos C, Fu R, Blazina I. Treatments for Localized Prostate Cancer: Systematic Review to Update the 2002 U.S. Preventive Services Task Force. Evidence Syntheses 91. Agency for Healthcare Research and Quality; 2011.
31. Carter HB, Kettermann A, Warlick C, et al. Expectant management of prostate cancer with curative intent: an update on the Johns Hopkins Experience. J Urol. 2007;178(6):2359-2364.
32. Soloway MS, Soloway CT, Williams S. Active surveillance; a reasonable management alternative for patients with prostate cancer: the Miami experience. BJU Int. 2008;101(2):165-169.
33. van As NJ, Norman AR, Thomas K. Predicting the probability of deferred radical treatment for localised prostate cancer managed by active surveillance. Eur Urol. 2008;54(6):1297-1305.
34. van den Bergh RC, Roemeling S, Roobol MJ, et al. Outcomes of men with screen-detected prostate cancer eligible for active surveillance who were managed expectantly. Eur Urol. 2009;55(1):1-8.
35. Klotz L, Zhang L, Lam A. Clinical results of long-term follow-up of a large, active surveillance cohort with localized prostate cancer. J Clin Oncol. 2010;28(1):126-131.
36. Xu J, Zeger S. Joint analysis of longitudinal data comprising repeated measures and times to events. J R Stat Soc Ser C Appl Stat. 2001;50(3):375-387.
37. Goldberg D. The heterogeneity of “major depression.” World Psychiatry. 2011;10(3):226-228.
38. Carter GC, Cantrell RA, Victoria Z, et al. Comprehensive review of factors implicated in the heterogeneity of response in depression. Depress Anxiety. 2012;29(4):340-354.
39. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606-613.
40. Ogburn EL, Zeger SL. Statistical reasoning and methods in epidemiology to promote individualized health: in celebration of the 100th anniversary of the Johns Hopkins Bloomberg School of Public Health. Am J Epidemiol. 2016;183(5):427-434.
51
41. Coley RY, Fisher AJ, Mamawala M, Carter HB, Pienta KJ, Zeger SL. A Bayesian hierarchical model for prediction of latent health states from multiple data sources with application to active surveillance of prostate cancer. Biometrics. 2017;73(2):625-634. doi:10.1111/biom.12577
42. Wu Z, Deloria-Knoll M, Hammitt LL, Zeger SL. Pneumonia Etiology Research for Child Health Core Team. Partially latent class models for case–control studies of childhood pneumonia aetiology. J R Stat Soc Ser C Appl Stat. 2016;65(1);97-114.
43. Wu Z, Deloria-Knoll M, Zeger SL. Nested partially latent class models for dependent binary data; estimating disease etiology. Biostatistics. 2016;18(2):200-213.
44. Wu Z, Casciola-Rosen L, Shah AA, Rosen A, Zeger SL. Estimating auto-antibody signatures to detect autoimmune disease patient subsets. Biostatistics. 2019;20(1):30-47. doi:10.1093/biostatistics/kxx061
45. Wu Z, Casciola-Rosen L, Rosen A, Zeger SL. A Bayesian approach to restricted latent class models for scientifically-structured clustering of multivariate binary outcomes. bioRxiv Preprint posted online August 15, 2018. Accessed September 29, 2020. https://www.biorxiv.org/content/10.1101/400192v3
46. Fojo AT, Musliner KL, Zandi P, Zeger SL. A precision medicine approach for psychiatric disease based on repeated symptom scores. J Psychiatr Res. 2017;95:147-155.
47. Deloria Knoll M, Fu W, Shi Q, et al. Bayesian estimation of pneumonia etiology: epidemiologic considerations and applications to the Pneumonia Etiology Research for Child Health Study. Clin Infect Dis. 2017;64(Suppl_3):S213-S227. doi:10.1093/cid/cix144
48. PERCH Study Group. Causes of severe pneumonia requiring hospital admission in children without HIV infection from Africa and Asia: the PERCH multi-country case-control study. Lancet. 2019;394:757-779. https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30721-4/fulltext
49. Mamawala MM, Rao K, Landis P, et al. Risk prediction tool for grade re-classification in men with favourable-risk prostate cancer on active surveillance. BJU Int. 2017;120(1):25-31.
50. Plummer M, Stukalov A, Denwood M. rjags: Bayesian graphical models using MCMC (version 3-5). 2011. https://cran.r-project.org/web/packages/rjags/index.html
52
ACKNOWLEDGMENTS The authors gratefully acknowledge the support of the people of the United States
whose tax dollars sustain PCORI, which funded this work through a competitive process. We
acknowledge the supportive role that our PCORI staff, ably led by Dr Emily Evans, played in the
initiation, execution, and the reporting of this research. We especially appreciate their
dedication to clear communication and to engagement of clinical and patient stakeholders.
The authors were ably supported by numerous faculty and staff colleagues, including
Darcy Phelan, Ken Fasman, Brionna Hair, Risha Zuckerman, Debra Moffitt, Kara Schoenberg,
Tricia Landis, and Joyclyn Gilmore.
We benefited from the positive research environment in the Department of Biostatistics
chaired by Dr Karen Bandeen-Roche and in the Johns Hopkins Individualized Health Initiative
(Hopkins inHealth), led by Prof Antony Rosen. Hopkins inHealth colleagues Aalok Shah, Dwight
Raum, and members of the Technology Innovation Center were important contributors.
We thank all the co-authors of articles partially supported by this grant. Though not
themselves funded, they generously added their talents to the science presented here.
We are especially appreciative of the time and good counsel offered by our patient
stakeholders: William Wilson, William Lewis, and Peter Johnson. They educated us about what
patients need and want.
Our Scientific Advisory Board kept us moving in the right direction. These colleagues—
Patrick Heagerty, Francesca Dominici, Martin Morgan, Roger Peng, and Vince Carey—met with
us each year and enriched the work by generously sharing their intellects and vast experience.
We are most grateful.
The group that worked together on this project have moved ahead in their careers. The
postdoctoral fellows Zhenke Wu, Yates Coley, and Todd Fojo are all now assistant professors or
the equivalent at top institutions. Two of our midlevel collaborators became full professors
during this project, another became a director for the World Health Organization, and yet
53
another was awarded a named professorship. We like to think that the collaboration
contributed useful research results as well as new insights for those who enthusiastically
invested themselves in this endeavor.
54
Copyright © 2020. Johns Hopkins Bloomberg School of Public Health. All Rights Reserved.
Disclaimer:
The [views, statements, opinions] presented in this report are solely the responsibility of
the author(s) and do not necessarily represent the views of the Patient-Centered
Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology
Committee.
Acknowledgment:
Research reported in this report was funded through a Patient-Centered Outcomes
Research Institute® (PCORI®) Award (#ME-1408-20318). Further information available
at: https://www.pcori.org/research-results/2015/using-bayesian-approach-predict-
patients-health-and-response-treatment