FINAL RESEARCH REPORT - PCORI

PATIENT-CENTERED OUTCOMES RESEARCH INSTITUTE FINAL RESEARCH REPORT

Using a Bayesian Approach to Predict Patients’ Health and Response to Treatment Scott L. Zeger, PhD1; Zhenke Wu, PhD2; Yates Coley, PhD3; Anthony Todd Fojo, MD, PhD4; Bal Carter, MD4; Katherine O’Brien, MD1;

Peter Zandi, PhD1,4; Mary Cooke, DHA5; Vince Carey, PhD6; Ciprian Crainiceanu, PhD1; John Muscelli, PhD1; Adrian Gherman, MSE1;

Jason Mekosh, MBA5

AFFILIATIONS: 1Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 2School of Public Health, University of Michigan, Ann Arbor 3Kaiser Permanente Washington Health Research Institute, Seattle 4The Johns Hopkins University School of Medicine, Baltimore, Maryland 5Johns Hopkins HealthCare, Baltimore, Maryland 6School of Medicine, Harvard University, Boston, Massachusetts

Original Project Title: Bayesian Hierarchical Models for the Design and Analysis of Studies to Individualize Healthcare PCORI ID: ME-1408-20318 HSRProj ID: HSRP20153591

_______________________________ To cite this document, please use: Zeger SL, Wu Z, Coley Y, et al. (2020). Using a Bayesian Approach to Predict Patients’ Health and Response to Treatment. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/09.2020.ME.140820318

2

TABLE OF CONTENTS ABSTRACT .............................................................................................................................. 4

BACKGROUND ........................................................................................................................ 5

PATIENT AND STAKEHOLDER ENGAGEMENT ........................................................................... 7

METHODS ............................................................................................................................... 8

Research Design ......................................................................................................................... 8

Table 1. Applications of Bayesian Hierarchical Model for Individualized Health to 3 Case Studies ..................................................................................................................... 9

Case Studies ............................................................................................................................. 13

OSLER inHealth ........................................................................................................................ 18

RESULTS ................................................................................................................................ 19

Aim 1 ........................................................................................................................................ 19

Figure 1. Graphical Representation of Person-Time Levels of Bayesian Hierarchical Model for Individualizing Health Showing State Trajectory (η)a for a Single Individual (i) ........................................................................................................... 20

Figure 2. Graphical Representation of a Population (n = 4) Showing the Population Level in Which the Individual Specific Parameters Are Assumed to be Independent Realizations From a Distribution (b, q, f, η | X)a ........................................ 21

Figure 3. Decomposition Results for Gel Electrophoresis Assay Data .............................. 23

Aim 2 ........................................................................................................................................ 24

Figure 4. Graphical Displays of the Estimated Population and Individual Etiologies Using Data From a Site of the PERCH Study ..................................................................... 25

Figure 5. Alternative Analytic Approaches Used for Determining Pneumonia Etiology ............................................................................................................................. 27

Figure 6. Estimating the Etiologic Fraction From a Study With 2 Types of Measurements, 1 With Control Data, and Accounting for Imperfect Sensitivity of Both Measurements: The PIA Methoda ............................................................................ 28

Figure 7. All Site Etiology Results Among HIV-Uninfected Patients by CXR Findings: CXR+ Patients vs All Patients ............................................................................ 30

Figure 8. Directed Acyclic Graphs Describing the Relationships Between Latent Class and Clinical Outcomesa ............................................................................................ 32

Table 2. Model Summary With Priors Used for Application to Johns Hopkins Active Surveillance Dataa .................................................................................................. 33

Figure 9. CONSORT Diagram for Johns Hopkins Active Surveillance Prospective Cohort Patients Included in This Analysisa ........................................................................ 34

3

Figure 10. Predictive Accuracy of Out-of-Sample Predictions of η Among Patients With η Observed ............................................................................................................... 36

Figure 11. Screen Shot From JHM EHR Showing the Model-Estimated Risk That a Patient’s Prostate Tumor Falls Into Each of the Gleason Classes Indicated by the Colors in the Pie Chart on the Lefta .................................................................................. 38

Figure 12. Example Predictions of PANSS Symptom Scores ............................................. 40

Aim 3 ........................................................................................................................................ 41

DISCUSSION ........................................................................................................................... 43

Bayesian Hierarchical Model ................................................................................................... 43

Case Studies ............................................................................................................................. 43

Software .................................................................................................................................. 44

Stakeholders ............................................................................................................................ 45

Influence on Population Health and Patient Care ................................................................... 45

Future Directions ..................................................................................................................... 46

CONCLUSIONS ....................................................................................................................... 47

REFERENCES .......................................................................................................................... 48

ACKNOWLEDGMENTS ............................................................................................................ 52

4

ABSTRACT The PCORI mission is to address questions about health care from the patients’ perspective, such as “What is my health status and its trajectory?” and “What are my treatment options and the expected benefits and harms of each?” The purpose of this PCORI-funded project is to make it easier for clinicians and patients to find valid answers to these and other clinical questions by using modern digital tools that support (1) learning from the experience of prior patients, and (2) translating what is learned to inform the decision at hand, taking into account each patient’s unique circumstances. For this project, we developed and implemented statistical methods called bayesian hierarchical models that combine existing data on past clinical experience from a reference population with new measurements for the individual. Clinicians currently use such methods when screening patients for disease. Modern technologies make it possible for this proven approach to extend far beyond its current use. The recent revolution in information technology has unleashed new types of health data, from DNA sequences to functional images of the brain to patient-reported outcomes. Furthermore, the electronic health record captures every patient’s sequence of health measurements, diagnoses, and treatments. The bayesian methods developed and reported on here combine even complex data to produce predictions about an individual patient’s health status, trajectory, and likely benefits and harms of interventions.

In addition to developing novel methods, we facilitated their use by creating and locally disseminating a software package, OSLER inHealth, that will allow other researchers to apply this methodology. The software repository is open-source and includes the methodology developed as part of this research as well as other existing methods that facilitate individualized health prediction.

We have tested the proposed methods and software on 3 case studies to (1) estimate the frequency with which various pathogens cause children’s pneumonia and predict which pathogen is likely to be causing a particular child’s pneumonia given her or his clinical data, potentially reducing unnecessary use of antibiotics; (2) infer whether a prostate cancer is indolent or aggressive for a patient under active surveillance; and (3) characterize the variation in multiple, time-varying symptoms of major mental disorders, including schizophrenia and depression, and then use this knowledge to provide patient-specific estimates of past and, likely, future trajectories.

With this project, we have developed and demonstrated the value of combining even complex measurements on a population of patients, then translating this experience into more valid assessments of a new patient’s health status and trajectory. The model also supports inferences about the likely benefits and harms associated with available interventions.

5

BACKGROUND The electronic health record (EHR) now captures more information about patients’

characteristics and changes in health over time.1-3 Fuller use of these data could improve

diagnostic accuracy and prediction of treatment effects, but standard approaches for analyzing

clinical data are not adequate for this purpose. To make the best use of newly available data,

statistical methods must account for longitudinal data, informative sampling and missing data,

heterogeneous treatment effects, and the need to combine expert knowledge with empirical

knowledge.

Our work focuses on developing and implementing more powerful and flexible

statistical frameworks and software that can generate more accurate information that can be

incorporated into tools to help patients and clinicians make better decisions. Specifically, we

designed and implemented bayesian hierarchical models for diverse types of emerging

longitudinal data to answer patient-centered research questions. We use hierarchical (or

“multilevel”) statistical models as a basic framework because they can address each of the

complexities that arise with the EHR data.

The main innovation of this research was to design and implement bayesian hierarchical

models for diverse types of emerging longitudinal data to answer patient-centered research

questions. We use hierarchical (or “multilevel”) statistical models as a basic framework because

they represent levels of variation, for example, time within an individual within a population,

and produce point and interval predictions for individuals.4,5,6,7,8,-9 A bayesian approach is used

for 2 reasons: (1) to explicitly include prior medical knowledge about treatment effects or other

parameters using prior distributions, and (2) to communicate findings in terms of probabilities

that clinicians are accustomed to using. Three case studies on pneumonia etiology in children,

prostate cancer, and major mental disorders, respectively, were chosen to provide different

challenges to the model development.

The specific aims of this research are as follows:

6

1. To develop a bayesian hierarchical model for possibly multivariate longitudinal data that predict the health status, trajectory, and likely intervention effects for each member of a clinical population. The modeling approach comprises the following components: the effects of exogenous (eg, age, clinical history) and endogenous (eg, current treatment) variables on the individual’s health status; multivariate health measurements from which health status can reasonably be inferred; the effects of health measurements at one time on subsequent interventions; and the embedding of the individual within a reference clinical population of persons with similar values for measured covariates.

2. To build, test, and refine the model in 3 case studies representing diverse clinical challenges: (1) predicting the probable causes for childhood pneumonia; (2) diagnosing and evaluating treatments of prostate disease; and (3) quantifying the sources of variation in patient-reported measures of major mental disorders, including depression and schizophrenia; and to create preliminary designs of decision-support tools for each clinical application in collaboration with the area-specific stakeholder.

3. To implement the statistical methods in open-source R-packages in a repository named Open-Source Learning Environment for Research on Individualized Health (OSLER inHealth). In this proof-of-concept study, we developed and implemented novel statistical models that address key PCORI questions. The next key steps are to scale these and other similar tools to larger and more diverse populations in which they can be systematically evaluated and then used to improve health outcomes at more affordable costs.

7

PATIENT AND STAKEHOLDER ENGAGEMENT This project included several clinical stakeholders to ensure it was focused appropriately

on patients and clinicians. Each of the 3 case studies included 1 or more clinical stakeholders as

a member of the research team: Dr Kate O’Brien for children’s pneumonia; Drs Todd Fojo and

Peter Zandi for mental disorders; and Dr Bal Carter for prostate cancer. We also included Dr

Mary Cooke, vice president of Johns Hopkins HealthCare to represent administrative leadership

at Johns Hopkins Medicine (JHM), an early consumer of the tools developed. Dr Cooke leads

JHM health plans with approximately 40 000 members and is charged with improving the

design of health care for all of the 500 000 covered lives.

Each stakeholder was involved in the planning and execution of this research program.

During the study, the stakeholders provided their expert knowledge about the disease and

measurement processes and also shared their clinical data. They participated in the

specification and refinement of the model. In addition, 3 patients with prostate cancer

participated in semiannual meetings to review design plans and early versions for our decision-

support tool. The clinician stakeholders met with the study team in person at least once per

month or more frequently when needed.

As part of the organizational structure of the study, an OSLER inHealth Steering

Committee provided guidance to the study team about the software issues. The committee was

chaired by Dr Vince Carey; members of the committee included Dr Martin Morgan (current

leader of the Bioconductor software project, senior staff scientist, Fred Hutchinson Cancer

Research Center); Dr Francesca Dominici (professor of biostatistics, vice senior associate dean

for Research, Harvard School of Public Health); Dr Roger Peng (associate professor of

biostatistics, Johns Hopkins School of Public Health); and Dr Patrick Heagerty (professor and

chair, Department of Biostatistics, University of Washington). The Steering Committee met at

least once per year via teleconference or internet meetings or in person.

8

METHODS

Research Design

Hierarchical Statistical Models

For this project, we developed and applied bayesian hierarchical statistical models with

2 levels—time within person and persons within population—to represent the key components

necessary to learn about the level and trajectory of a disease and to make well-informed health

decisions.10 The following components were included.

Health status. In most problems, an individual’s true health status (call it η) cannot be

precisely measured but is instead reflected in clinical measurements, Y. For example,

pathogens infecting a child’s lung cannot be directly observed, so their presence or absence in

samples from the nose and throat is observed instead. The conditional distribution of these

observations, given the actual health status (here, lung infection), is the first component of our

model.

Mechanistic effects of covariates and interventions on health status. Effects of

covariates (X, exogenous; R, endogenous) on health status is the second component. We use

standard generalized linear models to describe the conditional distribution of the health status

given the individual’s covariate values at each time. We allow these covariate effects to vary

across individuals to account for heterogeneous treatment effects. A special case is to assume

that more homogeneous subgroups of people exist and that interventions can be tailored to

the characteristics of the subgroup. Our combined scientific and clinical goal is to define

“clinically relevant and mechanistically anchored” subgroups.11

Treatment decisions with feedback. To learn about the efficacy and safety of an

intervention, the third component of the model is a regression of the intervention assignment

process on previous health measures. For example, in the mental disorders case study, one

might anticipate that the choice of therapy depends on previous measurements of depressive

symptoms.

9

Embedding the individual within a reference population. The final component of

our model is a second level that describes the variation in the individual-specific model terms

across the population.

Case Studies Dictate Model Development

As listed in Table 1, the 3 case studies on pneumonia etiology in children, prostate

cancer, and major mental disorders were chosen to provide different challenges to the model

development. For each case study, we specified and computed an initial model and worked

with the stakeholder to refine it.

When this project began, Johns Hopkins EHR data could not be readily accessed for

research. Therefore, another criterion for the case studies is that the data were ready for

modeling.

Table 1. Applications of Bayesian Hierarchical Model for Individualized Health to 3 Case Studies

Medical challenge Stakeholder Target of inference Data

I Diagnosing viral vs bacterial childhood pneumonia

Kate O’Brien, professor of international health

Health status (aim 1A)

PERCH Study: 5000 cases; 5000 control participants

II Evaluating efficacy and safety of active surveillance as alternative to surgery for prostate cancer

Bal Carter, professor of urology

Health status, trajectory; causal inference about treatment choice (aims 1A and 1B)

Brady Institute Active Surveillance Study: 1300 patients with prostate cancer at Johns Hopkins

III Diagnosis and evaluation of therapies for schizophrenia or depression

Peter Zandi, associate professor of psychiatry

Health status, trajectory; causal inference: entire model (aims 1A, 1B, 1C)

Janssen Schizophrenia Trial data; NNDC depression data

Abbreviations: NNDC, National Network of Depression Centers; PERCH, Pneumonia Etiology Research for Child

Health.

10

Model Specification

The framework in Figures 1 and 2 is general. We implement the model using the

following additional specifications.

Time. We have narrowed the focus of this project to discrete, rather than continuous,

time. By discrete time, we mean daily, monthly, or annual measures; continuous time allows

measures anywhere. There are 2 main reasons for this decision. First, continuous time

processes are tractable mainly for Gaussian processes, whereas continuous covariance

functions are natural. Continuous-time models do not easily accommodate non-Gaussian

outcomes such as categorical, count, or repeated event times, which are common in medical

research. Second, most continuous-time models can be closely approximated by a discrete

time.

Health status. We allow the latent health status to be represented by multiple

variables that can be a mixture of discrete and continuous variables.12,13

Measurements. We assume that the measured health outcomes (Ys), given the true

underlying health status (η) are distributed according to the exponential family of distributions

(including Gaussian, binomial, Poisson, γ, and others). Where measurement error may depend

on external covariates, generalized linear models are used.

Missing data. This model naturally handles missing data by treating them in the same

way as other latent variables. At each iteration of the estimation algorithm, the missing values

are randomly imputed. The full data set is then used to simulate parameters and latent

variables. Conditioned on the model, the inferences are not substantially affected by missing

data unless the unobserved value is what caused the missingness (“nonignorable

missingness”).14

Plug-and-play extensibility. A feature of this hierarchical model design is the possible

use of conditional independence of components. This allows the modeler to change parts of the

11

model while leaving the rest intact. For example, one health status measurement model can be

substituted for another without changing the rest of the structure.

Model Computation

Given the dramatic advances in bayesian computing packages, such as JAGS

(http://mcmc-jags.sourceforge.net/) and Stan (https://mc-stan.org/), we changed our plans to

implement our longitudinal models in R using the existing computational software rather than

writing original software.

Model Deliverables

This section briefly summarizes the key outputs from the bayesian hierarchical model

we developed. The inputs are the predictor variables (X and R), health outcome measurements

(Y), a prior distribution for the unknown health status (η), and the model structure that ties

the observations to the unknowns. The major outputs are listed in the following paragraphs and

illustrated in more detail for each of the 3 case studies.

Predictions of individual’s health status. The model produces an estimate of the

posterior distribution of the health status η!" for individual )at every time +. For example, in the

pneumonia etiology case study, the model produces the probability that the lung infection is

caused by each of the candidate pathogens given the observed measurements and the

estimated population pathogen frequencies.

Predictions of individual’s health trajectory. The model produces an estimate of

the posterior distribution of the trend in health status for each value of the predictor variables

that can include the treatment. For instance, in the major mental disorders example, the model

calculates the risk for a particular patient’s depression to worsen (negative slope) based on his

history of Patient Health Questionnaire-9 (PHQ-9) scores and covariates.

Estimates of treatment effects. The model produces an estimate of the marginal

distribution of the regression coefficients; each coefficient measures how the outcome, health

12

status, is associated with its predictor variable (R,X). For example, if health status is a binary

latent variable representing presence or absence of a disease and R is an indicator of whether

the patient has received intervention B rather than alternative A, then, a model output is the

estimated posterior distribution for the relative risk of disease for a person receiving B as

compared with another person with the same X value receiving A. The assumptions required to

interpret this quantity as a causal effect are well known.15-18

Measuring heterogeneity of treatment effects and predictions of individual

treatment effects. This modeling approach naturally accommodates heterogeneity of

intervention effects through both fixed and random effects. As with any regression, the user

can include interactions between exogenous covariates (X), for example, genetic markers, and

the intervention indicator variables (R). Here, the regression coefficients for the interaction

terms estimate the differences in intervention effects across the levels of the interacting

variables. In addition, this hierarchical model includes a distribution of intervention effects

across the population. With substantial amounts of information for each individual or with prior

knowledge about the variance of the intervention effect coefficients across the population, the

model produces the posterior distribution for an individual’s treatment effect under each

intervention.

Model Refinement

Sensitivity analyses. For a statistical model to be relied upon by clinician scientists

and practitioners, these users must develop an understanding of its main ideas. Trust was built

by making the practitioners full partners in the design of the models. In addition, “kicking tires,”

(ie, repeated testing of the sensitivity of results to varying assumptions and data) was used. Our

design of software has attempted to make this easier for statistical users.

Testing with stakeholders. In each case study, we met approximately weekly with

our clinical colleagues and their staff. Part of each meeting was dedicated to sensitivity analyses

to improve the model performance and clinical utility.

13

Model Evaluation

Statistical evaluation. Each model was evaluated in 2 ways. First, we used 10-fold

cross-validation to estimate the accuracy and precision of the predictors relative to

competitors. The second method was to simulate data sets from a known distribution like the

one estimated from the case study data and to calculate the statistical performance (ie, bias,

variance, mean squared error) of the model.

Clinical evaluation. The refinement process we have described included a qualitative

user evaluation of whether a tool was ready to test clinically. It was beyond the scope of this

project to complete this clinical evaluation that, in the future, will include 2 phases. In phase 1,

we will use the tool with a representative sample of clinicians and patients, then administer a

questionnaire to measure their opinions about the value added or subtracted by its use. In

phase 2, we will randomly assign clinicians and patients to use the tool or not and measure

clinical end points. For example, in the prostate cancer trial, we would hypothesize that our

decision-support tool would reduce the number of prostatectomies that remove indolent

tumors without increasing the rate of metastases.

Case Studies

The general model was tailored to address clinical research questions within the 3

collaborations: childhood pneumonia, prostate cancer, and major mental disorders, listed in

order of their maturity (most to least) in Table 1. In the following sections, we discuss each in

more detail, illustrating the scientific problem, how the general model applies, and what clinical

outcomes are anticipated.

Childhood Pneumonia Case Study: Collaboration With Pneumonia Etiology Research for Child Health Study

The Pneumonia Etiology Research for Child Health (PERCH) study, initiated in 2009, is a

large, multinational, case-control study of severe pneumonia in hospitalized children aged <5

years. Seven PERCH sites from South Asia and sub-Saharan Africa have been selected for the

study because they represent areas where most of the severe pneumonia cases in children

14

occurred in 2015 and where key interventions are already in place.19 Kate O’Brien, PERCH

principal investigator (PI), was a funded stakeholder on this project. To improve on previous

laboratory techniques that have remained largely unchanged for more than a century, her team

is applying modern diagnostic tools and standardized methods in the hope of contributing to

new, precise information about the cause of each pneumonia case and, ultimately, to guide the

development of new vaccines and treatments.

Scientific background and study aims. Pneumonia is the leading cause of global

childhood deaths, accounting for almost 1 in 5 childhood deaths in 2010.20,21 Pneumonia is a

syndrome associated with infection of the lung tissue, which can be caused by microorganisms

of >30 different species, including bacteria, viruses, mycobacteria, and fungi, among which only

a few are likely to have infected each patient by the time of hospitalization.22 Knowing which

pathogen has caused a pneumonia case is crucial for choosing effective treatment. For

example, antibiotics are ineffective for treating viral infections. The strategy for direct

pneumonia treatment and prevention efforts is also complicated by various epidemiologic and

microbiologic factors.19

In the PERCH study, approximately 5000 cases and 5000 control participants were

enrolled and specimens from both groups were tested by a comprehensive array of laboratory

measurements with differing precisions.23 These specimens are collected from the lungs and

peripheral body fluids, including the blood, nasopharyngeal (NP) cavity, pleural fluids, and

induced sputum. Direct sampling from the lungs (lung aspirates) of patients serves as a “gold

standard” measurement of the pathogens in the lung; that is, the test has nearly perfect

specificity and sensitivity. Culturing bacteria from blood samples gives a “silver standard”

measurement assumed to be perfectly specific but imperfectly sensitive. Obtaining lung

aspirate samples is painful for the patient and uncommon in resource-limited settings, so only

some case patients in the PERCH study had these collected in response to clinical needs,

whereas all case patients had blood samples collected. Finally, polymerase chain reaction (PCR)

evaluation of bacteria and viruses from NP samples are a “bronze standard” because they have

15

imperfect sensitivity and specificity. NP samples were taken from all case and control

participants and tested by PCR.

Our research addressed 2 biomedical questions:

1. What is the frequency with which each pathogen on a prespecified list causes clinical pneumonia in the population of infected children?

2. What is the probability that a child with clinical pneumonia has ≥1 particular pathogens infecting the lung given the child’s specimen measurements and other characteristics like age and disease severity?

We developed original methods called partial latent class models (PLCMs) and nested

versions (nPLCMs) that can estimate the etiology of any disease from multiple types of

measurements, regress the etiology distribution on covariates, and produce a patient-specific

probability distribution for each potential cause.

Prostate Cancer Case Study: Collaboration With Brady Institute of Urology

Through our collaboration with stakeholder Dr H. Ballentine Carter, the director of Adult

Urology at the Johns Hopkins School of Medicine and PI of the Active Surveillance Program

within the Department of Urology, we have access to longitudinal data on 1300 men who

elected to follow active surveillance upon receiving their initial prostate cancer diagnosis.24,25

Scientific background and aims. Prostate cancer is the most commonly diagnosed

nonskin cancer in men in the United States and has a lifetime risk of diagnosis of 15% and

lifetime risk of death of 2.7%.26 Upon diagnosis, early curative treatment with surgery,

radiation, or androgen deprivation therapy is common.27 In particular, nearly half of men with

biopsy-detected localized prostate cancer receive prostatectomy, whereas only 6.8% choose

surveillance.28 Curative interventions can be physically, emotionally, and financially taxing for

patients. In particular, 1-month mortality after surgery is as high as 0.5%, and at least 20% to

30% of men experience urinary incontinence and/or erectile dysfunction after surgery or

radiotherapy.29,30

16

There is also evidence that treatment is not always proportionate to risk; patterns of

both overtreatment of low-risk disease and undertreatment of high-risk disease have been

identified.28 To this point, the risks and benefits of treatment vary for patients depending on

the severity of their cancer. Specifically, men whose cancer would never become symptomatic

have no potential to benefit from treatment.

Despite the risks associated with overtreatment, patients and doctors may often choose

early treatment because of uncertainty in the initial diagnosis and, more specifically, the

inability of existing biopsy techniques to distinguish with certainty between cancers that will

remain indolent and those that are, or will become, life-threatening. Prostate biopsy specimens

are only informative about the biopsied tissue; features of nonbiopsied tissues, such as regional

lymph nodes, remain unobserved. As a result, doctors and patients must make treatment

decisions in the face of this uncertainty.

Active surveillance with curative intent offers an alternative to early treatment for

individuals with lower-risk disease detected.25,31-35 Though active surveillance regimens vary,

the approach generally entails regular biopsies (eg, annually) with curative intervention

recommended on disease reclassification. Although a primary concern for active surveillance is

the potential for delaying life-saving treatment, a low risk of prostate cancer–specific mortality

has been observed in several active surveillance studies. Correctly identifying patients at low

risk and who, therefore, would benefit from active surveillance could reduce overtreatment,

thus reducing the risk of complications and adverse effects from treatment, as well as financial

burden, for patients.

In this context, men with a recent diagnosis of lower-risk prostate cancer want to know

whether they have a lethal cancer and, for each of their intervention options, what their

expected quality and length of life is likely to be. The treatment options are continued active

surveillance, radiation treatment, or prostatectomy. For men who choose active surveillance,

they want to learn whether the frequency of (painful) biopsies could be safely reduced.

17

Methodologically, we developed and applied hierarchical models that incorporate

measurement error in cancer-state determinations on the basis of biopsied tissue, clinical

measurements possibly not missing at random, and informative partial observation of the true

state.

Major Mental Disorder Case Study

In this case study, access to data from the major clinical partner, the National Network

of Depression Centers (NNDC), was delayed nearly 2 years as this new organization became

established. Therefore, we developed our bayesian hierarchical models on the basis of directly

analogous schizophrenia data obtained from a Janssen Pharmaceutical clinical trial that we had

previously used in model development.36 Once JHM depression data became available, we

applied our methods to longitudinal measurements of depression, mania, and anxiety

symptoms. Dr Peter Zandi, a member of the NNDC data acquisition and analysis team, was a

funded stakeholder on this proposal.

Scientific background. Depression is a complex disease with heterogeneous etiology,

phenotypes, and treatments.37 Depression is also associated with significant psychiatric (eg,

anxiety, mania, panic) and general comorbidities, including cardiovascular disease, stroke, and

dementia. The heterogeneity of response to treatment makes the choices of first-line and later

therapies more difficult.38

Scientific aims. To support NNDC in their mission, a first key step was to build a

statistical understanding about sources of variation in the NNDC-selected, patient-reported

measure of depression, the PHQ-9, a general instrument for screening, diagnosing, monitoring,

and measuring severity of depression.39 To this end, we built a multivariate longitudinal data

model for the depression, anxiety, and mania data to estimate a patient’s level and trajectory

of symptoms and to predict them into the future, given the covariate profile.

Methodologically, we developed methods to jointly predict the trajectory of a patient’s

mental health status as measured by multiple outcomes and the effect of this trajectory on the

18

risk of key clinical events. Our methods accommodate substantial missing data and irregular

observation times.

OSLER inHealth

The overarching goal of OSLER inHealth (https://oslerinhealth.org/) is to provide an R-

based environment comprising software tools to support the visualization and analysis of

health data to better inform clinical decisions. For many health decisions, the intelligent

acquisition and use of data improves the chance of a successful outcome. The relevant

information is longitudinal and increasingly complex, now including digitalized images, DNA

sequences, novel biomarkers, and multivariate time series from wearable devices, in addition

to more traditional clinical indicators of phenotype. EHRs have made it possible to acquire and

manage health information more effectively. They also enable Boolean-style (ie, “if, then, else”)

decisions. For example, if a newly recorded laboratory value is above a particular level, an EHR

can automatically signal a clinician to inquire further, perhaps by scheduling a follow-up visit.

But in today’s information-rich environment, there is a heightened need to define,

measure, and track health status; integrate traditional with more complex health measures;

and develop and use appropriate tools for analysis. For an EHR to maximally benefit patients, it

must be a component in a system that integrates the relevant evidence to build, test, and

continuously refine mechanistic or empirical (statistical) models that evaluate and

communicate the evidence from the available data to the point of care where health decisions

are made.

OSLER inHealth, like Bioconductor (https://www.bioconductor.org/), must operate at

the interface of statistical and biomedical science. We intend for it to be used by professional

data scientists, by their quantitatively oriented biomedical colleagues, and by students from

both groups. However, unlike Bioconductor, the main consumers of the OSLER inHealth output

are nontechnical persons making health decisions. Hence, it must also support effective

communication of the questions, data, and findings to health experts and their clients/patients.

19

RESULTS

Aim 1

To develop a bayesian hierarchical model for longitudinal data that predicts the health

status, trajectory, and intervention effects for each member of a clinical population.

We have used a hierarchical statistical model with 2 levels—time within person and

persons within a population—to represent the key factors relevant to making medical

decisions.10 Details on the model formulation are provided by Ogburn and Zeger.40 Details of its

implementation are given in the papers specific to each of the case studies.41-46

The model for an individual over time is pictured in Figure 1. The health status trajectory

is represented by the temporal sequence of latent variables, here called η. For example, η can

represent the pathogenic species infecting a child’s lung or the cancer state of the prostate over

time. Specification of this part of our model has 3 major components.

Health Status

In many medical applications, the true health status cannot be directly or precisely

measured, but it can be inferred from measurements that we denote Y in Figure 1. For

example, the child’s lung cannot be directly sampled, so blood and the NP cavity are sampled

instead. The conditional distribution [Y|η,ϕ] of the observations, given the true health status,

is indexed by unknown parameters ϕ that represent errors in measurements.

Mechanistic Effects of Covariates and Interventions on Health Status

Effect of covariates (X, exogenous; R, endogenous) on health status η is represented by

the conditional distribution [η|X, R; β, δ]. The β parameters represent the intervention effects;

the unobserved δ parameters are latent indicators of possible classes of disease states or

trajectories or responses to treatment. The interaction effects of either observed (X) or latent

subgroup indicators (δ) with the treatment indicators (R) cause heterogeneity in treatment

effects across the population.

20

Treatment Decisions With Feedback

To estimate the efficacy or safety of a treatment, one must understand the intervention

assignment process [R|Y,X], especially its dependence on prior health measures.

The full model is completed by embedding the multiple measurements for an individual

(Figure 1) within a reference population (ie, to model the variation in the individual-specific

model terms across the population). This embedding is shown in Figure 2, where parameters

and latent variables for an individual can depend on covariates X in the distribution

[β, θ, ϕ, η|X] as discussed by Ogburn and Zeger.40

Figure 1. Graphical Representation of Person-Time Levels of Bayesian Hierarchical Model for Individualizing Health Showing State Trajectory (η)a for a Single Individual (i)

aHealth status (η) is affected by exogenous (X) and endogenous (R) covariates through person-specific regression

coefficients β! and expressed in observed outcomes $"# via a model with parameters ϕ". Trajectories can usefully

be partitioned into subgroups represented by δ".

21

Figure 2. Graphical Representation of a Population (n = 4) Showing the Population Level in Which the Individual Specific Parameters Are Assumed to be Independent Realizations From

a Distribution (b, q, f, η | X)a

aIn this model, the treatment effect or measurement or trajectory heterogeneity can vary across people in a

manner that depends on covariates X. This population-level variation also implies that the best estimates of the

health status for individuals rely not only on their data but also on data for otherwise similar individuals in their

reference population.

Complex Latent States

For the 3 case studies, the health status variable η is relatively low dimensional: an

indicator for 1 of 30 lung pathogens; an indolent or aggressive prostate tumor; or levels of

symptoms for depression, anxiety, and mania. However, there are applications, for example, in

image analysis, where the dimension of η is large enough to require specialized approaches.

22

This project investigated an example in which we observed the presence or absence of

hundreds of proteins indicative of an autoantibody “signature” in a patient with autoimmune

disease that we expect might be predictive of the disease trajectory. Wu et al44 introduce the

problem and provide methods for preprocessing the image data generated by a gel

electrophoresis assay. In another publication, Wu et al45 reported on their development of a

class of restricted latent class models (RLCMs) that favors sparse distributions for high-

dimensional latent classes.

Figure 3, taken from Wu et al,45 shows our proposed solution for the autoantibody

problem. The original data, after preprocessing, are shown in the matrix on the far left of the

figure. Each person’s signature forms a row. The color is blue if the protein is present and

yellow if not. The proposed RLCM solution is shown in the 2 matrices on the right. On the far

right, the matrix has the same number of columns (ie, proteins; here, n = 50 proteins) as the

original data (left) but only 7 rows, 1 for each estimated “machine” defined as a combination of

proteins that could be targeted by a single immune autoantibody. The middle matrix indicates

whether a person’s immune system responded to each of the machines using the same color

convention. Here, however, the degree of blue indicates the posterior probability that the

individual’s immune system targeted that machine. This method is designed to find a relatively

small number of machines, each of which has a sparse (ie, low number) set of proteins. Details

about how this variable reduction method can be used in practice are provided by Wu et al.45

23

Figure 3. Decomposition Results for Gel Electrophoresis Assay Data

Note: Left panel: Aligned data matrix for band presence or absence; row for 76 serum lanes, reordered into

optimal estimated clusters separated by gray horizontal lines “—–“; columns for L = 50 protein landmarks. A blue

vertical line “|” indicates a protein’s presence at that molecular weight. Middle panel: lane-machine matrix for the

probability of a lane (ie, serum sample) having a particular machine. The blue cells correspond to a high probability

of having a machine in that column. Smaller probabilities are shown in lighter blue. Right panel: Estimated machine

profiles. Here, 7 estimated machines are shown, each with component proteins shown by a blue bar “|”.

From Wu et al. bioRxiv. Posted September 21, 2020. The copyright holder for this preprint is the author/funder,

who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0

International license. https://www.biorxiv.org/content/10.1101/400192v3

Limitations

Among the many limitations of the hierarchical models discussed, we propose to focus

on 2. First, these models are entirely parametric, meaning that the entire distribution of the

observations must be specified. We tend to choose distributions that simplify computations

without increasing prediction variance or bias. Extensions of these methods to include

nonparametric or more flexible parametric models are of future interest. The second limitation

is that the image analysis example was lower dimensional (76 × 50 in Figure 3) than many

24

interesting problems. Our computational approaches must be improved to apply our approach

to neuroimage or genomic data.

Aim 2

To iteratively build, test, and refine the model in 3 case studies.

Diagnosing Viral vs Bacterial Childhood Pneumonia

The primary goal for this case study was estimating the rates with which 30 different

pathogens (ie, viruses, bacteria, and fungi) cause children’s pneumonia at 7 sites from Africa

and Asia. These estimates of pneumonia etiology can guide investments by governments and

nongovernmental organizations in prevention and intervention strategies. In addition, the

methods allow clinicians to predict the cause of a particular child’s pneumonia by integrating

measurements from the child’s blood, sputum, and/or NP cavities with current estimates of

population rates.

This project has funded the development, implementation, and application of this

bayesian hierarchical modeling approach and multiple extensions using the PERCH study data

and has enabled PERCH to more precisely achieve its aims. After a brief review of the method’s

products, we focus on the translation of the model into practice within the infectious disease

community. Details are available in the methods paper by Deloria Knoll et al.47 The

methodologic details are provided in 2 papers by Wu et al42,43 and in a recent technical report

also by Wu et al.45

Figure 4 shows the application of our bayesian hierarchical model to the problem of

estimating the etiology of children’s pneumonia. The left panel shows the posterior (and prior)

distributions of the fraction of pneumonia cases caused by viruses. The panels in the center and

on the right show the posterior distributions for 2 children, both with multiple detected

pathogens in the NP cavity (shown by the sequence of 0 or 1 values in black) and with no

pathogens detected in the blood assay (all 0s in the red). The posterior distribution for the

center panel identifies respiratory syncytial virus (RSV) as the cause with high certainty. This is

because RSV is rarely detected in the NP cavities of control-group children. The rightmost

25

posterior distribution puts the majority of probability on human metapneumovirus (HMPV) A

and B but cannot rule out 3 other causes.

To introduce the bayesian hierarchical approach to the clinical infectious disease

community, Deloria Knoll et al47 first synthesized the prior methods for estimating the etiology

of children’s pneumonia (shown in Figure 5). Only the first and last methods produce estimates

of etiology distributions. The first is not reproducible. The last ignores measurement error and

cannot easily cope with multiple measurements of each pathogen (eg, from multiple sampling

locations).

Deloria Knoll et al47 then introduce the bayesian hierarchical approach developed in this

program with the title PERCH Integrated Analysis (PIA) Method (Figure 6).

26

Figure 4. Graphical Displays of the Estimated Population and Individual Etiologies Using Data From a Site of the PERCH Study

Abbreviations: PCR, polymerase chain reaction; PERCH, Pneumonia Etiology Research for Child Health. Note: The left panel illustrates the prior (dashed line) and posterior (solid line) distributions of the viral etiologies that can be easily derived in our framework. The middle and right panels show the individual prediction histograms for the 2 measurement patterns in the case data.

27

Figure 5. Alternative Analytic Approaches Used for Determining Pneumonia Etiology

Abbreviations: BCX+, positive blood culture; Cor, coronavirus; Hinf, Haemophilus influenzae; HMPV, human metapneumovirus A/B; ID, identifier; NP/OP, nasopharyngeal/oropharyngeal; PCR, polymerase chain reaction; Rhino, rhinovirus; RSV, respiratory syncytial virus; S. aur, Staphylococcus aureus; Spn, Streptococcus pneumoniae. From Deloria Knoll et al. Clin Infect Dis 64(Suppl 3),S213-S227. Reprinted with permission from Clinical Infectious Diseases (Copyright ©2017). Oxford University Press. All Rights Reserved.

28

Figure 6. Estimating the Etiologic Fraction From a Study With 2 Types of Measurements, 1 With Control Data, and Accounting for Imperfect Sensitivity of Both Measurements: The PIA Methoda

Abbreviations: BCx, blood culture; NP, nasopharyngeal; OP, oropharyngeal; PCR, polymerase chain reaction; PERCH, Pneumonia Etiology Research for Child Health; PIA, PERCH integrated analysis. aThe PIA method can combine multiple specimens as whole-blood PCR and lung aspirate culture PCR and adjust each measurement for pathogen-specific sensitivity to estimate the etiologic fraction. From Deloria Knoll et al. Clin Infect Dis 64(Suppl 3),S213-S227. Reprinted with permission from Clinical Infectious Diseases (Copyright ©2017). Oxford University Press. All Rights Reserved.

29

Having introduced the method in epidemiologic terms and having conducted simulation

studies to compare its performance to the attributable fraction method, the PERCH Study team

used it as the main method in their PERCH results paper.48 The PERCH investigators showed

that most hospital admissions for childhood pneumonia were caused by a small set of

pathogens. An example of how the results are communicated is shown in Figure 7 (and the

PERCH team’s Figure 7 in the supplemental materials accompanying their article48), which

shows the estimated etiologic fraction for each pathogen with 95% posterior probability

interval, stratified by whether the cases were chest X-ray positive. The lower box in Figure 7

shows the estimated fractions of viruses and bacteria.

30

Figure 7. All Site Etiology Results Among HIV-Uninfected Patients by CXR Findings: CXR+ Patients vs All Patients

Abbreviations: Adeno, adenovirus; B. pert., Bordetella pertussis; C. pneu, Chlamydia pneumoniae; Cand, Candida; CMV, cytomegalovirus; Cor, coronavirus; CXR+, chest radiograph positive; Entrb, Enterobacter; HCoV, human coronavirus; Hinf, Haemophilus influenzae; HMPV, human metapneumovirus A/B; NP/OP, nasopharyngeal/oropharyngeal; Legio, Legionella; M. pneu, Mycoplasma pneumoniae; Mtb, Mycobacterium

31

tuberculosis; NFGNR, nonfermenting gram-negative rods; Nmen, Neisseria meningitidis; NoS, not otherwise specified; PERCH, Pneumonia Etiology Research for Child Health; P. jirov, Pneumocystis jirovecii; PCR, polymerase chain reaction; Rhino, rhinovirus; RSV, respiratory syncytial virus; Salm, Salmonella; S. aur, Staphylococcus aureus; S. pneu, Streptococcus pneumoniae. From PERCH Study Group. Lancet. 2019;394, 757-779. Reprinted with permission from Lancet (©2019). All Rights Reserved.

Evaluating Efficacy and Safety of Active Surveillance as An Alternative to Surgery for Prostate Cancer This case study was developed to support the critical clinical decision, made by a

clinician and his patient, to join, continue, or leave active surveillance in favor of prostatectomy

or radiation treatment. The team developed specific models, special cases of the general model

shown in Figures 1 and 2, to produce the probabilities that (1) a man’s prostate cancer is

aggressive (Gleason score >6) rather than indolent; and (2) a specimen collected via biopsy

conducted on that day would show evidence of an aggressive tumor. This section focuses on

the first question; results relevant to the second can be found in the report by Mamawala et

al.49

Model. The results presented here are summaries, with considerable direct citation in

quotation marks, from 1 of the 2 main articles that present the results in detail (Coley et al).41

The general models summarized in Figures 1 and 2 were tailored to prostate cancer

active surveillance. In this case study, we assume there is a true underlying binary state of the

tumor. Although the model allows that state to change (Figure 8, panel a), there is little

evidence in the data to learn about whether the state has changed. Therefore, as a first

approximation, we assume that the true tumor type is fixed (Figure 8, panel b). Given the latent

variable, there are 3 sources of direct evidence about the underlying unknown state that we

seek to predict: (1) the time series of prostate-specific antigen (PSA) scores that we represent

using a random-effects model with random intercept and slope, and a covariance matrix that is

the same regardless of true state; (2) the indicator of whether a biopsy was performed at a

visit; and (3) the pathology results of the biopsy specimen. In addition, we allow the decision to

conduct a prostatectomy to also depend on the underlying state as a sensitivity analysis of

32

whether the predictions change substantially over a range of assumptions about the degree of

representativeness of the patients undergoing prostatectomy for the entire cohort.41

Figure 8. Directed Acyclic Graphs Describing the Relationships Between Latent Class and Clinical Outcomesa

Abbreviation: PSA, prostate-specific antigen. aLatent class is indicated by blue circle or oval; clinical outcomes are indicated by a green rectangle. From Coley et al. Biometrics 73(2), 625-634. Reprinted with permission from Biometrics (Copyright ©2017). International Biometric Society. All Rights Reserved.

Model parameters and their priors are presented in Table 2. Posterior sampling was

performed with RJags (Plummer et al).50 Analysis code, sampler settings, and diagnostic plots

are available from the supplementary material for Coley et al.41

33

Table 2. Model Summary With Priors Used for Application to Johns Hopkins Active Surveillance Dataa

Abbreviations: AS, active surveillance; PSA, prostate-specific antigen. aD! is the length of vector X, and I"! is the identity matrix with dimension D! × D!. D#, D$, D%, and D&, and the associated identity matrices are similarly defined for covariate vectorsZ, U, V, and W. From Coley et al. Biometrics 73(2), 625-634. Reprinted with permission from Biometrics (Copyright ©2017). International Biometric Society. All Rights Reserved.

34

The Johns Hopkins Active Surveillance Cohort comprises a “total of 874 patients who

met study criteria and had at least two PSA measurements and at least one post-diagnosis

biopsy as of October 1, 2014... Patient outcomes are given in Figure 9. Grade reclassification

was observed in 160 patients (18% of the analysis cohort). Notably, over a quarter of patients

with grade reclassification who underwent prostatectomy were downgraded after surgery

(17/65) while nearly a third of patients who underwent prostatectomy in the absence of grade

reclassification were upgraded (30/96).”41

Figure 9. CONSORT Diagram for Johns Hopkins Active Surveillance Prospective Cohort Patients Included in This Analysisa

Abbreviation: GS, Gleason score. aPostsurgery full prostate GS observations are given in circles. Six patients who underwent prostatectomy did not

have true GS observations available.

From Coley et al. Biometrics 73(2), 625-634. Reprinted with permission from Biometrics (Copyright ©2017).

International Biometric Society. All Rights Reserved.

Prediction assessment. “Predictive accuracy was assessed using the whole prostate

surgical specimen as the gold standard. To avoid bias in estimating prediction error, we never

35

used the same data to build the model that is used to evaluate it. So called ‘out-of-sample’

predictions of η were obtained for each patient by removing his data from the analysis and re-

running the posterior sampler. Out-of-sample predictions of η! were then compared to known

values with receiver operating characteristic (ROC) curves and calibration plots. For the former,

the area under the curve (AUC) and associated 95% bootstrapped intervals were calculated. For

the latter, a plot comparing posterior predictions to observed rates of class membership was

constructed by performing logistic regression of the observed true state on a natural spline

representation of out-of-sample posterior predictions (degrees of freedom = 2).”41

Prostate cancer findings. “We estimated that the prevalence of aggressive tumors in

the cohort was 0.20 (95% posterior interval: 0.14, 0.28) when we allow the decision to have

surgery to be informative, and 0.30 (0.23, 0.38) without informative sampling. Ninety-five

percent of AS [active surveillance] patients who neither reclassified nor underwent surgery

have posterior predictions that are lower than 50%; a majority have predictions below 20%.”41

The posterior predictions with vs without informative biopsy timing or decision to have a

prostatectomy are similar for most patients.

“Posterior predictions of η from our full model gave out-of-sample AUC estimates

among patients with observed true cancer state of 0.75 (95% bootstrapped interval: 0.67,

0.83).”41 See the graphs in Figure 10.

36

Figure 10. Predictive Accuracy of Out-of-Sample Predictions of η Among Patients With η Observed

Abbreviations: AUC, area under the curve; IOP, intensive outpatient program; ROC, receiver operating

characteristic; η, true health status.

(a) The specificity of predictions from each model is highlighted at the sensitivity of a binary classifier defined by

final biopsy result (*).

(b) The dark line shows the empirical rate of observing a true Gleason score of ≥7 (y-axis) given an out-of-sample

posterior probability of true state (x-axis) under our model with informative biopsy and surgery components;

shading gives the 95% point-wise CI. Perfect agreement lies on the x = y axis (dotted line). Hashmarks at y = 0 and y

= 1 correspond to observed cancer states (η = 0, 1, respectively) for patients with postsurgery true-state

observations. Hashmarks are located along the x-axis at each patient’s out-of- sample posterior probability of the

true state.

“Posterior predictions of η from the IOP model also appear to accurately estimate a

patient’s risk having more aggressive cancer. The calibration plot in Figure 6b [Figure 10, panel

b here] shows that, for patients with known values of η, the average observed value of η is

close to the average posterior predicted probability of η=1, indicating that the model

reasonably reproduces the mean of observations. The risks of clinical outcomes (biopsy results)

and choices (occurrence of biopsy and surgery) for all patients appear to be accurately

37

estimated by the IOP model as well, as demonstrated by calibration plots in the online

supplement.”41

The results of this model have been visualized for clinician and patient use and

successfully implemented within the JHM EHR. Figure 11 shows a screen shot that summarizes

1 man’s risk of having an aggressive tumor and the implications of those risks. Clinical studies

are underway to evaluate the value of such patient information.

38

Figure 11. Screen Shot From JHM EHR Showing the Model-Estimated Risk That a Patient’s Prostate Tumor Falls Into Each of the Gleason Classes Indicated by the Colors in the Pie Chart on the Lefta

Abbreviations: EHR, electronic health record; JHM, Johns Hopkins Medicine; PSA, prostate-specific antigen. aBy clicking on a wedge of the pie chart, additional information appears on the right, indicating published

information for longer-term outcomes, given their true pathology.

Diagnosis and Evaluation of Therapies for Major Mental Illness The results presented here are summaries, with considerable direct citation in quotation

marks, from Fojo et al.46

39

For this case study, we intended to use NNDC data to develop bayesian hierarchical

models to predict the trajectory of each patient before treatment to be compared with the

outcomes under treatment. The NNDC project was chosen to feature its time-varying

multivariate symptoms data and possibly its neuroimaging data. However, NNDC data

availability was substantially delayed and so the research team developed the methods for

similar symptoms of schizophrenia, including scores for 3 distinct scales: positive, negative, and

general symptoms measured over time for approximately 1000 patients. The details about the

data set and the bayesian hierarchical models are presented by Fojo et al.46

Figure 12 displays the model’s prediction of schizophrenia symptoms for a single

patient. “The three panels illustrate the predictions (red circles) for the individual’s future

General, Negative, and Positive PANSS [Positive and Negative Syndrome Scale] subscale scores

as well as predicted cumulative probability of treatment failure (red bars at right) calculated

from prior measurements of the subscales at (a) week 0 when treatment is begun, (b) 0 and 1

weeks, and (c) 0, 1, 2, and 4 weeks. Green squares indicate the observed measurements. The

individual’s trajectory is displayed against other participants in the trial (background blue lines).

The dark gray ribbons indicate the 50% confidence intervals, the lighter gray ribbons indicate

the 95% intervals. The vertical dashed lines indicate the time up to which observations are used

to inform the predictions. A higher PANSS score indicates more severe symptoms. The General

subscale score ranges from 16 to 112, and the Positive and Negative subscale scores range from

7 to 49.” 46

The same model has more recently been applied to the NNDC data from Baltimore for

the 3 scales of depression, anxiety, and mania. That work is still in progress and thus is excluded

from further description here.

40

Figure 12. Example Predictions of PANSS Symptom Scores

Abbreviation: PANSS, Positive and Negative Syndrome Scale.

Note: The 3 panels illustrate the predictions (red circles) for 1 individual’s future General, Negative, and Positive

PANSS subscale scores as well as predicted cumulative probability of treatment failure (red bars at right) based on

measurements of the subscales at (a) week 0 (initiation of treatment), (b) 0 and 1 weeks, and (c) 0, 1, 2, and 4

weeks. Green squares indicate the observed measurements. The individual’s trajectory is displayed against other

41

participants in the trial (background blue lines). The dark gray ribbons indicate the 50% prediction intervals, and

the lighter gray ribbons indicate the 95% prediction intervals. The vertical dashed lines indicate the time up to

which observations are used to inform the predictions. A higher PANSS score indicates more severe symptoms. The

General subscale score ranges from 16 to 112, and the Positive and Negative subscale scores range from 7 to 49.

From Fojo et al. J Psychiatr Res. 95, 147-155. Reprinted with permission from the Journal of Psychiatric Research (Copyright ©2017). Elsevier. All Rights Reserved.

Aim 3 To implement the statistical methods in an open-source, easily extensible R package:

OSLER inHealth.

The longer-term goals of OSLER inHealth are as follows:

• Disseminate software updates quickly

• Educate a diverse community of scientists, using detailed tutorials

• Ensure quality via automatic and manual quality controls

• Promote the reproducibility of personalized health care data analysis

OSLER inHealth (https://oslerinhealth.org/) is a repository of current and future R

packages that are relevant for statisticians involved in precision medicine and health care. We

have built a repository infrastructure and begun the process of making centrally available high-

quality packages. The structure includes feedback for developers. OSLER attempts to fill the gap

in repositories whereby a package may not be listed on the Comprehensive R Archive Network

(CRAN) because it may contain crucial data that make it too large to store. For example, the

rnhanesdata package contains the National Health and Nutrition Examination Survey (NHANES)

wearable activity data, organized for the user. The size of this package prohibits acceptance

into CRAN, but its size is acceptable in OSLER and can be used to analyze NHANES data easily

and quickly.

We have an estimated 124 monthly users. We currently have 6 packages; the

developers all have been helped by OSLER developers to improve the packages and have them

pass a set of quality checks.

42

We plan to publicize OSLER more in the future. In tandem with OSLER, we have been

developing Neuroconductor (https://neuroconductor.org/) with similar goals for medical image

analysis. Though much effort has focused on Neuroconductor, all improvements to

Neuroconductor have been used to improve OSLER. Thus, we have learned lessons with a larger

set of packages and developers and have been able to improve OSLER, even though, because of

its recent completion, the number of OSLER packages is much fewer. For example, we have

developed custom build scripts

(https://github.com/muschellij2/neuroc_travis/blob/master/oslerinhealth_travis.yml)

specifically designed for OSLER. These customizations allow packages to be checked with

additional software to ensure that the packages run properly for users.

Next steps. Now that we have a stable repository for packages, we plan to publicize

this to researchers, encourage more developers to submit packages, and improve current

packages to provide more tutorials for researchers entering these areas.

43

DISCUSSION

Bayesian Hierarchical Model With this project, we have developed and demonstrated the utility of bayesian

hierarchical statistical models to better characterize and communicate an individual’s health

status, trajectory, and, to a more limited degree, the likely benefits of interventions. Our model

represents key elements of the process that gives rise to clinical or population health data,

including a dynamic latent health status process that can comprise discrete and/or continuous

variables; heterogeneous (across individuals) effects of treatments and covariates on that

process; nonignorable observation bias and complex outcome variables that reflect the

underlying process; and a treatment assignment process that can depend on past outcomes.

The model has 2 levels that allow the treatment effects, latent health status, and measurement

process parameters to vary among individuals.

Case Studies We have tailored the bayesian hierarchical model to 3 specific case studies representing

diverse types of questions and data. The children’s pneumonia case is a case-control study at a

single time across 6 countries in Africa and Asia. Its latent health status is a discrete multinomial

variable that can take 1 of 30 different values, indicating which pathogen caused a child’s

infection. The outcome data are complex, comprising presence or absence indicators of each

pathogen in 3 distinct samples from the NP cavity, blood, and, rarely, from the lung. The

prostate cancer and the mental disorder case studies involve multivariate longitudinal outcome

data. In the former, the state variable is a static binary indicator of whether the tumor is

aggressive or indolent. In the mental disorder case, the latent process is multivariate and takes

continuous values. To examine the flexibility of the modeling approach for image data, we

added a project using gel electrophoresis assays to identify patient autoantibody signatures.

Here, the state space is discrete with 2100 = 1.3 × 1030 possible outcomes.

The bayesian hierarchical model is adaptable to these different problems because of a

few key features. First, it is a likelihood-based approach so that, in larger samples, the

44

likelihood dominates the prior distribution for many key parameters such as regression

coefficients. Second, the use of priors allows us to address important substantive questions by

restricting the possible solutions through the choice of priors so that poorly identified models

can be used. For example, absent scientific constraints, the pneumonia etiology model cannot

estimate the frequency of lung infections caused by each pathogen, because the likelihood

itself does not separate these parameters from the sensitivities of the measurements. But prior

laboratory and clinical trials data provide a reasonable range for the assay sensitivities that,

once imposed through the prior assumptions, make the model identifiable. Note, in this case,

that the confidence intervals account for the prior uncertainty, which does not decrease with

sample size. See Wu et al42 for the statistical details and Deloria Knoll et al47 for references to

prior selections. Third, Markov chain Monte Carlo (MCMC) is used to estimate the posterior

distributions of interest so that missing data and complex outcome measurement are not a

computational burden. The bayesian approach also makes model checking relatively

straightforward by comparing observed characteristics of the joint distribution of the observed

data with predictions of those same quantities based on the model. See, for example, Wu et

al42 and Fojo et al46 for practical examples. Finally, posterior distributions are easily understood

by clinicians and patients with a small amount of training. They are easily visualized to

communicate predictions of health status, trajectories, or likely intervention benefits.

Software Despite our original plan to write stand-alone, new software, initial experimentation

revealed that dramatic improvements in available R-based software for MCMC (eg, RJags, Stan,

MCMCglmm) made writing new software a poor investment of resources. Hence, all our

software is written as R packages that were made publicly available as soon as completed

through GitHub, as cited in each article. We have also built an R software repository called

OSLER inHealth in which our software and similar software developed by others can be checked

for software standards and kept current as R and supporting software changes, and can be

more widely disseminated because of easy access to the programs and helpful documentation.

45

OSLER inHealth remains a work in progress. The repository is in place and has a few R

packages. We are now in a position to inform colleagues about its availability and to receive

other contributions. Over the longer term, however, funding for a staff person to support

contributors is needed, as was the case for its progenitors Neuroconductor and Bioconductor.

Fortunately, JHM has built a new Precision Medicine Analytics Platform (PMAP) that it internally

funds and OSLER will become the R repository for internal and external software packages

within PMAP. This may provide the core support for an OSLER manager.

Stakeholders This project was successful in supporting collaborations among clinical and statistical

experts to create the case study methods, software, and applications. Each of the case studies

has produced articles that both advance quantitative methods for clinical research and provide

substantive findings as reflected in the bibliography of published or submitted manuscripts.

Patient stakeholders have also played a critical role in the design and evaluation of the prostate

cancer application. Early versions of the model were critiqued by our patient advisory board

and significant changes were made as a result. Similarly, when the tool was functioning, the

board assisted us in designing the patient- and clinician-facing visualizations. Our early

questions were about whether patients wanted to see predictions directly or wanted their

physicians to be intermediaries. They were clear that they wanted full access to the model

results and to thorough documentation supporting these predictions. They also wanted to

make critical decisions in partnership with their physician. A JHM patient with depression and a

patient with prostate cancer also served on the OSLER inHealth oversight committee that met

once a year.

Influence on Population Health and Patient Care Each case study has generated patient-oriented results that are used by their target

clinical or public health audiences. For example, the prostate cancer software has been

implemented within the JHM EHR as described previously. Clinicians can now access and

consider the model’s risk predictions when they are assisting their patients to decide whether

to continue active surveillance. Future research is to determine whether clinicians and patients

46

derive benefits from using the tool, the critical one of which is whether fewer indolent tumors

are resected or irradiated. The main obstacle to more rapid dissemination of this tool is the

financial model for prostate cancer care. Clinicians in many American settings are rewarded for

providing treatment; there is a smaller financial reward for choosing not to treat. One

implication is that there is no health system funding to scale the prostate tool for regional or

national use. The software has been designed to scale. JHM has built a cloud-based system

whereby the tool could be used by clinicians in their private offices. But the start-up and

curation costs combined with the intervention incentives remain obstacles to the tool

becoming widely available.

Future Directions There are several important next steps to expand the influence of this bayesian

hierarchical model approach for the benefit of patients, some of which are already underway.

First, it is important to scale a tool that addresses a particular unmet need across a larger, more

diverse population of patients and clinicians so that its utility can be scientifically measured,

curation methods can be established to keep the tool current, and a financial model can be

established to support its continuous use. We are currently pursuing this vertical scaling of both

the prostate cancer and children’s pneumonia applications. We think it is equally important to

horizontally scale the approach to address a wider set of unmet clinical needs. To this end, we

have projects underway in autoimmune diseases, sudden cardiac arrest, and diabetes. Our

longer-term goals are to (1) embed a collection of tools to acquire and use the most relevant

information, agnostic to its level of measurement, to improve population and individual health

decisions that cause better outcomes at more affordable costs; and (2) scale the tool-creation

process so that data scientists around the world, in partnerships with population- or patient-

health managers (ie, clinicians) share equal access to the best information for each decision.

47

CONCLUSIONS Bayesian hierarchical models that include dynamic, latent health state; probabilities for

the selection and effects of interventions on those states; and the complex health outcomes

from which the underlying states can be inferred are useful tools to improve population health

or clinical decisions. Such a model combines diverse sources of prior knowledge and data with

evidence about the patient (population) at hand, to predict the patient’s health status,

trajectory, and/or likely benefits of interventions. Visualizations of characteristics of posterior

distributions can be immediately understood by clinicians and patients as relevant to their

decision. When tailored to the particular medical situation, the models summarize complex

information to answer questions of interest to patients, such as, “What is my health status; am I

improving; which treatment is most likely to improve these symptoms?” The next key steps are

to scale these and other similar tools to larger and more diverse populations where they can be

systematically evaluated and to increase the rate at which tools can be developed, tested,

disseminated, and then curated. The most significant obstacle is the lack of a financial model by

which the improvements in health and cost outcomes derived from using these tools are not

returned to support their curation and expand their influence.

48

REFERENCES 1. Washington AE, Lipstein SH. The Patient-Centered Outcomes Research Institute —

promoting better information, decisions, and health. N Engl J Med. 2011;365(15):e31. doi:10.1056/NEJMp1109407

2. Blumenthal D. Stimulating the adoption of health information technology. N Engl J Med. 2009;360(15):1477-1479. doi:10.1056/nejmp0901592

3. Institute of Medicine, National Academy of Engineering. Engineering a Learning Healthcare System: A Look at the Future: Workshop Summary. National Academies Press; 2011.

4. Diggle PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data. Clarendon Press; 2002.

5. Raudenbush S, Byrk A. Hierarchical Linear Models: Applications and Data Analysis Methods. Vol 1. Sage Publications; 2002.

6. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Springer; 2009.

7. Diaz F, Yeh HW, de Leon J. Role of statistical random-effects linear models in personalized medicine. Curr Pharmacogenomics Person Med. 2012;10(1):22-32. doi:10.2174/187569212800166693

8. Goldstein H. Multilevel Statistical Models. Vol 922. John Wiley & Sons; 2011.

9. Silverman M, Murray T, Bryan C. The Quotable Osler. American College of Physicians; 2008.

10. Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press; 2006.

11. Rosen A, Zeger SL. Precision medicine: discovering clinically relevant and mechanistically anchored disease subgroups at scale. J Clin Invest. 2019;129(3):944-945. doi:10.1172/jci126120

12. Prentice RL, Zhao LP. Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics. 1991;47(3):825-839. doi:10.2307/2532642

13. Lauritzen SL. Propagation of probabilities, means, and variances in mixed graphical association models. J Am Stat Assoc. 1992;87(420):1098-1108. doi:10.1080/01621459.1992.10476265

14. Little RJA, Rubin DB. Statistical Analysis with Missing Data. Wiley; 2002.

49

15. Holland PW. Statistics and causal inference. J Am Stat Assoc. 1986;81(396):945.

16. Rubin DB. Formal mode of statistical inference for causal effects. J Stat Plan Inference. 1990;25(3):279. doi:10.1016/0378-3758(90)90077-8

17. Greenland S, Robins JM, Pearl J. Confounding and collapsibility in causal inference. Stat Sci. 1999;14(1):29. doi:10.1214/ss/1009211805

18. Pearl J. Causal Inference in statistics: an overview. Stat Surv. 2009;3:96. doi:10.1214/09-ss057

19. Levine OS, Obrien KL, Deloria-Knoll M, et al. The Pneumonia Etiology Research for Child Health Project: a 21st century childhood pneumonia etiology study. Clin Infect Dis. 2012;54(Suppl 2):S93-S101. doi:10.1093/cid/cir1052

20. Black RE, Cousens S, Johnson HL, et al. Global, regional, and national causes of child mortality in 2008: a systematic analysis. Lancet. 2010;375(9730):1969-1987. doi:10.1016/s0140-6736(10)60549-1

21. Liu L, Oza S, Hogan D, et al. Global, regional, and national causes of child mortality in 2000–13, with projections to inform post-2015 priorities: an updated systematic analysis. Lancet. 2015;385(9966):430-440. doi:10.1016/s0140-6736(14)61698-6

22. Scott JAG, Brooks WA, Peiris JM, Holtzman D, Mulholland EK. Pneumonia research to reduce childhood mortality in the developing world. J Clin Invest. 2008;118(4):1291-1300.

23. Murdoch DR, O’Brien KL, Driscoll AJ, Karron RA, Bhat N. Laboratory methods for determining pneumonia etiology in children. Clin Infect Dis. 2012;54(Suppl_2):S146-S152. doi:10.1093/cid/cir1073

24. Carter HB, Walsh PC, Landis P, Epstein JI. Expectant management of nonpalpable prostate cancer with curative intent: preliminary results. J Urol. 2002;167(3):1231-1234. doi:10.1097/00005392-200203000-00006

25. Tosoian JJ, Trock BJ, Landis P, Feng Z, Epstein JI, Partin AW. Active surveillance program for prostate cancer: an update of the Johns Hopkins Experience. J Clin Oncol. 2011;29(16):2185-2190.

26. Howlader N, Noone AM, Krapcho M, et al (eds). Previous version: SEER Cancer Statistics Review, 1975-2011. National Cancer Institute. Posted April 2014. Updated December 17, 2014. Accessed September 30, 2020. https://seer.cancer.gov/archive/csr/1975_2011/

27. Welch HG, Albertsen PC. Prostate cancer diagnosis and treatment after the introduction of prostate-specific antigen screening: 1986-2005. J Natl Cancer Inst. 2009;101(19):1325-1329.

50

28. Cooperberg MR, Broering JM, Carroll PR. Time trends and local variation in primary treatment of localized prostate cancer. J Clin Oncol. 2010; 28(7):1117-1123.

29. Chou R, Croswell JM, Bougatsos C, Blazina I. Screening for prostate cancer: a review of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2011;155(11):762-771.

30. Chou R, Dana T, Bougatsos C, Fu R, Blazina I. Treatments for Localized Prostate Cancer: Systematic Review to Update the 2002 U.S. Preventive Services Task Force. Evidence Syntheses 91. Agency for Healthcare Research and Quality; 2011.

31. Carter HB, Kettermann A, Warlick C, et al. Expectant management of prostate cancer with curative intent: an update on the Johns Hopkins Experience. J Urol. 2007;178(6):2359-2364.

32. Soloway MS, Soloway CT, Williams S. Active surveillance; a reasonable management alternative for patients with prostate cancer: the Miami experience. BJU Int. 2008;101(2):165-169.

33. van As NJ, Norman AR, Thomas K. Predicting the probability of deferred radical treatment for localised prostate cancer managed by active surveillance. Eur Urol. 2008;54(6):1297-1305.

34. van den Bergh RC, Roemeling S, Roobol MJ, et al. Outcomes of men with screen-detected prostate cancer eligible for active surveillance who were managed expectantly. Eur Urol. 2009;55(1):1-8.

35. Klotz L, Zhang L, Lam A. Clinical results of long-term follow-up of a large, active surveillance cohort with localized prostate cancer. J Clin Oncol. 2010;28(1):126-131.

36. Xu J, Zeger S. Joint analysis of longitudinal data comprising repeated measures and times to events. J R Stat Soc Ser C Appl Stat. 2001;50(3):375-387.

37. Goldberg D. The heterogeneity of “major depression.” World Psychiatry. 2011;10(3):226-228.

38. Carter GC, Cantrell RA, Victoria Z, et al. Comprehensive review of factors implicated in the heterogeneity of response in depression. Depress Anxiety. 2012;29(4):340-354.

39. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606-613.

40. Ogburn EL, Zeger SL. Statistical reasoning and methods in epidemiology to promote individualized health: in celebration of the 100th anniversary of the Johns Hopkins Bloomberg School of Public Health. Am J Epidemiol. 2016;183(5):427-434.

51

41. Coley RY, Fisher AJ, Mamawala M, Carter HB, Pienta KJ, Zeger SL. A Bayesian hierarchical model for prediction of latent health states from multiple data sources with application to active surveillance of prostate cancer. Biometrics. 2017;73(2):625-634. doi:10.1111/biom.12577

42. Wu Z, Deloria-Knoll M, Hammitt LL, Zeger SL. Pneumonia Etiology Research for Child Health Core Team. Partially latent class models for case–control studies of childhood pneumonia aetiology. J R Stat Soc Ser C Appl Stat. 2016;65(1);97-114.

43. Wu Z, Deloria-Knoll M, Zeger SL. Nested partially latent class models for dependent binary data; estimating disease etiology. Biostatistics. 2016;18(2):200-213.

44. Wu Z, Casciola-Rosen L, Shah AA, Rosen A, Zeger SL. Estimating auto-antibody signatures to detect autoimmune disease patient subsets. Biostatistics. 2019;20(1):30-47. doi:10.1093/biostatistics/kxx061

45. Wu Z, Casciola-Rosen L, Rosen A, Zeger SL. A Bayesian approach to restricted latent class models for scientifically-structured clustering of multivariate binary outcomes. bioRxiv Preprint posted online August 15, 2018. Accessed September 29, 2020. https://www.biorxiv.org/content/10.1101/400192v3

46. Fojo AT, Musliner KL, Zandi P, Zeger SL. A precision medicine approach for psychiatric disease based on repeated symptom scores. J Psychiatr Res. 2017;95:147-155.

47. Deloria Knoll M, Fu W, Shi Q, et al. Bayesian estimation of pneumonia etiology: epidemiologic considerations and applications to the Pneumonia Etiology Research for Child Health Study. Clin Infect Dis. 2017;64(Suppl_3):S213-S227. doi:10.1093/cid/cix144

48. PERCH Study Group. Causes of severe pneumonia requiring hospital admission in children without HIV infection from Africa and Asia: the PERCH multi-country case-control study. Lancet. 2019;394:757-779. https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30721-4/fulltext

49. Mamawala MM, Rao K, Landis P, et al. Risk prediction tool for grade re-classification in men with favourable-risk prostate cancer on active surveillance. BJU Int. 2017;120(1):25-31.

50. Plummer M, Stukalov A, Denwood M. rjags: Bayesian graphical models using MCMC (version 3-5). 2011. https://cran.r-project.org/web/packages/rjags/index.html

52

ACKNOWLEDGMENTS The authors gratefully acknowledge the support of the people of the United States

whose tax dollars sustain PCORI, which funded this work through a competitive process. We

acknowledge the supportive role that our PCORI staff, ably led by Dr Emily Evans, played in the

initiation, execution, and the reporting of this research. We especially appreciate their

dedication to clear communication and to engagement of clinical and patient stakeholders.

The authors were ably supported by numerous faculty and staff colleagues, including

Darcy Phelan, Ken Fasman, Brionna Hair, Risha Zuckerman, Debra Moffitt, Kara Schoenberg,

Tricia Landis, and Joyclyn Gilmore.

We benefited from the positive research environment in the Department of Biostatistics

chaired by Dr Karen Bandeen-Roche and in the Johns Hopkins Individualized Health Initiative

(Hopkins inHealth), led by Prof Antony Rosen. Hopkins inHealth colleagues Aalok Shah, Dwight

Raum, and members of the Technology Innovation Center were important contributors.

We thank all the co-authors of articles partially supported by this grant. Though not

themselves funded, they generously added their talents to the science presented here.

We are especially appreciative of the time and good counsel offered by our patient

stakeholders: William Wilson, William Lewis, and Peter Johnson. They educated us about what

patients need and want.

Our Scientific Advisory Board kept us moving in the right direction. These colleagues—

Patrick Heagerty, Francesca Dominici, Martin Morgan, Roger Peng, and Vince Carey—met with

us each year and enriched the work by generously sharing their intellects and vast experience.

We are most grateful.

The group that worked together on this project have moved ahead in their careers. The

postdoctoral fellows Zhenke Wu, Yates Coley, and Todd Fojo are all now assistant professors or

the equivalent at top institutions. Two of our midlevel collaborators became full professors

during this project, another became a director for the World Health Organization, and yet

53

another was awarded a named professorship. We like to think that the collaboration

contributed useful research results as well as new insights for those who enthusiastically

invested themselves in this endeavor.

54

Copyright © 2020. Johns Hopkins Bloomberg School of Public Health. All Rights Reserved.

Disclaimer:

The [views, statements, opinions] presented in this report are solely the responsibility of

the author(s) and do not necessarily represent the views of the Patient-Centered

Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology

Committee.

Acknowledgment:

Research reported in this report was funded through a Patient-Centered Outcomes

Research Institute® (PCORI®) Award (#ME-1408-20318). Further information available

at: https://www.pcori.org/research-results/2015/using-bayesian-approach-predict-

patients-health-and-response-treatment

FINAL RESEARCH REPORT - PCORI

Documents

Transcript of FINAL RESEARCH REPORT - PCORI