Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2 and Gregory F. Cooper 1,2
description
Transcript of Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2 and Gregory F. Cooper 1,2
An Outbreak Detection An Outbreak Detection Algorithm that Efficiently Algorithm that Efficiently
Performs Complete Bayesian Performs Complete Bayesian Model Averaging Over All Model Averaging Over All
Possible Spatial Distributions of Possible Spatial Distributions of DiseaseDisease
Yanna ShenYanna Shen1,21,2, Weng-Keen Wong, Weng-Keen Wong33, John D. Levander, John D. Levander22 and and Gregory F. CooperGregory F. Cooper1,21,2
11Intelligent Systems Program and Intelligent Systems Program and 22Department of Biomedical Department of Biomedical Informatics, University of Pittsburgh;Informatics, University of Pittsburgh;
33School of Electrical Engineering and Computer Science, School of Electrical Engineering and Computer Science, Oregon State UniversityOregon State University
IntroductionIntroduction
Bayesian outbreak detection algorithmBayesian outbreak detection algorithm Monitors for a wide variety of spatial Monitors for a wide variety of spatial
distributions of a disease outbreakdistributions of a disease outbreak Performs Performs complete Bayesian model complete Bayesian model
averaging averaging over all spatial distributions over all spatial distributions of disease in polynomial time (relative of disease in polynomial time (relative to some finite set of spatial primitives, to some finite set of spatial primitives, such as zip codes)such as zip codes)
BackgroundBackground
Most commonly used approaches to spatial Most commonly used approaches to spatial outbreak detectionoutbreak detection Are frequentist (e.g., spatial scan algorithms)Are frequentist (e.g., spatial scan algorithms) Detect outbreaks by finding the Detect outbreaks by finding the singlesingle contiguouscontiguous
subregion where the outbreak is the most likely to subregion where the outbreak is the most likely to be occurringbe occurring
The Spatial Bayesian Model Averaging (SBMA) The Spatial Bayesian Model Averaging (SBMA) algorithm introduced herealgorithm introduced here Is BayesianIs Bayesian Detects outbreaks by considering Detects outbreaks by considering all possibleall possible
contiguous and non-contiguouscontiguous and non-contiguous subregions where subregions where the outbreak may be occurringthe outbreak may be occurring
An example of outbreak An example of outbreak states in a hypothetical states in a hypothetical
regionregion Three zip codes in a hypothetical region monitored OBi and NOBi: represent the disease outbreak states
(outbreak and non-outbreak) for zip code i. where 1 ≤ i ≤ 3.
Total number of spatial outbreak states in the region: 23 = 8 NOBiOBi
1
32
General case of outbreak General case of outbreak states in a hypothetical states in a hypothetical
regionregion Suppose that there areSuppose that there are r r zip codes in zip codes in
a region being monitored. Then a region being monitored. Then there are 2there are 2rr spatial outbreak states. spatial outbreak states. Consider each outbreak state as being a Consider each outbreak state as being a
spatial disease hypothesisspatial disease hypothesis If we perform complete Bayesian model If we perform complete Bayesian model
averaging over all of these hypotheses averaging over all of these hypotheses in a brute-force way, then the time in a brute-force way, then the time complexity is complexity is OO(2(2r r ))
NotationNotation r r – total number of zip codes in a – total number of zip codes in a
hypothetical regionhypothetical region i i – the index of a specific zip code, – the index of a specific zip code,
where where 1 ≤ 1 ≤ ii ≤ ≤ rr q q – the probability that a zip code has – the probability that a zip code has
an outbreak. an outbreak. PP((qq): our belief about ): our belief about qq
When 0 < When 0 < qq ≤ 1, we model that ≤ 1, we model that PP((qq) = ) = 1 – 1 – αα, where 0 ≤ , where 0 ≤ αα ≤ 1. ≤ 1.
DefinitionsDefinitions
OB: the state that at least one zip code has an outbreak present
NOB: the state that no zip code has a disease outbreak, where P(NOB) = α
AssumptionsAssumptions The spatial region is partitioned into a The spatial region is partitioned into a finite finite
set of spatial primitivesset of spatial primitives, such as , such as rr zip codes. zip codes. A spatial outbreak hypothesis consists of an A spatial outbreak hypothesis consists of an
assignment of an outbreak as assignment of an outbreak as presentpresent or or absentabsent to each spatial primitive. to each spatial primitive.
Each spatial primitive is modeled as an Each spatial primitive is modeled as an independent Bernoulli trialindependent Bernoulli trial with probability with probability qq of having an outbreak during the current of having an outbreak during the current period.period.
Data consists of the current period Data consists of the current period counts of counts of some binary eventsome binary event (e.g., ED respiratory (e.g., ED respiratory counts in the past 24 hours).counts in the past 24 hours).
The SBMA algorithmThe SBMA algorithm
The joint probability of the data (The joint probability of the data (DD) and ) and the outbreak state (the outbreak state (OBOB):):
qOBDP
qNOBDP
qOBDP
qNOBDP
qOBDP
qNOBDP
|,
|,
|,
|,
|,
|,
33
33
22
22
11
11
1
0
1
0
|,,
dqqP
dqqOBDPqPOBDP
1
32
P(D1 , NOB1|q) + P(D1 , OB1|q)
P(D2 , NOB2|q) + P(D2 , OB2|q)
P(D3 , NOB3|q) + P(D3 , OB3|q)
qNOBDPqNOBDPqNOBDP |,|,|, 332211
Linear time
Polynomial time qDP | qNOBDP |, qDP | qNOBDP |,
The SBMA algorithmThe SBMA algorithm
where Di represents the data for zip code i .
r
iii
r
iii NOBDPNOBDPNOBPNOBDP
11
||,
1
011
|,|,|,, dqqNOBDPqOBDPqNOBDPqPOBDPr
iii
r
iiiii
Computes the posterior probability P(OB|D)
NOBDPOBDP
OBDPDOBP
,,
,|
Polynomial time
*http://www.dbmi.pitt.edu/panda/papers/Shen/ISDS07.pdf
Linear time
Polynomial time
Preliminary experimentsPreliminary experiments
96 simulated anthrax outbreaks96 simulated anthrax outbreaks**
Each outbreak consists of a time series of ED Each outbreak consists of a time series of ED patient cases, where each case has a chief patient cases, where each case has a chief complaint and a zip code.complaint and a zip code.
The probability that a case was assigned to live in a The probability that a case was assigned to live in a zip code was proportional to the population of that zip code was proportional to the population of that zip code.zip code.
The simulated outbreak cases were overlaid onto The simulated outbreak cases were overlaid onto real ED cases to create a semi-synthetic dataset of a real ED cases to create a semi-synthetic dataset of a scattered disease distributionscattered disease distribution..
Ran PANDARan PANDA** and SBMA on each of the 96 and SBMA on each of the 96 semi-synthetic datasetssemi-synthetic datasets
* Cooper GF, Dash DH, Levander, JD, Wong WK, Hogan WR, Wagner MM. Bayesian biosurveillance of disease outbreaks. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2004) 94-104 [We used the version of PANDA reported in Figure 7].
Preliminary resultsPreliminary results AMOC curves for PANDA and SBMA when AMOC curves for PANDA and SBMA when
the prior probability of an outbreak was 10%.the prior probability of an outbreak was 10%. At 0 false positive: SBMA detected the At 0 false positive: SBMA detected the
simulated scattered-disease outbreaks on simulated scattered-disease outbreaks on average about 12 hours before PANDAaverage about 12 hours before PANDA
20
25
30
35
40
45
50
55
60
65
70
0 2 4 6 8 10 12
False positives per year
Exp
ecte
d d
etec
tio
n t
ime
(ho
urs
) PANDA
SBMA
Preliminary resultsPreliminary results We also ran PANDA and SBMA on 96 semi-synthetic We also ran PANDA and SBMA on 96 semi-synthetic
datasets of simulated outbreaks due to outdoor datasets of simulated outbreaks due to outdoor aerosol releases of anthrax sporesaerosol releases of anthrax spores
At 0 false positives PANDA detected a simulated At 0 false positives PANDA detected a simulated anthrax outbreak on average about 5 hours before anthrax outbreak on average about 5 hours before SBMASBMA
30
32
34
36
38
40
42
44
46
48
50
0 2 4 6 8 10 12
False positives per year
Exp
ecte
d d
etec
tio
n t
ime
(ho
urs
) PANDA
SBMA
ConclusionsConclusions
An algorithm called SBMA was An algorithm called SBMA was introduced that performs exhaustive introduced that performs exhaustive Bayesian model averaging over all Bayesian model averaging over all possible spatial distributions of disease.possible spatial distributions of disease.
SBMA monitors for a wide variety of SBMA monitors for a wide variety of spatial patterns of a disease outbreak spatial patterns of a disease outbreak in polynomial time.in polynomial time.
Preliminary results support that SBMA Preliminary results support that SBMA is complementary to PANDA in disease is complementary to PANDA in disease outbreak detection.outbreak detection.
Future workFuture work
Apply the SBMA algorithm to a wide Apply the SBMA algorithm to a wide variety of semi-synthetic outbreak test variety of semi-synthetic outbreak test scenarios and compare its detection scenarios and compare its detection performance to that of other spatial performance to that of other spatial algorithmsalgorithms
Develop a multivariate version of SBMADevelop a multivariate version of SBMA Extend SBMA to model progression of Extend SBMA to model progression of
disease over time in order to create a disease over time in order to create a spatio-temporal BMA outbreak-detection spatio-temporal BMA outbreak-detection algorithmalgorithm
AcknowledgementsAcknowledgements
This research was funded by a This research was funded by a grant from the National Science grant from the National Science Foundation (NSF IIS-0325581)Foundation (NSF IIS-0325581)