Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2 and Gregory F. Cooper 1,2

16
An Outbreak Detection Algorithm An Outbreak Detection Algorithm that Efficiently Performs that Efficiently Performs Complete Bayesian Model Complete Bayesian Model Averaging Over All Possible Averaging Over All Possible Spatial Distributions of Spatial Distributions of Disease Disease Yanna Shen Yanna Shen 1,2 1,2 , Weng-Keen Wong , Weng-Keen Wong 3 , John D. Levander , John D. Levander 2 and and Gregory F. Cooper Gregory F. Cooper 1,2 1,2 1 Intelligent Systems Program and Intelligent Systems Program and 2 Department of Department of Biomedical Informatics, University of Pittsburgh; Biomedical Informatics, University of Pittsburgh; 3 School of Electrical Engineering and Computer School of Electrical Engineering and Computer Science, Science, Oregon State University Oregon State University

description

An Outbreak Detection Algorithm that Efficiently Performs Complete Bayesian Model Averaging Over All Possible Spatial Distributions of Disease. Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2 and Gregory F. Cooper 1,2 - PowerPoint PPT Presentation

Transcript of Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2 and Gregory F. Cooper 1,2

Page 1: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

An Outbreak Detection An Outbreak Detection Algorithm that Efficiently Algorithm that Efficiently

Performs Complete Bayesian Performs Complete Bayesian Model Averaging Over All Model Averaging Over All

Possible Spatial Distributions of Possible Spatial Distributions of DiseaseDisease

Yanna ShenYanna Shen1,21,2, Weng-Keen Wong, Weng-Keen Wong33, John D. Levander, John D. Levander22 and and Gregory F. CooperGregory F. Cooper1,21,2

11Intelligent Systems Program and Intelligent Systems Program and 22Department of Biomedical Department of Biomedical Informatics, University of Pittsburgh;Informatics, University of Pittsburgh;

33School of Electrical Engineering and Computer Science, School of Electrical Engineering and Computer Science, Oregon State UniversityOregon State University

Page 2: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

IntroductionIntroduction

Bayesian outbreak detection algorithmBayesian outbreak detection algorithm Monitors for a wide variety of spatial Monitors for a wide variety of spatial

distributions of a disease outbreakdistributions of a disease outbreak Performs Performs complete Bayesian model complete Bayesian model

averaging averaging over all spatial distributions over all spatial distributions of disease in polynomial time (relative of disease in polynomial time (relative to some finite set of spatial primitives, to some finite set of spatial primitives, such as zip codes)such as zip codes)

Page 3: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

BackgroundBackground

Most commonly used approaches to spatial Most commonly used approaches to spatial outbreak detectionoutbreak detection Are frequentist (e.g., spatial scan algorithms)Are frequentist (e.g., spatial scan algorithms) Detect outbreaks by finding the Detect outbreaks by finding the singlesingle contiguouscontiguous

subregion where the outbreak is the most likely to subregion where the outbreak is the most likely to be occurringbe occurring

The Spatial Bayesian Model Averaging (SBMA) The Spatial Bayesian Model Averaging (SBMA) algorithm introduced herealgorithm introduced here Is BayesianIs Bayesian Detects outbreaks by considering Detects outbreaks by considering all possibleall possible

contiguous and non-contiguouscontiguous and non-contiguous subregions where subregions where the outbreak may be occurringthe outbreak may be occurring

Page 4: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

An example of outbreak An example of outbreak states in a hypothetical states in a hypothetical

regionregion Three zip codes in a hypothetical region monitored OBi and NOBi: represent the disease outbreak states

(outbreak and non-outbreak) for zip code i. where 1 ≤ i ≤ 3.

Total number of spatial outbreak states in the region: 23 = 8 NOBiOBi

1

32

Page 5: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

General case of outbreak General case of outbreak states in a hypothetical states in a hypothetical

regionregion Suppose that there areSuppose that there are r r zip codes in zip codes in

a region being monitored. Then a region being monitored. Then there are 2there are 2rr spatial outbreak states. spatial outbreak states. Consider each outbreak state as being a Consider each outbreak state as being a

spatial disease hypothesisspatial disease hypothesis If we perform complete Bayesian model If we perform complete Bayesian model

averaging over all of these hypotheses averaging over all of these hypotheses in a brute-force way, then the time in a brute-force way, then the time complexity is complexity is OO(2(2r r ))

Page 6: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

NotationNotation r r – total number of zip codes in a – total number of zip codes in a

hypothetical regionhypothetical region i i – the index of a specific zip code, – the index of a specific zip code,

where where 1 ≤ 1 ≤ ii ≤ ≤ rr q q – the probability that a zip code has – the probability that a zip code has

an outbreak. an outbreak. PP((qq): our belief about ): our belief about qq

When 0 < When 0 < qq ≤ 1, we model that ≤ 1, we model that PP((qq) = ) = 1 – 1 – αα, where 0 ≤ , where 0 ≤ αα ≤ 1. ≤ 1.

Page 7: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

DefinitionsDefinitions

OB: the state that at least one zip code has an outbreak present

NOB: the state that no zip code has a disease outbreak, where P(NOB) = α

Page 8: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

AssumptionsAssumptions The spatial region is partitioned into a The spatial region is partitioned into a finite finite

set of spatial primitivesset of spatial primitives, such as , such as rr zip codes. zip codes. A spatial outbreak hypothesis consists of an A spatial outbreak hypothesis consists of an

assignment of an outbreak as assignment of an outbreak as presentpresent or or absentabsent to each spatial primitive. to each spatial primitive.

Each spatial primitive is modeled as an Each spatial primitive is modeled as an independent Bernoulli trialindependent Bernoulli trial with probability with probability qq of having an outbreak during the current of having an outbreak during the current period.period.

Data consists of the current period Data consists of the current period counts of counts of some binary eventsome binary event (e.g., ED respiratory (e.g., ED respiratory counts in the past 24 hours).counts in the past 24 hours).

Page 9: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

The SBMA algorithmThe SBMA algorithm

The joint probability of the data (The joint probability of the data (DD) and ) and the outbreak state (the outbreak state (OBOB):):

qOBDP

qNOBDP

qOBDP

qNOBDP

qOBDP

qNOBDP

|,

|,

|,

|,

|,

|,

33

33

22

22

11

11

1

0

1

0

|,,

dqqP

dqqOBDPqPOBDP

1

32

P(D1 , NOB1|q) + P(D1 , OB1|q)

P(D2 , NOB2|q) + P(D2 , OB2|q)

P(D3 , NOB3|q) + P(D3 , OB3|q)

qNOBDPqNOBDPqNOBDP |,|,|, 332211

Linear time

Polynomial time qDP | qNOBDP |, qDP | qNOBDP |,

Page 10: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

The SBMA algorithmThe SBMA algorithm

where Di represents the data for zip code i .

r

iii

r

iii NOBDPNOBDPNOBPNOBDP

11

||,

1

011

|,|,|,, dqqNOBDPqOBDPqNOBDPqPOBDPr

iii

r

iiiii

Computes the posterior probability P(OB|D)

NOBDPOBDP

OBDPDOBP

,,

,|

Polynomial time

*http://www.dbmi.pitt.edu/panda/papers/Shen/ISDS07.pdf

Linear time

Polynomial time

Page 11: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

Preliminary experimentsPreliminary experiments

96 simulated anthrax outbreaks96 simulated anthrax outbreaks**

Each outbreak consists of a time series of ED Each outbreak consists of a time series of ED patient cases, where each case has a chief patient cases, where each case has a chief complaint and a zip code.complaint and a zip code.

The probability that a case was assigned to live in a The probability that a case was assigned to live in a zip code was proportional to the population of that zip code was proportional to the population of that zip code.zip code.

The simulated outbreak cases were overlaid onto The simulated outbreak cases were overlaid onto real ED cases to create a semi-synthetic dataset of a real ED cases to create a semi-synthetic dataset of a scattered disease distributionscattered disease distribution..

Ran PANDARan PANDA** and SBMA on each of the 96 and SBMA on each of the 96 semi-synthetic datasetssemi-synthetic datasets

* Cooper GF, Dash DH, Levander, JD, Wong WK, Hogan WR, Wagner MM. Bayesian biosurveillance of disease outbreaks. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2004) 94-104 [We used the version of PANDA reported in Figure 7].

Page 12: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

Preliminary resultsPreliminary results AMOC curves for PANDA and SBMA when AMOC curves for PANDA and SBMA when

the prior probability of an outbreak was 10%.the prior probability of an outbreak was 10%. At 0 false positive: SBMA detected the At 0 false positive: SBMA detected the

simulated scattered-disease outbreaks on simulated scattered-disease outbreaks on average about 12 hours before PANDAaverage about 12 hours before PANDA

20

25

30

35

40

45

50

55

60

65

70

0 2 4 6 8 10 12

False positives per year

Exp

ecte

d d

etec

tio

n t

ime

(ho

urs

) PANDA

SBMA

Page 13: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

Preliminary resultsPreliminary results We also ran PANDA and SBMA on 96 semi-synthetic We also ran PANDA and SBMA on 96 semi-synthetic

datasets of simulated outbreaks due to outdoor datasets of simulated outbreaks due to outdoor aerosol releases of anthrax sporesaerosol releases of anthrax spores

At 0 false positives PANDA detected a simulated At 0 false positives PANDA detected a simulated anthrax outbreak on average about 5 hours before anthrax outbreak on average about 5 hours before SBMASBMA

30

32

34

36

38

40

42

44

46

48

50

0 2 4 6 8 10 12

False positives per year

Exp

ecte

d d

etec

tio

n t

ime

(ho

urs

) PANDA

SBMA

Page 14: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

ConclusionsConclusions

An algorithm called SBMA was An algorithm called SBMA was introduced that performs exhaustive introduced that performs exhaustive Bayesian model averaging over all Bayesian model averaging over all possible spatial distributions of disease.possible spatial distributions of disease.

SBMA monitors for a wide variety of SBMA monitors for a wide variety of spatial patterns of a disease outbreak spatial patterns of a disease outbreak in polynomial time.in polynomial time.

Preliminary results support that SBMA Preliminary results support that SBMA is complementary to PANDA in disease is complementary to PANDA in disease outbreak detection.outbreak detection.

Page 15: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

Future workFuture work

Apply the SBMA algorithm to a wide Apply the SBMA algorithm to a wide variety of semi-synthetic outbreak test variety of semi-synthetic outbreak test scenarios and compare its detection scenarios and compare its detection performance to that of other spatial performance to that of other spatial algorithmsalgorithms

Develop a multivariate version of SBMADevelop a multivariate version of SBMA Extend SBMA to model progression of Extend SBMA to model progression of

disease over time in order to create a disease over time in order to create a spatio-temporal BMA outbreak-detection spatio-temporal BMA outbreak-detection algorithmalgorithm

Page 16: Yanna Shen 1,2 , Weng-Keen Wong 3 , John D. Levander 2  and Gregory F. Cooper 1,2

AcknowledgementsAcknowledgements

This research was funded by a This research was funded by a grant from the National Science grant from the National Science Foundation (NSF IIS-0325581)Foundation (NSF IIS-0325581)