Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE...

19
Statistical Issues and Challenges Associated with Rapid Detection of Bio- Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean 11/17/2005

Transcript of Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE...

Page 1: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks

SE Fienberg and G Shmueli(2005)

Presented by Lisa Denogean11/17/2005

Page 2: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Detection Problems

Traditionally used medical and public health data may take months to collect, obtain, and analyze– Need better system for collection, efficient

detection and privacy protection

Real-time collection often does not result in enough data, the signal is too weak for detection– Need to be able to collect and effectively

analyze more data from different sources

Page 3: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Outline

System and Data Requirements for Timely Detection

Grocery Sales Data Example: Combining Data Across Sources

Advantages and Disadvantages of Different Data Sources

Page 4: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Detection System and Data Requirements

Types of Data Available Traditional data

– ER visits, 911 calls, mortality records, veterinary reports, school or work absence records…

Non-traditional– To detect known agent, e.g. anthrax– OTC medication sales, grocery (e.g. OJ and soup) sales

Initial Data Requirements Frequently collected

– Real-time, frequent non-traditional data, or improved traditional

Fast transfer – Electronic recording and data conversion

Page 5: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Essential Data Features

Early signature of the outbreak– Data allows detection of a disease signature a day or

week before the disease apparent– OTC sales, website searches, bio-sensors

Sufficient amounts of data– Lack of sufficient data leads to under-detection– Temporal or spatial aggregation, but could slow

detection or dampen a signal

Local, not regional or national data– Improves sensitivity and timeliness

Page 6: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Detection System Requirements Immediate analysis of incoming data

– Resources for quick storage and efficient detection algorithms

Immediate output– Output an operational decision-making conclusion

in a user-friendly transferable format Flexibility

– Almost or fully automated for different outbreak types

Considerations– Number of false alarms vs. speed of true

detection rate– Expense of false alarms vs. risk of not detecting

true outbreak

Page 7: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Advances

NYC syndromic surveillance system*– Track 911 calls, OTC sales, ER admissions,

absenteeism (weekly false alarms) Real-time outbreak and disease

surveillance (RODS) system– Real-time collection of ER visits in Western

Pennsylvania (including retailer data) National Electronic Disease Surv. System

– CDC initiative for electronic transfer of health information

New sources (not yet available)– Track medical web searches, body tracking devices,

biosensor data

Page 8: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Inhalational Anthrax

First stage– A few hours to a few days (assume within 3

days)– Nonspecific symptoms: fever, sweat, fatigue,

cough, sore throat, nausea, headache– Similar to flu symptoms, except no runny nose– Rapid treatment improves survival

Second stage– Develops rapidly– Extreme symptoms– At least 80% fatality rate within 2 – 48 hours

Grocery Sales Example

Page 9: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Data electronically recorded in real-time Large amounts of data at rich levels of detail Processing time vs. level of detail

considerations Aggregated level of daily sales for each item

and hourly basket-level data Purchase data are localized, useful for

detecting large-scale outbreaks in small areas OTC and grocery sales can show an early

signature of symptoms of an outbreak Dependence between sales within neighboring

periods of time due to fine time scale Smaller ratio between signal and noise

Sales Data Features

Page 10: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Statistical Detection System Framework Decide which items to monitor

– Epidemiological and statistical analysis of information contained in different sales

Model the “no-outbreak” sales baseline– Account for promotions, sales, season, etc that

would add noise (clean data) Simulate an outbreak signature

– Footprint of anthrax known in traditional data, consult with outside experts for new data

Develop a roll-forward algorithm– Integrate previous data for detection in new data

Test system for real and false alarms– Select threshold based on simulations

Page 11: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Data

Nasal symptoms are unrelated to anthrax

Focus on cough meds (daily) and tissues, OJ and soup (basket-level)

Page 12: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Baseline Data indicates seasonal effect in overall sales

and includes flu cases Assume cough meds insensitive to promotions Smoothing methods applied Estimate baseline variability False alarms near holidays for all methods

Simulation Epidemiologist opinions on how anthrax is

manifested in cough medication sales Sales increase linearly over 3 day period

Page 13: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Detection System FormulationDetail from reference [12]

Clean data– Preprocess: Account for store level sales– Filter/De-noise: Decompose series into cosine

waves, retain those with large magnitudes

Forecast via wavelet approach– Efficient and tractable for non-stationary series– Autoregressive moving average model not

flexible to data type, user intervention required– Decompose series into resolutions of different

frequencies– For each resolution, use autoregressive model for

forecasting the next point

Page 14: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Detection System (cont.) Threshold for next-day forecasts

– Control chart type argument to determine anthrax-related variability

– Alarm if true sales more than 3 standard deviations above de-noised series prediction

Basket-level (50 products, 200k-500k/week)– Method of association rules: Pairs and triplets– Threshold: Most unexpected combinations

Evaluation– Simulate anthrax footprint as 3 day spike linearly

increasing pattern– Study different configurations of the system– If the scale of the footprint increases cough sales by

factor of 1.36 or more, 100% footprints detected– Outbreaks coinciding with holidays problematic

Page 15: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.
Page 16: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Combining Data Sources: Benefits and Challenges

Data Linkage

Linking data from multiple sources requires system-wide unique identifiers or variables for record linkage

Linkage methods use match features or string distances– Need extensions that link multiple lists and

allow for missing identifiers

Page 17: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Approaches to Using Multiple Data Sources

Independently and simultaneously monitor separate sources– Multiple testing inflates false alarm rate

Track different series intensively but sequentially– Alarms trigger further data collection and

analyses of other series (Univ. of Utah – flu)– Hierarchical signaling

Multivariate modeling– Use merged records for individuals or families– Measurement error from record linkage– Privacy and confidentiality concerns

Page 18: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Privacy and Confidentiality Issues Health Ins. Portability and

Accountability Act (HIPPA) restrictions– Permits de-identified data for research– Medical and public health org. may be exempt

Private commercial interests– Concern over information in grocery and OTC sales

data

Integrated data concerns– Linking across databases may pose more risks in

exposing confidential information

Page 19: Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Summary: Questions for Consideration What and how do non-traditional data carry

signals of an outbreak? How can we efficiently and accurately integrate

and analyze data from multiple sources? How can we effectively temporally or spatially

aggregate data? How can we use geographic detail to control

excessive false alarms? Can merged files useful for detection not allow for

re-identification and linkage to source? Is a risk-utility trade-off tolerable? Can a trusted third-party update files in real-time,

separately from the detection system?