Terrorist Organizational Models Terrorist Organizational Models
Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE...
-
Upload
owen-smith -
Category
Documents
-
view
214 -
download
0
Transcript of Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE...
Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks
SE Fienberg and G Shmueli(2005)
Presented by Lisa Denogean11/17/2005
Detection Problems
Traditionally used medical and public health data may take months to collect, obtain, and analyze– Need better system for collection, efficient
detection and privacy protection
Real-time collection often does not result in enough data, the signal is too weak for detection– Need to be able to collect and effectively
analyze more data from different sources
Outline
System and Data Requirements for Timely Detection
Grocery Sales Data Example: Combining Data Across Sources
Advantages and Disadvantages of Different Data Sources
Detection System and Data Requirements
Types of Data Available Traditional data
– ER visits, 911 calls, mortality records, veterinary reports, school or work absence records…
Non-traditional– To detect known agent, e.g. anthrax– OTC medication sales, grocery (e.g. OJ and soup) sales
Initial Data Requirements Frequently collected
– Real-time, frequent non-traditional data, or improved traditional
Fast transfer – Electronic recording and data conversion
Essential Data Features
Early signature of the outbreak– Data allows detection of a disease signature a day or
week before the disease apparent– OTC sales, website searches, bio-sensors
Sufficient amounts of data– Lack of sufficient data leads to under-detection– Temporal or spatial aggregation, but could slow
detection or dampen a signal
Local, not regional or national data– Improves sensitivity and timeliness
Detection System Requirements Immediate analysis of incoming data
– Resources for quick storage and efficient detection algorithms
Immediate output– Output an operational decision-making conclusion
in a user-friendly transferable format Flexibility
– Almost or fully automated for different outbreak types
Considerations– Number of false alarms vs. speed of true
detection rate– Expense of false alarms vs. risk of not detecting
true outbreak
Advances
NYC syndromic surveillance system*– Track 911 calls, OTC sales, ER admissions,
absenteeism (weekly false alarms) Real-time outbreak and disease
surveillance (RODS) system– Real-time collection of ER visits in Western
Pennsylvania (including retailer data) National Electronic Disease Surv. System
– CDC initiative for electronic transfer of health information
New sources (not yet available)– Track medical web searches, body tracking devices,
biosensor data
Inhalational Anthrax
First stage– A few hours to a few days (assume within 3
days)– Nonspecific symptoms: fever, sweat, fatigue,
cough, sore throat, nausea, headache– Similar to flu symptoms, except no runny nose– Rapid treatment improves survival
Second stage– Develops rapidly– Extreme symptoms– At least 80% fatality rate within 2 – 48 hours
Grocery Sales Example
Data electronically recorded in real-time Large amounts of data at rich levels of detail Processing time vs. level of detail
considerations Aggregated level of daily sales for each item
and hourly basket-level data Purchase data are localized, useful for
detecting large-scale outbreaks in small areas OTC and grocery sales can show an early
signature of symptoms of an outbreak Dependence between sales within neighboring
periods of time due to fine time scale Smaller ratio between signal and noise
Sales Data Features
Statistical Detection System Framework Decide which items to monitor
– Epidemiological and statistical analysis of information contained in different sales
Model the “no-outbreak” sales baseline– Account for promotions, sales, season, etc that
would add noise (clean data) Simulate an outbreak signature
– Footprint of anthrax known in traditional data, consult with outside experts for new data
Develop a roll-forward algorithm– Integrate previous data for detection in new data
Test system for real and false alarms– Select threshold based on simulations
Data
Nasal symptoms are unrelated to anthrax
Focus on cough meds (daily) and tissues, OJ and soup (basket-level)
Baseline Data indicates seasonal effect in overall sales
and includes flu cases Assume cough meds insensitive to promotions Smoothing methods applied Estimate baseline variability False alarms near holidays for all methods
Simulation Epidemiologist opinions on how anthrax is
manifested in cough medication sales Sales increase linearly over 3 day period
Detection System FormulationDetail from reference [12]
Clean data– Preprocess: Account for store level sales– Filter/De-noise: Decompose series into cosine
waves, retain those with large magnitudes
Forecast via wavelet approach– Efficient and tractable for non-stationary series– Autoregressive moving average model not
flexible to data type, user intervention required– Decompose series into resolutions of different
frequencies– For each resolution, use autoregressive model for
forecasting the next point
Detection System (cont.) Threshold for next-day forecasts
– Control chart type argument to determine anthrax-related variability
– Alarm if true sales more than 3 standard deviations above de-noised series prediction
Basket-level (50 products, 200k-500k/week)– Method of association rules: Pairs and triplets– Threshold: Most unexpected combinations
Evaluation– Simulate anthrax footprint as 3 day spike linearly
increasing pattern– Study different configurations of the system– If the scale of the footprint increases cough sales by
factor of 1.36 or more, 100% footprints detected– Outbreaks coinciding with holidays problematic
Combining Data Sources: Benefits and Challenges
Data Linkage
Linking data from multiple sources requires system-wide unique identifiers or variables for record linkage
Linkage methods use match features or string distances– Need extensions that link multiple lists and
allow for missing identifiers
Approaches to Using Multiple Data Sources
Independently and simultaneously monitor separate sources– Multiple testing inflates false alarm rate
Track different series intensively but sequentially– Alarms trigger further data collection and
analyses of other series (Univ. of Utah – flu)– Hierarchical signaling
Multivariate modeling– Use merged records for individuals or families– Measurement error from record linkage– Privacy and confidentiality concerns
Privacy and Confidentiality Issues Health Ins. Portability and
Accountability Act (HIPPA) restrictions– Permits de-identified data for research– Medical and public health org. may be exempt
Private commercial interests– Concern over information in grocery and OTC sales
data
Integrated data concerns– Linking across databases may pose more risks in
exposing confidential information
Summary: Questions for Consideration What and how do non-traditional data carry
signals of an outbreak? How can we efficiently and accurately integrate
and analyze data from multiple sources? How can we effectively temporally or spatially
aggregate data? How can we use geographic detail to control
excessive false alarms? Can merged files useful for detection not allow for
re-identification and linkage to source? Is a risk-utility trade-off tolerable? Can a trusted third-party update files in real-time,
separately from the detection system?