Post on 28-Jun-2020
1
Medical Social Web and Event Detection
Dr. Kerstin Denecke
Agenda
• The Medical Social Web
• Event Detection
• Public Health Event Detection
• Evaluation of Event Detection Systems
• Overview on the papers
08/12/10Kerstin Denecke 2
08/12/10Kerstin Denecke 3
Forum
Weblogs
Open Access Media
Multi Media Content
@GoethesMatrix Leitungswasser meiner Uni. Wir hatten letztens den Noro-Virus hier, vielleicht ist das ja ein Symptom?
The Medical Social Web – What is that?
Challenges of Social Media Data
• Huge amount of data available
� Irrelevant information vs. relevant
• Use of specific language
� Medical language vs. consumer health vocabulary
� Common language
Doctors have concluded thata body temperature above102 does indeed mean thatyou have Bieber Fever .
I got a fever sore throatand headache
irrelevant relevant
Challenges of Social Media Data
• Subjective content
� Information vs. opinion
• Different styles of writing, noise
� Abbreviations, writing errors, emoticons..
Asthma Problem Due to Allergy: A common indoor environmental asthma trigger is the mold that might be present in...
Thankful dat I got no ailments other den arthritis. Sum ppl got asthma , cancer, aids, badbreath, fatness, etc.
Event Detection
• Definition / Problem Statement
• Overview on approaches
• Public health event detection - challenges
08/12/10Kerstin Denecke 6
Where is event detection important?
From: D.B. Neill, W-K. Wong: Tutorial on Event Detection, KDD 2009,
)
Where is event detection important?
Where is event detection important?
Video Surveillance
Events: A journalist‘s perspective
1. What happened?2. When did the event occur?3. Who was involved?4. Where took it place?5. How did it happen?6. What is its impact,
significance, consequence?
What is an event?
“An event is a specific thing that happens at a specific time and place along with all necessary preconditions and unavoidable consequence.”
C. Cieri et al.: Corpora for topic detection and tracking. In: Topic detection and tracking, 2002, pp. 33-66
Retrospective eventspreviously unidentified events from accumulated historical collection
New eventsnew events detected from live
feeds in realtime
Rare vs. frequent events
Goals of event detection
• Identify if an event of interest has occurred
• Characterize the event
• Pinpoint the affected subgroup of the data i.e.
� What features describe the event (eg. Spatial area, time duration)?
� What is the severity/magnitude of the event?
• Detect as accurately as possible
• Detect as early as possible
PAHO EOC Situation Report #15 - Cholera Outbreak in Haiti
Event Detection Approaches
• Approaches from the natural language processing community
� Message Understanding Conferences (MUC)
� Topic Detection and Tracking (TDT)
� Automatic Content Extraction (ACE)
• Approaches from the data mining community
� Classification
� Clustering
08/12/10Kerstin Denecke 13
Natural Language Processing Approaches
• Message Understanding Conferences (MUC)
�Information Extraction, template filling
• Topic Detection and Tracking (TDT )
� issues related to detecting and tracking events in broadcast news
�Event refers to topic
• Automatic Content Extraction (ACE )
�an event is an n-ary relation, binding entities of a given type together by an explicit and named concept.
�definition of various types of events, such as: marriage, death, merger
08/12/10Kerstin Denecke 14
� pattern or rule-based approaches
Event Detection Approaches
• Approaches from the data mining community
� Assumption: Document is part of document set containing an eventor not
� Classification
� Clustering
08/12/10Kerstin Denecke 15
Machine learning: Learning characteristics from feature sets
Event Detection vs. Information Retrieval
Information retrieval
- Relies upon user-defined query to specify what is „interesting“
- Finds documents that satisfies an information request
Event Detection (in particular: new event detection)- No knowledge of what events will happen � without specified query- Might look for clues in relevant information sources
Public Health Event Detection
• Definitions
• Overview on approaches
• Current challenges
08/12/10Kerstin Denecke 17
Tens of thousands of people in Haiti arethreatened by a recent Cholera outbreakdespite the UN insisting that the endemicis stabilising.
Definitions
Disease Surveillance
� epidemiological practice by which the spread of disease is monitored in order to establish patterns of progression
Epidemic Intelligence
� complement traditional surveillance systems by incorporating new official and unofficial sources of structured and unstructured information
Public Health Event involves a disease occurrence or death above expected levels for the specific disease at a particular time and place.
Indicator refers to an epidemiological quantity, that is based on clearly defined events, usually cases of a disease or an infection according to a case definition, that are reported in a standardized way.
Signal is generated by a system based on observed data sources whenever a predefined or computed threshold for an indicator is exceeded. It can therefore be considered as a hint to a possiblepublic health event.
Definitions
Motivation
From: D.B. Neill, W-K. Wong: Tutorial on Event Detection, KDD 2009,
Approaches to Disease Surveillance
= the organized and rapid capture of information about events that are a potential risk to public health
• Information can be rumors or ad-hoc reports transmitted through formal and informal channels
• System rely on the immediate reporting of events and are designed to detect:
� Rare and new events that are not specifically included in indicator-based surveillance.
� Events that occur in populations which do not access health carethrough formal channels.
Event-based Surveillance
Producing signals for a public health event
Norovirus example
Norovirus outbreak (Real event)
Indicator (Multiple instances where of the term norovirus is mentioned)
Signal (Number of mentions exceeds specific threshold)
Event-detection (Public health event)
• Pattern matching
• Handcrafted-rules
Approaches to event-based EI
• Machine learning
• Automatically created patterns
Existing Event-based Surveillance Systems
• Technological challenges
� Considering many data sources
� Dealing with noise
� High specificity and sensitivity
• Epidemiological challenges
� Detection time
� Emerging and unknown diseases
Current Challenges in EI
Epidemiological Challenges: Detection Time
Existing event-based services: MedISys, ProMed Mail, news, publications, government websites
Established surveillance systems: SurvNet, ARS,
sentinels
Web 2.0 and user generated sources
2 hours 1 day 3 days� t
Time to event detection, different EI methods
Event occurrence
Evaluation of Event Detection Systems
• General Approach
� Training set, annotated with events
� Test set
� 10-fold-cross-validations
� Measures: Precision, Recall, F-Score
08/12/10Kerstin Denecke 28
� Annotated corpora are only available for events forspecific domains (news)
� Creation of annotated data sets is time consuming
Evaluation
• TREC entity track
� Goal: perform entity-related search on Web data
� http://ilps.science.uva.nl/trec-entity/
• TRECVid Surveillance Event Detection
� goal: support the development of technologies to detect visual events (people engaged in particular activities) in a large collection of streaming video data
� Training data: 150 hours of multi-camera airport surveillancedomain data
� Test corpus
� http://www.itl.nist.gov/iad/mig//tests/trecvid/2010/
Evaluation: Challenges for public health
General challenges
• Active feedback from the users necessary
• Time consuming
• Component evaluation vs. system evaluation
Specific challenges
• No annotated data set for medical social media data
• Sensitivity vs. specificity
• Comparison of various systems
Overview on the papers
1. Statistical Challenges Facing Early Outbreak Detection in
Biosurveillance (G. Shmueli, H. Burkom)
2. Detecting influenza outbreaks by analysing Twitter messages (A.
Culotta)
3. What‘s unusual in online disease outbreak news? (N. Collier)
Kerstin Denecke 08/12/10 32
Topic :
� Public health event detection
� Approaches for social media data
� Evaluation of detection systems
Statistical Challenges Facing Early Outbreak Detection in Biosurveillance (G. Shmueli, H. Burkom)
• Topic: Challenges for applying statistical methods from indicator-
based surveillance to new data
• Key task: combine new data sources with traditional ones to
classify situational awareness
• Challenges:
� Modeling of underlying background knowledge
� Nature of an outbreak
� Evaluation of performance
� Requirements and uses of biosurveillance systems
Kerstin Denecke 08/12/10 33
Detecting influenza outbreaks by analysing Twittermessages (A. Culotta)
• Topic: Use of Twitter data to forecast future influenza rates
• Approach: Simple keyword matching
• Show correlation with U.S. national statistics on disease outbreaks
• Problems and challenges:
� False positives „Bieber fever“, „flu vaccines“
• Classification approach for filtering
Kerstin Denecke 08/12/10 34
What‘s unusual in online disease outbreak news? (N. Collier)
• Topic: Use of online news to support early alerting
• BioCaster system: Textmining system for monitoring global online
media
• Analysis of various statistical methods for signal generation
• Challenges and problems
• Conclusion: it is non-trivial to relate news counts to the actual
number of cases
Kerstin Denecke 08/12/10 35
References
• Chinchor N (ed.): Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998, http://www.aclweb.org/anthology/M/M98/
• Doddington G, Mitchell A, Przybocki M, et al.: The Automatic ContentExtraction (ACE) Program Tasks, Data, and Evaluation. LREC 2004, Lisbon, Portugal
• Fiscus J, Doddington G: Topic Detection and Tracking Evaluation Overview. In: Topic Detection and Tracking Event-based Information Organization. 2002
• Zhao Q, Mitra P: Event detection and Visualization for Social Text Streams. ICWSM 2007, Boulder, Canada
Thank youfor your attention!
Dr. Kerstin DeneckeForschungszentrum L3Sdenecke@L3S.de
Medical Social Web and Event Detection