Influenza A(H1N1)
Executive Summary:Natural Language Processing of Twitter
#swineflu Posts using the Semantic MEDLINE Prototype
Dr. Alla Keselman, Dr. Thomas Rindflesch, David HaleNational Library of Medicine, National Institutes of Health,
Department of Health and Human ServicesMay 2009
http://twitter.com/CDCemergency
H1N1 information via Twitter:Communication issues
• Information receivers– Information overload
• >12,000 #swineflu (H1N1) posts/hour @ peak– Signal:Noise ratio
• Quality?• Authority?
– Twitter accounts impersonating CDC• Information providers– Effective information provision– Biosurveillance
(un)Controlled Vocabulary
• Folksonomy• Hashtags (#)• Grammar• Abbreviations– SRSLY IMO ROI 4 RT? YMMV
• High context
#swineflu Tweets
Acquisition Challenges
• Twitter timeline– Storage requirements– Privacy
• Twitter API– Limited search functionality• Temporal and range limitations
– Range definition limited to midnight– 1500 posts from limit
Semantic MEDLINE Prototype
• Summarizes MEDLINE citations returned by PubMed search
• Natural Language Processing (MetaMap, SemRep) used to analyze salient content in titles and abstracts
• Information presented in graph that has links to the MEDLINE text processed
• Visualize relationships, such as:– A is a process of B– X treats Y
http://skr3.nlm.nih.gov/SemMedDemo/
http://skr3.nlm.nih.gov/SemMedDemo/
http://skr3.nlm.nih.gov/SemMedDemo/
Semantic processing of#swineflu Tweets
• Sample - 1267 Tweets– Afternoon of April 27, 2009
• No adjustments made to NLP software (MetaMap, SemRep)– No additional vocabulary, abbreviations, etc.
Preliminary Processing of #swineflu Tweets
Preliminary Processing of #swineflu Tweets
Concepts in Tweets Isolatedby Semantic Processing
• Disease: influenza• Disease symptom: coughing• Geographic area: Mexico• Animal: family suidae • Health care organization: Centers for Disease
Control and Prevention (U.S.)• Medical device: mask
Next Steps
• Processing of larger dataset– include non-H1N1-related Tweets
• Additional vocabulary– Folksonomy, abbreviations, etc.
• Visualization of semantic processing results
Opportunities
• Biosurveillance• Monitoring of wide-spread sentiment• Targeted information provision– Respond to misinformation trends
• Evaluation of accuracy/authenticity
Links
• Semantic MEDLINE Prototype– http://skr3.nlm.nih.gov/SemMedDemo/
• Semantic Medline: Multi-Document Summarization and Visualization– http://www.nlm.nih.gov/pubs/techbull/mj07/theater_ppt/semantic.
ppt• National Library of Medicine
– http://www.nlm.nih.gov• National Institutes of Health
– http://nih.gov• Department of Health and Human Services
– http://hhs.gov
Dr. Alla Keselmankeselmana AT mail DOT nlm DOT nih DOT gov
Dr. Thomas Rindfleschtrindflesch AT mail DOT nih DOT gov
David Haledavid DOT hale AT nih DOT gov
Top Related