Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic...

18
Influenza A(H1N1) Executive Summary: Natural Language Processing of Twitter #swineflu Posts using the Semantic MEDLINE Prototype Dr. Alla Keselman, Dr. Thomas Rindflesch, David Hale National Library of Medicine, National Institutes of Health, Department of Health and Human Services May 2009

description

Natural Language Processing of Twitter #swineflu Posts using the Semantic MEDLINE Prototype at the National Library of Medicine, National Institutes of Health, U.S. Dept. of Health and Human Services

Transcript of Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic...

Page 1: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Influenza A(H1N1)

Executive Summary:Natural Language Processing of Twitter

#swineflu Posts using the Semantic MEDLINE Prototype

Dr. Alla Keselman, Dr. Thomas Rindflesch, David HaleNational Library of Medicine, National Institutes of Health,

Department of Health and Human ServicesMay 2009

Page 2: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

http://twitter.com/CDCemergency

Page 3: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

H1N1 information via Twitter:Communication issues

• Information receivers– Information overload

• >12,000 #swineflu (H1N1) posts/hour @ peak– Signal:Noise ratio

• Quality?• Authority?

– Twitter accounts impersonating CDC• Information providers– Effective information provision– Biosurveillance

Page 4: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

(un)Controlled Vocabulary

• Folksonomy• Hashtags (#)• Grammar• Abbreviations– SRSLY IMO ROI 4 RT? YMMV

• High context

Page 5: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

#swineflu Tweets

Page 6: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Acquisition Challenges

• Twitter timeline– Storage requirements– Privacy

• Twitter API– Limited search functionality• Temporal and range limitations

– Range definition limited to midnight– 1500 posts from limit

Page 7: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Semantic MEDLINE Prototype

• Summarizes MEDLINE citations returned by PubMed search

• Natural Language Processing (MetaMap, SemRep) used to analyze salient content in titles and abstracts

• Information presented in graph that has links to the MEDLINE text processed

• Visualize relationships, such as:– A is a process of B– X treats Y

Page 8: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

http://skr3.nlm.nih.gov/SemMedDemo/

Page 9: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

http://skr3.nlm.nih.gov/SemMedDemo/

Page 10: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

http://skr3.nlm.nih.gov/SemMedDemo/

Page 11: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Semantic processing of#swineflu Tweets

• Sample - 1267 Tweets– Afternoon of April 27, 2009

• No adjustments made to NLP software (MetaMap, SemRep)– No additional vocabulary, abbreviations, etc.

Page 12: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Preliminary Processing of #swineflu Tweets

Page 13: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Preliminary Processing of #swineflu Tweets

Page 14: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Concepts in Tweets Isolatedby Semantic Processing

• Disease: influenza• Disease symptom: coughing• Geographic area: Mexico• Animal: family suidae • Health care organization: Centers for Disease

Control and Prevention (U.S.)• Medical device: mask

Page 15: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Next Steps

• Processing of larger dataset– include non-H1N1-related Tweets

• Additional vocabulary– Folksonomy, abbreviations, etc.

• Visualization of semantic processing results

Page 16: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Opportunities

• Biosurveillance• Monitoring of wide-spread sentiment• Targeted information provision– Respond to misinformation trends

• Evaluation of accuracy/authenticity

Page 17: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Links

• Semantic MEDLINE Prototype– http://skr3.nlm.nih.gov/SemMedDemo/

• Semantic Medline: Multi-Document Summarization and Visualization– http://www.nlm.nih.gov/pubs/techbull/mj07/theater_ppt/semantic.

ppt• National Library of Medicine

– http://www.nlm.nih.gov• National Institutes of Health

– http://nih.gov• Department of Health and Human Services

– http://hhs.gov

Page 18: Executive Summary: Natural Language Processing of Twitter #swineflu (H1N1) Posts using Semantic MEDLINE Prototype

Dr. Alla Keselmankeselmana AT mail DOT nlm DOT nih DOT gov

Dr. Thomas Rindfleschtrindflesch AT mail DOT nih DOT gov

David Haledavid DOT hale AT nih DOT gov