Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA...

38
Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National Institute of Biomedical Innovation EMNLP2011

Transcript of Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA...

Page 1: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Twitter Catches The Flu: Detecting Influenza Epidemics

using Twitter

Eiji ARAMAKI * Sachiko MASKAWA *

Mizuki MORITA **

* The University of Tokyo** National Institute of Biomedical Innovation

EMNLP2011

Page 2: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Why we developed this system?

Let me show you several existing systems

Page 3: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Centers for Disease Control and Prevention (CDC)

Page 4: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Infection Disease Surveillance Center (IDSC)

Page 5: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

European Influenza Surveillance Network (EISN)

Page 6: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Why each country has each surveillance system?

• Influenza epidemics are a major public health concern, because it causes tens of millions of illnesses each year.

• To reduce the victims, the early detection of influenza epidemics is a national mission in every country.

• BUT: These surveillance systems basically rely on hospital reports (written manually).

Page 7: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Two Problems & Recent Approach

• (1) Small Scale – For example, IDSC gathers influenza patient data from

5,000 clinics. But It does not cover all cities (especially local cities).

• (2) Time Delay (Time lag)– For example, the data gathering process typically has a

1–2 week reporting lag

• To deal with these problems – Recently, various approaches that directly capture

people’s behavior are proposed

Page 8: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Recent Approach

• using Phone Call data– Espino et al. (2003) used data of a telephone

triage service, a public service, to give an advice to users via telephone. They reported the number of telephone calls that correlates with influenza epidemics.

• using Drug sale data– Magruder (2003) used the amount drug sales.

Among various approaches…

Page 9: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

The State-of-the-ArtWeb based Approach

• Ginsberg et al. (Nature 2009) used Google web search queries that correlate with an influenza epidemic, such as “flu”, “fever”.

• Polgreen et al. (2008) used a Yahoo! query log.• Hulth et al. (2009) used a query log of a

Switzerland web search engine.

Page 10: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

This Study

• Web search query is a extremely large scale and real-time data resource.

• BUT: the query data is closed (not freely available), which is available only for several companies, such as Google, Yahoo, or Microsoft.

→ This study examines Twitter data, which is widely available.

Page 11: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

OUTLINE• Background

• Objective

• Method

• Experiment

• Discussion

• Conclusion

Detailed Task Definition

Detailed Task Definition

Page 12: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Simple Word Frequency in Twitter“Cold”, “Fever” & “influenza”

Winter Summer

Simple Word Frequency contains various noisesBecause….

Actual influenza curve is more smooth

Page 13: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Negative Influenza TweetNegative Influenza Tweet

Positive Influenza TweetPositive Influenza Tweet

A word “influenza” does not always indicate an influenza patient

Page 14: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Two types of Influenza Tweets

• Negative influenza tweet indicates an influenza patient

• Negative influenza tweet includes mention of “influenza”, but does not

indicate that an influenza patient is present

• Not only the general news, but also various phenomena generate Negative influenza tweet…

Negative Influenza TweetNegative Influenza Tweet

Positive Influenza TweetPositive Influenza Tweet

Page 15: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Various Negative Influenza Tweet (1/2)

• Prevention – You need to get a influenza shot sometime

soon.

• Modality (just suspition)– @John might be suffering from influenza

• Question– Did you catch the influenza ?

Page 16: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Various Negative Influenza Tweet (2/2)

• Influenza of Cat or Dog– Today, I couldn't go

home late. My cat caught the influenza...

• Influenza of TV Character – In the last episode of

that TV Series, Ritsu-chan caught the flu

Page 17: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Research Questions• In total, half of Influenza related tweets are negative,

motivating an automatic filtering.

• RQ1: Could a NLP system filter out the negative influenza tweet?

• RQ2: Could this filtering contributes to the surveillance accuracy?

Page 18: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

OUTLINE• Background

• Method

• Experiment

• Discussion

• Conclusion

Page 19: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Basic Idea: Binary Classification• We regard this task as a binary classification task , such as a spam mail

filtering

PositivePositiveNegativeNegative

Training Corpus

Training Corpus

(2) What kind of Feature?

(3) What kind of Machine Learning Method?

(1) What kind of Corpus?

inputinput

Page 20: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

See proceeding for detailed Average Annotator Agreement Ratio = 0.85

Corpus (5k Sentences with Labels)

Page 21: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

What kind of Feature?

I think the influenza is going aroundR1 R2 R3L1L2L3

• Surrounding Words (BOW, no stemming, no POS)

• Among various settings, Window size = 6 achieved the highest accuracy

Twitter contains many ungrammatical

expressions

Twitter contains many ungrammatical

expressions

Page 22: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

What kind of Machine Learning Method?

Classifier F-Measure TimeAdaBoost 0.592 40.192Bagging 0.739 530.310Decision Tree 0.698 239.446Logistic Regression 0.729 696.704Nearest Neighbor 0.695 22.441Random Forest 0.729 38.683SVM (polynomial; d=2) 0.738 92.723

• Among various settings, SVM achieved the feasible accuracy

Page 23: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

OUTLINE• Background

• Method

• Experiment

• Discussion

• Objective

Page 24: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Twitter Data (2008-2010)

• First month is used for training corpus• We divides the other data into 4 seasons

– Twitter API sometimes changes the spec, leading to dropout periods.

Season ISeason I Season IISeason II Season IIISeason III Season IV

Season IV

Page 25: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Method Comparison & Evaluation• (1) TWEET-SVM (The proposed method)• (2) TWEET-RAW

– Based on simple word frequency of “influenza”• (3) GOOGLE [Ginsberg 2009]

– Based on Google web-search query– The previous estimation data is available at the Google Flu Trend

website.

• (4) DRUG-SALE [Magruder 2003]• Evaluation is based on – Average Correlation with GOLD_STANDARD DATA that

is the real number of the influenza patients reported by Infection Disease Surveillance Center (IDSC)

Page 26: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Result: Correlation Ratio

TWEET-RAW TWEET-SVM GOOGLE DRUG

Season I 0.683 0.816 0.817 -0.208

Season II -0.009 -0.018 0.232 0.406

Season III 0.382 0.474 0.881 0.684

Season IV 0.390 0.957 0.976 0.130Bold indicates the correlation > statistical significance level.

In most seasons, the proposed method achieved the higher correlation than simple word freq-based method, demonstrating the advantage of the SVM based filtering

In most seasons, the proposed method achieved the higher correlation than simple word freq-based method, demonstrating the advantage of the SVM based filtering

+SVM

Page 27: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Result: Correlation Ratio

TWEET-RAW TWEET-SVM GOOGLE DRUG

Season I 0.683 0.816 0.817 -0.208

Season II -0.009 -0.018 0.232 0.406

Season III 0.382 0.474 0.881 0.684

Season IV 0.390 0.957 0.976 0.130Bold indicates the correlation > statistical significance level.

Except for Season II, the proposed method achieved almost the same accuracy to GOOGLE.

Except for Season II, the proposed method achieved almost the same accuracy to GOOGLE.

+SVM

Page 28: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Why Twitter suffers from Season II? Because it includes Pandemic!

Suggesting Twitter might be biased by News Media

TWEET-RAW TWEET-SVM GOOGLE DRUG

Normal Season 0.831 0.890 0.847 0.308Pandemic Season 0.001 0.060 0.918 0.844

WHO says Pandemic

In 1999 Jul (Season II).

WHO says Pandemic

In 1999 Jul (Season II).

Page 29: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Season ITWEET-SVM ≒ GOOGLERelative number

Page 30: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Season IIRelative number

TWEET-SVM << GOOGLE

Page 31: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

OUTLINE• Background

• Method

• Experiment

• Discussion

• Conclusion

Extra ExperimentExtra Experiment

Page 32: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Frequent Question

• Could an Influenza Patient REALLY use a Twitter or Google Search?

• That seems to be un-natural situation!

I’d like to

sleep ...

I’d like to

sleep ...

Due to that, we modified the system assuming as follows:

People use Twitter or Google at the first sign of the influenza

People use Twitter or Google at the first sign of the influenza

Page 33: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

( Markov model)≒

Implemented by usingInfectious Model [Kermack1927]

SSSusceptibleSusceptible

II RRInfectiousInfectious RecoverRecover

Catch the flu Recover

• S-to-I transition is observed by Twitter / Google• 38% of Influenza people recover a day

0.38

0.62BEFORE FLU AFTER FLUUNDER FLU

Page 34: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

BUT: It ALSO improves Google based Approach

• This model improves correlation of BOTH Twitter & GOOGLE.

• This result suggests that there is a room of collaboration between medical study and web/NLP study

Page 35: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

OUTLINE• Background

• Method

• Experiment

• Discussion

• Conclusion

Page 36: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Answer to Research Questions• This study proposed a new influenza surveillance

system using Twitter• RQ1: Could a system filter out the negative influenza?– Yes. But NOT Perfect

• RQ2: Could this accuracy contribute to the surveillance performance?– YES. It increases the correlation (except for pandemic

period).

• We could achieve the almost same accuracy to GOOGLE using freely available data.

Page 37: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Conclusion

• Still now, more than 100 (sometime over 1,000) people die from influenza in Japan

• We hope that this study might help people

Page 38: Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National.

Thank you

NLP could save a life!

Eiji ARAMAKI Ph.D.University of Tokyohttp://mednlp.jp

Eiji ARAMAKI Ph.D.University of Tokyohttp://mednlp.jp