Thesis oral defense 2015 elvis saravia

53
Inferring User Interests from Microblog Data through Opinion Mining Student: Elvis Saravia Advisor: Prof. Yi-Shin Chen Institution: National Tsing Hua University Program: International Master Program in Information Systems and Applications (IMPISA) 1

Transcript of Thesis oral defense 2015 elvis saravia

Page 1: Thesis oral defense 2015  elvis saravia

Inferring User Interests from Microblog Data through Opinion Mining

Student: Elvis Saravia Advisor: Prof. Yi-Shin Chen

Institution: National Tsing Hua UniversityProgram: International Master Program in Information Systems and Applications (IMPISA)

1

Page 2: Thesis oral defense 2015  elvis saravia

Our Journey...→ Introduction→ Related Work→ Objectives→ Framework→ Experiment & Results→ Conclusion & Future Work→ Q & A

2Inferring User Interests from Microblog Data through Opinion Mining

Page 3: Thesis oral defense 2015  elvis saravia

Introduction→ Rapid Growth of the Web

○ Web 2.0 (user-generated content)○ Data generated rapidly○ Social Sharing platforms (Facebook & Twitter)

3

→ Online User-Behaviour Data○ Introduced research opportunities ○ The most valuable asset that a company possesses

Inferring User Interests from Microblog Data through Opinion Mining

Page 4: Thesis oral defense 2015  elvis saravia

Online User Behaviors

4Inferring User Interests from Microblog Data through Opinion Mining

Interests Emotions

Page 5: Thesis oral defense 2015  elvis saravia

Objectives

5Inferring User Interests from Microblog Data through Opinion Mining

→ This work aims to develop a behavior-based user interests identification model.

→ The algorithms proposed combine both contextual and emotion analysis to obtain better performance on user interests extraction.

Page 6: Thesis oral defense 2015  elvis saravia

Motivation→ Economic value

○ Recommendation services (dating sites & ads. targeting)○ Personalized systems (E-commerce & Search engines)

6

→ Personalization○ We love to be uniquely identified ○ Reduce extraction of ambiguous interests

Inferring User Interests from Microblog Data through Opinion Mining

“I may be exactly the same demographic as my neighbor, but that has nothing to do with what I eat.” - Lesperance VP of Digital Marketing and CRM for GrubHub

Page 7: Thesis oral defense 2015  elvis saravia

Related Work→ Ontology [Mylonas et al. 2008] [Bakalov et al. 2009]

○ Search logs and contextual information to build ontology

→ Social Structure [Bao et al., WWW 2010] [Wen et al., SIGKDD 2010]○ Focuses on the user social graph (friends and follows)

7Inferring User Interests from Microblog Data through Opinion Mining

Page 8: Thesis oral defense 2015  elvis saravia

Related Work → Contextual Information [Piao et al., 2011] [Yang et al., JCIS 2012]

○ Natural Language Processing (NLP) and latent Dirichlet allocation (LDA)

→ Behavior-Based [Zhou et al. 2008; Xing et al., WWW 2010]○ Collaborative filtering and Social Actions○ User Interactions (printing, copying and saving)

8Inferring User Interests from Microblog Data through Opinion Mining

Page 9: Thesis oral defense 2015  elvis saravia

Interest Definition→ Considerations:

○ Not everything we say or write interests us○ Our interests shouldn’t be ambiguous○ Ranking interests is challenging

→ Observations:○ Interests ← Motivation ← Positive Emotions [Silvia et al., 2002]○ Our personal Interests are interlinked with our positive emotions

9

I am in New York.

I cannot wait for the Facebook Developer Conference

Inferring User Interests from Microblog Data through Opinion Mining

Page 10: Thesis oral defense 2015  elvis saravia

Framework

10Inferring User Interests from Microblog Data through Opinion Mining

Contextual analysis + Emotion analysis

Page 11: Thesis oral defense 2015  elvis saravia

Rule-BasedExtraction

Emotion Classification

KeywordExtraction

Pre-processing

Interest Candidates Extraction

Interest Identification

Emotion Tagging & Filtering

Emotion Analysis

Interest Identification

Twitter Corpus

11Inferring User Interests from Microblog Data through Opinion Mining

Output file

POSTagging

Pre-processing

Page 12: Thesis oral defense 2015  elvis saravia

12Inferring User Interests from Microblog Data through Opinion Mining

Pre-processingFilter out information that doesn’t

provide any knowledge or value to user interest identification

Page 13: Thesis oral defense 2015  elvis saravia

Pre-processing→ Filter out non-English posts and Re-Tweets

→ Remove useless punctuation marks

13Inferring User Interests from Microblog Data through Opinion Mining

I am loving Jeremy Lin! I am loving Jeremy Lin!

For every post (P) in a collection of Tweets (T)

Page 14: Thesis oral defense 2015  elvis saravia

Pre-processing → Remove tweets containing hyperlinks (no emotion)

→ Remove repeated tweets (same emotion)

14Inferring User Interests from Microblog Data through Opinion Mining

Linsanity comes to LA. http://espn.com

.

.Linsanity comes to LA. http://espn.com

For every post (P) in a collection of Tweets (T)

Page 15: Thesis oral defense 2015  elvis saravia

Pre-processing

15Inferring User Interests from Microblog Data through Opinion Mining

→ Remove terms less than 3 characters long and terms containing “@” symbol○ (e.g. to and @jason)

@jason I love to go to New York @jason I love to go to New York

For every post (P) in a collection of Tweets (T)

Page 16: Thesis oral defense 2015  elvis saravia

16Inferring User Interests from Microblog Data through Opinion Mining

Rule-BasedExtraction

Emotion Classification

KeywordExtraction

Pre-processing

Interest Candidates Extraction

Interest Identification

Emotion Tagging & Filtering

Emotion Analysis

Interest Identification

Twitter Corpus

Output file

POSTagging

Pre-processing

Page 17: Thesis oral defense 2015  elvis saravia

17Inferring User Interests from Microblog Data through Opinion Mining

Interest Candidates Extraction3-phase interest candidates algorithm to extract as much interest candidates

as possible

Page 18: Thesis oral defense 2015  elvis saravia

Interest Candidates Extraction (1)→ POS-tagging

○ Part-of-speech tagging○ Nouns, Proper Nouns and Named entities○ Limitation: Naïve interest candidates

18

I cannot wait for the Facebook Developer Conference

I cannot wait for the Facebook Developer Conference

Inferring User Interests from Microblog Data through Opinion Mining

For every post (P) in a collection of Tweets (T)

Page 19: Thesis oral defense 2015  elvis saravia

Interest Candidates Extraction (2)→ Keyword Extraction (RAKE) [Rose et. al 2009]

○ Extract keywords from posts○ Limitation: phrase boundaries

19

I cannot wait for the Facebook Developer Conference

I cannot wait for the Facebook Developer Conference

Inferring User Interests from Microblog Data through Opinion Mining

I enjoyed watching Mr. Bean

For every post (P) in a collection of Tweets (T)

Page 20: Thesis oral defense 2015  elvis saravia

Interest Candidates Extraction (3)→ Previous Phases: Unreliable and Inconsistent

→ Emerging Interest Concepts? ○ Previous phases cannot extract them○ Provide better insights about users current interests

20Inferring User Interests from Microblog Data through Opinion Mining

Wimbledon 2015

Page 21: Thesis oral defense 2015  elvis saravia

Interest Candidates Extraction (3)→ Rule-Based Concept Extraction [Hsu et al., 2015]

○ Extract frequent emerging concepts based on “wisdom of the crowd”○ 80,000,000 tweets (3,000,000 users)○ 6 patterns were defined

21Inferring User Interests from Microblog Data through Opinion Mining

Page 22: Thesis oral defense 2015  elvis saravia

Interest Candidates Extraction (3)

22Inferring User Interests from Microblog Data through Opinion Mining

Page 23: Thesis oral defense 2015  elvis saravia

Interest Candidates Extraction (3)

23Inferring User Interests from Microblog Data through Opinion Mining

Page 24: Thesis oral defense 2015  elvis saravia

Interest Candidates Extraction (3)

24Inferring User Interests from Microblog Data through Opinion Mining

I am loving Wimbledon 2015 #WC2015 Wimbledon 2015

Crowd-wisdom

Page 25: Thesis oral defense 2015  elvis saravia

Interest Candidates Extraction

25Inferring User Interests from Microblog Data through Opinion Mining

→ Combine the results of the 3-phase interest candidates extraction algorithm.○ Repetitive interest candidates were removed

For every post (P) in a collection of Tweets (T)

Page 26: Thesis oral defense 2015  elvis saravia

26Inferring User Interests from Microblog Data through Opinion Mining

Rule-BasedExtraction

Emotion Classification

KeywordExtraction

Pre-processing

Interest Candidates Extraction

Interest Identification

Emotion Tagging & Filtering

Emotion Analysis

Interest Identification

Twitter Corpus

Output file

POSTagging

Pre-processing

Page 27: Thesis oral defense 2015  elvis saravia

27Inferring User Interests from Microblog Data through Opinion Mining

Emotion AnalysisTagging interest candidates with their pertaining emotion

Page 28: Thesis oral defense 2015  elvis saravia

Emotion Classification→ Pattern based approach

○ Appropriate for grammar informality of tweets

○ Effective for multilingual applications○ Contribution Degree

→ Why Positive emotions?○ Anticipation, Joy and Trust○ Highly related to motivation and interests.

28Inferring User Interests from Microblog Data through Opinion Mining

[Argueta et al., 2015]

Anticipation

Joy

Trust

Surprise

Sadness

Disgust

Anger

Fear

Page 29: Thesis oral defense 2015  elvis saravia

Emotion Analysis→ Tag Interests with emotion

○ Every interest candidate is tagged with its pertaining emotion○ Original post is classified (no pre-processing)

→ Only positive emotions considered:○ Anticipation, Joy and Trust○ Negative emotions were not considered in this work

29Inferring User Interests from Microblog Data through Opinion Mining

Joy

Anticipation

Trust

Page 30: Thesis oral defense 2015  elvis saravia

Emotion Filtering→ Filtering process

○ Posts bearing no emotion○ Shorts posts○ Posts that bear opposite emotions (ambiguous)

30Inferring User Interests from Microblog Data through Opinion Mining

Joy

Anticipation

Trust

Page 31: Thesis oral defense 2015  elvis saravia

Emotion Classification

31

I am loving Jeremy Lin right now

The traffic today is okay!

.

.

Feeling excited for the ASONAM Conference. #feelingblessed

joy

trust ASONAM Conference

Jeremy Lin

Inferring User Interests from Microblog Data through Opinion Mining

For every post P in a collection of Tweets T

Page 32: Thesis oral defense 2015  elvis saravia

32Inferring User Interests from Microblog Data through Opinion Mining

Rule-BasedExtraction

Emotion Classification

KeywordExtraction

Pre-processing

Interest Candidates Extraction

Interest Identification

Emotion Tagging & Filtering

Emotion Analysis

Interest Identification

Twitter Corpus

Output file

POSTagging

Pre-processing

Page 33: Thesis oral defense 2015  elvis saravia

33Inferring User Interests from Microblog Data through Opinion Mining

Interest IdentificationRanking interest candidates in each emotion set

Page 34: Thesis oral defense 2015  elvis saravia

Interest Identification

34Inferring User Interests from Microblog Data through Opinion Mining

→ Repetitive Interest Candidates○ Interest candidates found under several emotions are kept

→ Ambiguity○ Remove interests that are ambiguous○ Emotion classifier aids at this very well

Page 35: Thesis oral defense 2015  elvis saravia

Interest Identification

35Inferring User Interests from Microblog Data through Opinion Mining

→ Occurrence○ Calculate frequency for each interest candidate (ws) ○ Frequency (f) is based on occurrence

Anticipation:

ws1 (f)ws2 (f)wsn (f)

...

Joy:

Jeremy Lin (f)ws2 (f)wsn (f)

...

Trust:

ACM Conference (f)ws2 (f)wsn (f)

...

Page 36: Thesis oral defense 2015  elvis saravia

Interest Identification

36Inferring User Interests from Microblog Data through Opinion Mining

→ Ranking○ Calculate weight for each interest candidate (ws)○ Rank them by weight (w)

Anticipation:

ws1 (w)ws2 (w)wsn (w)

...

Joy:

Jeremy Lin (w)ws2 (w)wsn (w)

...

Trust:

ACM Conference (w)ws2 (w)wsn (w)

...

Page 37: Thesis oral defense 2015  elvis saravia

Interest Identification

37Inferring User Interests from Microblog Data through Opinion Mining

Page 38: Thesis oral defense 2015  elvis saravia

38Inferring User Interests from Microblog Data through Opinion Mining

Experiments and Results2 different types of experiment were conducted

Page 39: Thesis oral defense 2015  elvis saravia

Experiment (1)

39Inferring User Interests from Microblog Data through Opinion Mining

→ Experimental Setup○ 3 active Twitter users (A,B,C)○ The latest 3000+ English posts crawled from feed○ The top-15 most frequent interests per emotion○ Results rated by the users

Page 40: Thesis oral defense 2015  elvis saravia

Evaluation

40

0 = not-related1~4 = related5~10 = highly related

User A: Top 15 frequent interests per emotion

Inferring User Interests from Microblog Data through Opinion Mining

Page 41: Thesis oral defense 2015  elvis saravia

Evaluation

41

0 = not-related1~4 = related5~10 = highly related

User B: Top 15 frequent interests per emotion

Inferring User Interests from Microblog Data through Opinion Mining

Page 42: Thesis oral defense 2015  elvis saravia

Evaluation

42

0 = not-related1~4 = related5~10 = highly related

User C: Top 15 frequent interests per emotion

Inferring User Interests from Microblog Data through Opinion Mining

Page 43: Thesis oral defense 2015  elvis saravia

Evaluation

43

User C

Inferring User Interests from Microblog Data through Opinion Mining

User BUser A

Page 44: Thesis oral defense 2015  elvis saravia

Experiment (2)

44Inferring User Interests from Microblog Data through Opinion Mining

→ Experimental Setup○ Online Surveys○ 7 Users (A,B,C,D,E,F)○ Top 5 interests (including 5 sub-category interests)○ The latest 3000+ English posts crawled from feed○ Interests are categorized (ConceptNet)

Page 45: Thesis oral defense 2015  elvis saravia

Categorizing Interests

45Inferring User Interests from Microblog Data through Opinion Mining

→ Hierarchical Interests Extraction○ Top 15 interests in the 3 emotion sets are combined and categorized○ ConceptNet API○ 2 level “is-a” relationship○ Observation: top interest candidates were highly related

Page 46: Thesis oral defense 2015  elvis saravia

Evaluation

46Inferring User Interests from Microblog Data through Opinion Mining

Precision of system on raw data (Twitter feed)

Page 47: Thesis oral defense 2015  elvis saravia

Evaluation

47Inferring User Interests from Microblog Data through Opinion Mining

● Precision of system when including ambiguous tweets

● Ambiguous tweets bearopposite or no emotion

Page 48: Thesis oral defense 2015  elvis saravia

Evaluation

48Inferring User Interests from Microblog Data through Opinion Mining

● Precision of the full system when considering positive emotions

● Average precision of approx. 81% as top performance (top-10).

Page 49: Thesis oral defense 2015  elvis saravia

Evaluation

49Inferring User Interests from Microblog Data through Opinion Mining

Performance of componentsper user

Page 50: Thesis oral defense 2015  elvis saravia

Evaluation

50Inferring User Interests from Microblog Data through Opinion Mining

Precision comparison of all components evaluated

Page 51: Thesis oral defense 2015  elvis saravia

Conclusion

51Inferring User Interests from Microblog Data through Opinion Mining

→ Positive emotions contribute tremendously to user interests identification as seen in the experiments section.

→ Emotion Analysis is an important component for the effective ranking of user’s interests and the removal of ambiguous information.

Page 52: Thesis oral defense 2015  elvis saravia

Future Work

52Inferring User Interests from Microblog Data through Opinion Mining

→ Analyze emotion distribution to observe if there are patterns in the change of interests.

→ Adopt machine learning techniques to automate feature extraction for interest identification.

→ Improve approach by considering temporal information and negative emotions as a weighting factor.

→ Improve Interest categorization.

Page 53: Thesis oral defense 2015  elvis saravia

Thanks for listening...

53Inferring User Interests from Microblog Data through Opinion Mining

Q & A