AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion

AVAYA: Sentiment Analysis in Twitter with Self-Training and

Polarity Lexicon ExpansionLee Becker, George Erhart,

David Skiba, and Valentine Matula

June 16, 2013

Labs

SemEval 2013 Task 2

2

Participation

SemEval

2013Task 2

Subtasks:• A: Message

Polarity Classification

• B: Contextual Polarity Disambiguation

Training Conditions:• Constrained• Unconstrained

Testing Conditions• Tweet• SMS

3

Guiding Intuitions• Boost recall of positive/negative instances

(A,B)• Don’t worry about neutral instances (A,B)• Encode polarity cues into features (A,B)• Exploit the context (A)

4

System Overview: Task B Constrained

Sentiment Labeled Tweets

FeatureExtraction

PolarityLexicon

ConstrainedModel

5

System Overview: Task B Unconstrained

Unlabeled Tweets

Auto LabeledTweets

ExpandedPolarityLexicon

FeatureExtraction

UnconstrainedModel

ConstrainedModel

6

Overview: Task A Models

Sentiment Labeled Contexts

FeatureExtraction

PolarityLexicon

ConstrainedModel

Sentiment Labeled Contexts

FeatureExtraction

Expanded Polarity Lexicon

UnconstrainedModel

7

Preprocessing• Normalization:

o URLSo @Mentions

• NLP Pipelineo Written in ClearTK frameworko ClearNLP Wrappers

• Tokenization – preserves emoticons and URLs• POS Tagging• Lemmatization• Dependency Parsing

o PTB POS -> ArkTweet POS (Gimpel, et. al. 2011)o Dependencies -> Collapsed Dependencies

8

Resources• MPQA Subjectivity Lexicon

(Wilson, Weibe and Hoffman, 2005)

• Hand-Crafted Negation Word Dictionary

• Hand-Crafted Emoticon Polarity Dictionary

http://leebecker.com/resources/semeval-2013/

9

Task B Features• Polarized Bag-of-Words

o Easy way to double the feature space (e.g. happy & NOT_happy)

I am not too happy about this, but I’m still pumped and thrilled for tomorrow.

Negation Window

Features:• Token• Token + PTB POS• Token + Simplified POS• Lemma• Lemma + PTB POS• Lemma + Simplified POS

10

Task B Features• Message Polarity Features

o Word Sentiment Counts (pos|neg)o Emoticon Sentiment Counts (pos|neg)o Net word polarityo Net emoticon polarity

• Microblogging Featureso ALL CAPS word countso Words with repeated characters (yaaaaay, booooo) countso Emphasis (*yes*)o Winning Sports score (Nuggets 15-0)

• PTB POS Tag counts• Collapsed Dependency Relations

o Incorporated negationo Text-Texto Lemma+Simplified POS – Lemma+Simplified POSo POS - Lemma

11

Task B: Constrained Model

• LIBLinear with Logistic Regression loss function• Heavily boosted negative-polarity instances

o wpositive =1o wnegative = 25o wneutral = 1

13

Polarity Lexicon Expansion: Pointwise Mutual

Information• Based on Semantic Orientation for Sentiment

(Turney, 2002)• Intuition: Utilize co-occurrence statistics to

measure words’ dependence/independence with a polarity.

PMI(word, sentiment) = log2p(word, sentiment)

p(word)p(sentiment)

polarity(word) = sgn(PMI(word, positive) – PMI(word, negative))

14

Polarity Lexicon Expansion:From tweets to lexicon

• Differences from Turney (2002)o Classifier output instead of seed wordso Words instead of word phrases

• Procedureo Applied to ~475k Unlabeled Tweetso Filtered and balanced corpus via classifier confidence score thresholds

• 50,789 positive instances ( > 0.9)• 59,029 negative instances ( > 0.7)• 70,601 neutral instances ( > 0.8)

o Removed:• f(word) < 10• neutral polarity words• single character words (‘a’, ‘j’, ‘I’, etc…)• numbers (1, 20, 1000)• punctuation

o Merged with MPQA subjectivity lexiconFinal lexicon size: 11,740 entries

15

Task B: Unconstrained Model

• Self-trained modelo ~470k constrained model produced instanceso ~10k original instances

• Expanded polarity lexicon• Heavily discounted neutral instances

o wpositive =2o wnegative = 5o wneutral = 0.1

16

Task B ResultsSystem Fpos

+

Fne

g-

Fne

u

Favg

+/-

Rank

Tweet

NRC-Canada .733

.647

.744

.690 1

Avaya-Unconstrained

.700

.582

.713

.641 5

Avaya-Constrained .669

.548

.608

.608 12

Mean .626

.450

.538

.538 -

SMS

NRC-Canada .730

.639

.799

.685 1


.553

.778

.600 4

Avaya-Unconstrained

.633

.557

.759

.595 5

Mean .546

.456

.627

.501 -

17

Task A: Features• Same as Task B

o Polarized Bag of Wordso Contextual Polarity Features

• Word Sentiment Counts (pos|neg)• Emoticon Sentiment Counts (pos|neg)• Net word polarity• Net emoticon polarity

o Microblogging Featureso PTB POS tags

• Additional Features:o Scoped Dependencieso Dependency Paths

18

Task A Features: Scoped Dependencies

• OUT_neg_nsubj(want,you)• OUT_neg(want, not)• IN_xcomp(want, miss)• IN_aux(miss, to)• OUT_tmod(miss, tomorrow)

You do not want to miss this tomorrow night.

rootnsubj

negxcomp

auxtmod

19

Task A Features: Dependency Paths

• POS Path: {NNP} dobj < {VBD} < conj {VBD} < root • Sentiment POS Path: {^/neutral} < {V/negative} < {V/negative} <

{root}• In Subject: False• In Object: True

Criminals killed Sadat and in the process they killed Egypt.

dobjconj

root

20

Task A Models• Constrained: MPQA Subjectivity Lexicon• Unconstrained: Expanded Polarity Lexicon• LIBLinear

o wpositive =11o wnegative = 2o wneutral = 1

21

Task A ResultsSystem Fpos

+

Fne

g-

Fne

u

Favg

+/-

Rank

Tweet

NRC-Canada .910

.869

.110

.889 1

Avaya-Unconstrained

.898

.849

.311

.874 2


.843

.309

.870 3

Mean .773

.677

.115

.725 -

SMS

GUMLTLT .865

.902

.086

.884 1

Avaya-Unconstrained

.842

.874

.138

.858 3


.856

.125

.839 4

Mean .710

.698

.099

.704 -

22

Discussion• Dictionary expansion via supervised sentiment

models provides a relatively simple way to expand the feature space and expand coverage.

• Dependency-Based features provide additional context and richer information

• Future worko Ablation studieso Better tuning of self-training

23

Thank you!• Task 2 Organizers and Participants• SemEval 2013 Organizers• Anonymous Reviewers

AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion

Documents

Transcript of AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion