"Alexandru Ioan Cuza" of Iai Faculty of computer Science SEMANTICA I PRAGMATICA LIMBAJULUI

Post on 17-Dec-2015

"Alexandru Ioan Cuza" of IașiFaculty of computer Science
SEMANTICA I PRAGMATICA LIMBAJULUI

“Alexandru Ioan Cuza” of IașiFaculty of computer Science



Daniela GÎFU

Iași09 Oct. 2014


Cursul nr. 2


Sentiment Analysis (SA) - one of the most current topics in NLP.

SA - offers possibility to monitor, to identify and understand in real time consumer's feelings and attitudes towards brands or topics in cyberspace and act accordingly.

SA - very popular in social media.

-Target:  academia and industry.


- to create a complete SOTA in SA, with a focus on social media posts.- to enhance the results of context-based SA.

- to clarify the descriptive behavior of receptor, affected by the multitude of information on forums.- to improve the performance of SA classifiers based on two approaches (machine learning & lexicon).


1. Introduction2. A general view on the subject3. SA levels

3.1. SA at document level3.2. SA at clause/sentence level3.3. Features-based on SA3.4. Comparative sentiment analysis3.5. Sentiment lexicon acquisition3.6. Conclusions

4. Applications4.1. Business and government4.2. Review sites4.3. Other domains: politics and sociology4.4. Conclusions

5. Conclusions and discussions

2. A general view on the subject

SA - a module of extracting opinions, sentiments and subjectivity of the text;

SA – terminology:

- subjectivity [Lyons 1981; Langacker 1985]; - evidentiality [Chafe and Nichols 1986];- analysis of stance [Biber and Finegan 1988; Conrad and Biber 2000];- affect [Batson, Shaw, and Oleson 1992];- point of view [Wiebe 1994; Scheibman 2002];- evaluation [Hunston and Thompson, 2001]- appraisal [Martin and White 2005]; - opinion mining [Pang and Lee 2008]; - politeness [Gîfu and Topor, 2014].

3. Sentiment classification techniques

Fig. 1 Sentiment classification techniques

3. SA levels - document

Positive Negative Neutral

Fig. 2 Supervised learning – for three classes

a) supervised approach

3. SA levels - document

Fig. 2 Python NLTK Demos for Natural Language Text Processing

a) supervised approach


3. SA levels - document

a) unsupervised approach

Based on determining the semantic orientation (SO) of specific words/phrases.

1. Sentiment lexicon (words/expressions) – [Taboada et. al, 2011]

1. Set of predefined POS models – [Turney, 2002]

3. SA levels – clause/sentence

More complex – identifying if a sentence is opinionated and establishing the nature of opinion;

- using supervised methods;

1. classifying clauses into two classes [Yu and Hatzivassiloglou, 2003]

2. an approach based on minimal reductions. [Pang and Lee, 2004]

The problem: How can we classify the interrogations, sarcasm, metaphor, humor, etc.?

3. SA levels – features

- more entities for each analyzed text or more attributes for each entity;- extraction of the attributes of an object;

Becali a ajutat mult săracii 1/, [dar] nimeni nu a ştiut exact 2/ [cum] a făcut atâţia bani 3/.

- extract and store all NPs;

- keep only NPs with frequency above a learned-by-experiments threshold [Hu and Liu, 2004]

3. SA levels – comparative

-When a user doesn’t offer a direct opinion about a product. [Jindal and Liu, 2006]

Dacia Logan arată mult mai bine decât Dacia Solenza. - adverbial adjectives: mai mult, mai puţin (En. - more, less)- superlative adjectives and adverbs: mai, cel puţin (En. - more, at least)- additional clauses: decât, împotriva (En. - rather than, against).

cover 98% of the comparative opinions

3. SA levels – sentiment lexicon

a) manual approaches: WordNet [Fellbaum, 1998], European EuroWordNet [Vossen, 1998], Balkanet [Tufiş et al., 2004]

Our work: AnaDiP-2010 inspired by LIWC-2007 [Pennebaker et al., 2001]: 9 emotional classes.

<classes><class name="emotional" id="1"/><class name="positive" id="2" parent="1"/><class name="negative" id="3" parent="1"/><class name="anxiety" id="4" parent="3"/><class name="anger" id="5" parent="3"/><class name="sadness" id="6" parent="3"/><class name="spectacular" id="7" parent="2"/><class name="firmness” id="8" parent="2"/><class name="moderation" id="9" parent="2"/>


3. SA levels – sentiment lexicon

Our software performs part-of-speech (POS) tagging and lemmatization of words.

For example: <lexic name="Politic" lang="ro">

<word lemma="clevetitor" classes="1,3,6"/><word lemma="genial" classes="1,2,7"/>


3. SA levels – sentiment lexicon

a) corpus-based approaches – a set of words/phrases extracted from a relatively small corpus is extended by using a large corpus of documents on a single domain.

- a classical work [Hatzivassiloglou and McKeown, 1997] using a set of linguistic connectors şi, sau, nici, fie (en. - and, or, not, either).

Examples:  bărbat puternic şi armonios / bărbat puternic şi armonios

femeie senzuală sau inteligentă? / femeie sărmană sau înstărită?

băiatul nu e nici prost, nici deștept... / băiatul nu e nici prost, nici urât...

4. Applications – business and government

“Why aren’t consumers buying our laptop?” when the price is good, and the weight is obviously in accord with consumer’s wishes. [Lee, 2004]

Two kinds of answers: - the subjective reasons about intangible qualities (e.g. the physical keyboard is tacky)

or - misperceptions (even though they are wrong)

Solution: By tracking consumer’s opinions, one could realize trend prediction in sales, etc. [Mishne & Glance, 2006].

4. Applications – business and government

Solution based on a dictionary + semantic role of negations and pragmatic connectors: - classification of emotionally charged words into two classes: positive and negative (also a neutral class);

- more classes, associating to each word with a value in the range -5 to +5;

- [Gîfu and Cristea, 2012a] a scale to the interval -3 to +3;

- [Gîfu and Scutelnicu, 2013] a scale of values: -1 to +1.

4. Process phases: POS-tagger & NER & Anaphora Resolution <DOCUMENT>

<P ID="1"><S ID="1"><W EXTRA="NotInDict" ID="11.1" LEMMA="" MSD="Vmip3s" Mood="indicative"Number="singular" POS="VERB" Person="third" Tense="present" Type="predicative"offset="0"></W><NP HEADID="11.2" ID="0" ref="0"><W Case="direct" Gender="masculine" ID="11.2" LEMMA="nimic" MSD="Pz3msr"Number="singular" POS="PRONOUN" Person="third" Type="negative"offset="1">Nimic</W><W ID="11.3" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="7">mai</W><W Case="direct" Definiteness="no" Gender="masculine" ID="11.4" LEMMA="odios"MSD="Afpmsrn" Number="singular" POS="ADJECTIVE" offset="11">odios</W><W ID="11.5" LEMMA="," MSD="COMMA" POS="COMMA" offset="16">,</W> <W ID="11.6" LEMMA="mai" MSD="Rg" POS="ADVERB" offset="18">mai</W><W ID="11.7" LEMMA="oribil" MSD="Rg" POS="ADVERB" offset="22">oribil</W><W Case="direct" Definiteness="no" EXTRA="NotInDict" Gender="masculine"ID="11.8" LEMMA="decât" MSD="Afpmsrn" Number="singular" POS="ADJECTIVE"offset="29">decât</W></NP><NP HEADID="11.9" ID="1" ref="1"><W Case="direct" Definiteness="yes" Gender="masculine" ID="11.9" LEMMA="pantof"MSD="Ncmpry" Number="plural" POS="NOUN" Type="common" offset="35">pantofii</W><NP HEADID="11.10" ID="2" ref="2"><W Case="direct" Definiteness="no" Gender="masculine" ID="11.10" LEMMA="sport"MSD="Ncmsrn" Number="singular" POS="NOUN" Type="common" offset="44">sport</W><W ID="11.11" LEMMA="cu" MSD="Sp" POS="ADPOSITION" offset="50">cu</W><NP HEADID="11.12" ID="3" re f="3"><W Case="direct" Definiteness="yes" Gender="feminine" ID="11.12"LEMMA="platformă" MSD="Ncfsry" Number="singular" POS="NOUN" Type="common"offset="53">platformă</W></NP></NP></NP></DOCUMENT>

4. Process phases: POS-tagger & NER & Anaphora Resolution

Fig. 3 The interface of the EAT system

4. Applications – business and government

- 46 rules for values.  <rule>

<word attribute=”LEMMA” value=”cel”/><word attribute=”LEMMA” value=”mai”/><word attribute=”POS“ value=”ADJECTIVE”/>


Ex: cel mai bun

<rule><word attribute=”LEMMA” value=”cel”/><word attribute=”LEMMA” value=”mai”/><word attribute=”POS” value=”bun”/>


4. Applications – review sites

- to appreciate the reviews and ratings about your company or yourself;- to summarize reviews.

Our work: the consumer’s behaviour, civic identity [Gîfu et al., 2013]

6 profiles: the-decent, the-porn-aggressive, the-incitator, the-affected, the-author-attacker and supporter.

- we established a number of features (lexical, syntactic, semantic): style, emotional classes, etc.

4. Applications – politics/sociology

Two dimensions in politics:1. to know what electors are thinking about the political candidates [Efron, 2004, Goldberg et al., 2007, Layer et al., 2003, Mullen and Malouf, 2008];2. to clarify the politicians’ positions to enhance the quality of information that voters have access to [Bansal et al., 2008, Gîfu, 2013b]

In sociology:- how ideas and innovations are propagated [Rosen, 1974]Ex: the polls on different issues


SA - a complex task;SA - an emerging discipline with promising academic and, most important, industrial applications;....the sentiment classification problem - more challenging

Future work...

- to develop an independent sentiment classifier using machine learning methods;- to compare the results obtained with machine learning to sentiment classification on traditional topic-based categorization;- to analyse the sentiment lexicon in old Romanian language in terms of diachronic semantics.

