Images from wikimedia commons - cfilt.iitb.ac.in · Images from wikimedia commons . Topic Models...

55
Images from wikimedia commons

Transcript of Images from wikimedia commons - cfilt.iitb.ac.in · Images from wikimedia commons . Topic Models...

Images from wikimedia commons

Topic Models for Sentiment: Basics, Implementation and Experiments

Aditya M Joshi

IITB-Monash Research Academy

30th January, 2014

[email protected]

[email protected]

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Revisiting classifiers

What did Prof. Pushpak Bhattacharyya talk about in “Topics in NLP” lecture today?

Lecture transcript Classifier

NLP, Databases, Compilers

Topic models can do much more than this: with unlabeled corpus

SA, MT, Wordnet

“Topic-document distribution”

Lectures from

2008 to 2013

Topic Modeler

strong AI parser

alignment ACL

thwarting

co-reference resolution

demo RPC

MTP

Krishna

Raag Mahabharat

NLP Academic Cultural

Swar-sandhya

*Hypothetical example

NLP = 0.7 Academic = 0.2 Cultural = 0.1

Proportion of each topic in a document: “Multiple membership”

* And in context of sentiment analysis?

“Word-topic distribution”

“Aaditya, you are not making sense.”

“Let’s study word sense disambiguation”

Lectures from

2008 to 2013

Topic Modeler

sense logic

explanation

confused

sense

wordnet polysemy

iterative word

*Hypothetical example

“Relevance of each word to a topic” Words across “topics” actually indicate different senses in which a word occurs

Definition

• Topic models are a suite of algorithms that discover thematic structures in a data collection. (Blei (2011) )

• What is a thematic structure? A topic: a collection of words

• Used for a wide variety of tasks such as: author recognition, aspect extraction, sentiment modelling

Black box

Topic Modeler

Unlabeled corpus

Document-topic distribution

Overall word-topic distribution

FAQs:

Can you predict a test document directly? Not directly.

Is there only one way to construct a topic model? No. By intelligent structure of the model, you can derive useful information.

LDA Model

• Latent Dirichlet Allocation (LDA) model is a basic probabilistic topic model

• This presentation focuses on LDA and its adaptations with sentiment as the goal.

Plate Notation (1/2)

w Nd

D

w Nd

D

Unlabeled Corpus

w Nd

D

Labeled Corpus

L

z

Word

Topic (Latent)

Plate Notation (2/2)

w Nd

D

z

w Nd

D

z

w Nd

D

z

Ns

Word-level topics Document-level topics Sentence-level topics

Growing LDA further

w Nd

D

z

θ θ(Z): NLP = 0.7,

culture = 0.2, motivation = 0.1

Z

ϕ

ϕ (Z,word):

(NLP, sense) = 0.7,

(culture, sense)= 0.1, (motivation, sense) = 0.2

Let us now focus on these two multinomial distributions

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Multinomial distribution

• Training a LDA implies learning the parameters of the multinomial distribution

• We now focus on a multinomial distribution and the way it is modelled in case of LDA.

θ

Parameter estimation (Heinrich (2004))

P(θ|x) = P(x | θ) P(θ)

P(x)

Prior

Posterior

Likelihood

P(x) = ∫ P (x| θ). P(θ) d θ

Estimating the posterior: P(θ|x) Why? Goal: To estimate θd and ϕwz as accurately, given the data: documents The two are categorical distributions

P(θ|x) ∝ P(x | θ) P(θ)

Marginal likelihood

Binomial distribution & MLE

• Toss of a biased coin

P(X=1) = q P(X=0) = (1-q)

X = {x1, x2,... ,xN} MLE = argmax P(X|q) MLE = argmax P(x1|q). P(x2|q)... P(XN|q) = argmax qx1(1-q)(1-x

1). .... qxN(1-q)(1-x

N)

= argmax q(x1+x2..XN)(1-q)(N-(X1+X2..XN)

= argmax qm(1-q)n-m

= m / n

P(x1|q) = qx1(1-q)(1-x1

)

argmax (mlog q + (n-m)log(1-q)) Equating derivative to zero, m/q = (n-m)/(1-q) q = m/n

m, n are “sufficient statistics” of a binominal distribution

MAP of Binomial distribution (1/2)

MAP = argmax P(q|X)

= argmax P(X|q) P(q)

= argmax qm(1-q)n-m P(q)

Problem! P(q) can be any distribution, strictly speaking. Computationally difficult!

Assume: P(q) is a beta distribution

P(q) = qα-1 (1-q) β-1 / Beta(β-α)

MAP of Binomial distribution (2/2)

MAP = argmax qm(1-q)n-m P(q)

∝ argmax qm(1-q)n-m qα-1 (1-q) β-1

∝ argmax q(m+α-1)(1-q)(n-m+β-1)

∝ (m+α-1) / (n+α+β-2)

argmax (m+α-1) log q + (n-m+β-1) log(1-q)) Equating derivative to zero, (m+α-1) /q = (n-m+β-1) /(1-q) (m+α-1)-q(m+α-1) = q (n-m+β-1) (m+α-1) = q(m+α-1+n-m+ β-1) (m+α-1) = q(α+n+ β-2) q = (m+α-1) /(α+n+ β-2)

Beta Distribution

Binomial Distribution

Conjugate prior

Conjugate prior

• A distribution is a conjugate prior to a posterior distribution if both of them have the same form

• “Algebraic convenience”

Beta distribution is a conjugate prior of binomial distribution

What is it for categorical distribution?

Categorical distribution

• Roll of a dice P(X=1) = q1

P(X=2) = q2

... P(X=6) = q6

P(xi=k|q) = qk

X = {X1,...XN} ~ Cat(q)

P(X|q) = argmax P(x1|q). P(x2|q)... P(XN|q)

= argmax π qjcj

MAP = argmax P(X|q) P(q)

= argmax π qjcj P(q)

P(q) ∝ π qj αj-1

MAP ∝ argmax π qjcj qj

αj-1

∝ argmax π qj αj+cj-1

Dirichlet distribution Dirichlet Distribution

Categorical Distribution

Conjugate prior

Binomial & Categorical distribution

θ z Categorical

α

θ ~ Dir (α) z ~ Categorical (θ)

q x Binomial

α q ~ Beta(α, β ) x ~ Binomial (q)

β

P(z| θ) = θz

Hyper-parameters Distribution Random variable assignments

Does the name Latent Dirichlet Allocation seem justifiable now?

Z

Nd

D

Our first LDA model

θ

z

w

α

ϕ

β

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Estimation of LDA model

The denominator is computationally intractable. Hence, Gibbs sampling is used.

We now describe the generative story.

P(θ, ϕ| w) = P(w| θ, ϕ)P(θ, ϕ )/P(w)

Every LDA paper has: Plate notation Generative story Gibbs sampling formulas

Z

Nd

D

Generative story

θ

z

w

α

ϕ

β

Sample ϕ ~ Dir(β)

For each document, Generate θ ~ Dir (α) For each word,

Sample z ~ Multinomial (θ) Sample w ~ ϕ(z)

Z

Nd

D

Implementing topic models

θ

z

w

α

ϕ

β

Sample ϕ ~ Dir(β)

For each document, Generate θ ~ Dir (α) For each word,

Sample z ~ Multinomial (θ) Sample w ~ ϕ(z)

Sampling from multinomial

Input: θ : P(z=0) = 0.1, P(z=1) = 0.3, P(z=2) = 0.6

Goal: Sample a z given this distribution

θ

z

Z=0 Z=1 Z=2

0 0.1 0.4 1

Z

Nd

D

Implementing topic models

θ

z

w

α

ϕ

β

Sample ϕ ~ Dir(β)

For each document, Generate θ ~ Dir (α) For each word,

Sample z ~ Multinomial (θ) Sample w ~ ϕ(z)

Gibbs sampling

Initialize all word positions to random z’s. Compute θ & ϕ accordingly. For each iteration, For each document, For each word, Generate a z based on θ Generate a w based on ϕw|z

Compute θ & ϕ

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Evaluation

• Qualitative evaluation (Understanding topic cohesion) (Mukherjee et al (2012))

• Classification accuracy based on topics uncovered

• Held-out likelihood (Likelihood of data given parameters) (Wallach et al (2009))

A naïve addition:

• Measuring sentiment cohesion: Count of positive and negative words in each topic

Outline

• Motivation and Introduction (Blei (2011))

• Building blocks of LDA: Dirichlet and Multinomials (Kullis (2012))

• Estimation using LDA (Heinrich (2004))

• Evaluation of LDA (Wallach (2009))

• Plugging in sentiment (Jo & Oh (2011)), Lin & He (2009))

• Experimentation

Experiments with LDA

• Goal: Understand topic models & obtain sentiment-coherent topics from a LDA model

• Implementation: – Topic model implementation using Gibbs sampling

– Hyper-parameter estimation as given in Heinrich (2009)

– “Left to right” likelihood algorithm by Wallach (2009)

Data set

• Movie review data set from Amazon by McAuley & Leskovec (2013).

– Training data set: 11000 movie reviews

– Test data set: 2000 movie reviews

• Average length of a review: ~140 words

Effect of hyper-parameter estimation

Discovering sentiment-coherent topics

Modify basic LDA in one of the following ways:

1) Bootstrap sentiment priors with word lists

2) Modifying the structure of the topic model

Existing topic models

• Lin & He(2009) present a Joint Sentiment-Topic Model with sentiment as a latent variable.

• Jo & Ho(2011) extract senti-aspects: (sentiment, feature) pairs.

• Titov & McDonald (2008) use a sliding window model to incorporate discourse nature of reviews.

• Mukherjee & Liu (2012b) identify words belonging to six types of review comment expressions from an unlabeled corpus.

Discovering sentiment-coherent topics

Modify basic LDA in one of the following ways:

1) Bootstrap sentiment priors with word lists

2) Modifying the structure of the topic model

Discovering sentiment: Use of priors

• Induce positive words and negative words to belong to certain topics (based on Lin & He (2009) )

– For negative words, set beta(word, z=0 to z/2) = 2*beta. beta(word, z=z/2 to z) = 0.

– The corresponding beta for positive words.

Use of Priors: Results (1/2)

• “Basic”: Imposing priors on only 12 sentiment words

• Leads to greater sentiment words being identified in correct topics

Use of Priors: Results(2/2)

Qualitative evaluation: Topic 38 7.330 horror 2.392 killer 2.248 scary 2.147 house 2.072 gore Some topics are positive while

others are negative, depending on the priors. Topic 13

6.931 michael 3.929 fans 3.423 live 2.379 amazing 2.354 concert

Discovering sentiment-coherent topics

Modify basic LDA in one of the following ways:

1) Bootstrap sentiment priors with word lists

2) Modifying the structure of the topic model

Discovering sentiment: Modifying structure

• Sentiment is explicitly modelled as a latent variable (Based on joint sentiment Tying model by Lin & He (2009)

SLDA SLDA-Split

Sentiment as a Variable: Results (1/2)

Parameters: Z = 70; S = 2

Sentiment as a Variable: Results (2/2)

• SLDA

• SLDA-Split

Topic 13, s = 0 9.551 show 9.254 humor 7.166 comedy 4.846 watch 4.680 hilarious

Topic 13, s = 2 6.964 rock 5.547 children 5.38 school 4.636 remember 3.432 learn

No equivalence between topic 13 for s = 0 and s = 2.

Topic 31, s = 0 8.006 product 6.277 received 5.244 amazon 4.119 condition 4.043 seller

Topic 31, s = 1 5.206 return 4.661 problem 4.412 disappoint 3.654 case 3.616 copy

Topic 31, s = 2 10.358 amazon 9.213 play 7.068 player 3.651 dvds 3.594 purchased

A topic essentially implies “different polarities” in the same ‘context’

For S = 3,

Failed Experiment: Incorporating emotions

Both models repeated for eight basic emotions (joy, anger, trust, surprise, sadness, fear, disgust, anticipation)

1) LDA with priors on eight basic emotions

2) Sentiment LDA models with S = 8

Failed Experiment: Results

• LDA

• SLDA-Split

Fear: boring, lame, watching, predictable Joy: happy, purchase, find, store Anger: kids, children, adults, parents Trust: final, end, planet, heroes, stay

Disgust: action, horror, scenes Anticipation: people, life, man, samurai

Failed Experiment: Analysis

• Better way to incorporate emotion priors

• The current data set consists of reviews. It is likely that users do not express diverse emotions in these reviews.

Future directions (1/2)

Immediate tasks:

Sentiment topic model to discover interests of users

Say, prove that topic model based on tweets by a political commentator reflects his/her political ideology

Future directions (2/2)

Other identified tasks:

1. Models that discover dimensions of emotion in a corpus

2. Cross-lingual topic models

3. Topic models that include eye-tracking data

4. Using sentiment annotation complexity in case of an ensemble of classifiers

The Big Question

Topic models for the 5-tuple representation

I love Tim Tams : Positive Formally, a 5-tuple (holder, target, feature, sentiment, time)

I like the taste of Tim Tams but I wish the smell was different.

I liked Tim Tams a couple of years ago. I think you will like Tim Tams.

(I, Tim tam, taste, positive, present), (I, Tim Tam, smell, negative, present)

(I, Tim tam, -, positive, “couple of years ago”)

(I > you, Tim tam, -, positive, future)

Can five latent variables be structured intelligently to extract 5-tuples of this form?

Conclusion

• LDA can be used to uncover thematic structure in the form of topics.

• LDA models could be adapted effectively to extract different information.

References (1/2)

• Balamurali, A., Joshi, A., & Bhattacharyya, P. (2011). Harnessing wordnet senses for supervised sentiment classication. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, (pp. 1081{1091). Association for Computational Linguistics.

• Balamurali, A., Joshi, A., & Bhattacharyya, P. (2012). Cross-lingual sentiment analysis for indian languages using linked wordnets. In COLING (Posters), (pp. 73{82).

• Balamurali, A., Khapra, M. M., & Bhattacharyya, P. (2013). Lost in translation: viability of machine translation for cross language sentiment analysis. In Computational Linguistics and Intelligent Text Processing (pp. 38{49). Springer.

• Banea, C., Mihalcea, R., Wiebe, J., & Hassan, S. (2008). Multilingual subjectivity analysis using machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, (pp. 127{135). Association for Computational Linguistics.

• Blei, D. M. (2011). Introduction to probabilistic topic models. • Blei, D. M., Ng, A. Y., Jordan, M. I., & La

erty, J. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 2003. • Boyd-Graber, J., Chang, J., Gerrish, S., Wang, C., & Blei, D. (2009). Reading tea leaves: How humans interpret topic models. In Neural Information

Processing Systems (NIPS). • Brody, S. & Elhadad, N. (2010). An unsupervised aspect-sentiment model for online reviews. In HLT-NAACL, (pp. 804{812). The Association for

Computational Linguistics. • Carl, M. (2012). Translog-ii: a program for recording user activity data for empirical reading and writing research. In LREC, (pp. 4108{4112). • Dragsted, B. (2010). Coordination of reading and writing processes in translation. Translation and Cognition, American Translators Association

Scholarly Monograph Series. Amsterdam/Philadelphia: Benjamins, 41{62. • Duh, K., Fujino, A., & Nagata, M. (2011). Is machine translation ripe for cross-lingual sentiment classication? In ACL (Short Papers), (pp. 429{433). • Fellbaum, C. (2010). Wordnet: An electronic lexical database. 1998. WordNet is available from http://www. cogsci. princeton. edu/wn. • Jo, Y. & Oh, A. (2011). Aspect and sentiment unication model for online review analysis. In Proceedings of the fourth ACM international conference

on Web search and data • mining, (pp. 815{824). ACM.

References (2/2) • Joshi, S., Kanojia, D., & Bhattacharyya, P. (2013). More than meets the eye: Study of human cognition in sense annotation. In Proceedings of NAACL-HLT, (pp. 733{738). • Kulis, B. (2012). Conjugate priors. • Lin, C. & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. In Cheung, D. W.-L., Song, I.-Y., Chu, W. W., Hu, X., & Lin, J. J. (Eds.), CIKM, (pp. 375{384). ACM. • Lu, B., Tan, C., Cardie, C., & Tsou, B. K. Joint bilingual sentiment classification with unlabeled parallel corpora. • McAuley, J. J. & Leskovec, J. (2013). From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd international conference

on World Wide Web, (pp. 897{908). International World Wide Web Conferences Steering Committee. • McCallum, A. (2002). MALLET: A machine learning for language toolkit. • Meng, X., Wei, F., Liu, X., Zhou, M., Xu, G., & Wang, H. (2012). Cross-lingual mixture model for sentiment classication. In Proceedings of the 50th Annual Meeting of the Association for

Computational Linguistics: Long Papers-Volume 1, (pp. 572{581). • Association for Computational Linguistics. • Mukherjee, A. & Liu, B. (2012a). Aspect extraction through semi-supervised modeling. In ACL (1), (pp. 339{348). The Association for Computer Linguistics. • Mukherjee, A. & Liu, B. (2012b). Modeling review comments. In ACL (1), (pp. 320{329). The Association for Computer Linguistics. • Mukherjee, A. & Liu, B. (2013). Discovering user interactions in ideological discussions. In ACL (1), (pp. 671{681). The Association for Computer Linguistics. • Mukherjee, S. & Bhattacharyya, P. (2012). Wikisent: Weakly supervised sentiment analysis through extractive summarization with wikipedia. In Machine Learning and Knowledge

Discovery in Databases (pp. 774{793). Springer. • Nallapati, R., Ahmed, A., Xing, E. P., & Cohen, W. W. (2008). Joint latent topic models for text and citations. In Li, Y., 0001, B. L., & Sarawagi, S. (Eds.), KDD, (pp. 542{550). ACM. • Pang, B. & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on

Association for Computational Linguistics, (pp. 271). Association for Computational Linguistics. • Pang, B. & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2 (1-2), 1{135. • Prettenhofer, P. & Stein, B. (2010). Cross-language text classication using structural correspondence learning. In Proceedings of the 48th Annual Meeting of the Association for

Computational Linguistics, (pp. 1118{1127). Association for Computational Linguistics. • Rosen-Zvi, M., Griths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In 20th Conference on Uncertainty in Articial Intelligence, volume 21, Ban

Park Lodge, Ban , Canada.

• Scott, G. G., O'Donnell, P. J., & Sereno, S. C. (2012). Emotion words a ect eye xations during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38 (3), 783.

• Searle, J. R. (1992). The rediscovery of the mind. the MIT Press. • Titov, I. & McDonald, R. T. (2008a). A joint model of text and aspect ratings for sentiment summarization. In McKeown, K., Moore, J. D., Teufel, S., Allan, J., & Furui, S. (Eds.), ACL, (pp.

308{316). The Association for Computer Linguistics. • Titov, I. & McDonald, R. T. (2008b). Modeling online reviews with multi-grain topic models. CoRR, abs/0801.1063. • Wallach, H. M., Mimno, D. M., & McCallum, A. (2009). Rethinking lda: Why priors matter. In NIPS, volume 22, (pp. 1973{1981). • Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning,

(pp. 1105{1112). ACM. • Wang, X., McCallum, A., & Wei, X. (2007). Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Proceedings of the 7th IEEE • International Conference on Data Mining (ICDM), Nebraska, USA. • Yin, Y., Zhou, C., & Zhu, J. (2010). A pipe route design methodology by imitating human imaginal thinking. CIRP Annals-Manufacturing Technology, 59 (1), 167{170.