1
Topic-Sentiment Mixture: Modeling Facets and Opinions
in Weblogs
Qiaozhu Mei†, Xu Ling†, Matthew Wondra†, Hang Su‡, and ChengXiang Zhai†
† University of Illinois at Urbana-Champaign‡ Yahoo! Inc.
2
Why Opinion Analysis?
• Customers: need peer opinions to make purchase decisions
• Business providers: – need customers’ opinions to improve product – need to track opinions to make marketing decisions
• Social researchers: want to know people’s reactions about social events
• Government: wants to know people’s reactions to a new policy
• Psychology, education, etc.
3
An Illustrative Example
Should I buy an iPod?
• Thumb up or thumb down?
Positive, negative, neutral… (Sentiments)
• Are their opinions changing?
Negative before 2005, but positive
recently… (Dynamics)
• What do people say about ipod?
Price, battery, warranty, nano, … (Topics) • What aspects are good/bad?
Sound is good, battery is bad..
(Faceted opinions)
4
Why Extracting Opinions from Blogs?
• Easy to collect: huge amount, clean format• Broadly distributed: demographics• Topic diversified: free discussion about any
topic/product/event• Opinion rich: highly personalized
5
Evidence from Blog Search
availability
Broad distribution Positive: …the trail leads to fascinating places that are richly
… Negative: …when I first watched the big-screen version of The Da Vinci Code, I fell asleep twice. Not once. Twice! …
Opinion rich
Topic diversity
6
Existing Blog-opinion Analysis Work
• Opinmind: sentiment classification/search of blogs
No faceted analysis, no neutral fact description: Not informative enough to support decision making
7
Existing Blog-opinion Analysis Work (Cont.)
• Use content to predict sales– Blog level topic analysis– Information Diffusion
through blogspace– Use topic bursting to
predict sales spikes– E.g., [Gruhl et al. 2005]
No sentiment analysis, no faceted analysis: what if the hot discussion is “Negative”?
Hot criticisms may not lead to sales spikes
[from Gruhl et al. 2005]
8
What’s Missing Here?
• Discussions are faceted– E.g. iPod: battery? Price? Nano? …– Usually different opinions on different facets
• Opinions have polarities– Positive, negative, and neutral …– Non-discriminative analysis may lead to
wrong decision
• Opinions are changing over time …
9
Our Goal
• Model the mixture of facets and opinions (topics and sentiments)
• Generate a faceted opinion summarization for ad hoc query
• Track the change of opinions over time
time
strength PositiveNegative
Topic-sentiment dynamics (Topic = Price)
Neutral
Query: Dell Laptop
Topic-sentiment summary
positive negative
Topic 2(Battery)
Topic 1(Price)
neutral
• my Dell battery sucks
• Stupid Dell laptop battery
• One thing I really like about this Dell battery is the Express Charge feature.
• i still want a free battery from dell..
• …… • ……
• it is the best site and they show Dell coupon code as early as possible
• Even though Dell's price is cheaper, we still don't want it.
• ……
• mac pro vs. dell precision: a price comparis..
• DELL is trading at $24.66
10
Challenges in Opinion Analysis from Blogs
• Topics and sentiments are mixed together• No existing facet structure for ad hoc topics• Difficult to identify sentiment polarities• Difficult to associate sentiment polarities with
facets• Difficult to segment topics and sentiments
– Tracking sentiment dynamics
11
Our Approach: Modeling Topic-Sentiment Mixture
• Use language models to represent facets and sentiments– Facets represented with topic models, extracted in an
unsupervised/semi-supervised way– Sentiment models extracted in a supervised way
• Model the mixture of topics and sentiments with a probabilistic generative model
• Segment associated topics and sentiments with a topical hidden Markov model
12
Probabilistic Model of Topic-Sentiment Mixture
k
1
2
B
Facet 1
Facet k
Facet 2
…
Background B
Choose a facet (subtopic) i
battery 0.3 life 0.2..
nano 0.1release 0.05screen 0.02 ..
apple 0.2microsoft 0.1compete 0.05 ..
Is 0.05the 0.04a 0.03 ..
…
love 0.2awesome 0.05good 0.01 ..
suck 0.07hate 0.06stupid 0.02 ..
P N
P
F
N
P
F
N
P
F
N
battery
love
hate
the
Draw a word from the mixture of topics and sentiments ( )F P N
13T
op
ics
B
1 - B
The “Generation” Process
1
2
…
k
d1
d2
dk
2, d, F
k, d, F
1, d, F
j, d, N
j, d, P
1
2
…
k
P
N
Neutral, F
actsP
ositive
Negative
B
w
d
))]|()|()|((
)1(
)|(log[),()log(
,,,,,,
1
NNdjPPdjjFdj
k
jdjB
BCd Vw
wpwpwp
BwpdwcC
p(w| i )p(w| T )
• p(w|i), p(w| p), p(w| N) can be estimated with Maximum Likelihood Estimator (MLE) through an EM algorithm
16
Learning Sentiment Models
• Problem: Sentiment expressions are topic-biased– E.g., “fearful” is negative in general , but how about for a
ghost movie?– E.g., “heavy” is positive for rock music, but how about for
laptops?
• Impossible to create training data for every ad hoc topic
• Solution: – Collect sentiment labeled data with diversified topics– Learn a general sentiment model from the mixed training data in
training mode– Use this general sentiment model as prior, get the topic-biased
sentiment models in testing mode
17
Estimating Topic Models
• Problem: no existing facet structure for ad hoc topics
• Unsupervised extraction: facets might not be what you like– E.g., user wants “battery”, “price” and “sound quality”– System returns “ipod nano”, “ipod video”, “ipod
shuffle”..• Solution: Incorporate user specified interests into
automatically extracted facets– User provides hints; add priors into the topic model– Using MAP estimation instead of MLE– See paper for technical details
18
Sentiment Segmentation and Dynamics Tracking
• Design a topic-sentiment enhanced HMM
• Associate states with topic/sentiment models
• Learn the transition prob. and segment the text
• Plot the sentiment dynamics by counting segments over time ( tagged with each facet and sentiment)
E
T3T2
1
P N
B
T1
From and to E
… the battery really sucks and it's really heavy in my part but where could you find laptops so affordable nowadays?...
19
Experiment Setup• Training data for sentiment models (diversified topics,
downloaded from Opinmind)
• Test dataset: created by querying Google blog search and crawling from original sites (ad hoc)
Datasets # docs Time Period Query Term
iPod 2988 01/06 ~ 11/06 ipod
Da Vinci Code 1000 01/06 ~ 10/06 da+vinci+code
Topic # Pos # Neg Topic # Pos # Neg
laptops 346 142 people 441 475
movies 396 398 banks 292 229
universities 464 414 insurances 354 297
airlines 283 400 nba teams 262 191
cities 500 500 cars 399 334
20
Results: General Sentiment Models• Sentiment models trained from diversified topic mixture
v.s. single topicsPos-Cities Neg-Cities Pos-Mix Neg-Mix
beautiful hate love suck
love suck awesome hate
awesome people good stupid
amaze traffic miss ass
live drive amaze fuck
good fuck pretty horrible
night stink job shitty
nice move god crappy
time weather yeah terrible
air city bless people
greatest transport excellent evil
# topic mixture in training data
KL Divergence between learnt
p and N and unseen topic
21
Results: Facets and Topic Models (I)
• Facets for iPod :
No Prior With Prior
Battery, nano Marketing Ads, spam Nano Battery
battery apple free nano battery
shuffle microsoft sign color shuffle
charge market offer thin charge
nano zune freepay hold usb
dock device complete model hour
itune company virus 4gb mini
usb consumer freeipod dock life
hour sale trial inch rechargable
22
Results: Facets and Topic Models (II)
• Facets for the Da Vinci Code
No Prior With Prior
Story Book Background Movie Religion
landon author jesus movie religion
secret idea mary hank belief
murder holy gospel tom cardinal
louvre court magdalene film fashion
thrill brown testament watch conflict
clue blood gnostic howard metaphor
neveu copyright constantine ron complaint
curator publish bible actor communism
23
Results: Faceted Opinions(the Da Vinci Code)
Neutral Positive Negative
Facet 1:Movie
... Ron Howards selection of Tom Hanks to play Robert Langdon.
Tom Hanks stars in the movie,who can be mad at that?
But the movie might get delayed, and even killed off if he loses.
Directed by: Ron Howard Writing credits: Akiva Goldsman ...
Tom Hanks, who is my favorite movie star act the leading role.
protesting ... will lose your faith by ... watching the movie.
After watching the movie I went online and some research on ...
Anybody is interested in it?
... so sick of people making such a big deal about a FICTION book and movie.
Facet 2:Book
I remembered when i first read the book, I finished the book in two days.
Awesome book. ... so sick of people making such a big deal about a FICTION book and movie.
I’m reading “Da Vinci Code” now.
…
So still a good book to past time.
This controversy book cause lots conflict in west society.
24
Results: Comparison with Opinmind
• Faceted opinions from TSMFacets Thumbs Up Thumbs Down
iPod Nano (sweat) iPod Nano ok so ...
Ipod Nano is a cool design, ...
WHAT IS THIS SHIT??!!
ipod nanos are TOO small!!!!
Battery the battery is one serious
example of excellent relibability
Poor battery life ...
...iPod’s battery completely died
iPod Video My new VIDEO ipod arrived!!!
Oh yeah! New iPod video
fake video ipod
Watch video podcasts ...
Opinions
from
Opinmind:
Thumbs Up Thumbs Down
I love my iPod, I love my G5... I hate ipod.
I love my little black 60GB iPod Stupid ipod out of batteries...
I LOVE MY iPOD “ hate ipod ” = 489..
I love my iPod. my iPod looked uglier...surface...
- I love my iPod. i hate my ipod.
... iPod video looks SO awesome ... microsoft ... the iPod sucks
25
Results: Sentiment Dynamics
Facet: the book “ the da vinci code”. ( Bursts during the movie, Pos > Neg )
Facet: the impact on religious beliefs. ( Bursts during the movie, Neg > Pos )
26
Summary and Future Work
• Algorithm: A new way to model the mixture of topics and sentiments
• Application: A new way to summarize faceted opinions, and track their dynamics
• Future Work:– Beyond unigram language model?– Better segmentation of sentiments and topics?– Adapting existing facet structures?– Develop an end user application for opinion analysis
Top Related