Tutorial 13 (explicit ugc + sentiment analysis)

20
Mining Explicit User Generated Content: Sentiment Analysis Kira Radinsky Slides based on material from: Bing Liu (WWW-2008 tutorial)

description

Part of the Search Engine course given in the Technion (2011)

Transcript of Tutorial 13 (explicit ugc + sentiment analysis)

Page 1: Tutorial 13 (explicit ugc + sentiment analysis)

Mining Explicit User Generated Content:Sentiment Analysis

Kira Radinsky

Slides based on material from: Bing Liu (WWW-2008 tutorial)

Page 2: Tutorial 13 (explicit ugc + sentiment analysis)

2

Introduction

• Two main types of textual information. – Facts and Opinions

• Most current text information processing methods (e.g., web search, text mining) work with factual information.

• Sentiment analysis/opinion mining– computational study of opinions, sentiments and

emotions expressed in text.

– huge volumes of opinionated text on the web

Page 3: Tutorial 13 (explicit ugc + sentiment analysis)

3

User Generated Media

Word-of-mouth on the Web

– User-generated media: One can express opinions on anything in reviews, forums, discussion groups, blogs ...

– Opinions of global scale: No longer limited to:• Individuals: one’s circle of friends• Businesses: Small scale surveys, tiny focus

groups, etc.

Page 4: Tutorial 13 (explicit ugc + sentiment analysis)

An Example Review

• “I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …”

• What do we see?

– Opinions, targets of opinions, and opinion holders

4

Page 5: Tutorial 13 (explicit ugc + sentiment analysis)

5

Target Object

• Definition (object): An object o is a product, person, event, organization, or topic. o is represented as

– A hierarchy of components, sub-components, and so on. – Each node represents a component and is associated with a

set of attributes of the component.

• An opinion can be expressed on any node or attribute of the node (also called features)

* Liu, Web Data Mining book, 2006

Page 6: Tutorial 13 (explicit ugc + sentiment analysis)

What is an Opinion?

An opinion : (oj, fjk, soijkl, hi, tl),

• oj is a target object.

• fjk is a feature of the object oj.

• hi is an opinion holder.

• tl is the time when the opinion is expressed.

• soijkl is the sentiment value of the opinion of the opinion holder hi on feature fjk of object oj

at time tl. soijkl is +ve, -ve, or neu, or a more granular rating.

6

Page 7: Tutorial 13 (explicit ugc + sentiment analysis)

Sentiment Analysis approaches

• Document level sentiment classification

– Unsupervised review classification (Turney, ACL-02)

– Sentiment classification using machine learning methods (Pang et al, EMNLP-02)

• Sentence level sentiment analysis

– Using learnt patterns (Rilloff and Wiebe, EMNLP-03)

• Feature-based opinion mining and summarization

– Next slides

Page 8: Tutorial 13 (explicit ugc + sentiment analysis)

8

Feature-Based Sentiment Analysis

• Objective: Discovering all quintuples

(oj, fjk, soijkl, hi, tl)

• Sentiment classification at both document and sentence (or clause) levels are not enough,

– they do not tell what people like and/or dislike

– A positive opinion on an object does not mean that the opinion holder likes everything.

– An negative opinion on an object does not mean that the opinion holder dislikes everything.

Page 9: Tutorial 13 (explicit ugc + sentiment analysis)

9

Feature-Based Opinion Summary

“I bought an iPhone a few days ago. It was such a nice phone. The touch screen was really cool. The voice quality was clear too. Although the battery life was not long, that is ok for me. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, and wanted me to return it to the shop. …”

Feature Based Summary:

Feature1: Touch screenPositive: 212• The touch screen was really cool. • The touch screen was so easy to use

and can do amazing things. …Negative: 6• The screen is easily scratched.• I have a lot of difficulty in removing

finger marks from the touch screen. … Feature2: battery life…

*Hu & Liu, KDD-2004

Page 10: Tutorial 13 (explicit ugc + sentiment analysis)

10

Visual Comparison

Summary of

reviews of

Cell Phone 1

Voice Screen Size WeightBattery

+

_

Comparison of

reviews of

Cell Phone 1

Cell Phone 2

_

+

* Liu et al. WWW-2005

Page 11: Tutorial 13 (explicit ugc + sentiment analysis)

Bing feature-based opinion summary

11

Page 12: Tutorial 13 (explicit ugc + sentiment analysis)

Sentiment Analysis is Hard!

• “This past Saturday, I bought a Nokia phone and my girlfriend bought a Motorola phone with Bluetooth. We called each other when we got home. The voice on my phone was not so clear, worse than my previous phone. The battery life was long. My girlfriend was quite happy with her phone. I wanted a phone with good sound quality. So my purchase was a real disappointment. I returned the phone yesterday.”

12

Page 13: Tutorial 13 (explicit ugc + sentiment analysis)

Not Just ONE Problem

• (oj, fjk, soijkl, hi, tl),

– oj - a target object: Named Entity Extraction (more)

– fjk - a feature of oj: Information Extraction

– soijkl is sentiment: Sentiment determination

– hi is an opinion holder: Information/Data Extraction

– tl is the time: Data Extraction

• Co-reference resolution• Relation extraction• Synonym match (voice = sound quality) …

• None of them is a solved problem!

13

Page 14: Tutorial 13 (explicit ugc + sentiment analysis)

Easier and Harder Problems

• Reviews are easier.

– Objects/entities are given (almost), and little noise

• Forum discussions and blogs are harder.

– Objects are not given, and a large amount of noise

• Determining sentiments seems to be easier.

• Determining objects and their corresponding features is harder.

• Combining them is even harder. 14

Page 15: Tutorial 13 (explicit ugc + sentiment analysis)

15

Two Main Types of Opinions

• Direct Opinions: direct sentiment expressions on some target objects, e.g., products, events, topics, persons.– E.g., “the picture quality of this camera is great.”

• Comparative Opinions: Comparisons expressing similarities or differences of more than one object. Usually stating an ordering or preference. – E.g., “car x is cheaper than car y.”

Page 16: Tutorial 13 (explicit ugc + sentiment analysis)

16

Comparative Opinions

• Gradable

– Non-Equal Gradable: Relations of the type greater or less than

• Ex: “optics of camera A is better than that of camera B”

– Equative: Relations of the type equal to

• Ex: “camera A and camera B both come in 7MP”

– Superlative: Relations of the type greater or less than all others

• Ex: “camera A is the cheapest camera available in market”

* Jindal and Liu, AAAI 2006

Page 17: Tutorial 13 (explicit ugc + sentiment analysis)

Mining Comparative Opinions (Jinal and Liu, SIGIR-06)

Given a collection of evaluative texts

Task 1: Identify comparative sentences.

Task 2: Categorize different types of

comparative sentences.

Task 2: Extract comparative relations from the

sentences.

Page 18: Tutorial 13 (explicit ugc + sentiment analysis)

Identify comparative sentences

Keyword strategy• An observation: It is easy to find a small set of

keywords that covers almost all comparative sentences, i.e., with a very high recall and a reasonable precision

• Compiled a list of 83 keywords used in comparative sentences, which includes:

– Words with POS tags of JJR, JJS, RBR, RBS

• POS tags are used as keyword instead of individual words.

• Exceptions: more, less, most and least

– Other indicative words like beat, exceed, ahead, etc

– Phrases like in the lead on par with etc

Page 19: Tutorial 13 (explicit ugc + sentiment analysis)

2-step learning strategy

• Step1: Extract sentences which contain at

least a keyword (recall = 98%, precision =32% on our data set for gradables)

• Step2: Use the naïve Bayes (NB) classifier to classify sentences into two classes: comparative and non-comparative, and use features like:

– Use words within radius r of a keyword to form a sequence (words are replaced with POS tags)

– Use different minimum supports for different keywords (multiple minimum supports)

Page 20: Tutorial 13 (explicit ugc + sentiment analysis)

Mining Comparative Opinions

1. (Bos and Nissim 2006) proposes a method to extract items from superlative sentences. It does not study sentiments either.

2. (Fiszman et al 2007) tried to identify which entity has more of a certain property in a comparative sentence.

3. (Ding and Liu 2008) studies sentiment analysis of comparatives, i.e., identifying which entity is preferred.

20