Fypca5

24
Mining User’s Opinions in Hotel TEY JUN HONG U095074X

Transcript of Fypca5

Mining User’s Opinions in Hotel

TEY JUN HONG U095074X

Content Background

Formulating the problem

Data Mining Process

Techniques

Analysis

Extraction of patterns

Automatic Means

Little human Interactions

What is Data Mining?

The Web

http://www

Identify Potential Hotel

Predict what ASPECTS customers like

Sales and Margin

Sentiment Analysis

User’s Opinions in Hotel

Some Limitations of machines

Unable to read like a human

Cannot detect sarcasm

Expression of sentiments in different topic and domain

Polarity analysis

Facts Vs Opinion

Some machine limitation examples

“The service is as good as none”. Negation not obvious to machine

“Swimming pool is big enough to swim with comfort” , “There is a big crowd at the counter complaining”. Polarity might change with context.

“The room is warmer than the lobby”. Comparisons are hard to classify

Project

Sentiment Analysis

Prediction of sentence polarity

Classification of polarity for sentiment lexicon

Detection of relations

Data Mining Process

Cleaning The “Dirty” Reviews

Frequent problem : Data inconsistencies

Duplicate data

Spelling Errors != Trim from data

Foreign accent and characters

Singular / Plural conversion

Punctuations removal / replacement

Noise and incomplete data

Naming convention misused, same name but different meaning

Data Preprocessing

Part Of Speech Tags

Data Preprocessing

Polarity tagging using sentiment lexicon

The Word

BESTPart of Speech Tag

ADJ

Sentiment Lexicon

Tag

+VE

Occurrence

HIGH

Findings

Part of Speech Tagging (POS) using Brill Tagger - NO PROBLEM

-95% accuracy of POS tagging words after data cleaning

Findings

Polarity tagging using sentiment lexicon – BIG PROBLEM

-40% sentiment words not found in sentiment lexicon

-10% sentiment words with a positive or negative polarity found are in the neutral section of sentiment lexicon

Problems

Sentiment lexicon not comprehensive

Domain Independent Sentiment Words

Domain Dependent Sentiment Words

Solutions

Rule Based Mining

Relation Based Mining

Rule Based Mining

Relation Based Mining

Analysis - Bayesian

To determine polarity of sentiments

P(X | Y) = P(X) P(Y | X) / P(Y)

Probability that a sentiments is positive or negative, given it's contents

P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)

Validation

• Precision = N (agree & found) / N (found)

• High precision means most of the correct sentiment words are found by the system

• Recall = N (agree & found) / N (agree)

• High recall means most of found sentiment words are correctly labeled by the system

Validation Results

Validation Results

It is found that out of the 350 aspect-unlabelled sentiment word pairs,

294 are founded by the methods. Thus, the precision is about 84%.

The recall : 276 words are corrected labelled by the system, which is about 78%

Application

Reviews Rating

Aspect Rating

Summary of reviews