Fypca5
Transcript of Fypca5
Identify Potential Hotel
Predict what ASPECTS customers like
Sales and Margin
Sentiment Analysis
User’s Opinions in Hotel
Some Limitations of machines
Unable to read like a human
Cannot detect sarcasm
Expression of sentiments in different topic and domain
Polarity analysis
Facts Vs Opinion
Some machine limitation examples
“The service is as good as none”. Negation not obvious to machine
“Swimming pool is big enough to swim with comfort” , “There is a big crowd at the counter complaining”. Polarity might change with context.
“The room is warmer than the lobby”. Comparisons are hard to classify
Sentiment Analysis
Prediction of sentence polarity
Classification of polarity for sentiment lexicon
Detection of relations
Cleaning The “Dirty” Reviews
Frequent problem : Data inconsistencies
Duplicate data
Spelling Errors != Trim from data
Foreign accent and characters
Singular / Plural conversion
Punctuations removal / replacement
Noise and incomplete data
Naming convention misused, same name but different meaning
Data Preprocessing
Polarity tagging using sentiment lexicon
The Word
BESTPart of Speech Tag
ADJ
Sentiment Lexicon
Tag
+VE
Occurrence
HIGH
Findings
Part of Speech Tagging (POS) using Brill Tagger - NO PROBLEM
-95% accuracy of POS tagging words after data cleaning
Findings
Polarity tagging using sentiment lexicon – BIG PROBLEM
-40% sentiment words not found in sentiment lexicon
-10% sentiment words with a positive or negative polarity found are in the neutral section of sentiment lexicon
Problems
Sentiment lexicon not comprehensive
Domain Independent Sentiment Words
Domain Dependent Sentiment Words
Analysis - Bayesian
To determine polarity of sentiments
P(X | Y) = P(X) P(Y | X) / P(Y)
Probability that a sentiments is positive or negative, given it's contents
P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
Validation
• Precision = N (agree & found) / N (found)
• High precision means most of the correct sentiment words are found by the system
• Recall = N (agree & found) / N (agree)
• High recall means most of found sentiment words are correctly labeled by the system
Validation Results
It is found that out of the 350 aspect-unlabelled sentiment word pairs,
294 are founded by the methods. Thus, the precision is about 84%.
The recall : 276 words are corrected labelled by the system, which is about 78%