SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

27
SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT BY, SOWMYA KAMATH, ANUSHA BAGAL KOTHKAR, KUMARI POORNIMA, SHIVAM PANDEY AND ASHESH KHANDELWAL

description

BY, SOWMYA KAMATH, ANUSHA BAGAL KOTHKAR, KUMARI POORNIMA, SHIVAM PANDEY AND ASHESH KHANDELWAL. SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT. overview. Introduction Approaches to Sentiment Analysis Sentiment Analysis Applications - PowerPoint PPT Presentation

Transcript of SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Page 1: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER

CONTEXT IN WEB CONTENT

BY,SOWMYA KAMATH,

ANUSHA BAGAL KOTHKAR,

KUMARI POORNIMA,

SHIVAM PANDEY

AND

ASHESH KHANDELWAL

Page 2: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

IntroductionApproaches to Sentiment AnalysisSentiment Analysis ApplicationsCurrent development in Sentiment

AnalysisConclusionFuture workReferences

OVERVIEW

Page 3: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Sentiment analysis, also known as opinion mining is the computational study of opinions, sentiments and emotions expressed in natural language for the purpose of decision making

Sentiment analysis applies natural language processing techniques and computational linguistics to extract information about sentiments expressed by authors and readers about a particular subject, thus helping users in making sense of huge volume of unstructured Web data

For example: In our day to day lives, we highly value the opinions of friends in making decisions about issues like which brand to buy or which movie to watch

INTRODUCTION

Page 4: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Two types of textual information on webo Factso OpinionsCurrently available search engines search for facts

using machine readable information In today’s web, lot of opinioned text is available in

various forms, for example, as reviews, blogs, news articles, discussion groups and social networking sites

Analyzing opinions is very important for making decisions

Example-new cell phoneSentiment analysis is currently a very significant

trend in the area of natural language processing.

INTRODUCTION

Page 5: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Natural language processing involves giving artificial intelligence to computers and is concerned with promoting and understanding of human languages for machine’s use

Sentiment analysis extracts opinions, sentiments, and emotions from text and analyses them.

Sentiment classification can be done at three levelso Document levelo Sentence levelo Feature level

INTRODUCTION

Page 6: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

A document can be classified into two classes positive and negative based on overall sentiment expressed by its writer

Classification can be done based on four pairs of human emotions, namely,

1. “Joy Sadness”,2. “Acceptance Disgust”3. “Anticipation Surprise” and 4. “Fear Anger”

DOCUMENT LEVEL CLASSIFICATION

Page 7: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Sentence level sentiment analysis has two tasks, subjectivity classification and sentiment classification

Information in a sentence can be of two types, objective information and subjective information

Subjectivity classification involves identifying whether the sentence is subjective or objective

Sentiment classification is further classifying the subjective information as positive or negative

For example consider the following snippet of text - “I bought an iPhone a few days ago. It is a great phone.”

SENTENCE LEVEL ANALYSIS

Page 8: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Feature level classification comprises of three main tasks

First step is to identify and extract the featuresThe next step is to determine whether the opinions on

the features are positive, negative or neutralFinal task is to group the feature synonyms It has been found that document level and sentence

level classification are not enough to identify each and every one detail about sentiments expressed in a document as sentiments may be expressed with respect to different features.

For example, a phone may have a rating of 4 out of 5 for speed, 2 out of 5 ease of use, 3 out of 5 for battery, etc.

FEATURE LEVEL EXTRACTION

Page 9: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Sentiment analysis classifies the opinions into positive and negative categories

Knowing the reasons behind classifying the sentiment provides better perception

These reasons are called as sentiment topics associated with the sentiment

The proposed method collects web content and extracts snippets from them. Snippets are keywords like brand names

Then a sentiment score is calculated for each snippet based on which they are classified into different categories to create a sentiment taxonomy

Topics related to each category are then identified

APPROACHES TO SENTIMENT ANALYSIS

Page 10: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Hogenboom proposed a method which considers the negation scope and strength of a word while classifying whether a word has positive or negative effect on the sentence

For example, let us consider two sentences “I am happy with your performance” and “I am not that happy with your performance”

The first sentence expresses a positive emotion If we just consider the negative keyword “not” then the second

sentence would be equivalent to “I am not happy with your performance” which is not correct

If scope and strength of the negative keywords are considered while deciding its effect then it would give better results

The proposed approach uses two algorithms; the first one is used to calculate sentence score for each word

In the second algorithm, the sentence score is calculated using the word sense and word score with respect to each negative keyword.

If the calculated sentence score is less than zero, then it is assigned to a negative class

APPROACHES TO SENTIMENT ANALYSIS

Page 11: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Methods to analyze sentiment include machine learning, statistical methods, building a knowledge base and identifying keywords

To recognize effective information from text, sentence level analysis is required.

Shaikh et al. developed a tool called SenseNet, that assigns numerical valence values and output sense value for each sentence.

The input paragraph is divided into a set of sentences and each sentence is further divided into triplets.

Valence values are assigned to the words in the triplet.

These triplets are then processed to calculate the sentence level sentiment valence

APPROACHES TO SENTIMENT ANALYSIS

Page 12: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

An overall view about a document does not reveal the sentiments about all aspects of a topic. For example, a person might be happy with the camera, music, games in his cell phone but its battery life may be a problem.

Mapping the sentiment to the correct topic is quite a challenge

The Sentiment Analyzer algorithm presented by Nasukawa et.al. extracts the features related to a topic, and then extracts sentiments of each sentiment bearing phase.

It associates this topic, feature and sentiment to the document

CLASSIFICATION BASED APPROACH

Page 13: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

An approach to classify news video stories and rank them has been presented by Chunxi et.al.

In their approach, the stories were divided into two classes positive class and negative class

The algorithm forms two clusters - one containing positive adjectives and other containing negative ones. A graph based semi-supervised learning approach has been used for this purpose

Similarity between words is calculated to find the sentiment words. The selected sentiment words are used as features for classification

For the visual part, an Affinity Propagation clustering approach is used to determine the ranking of the videos. A linking matrix is used to check similarity between videos. Both text and visual information are combined to rank the video

CLASSIFICATION BASED APPROACH

Page 14: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

CLUSTER FORMED USING CLUSTER ALGORITHM

Page 15: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

A Support Vector Machine (SVM) was used as the classifier algorithm

The other models used for comparison are Naïve Bayes classifier (NB), passive-aggressive classifier (PA), bigram (BI), word(WD), metadata (MT), affix similarity (AS), word emotion (WE) and Cui’s combined word n-grams (CN)

The highest accuracy was achieved when the models SVM, BI, WD, MT, AS and WE were used together

CLASSIFICATION BASED APPROACH

Page 16: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Zhang et al used a method where, based on keyword entered by users, a sentiment graph of sentiment vectors of articles that keyword is plotted

The sentiment graph gives an idea about inclination of articles towards various sentiments.

Machine Learning in Document Level Classification is used to carry out sentiment analysis.

Supervised methods can also be used o support vector machine (SVM) - classifying reviews o Naïve Bayes method – co-occurrence of each wordo Maximum entropy classifier - weightso Entropy method

CLASSIFICATION BASED APPROACH

Page 17: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Lacking conscious awareness of websites sentiment bias may result in blind obedience to the reported information

Given a topic, Zhang et al proposed a system that extracts relevant subtopics and presents sentiment difference between different subtopics

The system analyses a given sentiment in four dimensions, which is more similar to human emotion than conventional positive-negative sentiment and detects sentiment bias. In the system, articles are crawled and the part of speech tagging is done on them

Weight for each extracted word from article is calculated using

SENTIMENT ANALYSIS APPLICATIONS

Page 18: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

where N(w, Pi) is the number of times that word w appears in article Pi, N(Pi) is the number of words extracted from Pi, N is the number of all collected news articles, and N(w) is the number of articles in which word w appears. a sentiment dictionary is constructed which contains a word and its sentiment value. Sentiment value consists of scale value and weight value for four dimensions.

SENTIMENT ANALYSIS APPLICATIONS

Page 19: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Sentiment value is calculated using probability functions for each article. For a particular year (Y) edition for a particular newspaper, the number of articles which include any word in the set e of original sentiment words in Table 1 be df(Y, e), and the number of articles which include both target word w and any word in e be df(Y,e&w)

Next interior division ratio and scale value is calculated using

SENTIMENT ANALYSIS APPLICATIONS

Page 20: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

A word may appear in number of editions and number of times in various editions. To consider this, weight factor is calculated using

A sentiment value Oe(P) of article P on dimension e is calculated as follows

SENTIMENT ANALYSIS APPLICATIONS

Page 21: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

ORIGINAL SENTIMENT WORDS FOR THE FOUR DIMENSIONS

Page 22: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Celikyilmaz et al. considered that twitter messages are of two types - polar and non polar (neutral).

They present a probabilistic model based sentiment analysis approach for twitter messages. Their technique analyzes sentiments of polar text. As the twitter messages are human generated, it is very difficult to interpret its meaning correctly sometimes even by humans and there may be a lot of noise in it, in the form of slang, shorthand etc. The method proposed first does text normalization followed by pronunciation based clustering.

For example, 4get is same as forget. Then, polarity lexicon extraction is done using a mixture model. The authors state that this analysis can be further improved by interpreting the similarity distance between words; for example, love, lovwww, loveee and luv as one entity ’love

CURRENT DEVELOPMENT IN SENTIMENT ANALYSIS

Page 23: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Analyzing e-learning blogs and reviews can help in providing better services to the users and improve the teaching -learning process

Jensen et al. proposed a technique by which about 150,000 twitter messages were analyzed. The results obtained conveyed that 19% mentioned a brand name, and 20% expressed sentiments about brands, among which about 50% spoke positively and 33% spoke negatively.

CURRENT DEVELOPMENT IN SENTIMENT ANALYSIS

Page 24: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Extensive research has been carried out in the field of sentiment analysais - text sentiment classifiers, effect analysis, automatic survey analysis, opinion extraction, or recommender systems

In this paper, they have presented different approaches available to analyze sentiment at different levels.

Based on the needs of the data to be analyzed, a particular approach can be chosen.

For example, to analyze reviews about a mobile, feature-level sentiment analysis can be carried out. This will help in knowing user’s opinion with respect to various features

CONCLUSION

Page 25: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

Applying data mining techniques on e-learning reviews and studying e-learning blogs are some of the challenges faced in improving the accuracy of the proposed system further.

Sentiment analysis of twitter messages can help in making financial, marketing, political decisions. People use tweets to express their opinion about something.

They plan to design and develop a system for detecting and visualizing sentiment bias in online articles

The proposed system will be able to dynamically summarize the sentiment for different subtopics and for different websites.

They plan to construct a model which can automatically calculate credibility scores for articles based on sentiment difference between subtopics and between websites.

FUTURE WORK

Page 26: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

B. Pang and L. Lee, “Opinion Mining and Sentiment Analysis”,Foundations and Trends in Information Retrieval 2(1-2), 2008

Hogenboom, A.; van Iterson, P.; Heerschop, B.; Frasincar, F.; Kaymak, U.; , "Determining negation scope and strength in sentiment analysis," Systems, Man, and Cybernetics (SMC), 2011 IEEE International Conference on , vol., no., pp.2589-2594, 9-12 Oct. 2011

Chunxi Liu; Li Su; Qingming Huang; Shuqiang Jiang; , "News video story sentiment classification and ranking," Multimedia and Expo (ICME), 2011 IEEE International Conference on , vol., no., pp.1-6, 11-15 July 2011

Hajmohammadi, M., Ibrahim, R., Ali Othman, Z.. Opinion Mining and Sentiment Analysis: A Survey. International Journal of Computers & Technology, North America, 2, jun. 2012

REFERENCES

Page 27: SENTIMENT ANALYSIS BASED APPROACHES FOR UNDERSTANDING USER CONTEXT IN WEB CONTENT

THANK YOU