Sentiment Analysis

download Sentiment Analysis

of 15

description

Senitment Analysis Complete Tutorial

Transcript of Sentiment Analysis

  • Sentiment AnalysisAn Overview of Concepts and Selected Techniques

  • TermsSentimentA thought, view, or attitude, especially one based mainly on emotion instead of reasonSentiment Analysis aka opinion mininguse of natural language processing (NLP) and computational techniques to automate the extraction or classification of sentiment from typically unstructured text

  • MotivationConsumer informationProduct reviewsMarketingConsumer attitudesTrendsPoliticsPoliticians want to know voters viewsVoters want to know policitians stances and who else supports themSocialFind like-minded individuals or communities

  • ProblemWhich features to use?Words (unigrams)Phrases/n-gramsSentencesHow to interpret features for sentiment detection?Bag of words (IR)Annotated lexicons (WordNet, SentiWordNet)Syntactic patternsParagraph structure

  • ChallengesHarder than topical classification, with which bag of words features perform wellMust consider other features due toSubtlety of sentiment expressionironyexpression of sentiment using neutral words Domain/context dependencewords/phrases can mean different things in different contexts and domainsEffect of syntax on semantics

  • ApproachesMachine learningNave BayesMaximum Entropy ClassifierSVMMarkov Blanket ClassifierAccounts for conditional feature dependenciesAllowed reduction of discriminating features from thousands of words to about 20 (movie review domain)Unsupervised methodsUse lexicons

    Assume pairwise independent features

  • LingPipe Polarity ClassifierFirst eliminate objective sentences, then use remaining sentences to classify document polarity (reduce noise)

  • LingPipe Polarity ClassifierUses unigram features extracted from movie review dataAssumes that adjacent sentences are likely to have similar subjective-objective (SO) polarityUses a min-cut algorithm to efficiently extract subjective sentences

  • LingPipe Polarity ClassifierGraph for classifying three items.

  • LingPipe Polarity ClassifierAccurate as baseline but uses only 22% of content in test data (average)Metrics suggests properties of movie review structure

  • SentiWordNetBased on WordNet synsetshttp://wordnet.princeton.edu/Ternary classifierPositive, negative, and neutral scores for each synsetProvides means of gauging sentiment for a text

  • SentiWordNet: ConstructionCreated training sets of synsets, Lp and LnStart with small number of synsets with fundamentally positive or negative semantics, e.g., nice and nastyUse WordNet relations, e.g., direct antonymy, similarity, derived-from, to expand Lp and Ln over K iterationsLo (objective) is set of synsets not in Lp or LnTrained classifiers on training setRocchio and SVMUse four values of K to create eight classifiers with different precision/recall characteristicsAs K increases, P decreases and R increases

  • SentiWordNet: Results24.6% synsets with Objective
  • SentiWordNet: How to use itUse score to select features (+/-)e.g. Zhang and Zhang (2006) used words in corpus with subjectivity score of 0.5 or greaterCombine pos/neg/objective scores to calculate document-level scoree.g. Devitt and Ahmad (2007) conflated polarity scores with a Wordnet-based graph representation of documents to create predictive metrics

  • Referenceshttp://www.answers.com/sentiment, 9/22/08B. Pang, L. Lee, and S. Vaithyanathan, Thumbs up? Sentiment classification using machine learning techniques, in Proc Conf on Empirical Methods in Natural Language Processing (EMNLP), pp. 7986, 2002.Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In: Proc of LREC 2006 - 5th Conf on Language Resources and Evaluation, 2006.Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining. TREC 2006 Blog Track, Opinion Retrieval Task.Devitt A, Ahmad K.Sentiment Polarity Identification in Financial News: A Cohesion-based Approach. ACL 2007.Bo Pang , Lillian Lee, A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.271-es, July 21-26, 2004.

    *1. Subjective vs objective information

    2. Essentially the same as other information retrieval tasks, but with some additional challenges as we will see

    *Review info from blogs, newsgroups, etc

    Consumer attitudes towards-companys products-competitors products

    Politics-can form basis of policy decisions*Lead in: these problems are similar to other IR tasksHave a body of text---need to know how to classify it

    GRANULARITY--Most research has used unigrams (single words)--some research shows that k-length n-grams work best

    --------------------------------------------------------Wordnet: Contains large lexicon with relationshipsSynonymy, antonymy, etc

    Syntactic patternsIndirect negationSetup/contradiction*

    [it] avoids all cliches and predictability found in Hollywood moviesavoids reverses polarity of cliches and predictability

    Thwarted expectation:This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can't hold up"

    unpredictable: good for movie plot, bad for car steering*

    Machine learning Strengths:-perform fairly well within a given domain with sufficient training dataWeaknesses:--in a given domain tends to overfit training data; hard to transfer learning to other domains--need training data

    UnsupervisedStrengths--domain independent; prior polarity--may aid machine learning techniquesweaknesses:--when used alone, does not perform as well as machine learning w/in a given domain

    ***Document with three sentences: Y, M, N nodes in the graph

    Assign weights for each nodes (sentences) preference for being in each of two classes (positive or negative)

    Assign weights for each nodes (sentences) preference for being in the same class as adjacent nodes.*Also shows performance of different classifiers*Wordnet: lexical resource developed at princeton

    A Synset represents a distinct semantic concept --contains a set of synonymous words

    ****