Text Sentiment analysis

download Text Sentiment analysis

of 59

Transcript of Text Sentiment analysis

  • 7/22/2019 Text Sentiment analysis

    1/59

    Deep learning for SentimentanalysisPRESENTER: HNG D PHAN INSTITUTION OFINFORMATION TECHNOLOGY

  • 7/22/2019 Text Sentiment analysis

    2/59

    Outline

    1. Introduction2. Sentiment analysis approaches

    3. Overview of deep learning for applications.

    4. Deep learning for sentiment detection.

    5. Future research direction

  • 7/22/2019 Text Sentiment analysis

    3/59

    1. Introduction

    Each sentence and paragraph contains it own sentiment feature.

    With sentence:

    This is a good moviepositive comment.

    This movie contains bad words, bad characters and unrelated scennegative comments

  • 7/22/2019 Text Sentiment analysis

    4/59

    1. Introduction

  • 7/22/2019 Text Sentiment analysis

    5/59

    1. Introduction

    Purpose of sentiment detection: Classification the comment.

    Extract relationship between sentences in a paragraph.

    Judgment and evaluation

    Emotional state

    intended emotional communication

  • 7/22/2019 Text Sentiment analysis

    6/59

    Outline

    1. Introduction2. Sentiment analysis approaches

    3. Overview of deep learning for applications.

    4. Deep learning for sentiment detection.

    5. Future research direction

  • 7/22/2019 Text Sentiment analysis

    7/59

    2. Sentiment analysis approachesIssues:

    Classifying the polarity of a given text at the document, sentence, or feature/aspect level

    Beyond polarity sentiment classification looks: angry, happy, sad, etc

    Early work of Polarity detection:

    Peter D. Turney [1]: The classification of a review is predicted by the average semantic othe phrases in the review that contain adjectives or adverb.

    Bo Pang and Lillian Lee [2]: Exploiting class relationships for sentiment categorization wirating scales.

    Benjamin Snyder and Regina Barzilay [3]: focus on restaurant reviews, analyzing specificrestaurant.

  • 7/22/2019 Text Sentiment analysis

    8/59

    Peter D. Turney [1] Purpose: classification of film reviews.

    Provide a simple unsupervised learning algorithm for classifying reviews as recom(thumbs up) or not recommended (thumbs down).

    The classification of a review is predicted by the average semantic orientation of the review that contain adjectives or adverbs.

    In this paper, the semantic orientation of a phrase is calculated as the mutual inforbetween the given phrase and the word excellent minus the mutual informationgiven phrase and the word poor.

  • 7/22/2019 Text Sentiment analysis

    9/59

    Peter D. Turney [1]

    1) Identify phrase in input

    text contain ADJ and

    adverbs

    2) Estimate the semantic

    orientation of each

    extracted phrase

    3) assign the given review

    to a class, recommended

    or not recommended,

    Part of speech tagging

    Pointwise Mutual Information

    (PMI) and Information Retrieval (IR)

  • 7/22/2019 Text Sentiment analysis

    10/59

    PMI-IR method The Pointwise Mutual Information (PMI) between two words, word1 and word2, is define

    follows (Church & Hanks, 1989):

    The Semantic Orientation (SO) of a phrase, phrase, is calculated here as follows:

    Update the SO based on phrase in hits (matching in the document):

  • 7/22/2019 Text Sentiment analysis

    11/59

    Peter D. Turney [1]

  • 7/22/2019 Text Sentiment analysis

    12/59

    Peter D. Turney [1]

    Disadvantages: average SO tends to err on the side of guessing that not recommended, when it is actually recommended.

  • 7/22/2019 Text Sentiment analysis

    13/59

    Bo Pang and Lillian Lee [2]

    Determine authors evaluation with respect to a multi-point scale (one tstar).

    2 main steps:

    Evaluating human performance at the task

    Applying a meta-algorithm, based on metric-labeling formulation ofproblem, that alters a given n-ary classifiers output in an explicit atteensure that similar item receive similar labels.

  • 7/22/2019 Text Sentiment analysis

    14/59

    Bo Pang and Lillian Lee [2]

    The idea of metric labeling is provided by JON KLEINBERG AND ETARDOS ([28]).

    Extract the cost of the labeling, which represents for the error in labelltotal cost:

    Metric labeling: minimize the cost.

  • 7/22/2019 Text Sentiment analysis

    15/59

    Bo Pang and Lillian Lee [2]Explicitly incorporates information about item similarities together wisimilarity information (for instance, one star. is closer to .two stars. thastars.) is to think of the task as one of metric labeling (Kleinberg and T2002), where label relations are encoded via a distance metric.

    To detect the similarity between items and labels, 3 algorithm has beeresearched based on Support Vector Machines:

    1. One-vs-all

    2. Regression3. Metric labeling

    Consider what item similarity measure to apply, proposing one based opositive-sentence percentage.

  • 7/22/2019 Text Sentiment analysis

    16/59

    Bo Pang and Lillian Lee [2]One-vs-all

    Each training point belongs to one of N different classes. The goal is toa function which, given a new data point, will correctly predict the clathe new point belongs [5].

    (i) Solve K different binary problems: classify class k" versus the resfor k = 1; .;K.

    (ii) Assign a test sample to the class giving the largest fk (x) (most pvalue, where fk (x) is the solution from the kth problem

    Purpose: Classify reviews as output labels (score rank) and evaluateaccuracy.

  • 7/22/2019 Text Sentiment analysis

    17/59

  • 7/22/2019 Text Sentiment analysis

    18/59

    Bo Pang and Lillian Lee [2]Regression

    the idea is to find the hyperplane that best the training data, but wtraining points whose labels are within distance of the hyperplaneloss:

    , is the negative of the distance between l and the value for x by the filted hyperplane function

    Koppel and Schler (2005) found that applying linear regression todocuments (in a different corpus than ours) with respect to a threerating scale provided greater accuracy than OVA SVMs and otheralgorithms.

  • 7/22/2019 Text Sentiment analysis

    19/59

    Bo Pang and Lillian Lee [2]Metric labeling

    Let d be a distance metric on labels, and let nnk(x) denote the k nneighbors of item x according to some item-similarity.

    Then, it is quite natural to pose our problem as finding a mappinginstances x to labels lx (respecting the original labels of the trainininstances) that minimize

  • 7/22/2019 Text Sentiment analysis

    20/59

    Bo Pang and Lillian Lee [2] To detect the similarity between item, a traditional measure has us

    overlap-based measure such as the cosine between term-frequencdocument vectors.

    Ratings can be determined by the positive-sentence percentage (Ptext, i.e., the number of positive sentences divided by the number

    subjective sentences.

  • 7/22/2019 Text Sentiment analysis

    21/59

    Benjamin Snyder and Regina Barzilay Input: in a restaurant review such opinions may include food, ambien

    service

    Algorithm: The Grief algorithm-jointly learns ranking models for iaspects by modeling the dependencies between assigned ranks .

    Analyzing meta-relations between opinions, such as agreement and c

    Models the dependencies between different labels via the agreement

  • 7/22/2019 Text Sentiment analysis

    22/59

    Benjamin Snyder and Regina Barzilay M-aspect ranking model contains m+1 components ((w[1], b[1]),(wb[m]), a). The first m components are individual ranking model, one aspect, the final is agreement model

    Predict a joint rank for the m aspects which satisfies the individual ramodels as well as the agreement model.

    The decoder then predicts the m ranks which minimize the overall gri

  • 7/22/2019 Text Sentiment analysis

    23/59

    Benjamin Snyder and Regina Barzilay

  • 7/22/2019 Text Sentiment analysis

    24/59

    2. Sentiment analysis approaches Objects to analysis:

    Text content (adjective, adverb).

    The accuracy of review.

    Multiple feature/aspect.

    Method:

    Extension of Support Vector Machine.

    Unsupervised learning

    Disadvantage: The order of words is ignored and important informa

  • 7/22/2019 Text Sentiment analysis

    25/59

    Outline1. Introduction

    2. Sentiment analysis approaches

    3. Overview of deep learning for applications.

    4. Deep learning for sentiment detection.

    5. Future research direction

  • 7/22/2019 Text Sentiment analysis

    26/59

    3. Overview of deep learning for applica

    Deep learning is a set of algorithms in machine learning that attempin multiple levels of representation, corresponding to different levelabstraction. It typically uses artificial neural networks. [11]

    Deep learning application:

    Hand writing recognition.

    Speech processing.

  • 7/22/2019 Text Sentiment analysis

    27/59

    Neural network

    Artificial neural networks are models inspired byanimal central nervous systems (in particular thebrain) that are capable of machine learning and patternrecognition. They are usually presented as systems ofinterconnected "neurons" that can compute valuesfrom inputs by feeding information through thenetwork.

    Main components:

    Input, output

    Weight.

    Activation function.

  • 7/22/2019 Text Sentiment analysis

    28/59

    The simplest model- the Perceptron

    Learning:

  • 7/22/2019 Text Sentiment analysis

    29/59

    Activation function

    This is similar to the behavior of the linearperceptron in neural networks

    However, its a nonlinear function, whichallows such networks to compute nontrivialproblems using only a small number of nodes.

    http://en.wikipedia.org/wiki/Linear_perceptronhttp://en.wikipedia.org/wiki/Linear_perceptronhttp://en.wikipedia.org/wiki/Linear_perceptronhttp://en.wikipedia.org/wiki/Linear_perceptronhttp://en.wikipedia.org/wiki/Neural_networkshttp://en.wikipedia.org/wiki/Neural_networkshttp://en.wikipedia.org/wiki/Linear_perceptron
  • 7/22/2019 Text Sentiment analysis

    30/59

    Types of Artificial Neural Network:

    Types of Artificial Neural Network:

    The feed forward neural network was the first and arguably most simof artificial neural network devised. In this network the informationonly one directionforwards: From the input nodes data goes throhidden nodes (if any) and to the output nodes.

    Recurrent neural networks (RNNs) are models with bi-directional dWhile a feed forward network propagates data linearly from input toRNNs also propagate data from later processing stages to earlier stacan be used as general sequence processors.

  • 7/22/2019 Text Sentiment analysis

    31/59

    The Boltzmann machine A Boltzmann machine is a network of units with an "energy" defined for th

    It also has binary units, but unlike Hopfield nets, Boltzmann machine unitsstochastic. The global energy, E, in a Boltzmann machine is identical in foa Hopfield network:

    Problems:

    the time the machine must be run in order to collect equilibrium statistics grows exponenmachine's size, and with the magnitude of the connection strengths

    connection strengths are more plastic when the units being connected have activation prointermediate between zero and one, leading to a so-called variance trap. The net effect is causes the connection strengths to random walk until the activities saturate.

  • 7/22/2019 Text Sentiment analysis

    32/59

    Restricted Boltzmann Machines RBM Boltzmann Machines (BMs) are a particular form of log-linear Markov Random F

    i.e., for which the energy function is linear in its free parameters. Advantages: Not allow intralayer connectionbetween hidden-hidden and between

    The energy function E(v,h) of an RBM is defined as:

  • 7/22/2019 Text Sentiment analysis

    33/59

    Deep learning stepsTwo main steps:

    1. Pre-trained one layer at a time: treating each layer in turn as aunsupervised restricted Boltzmann machine (RBM).

    2. Fine-tuning: using supervised back propagation.

    The resulting model is called a deep belief network, and may be builother building blocks than RBMs

  • 7/22/2019 Text Sentiment analysis

    34/59

    Deep believe network training1. Train the first layer as an RBM that models the raw input x =h(0) as its visible layer

    2. Use that first layer to obtain a representation of the input that will be used as data flayer. Two common solutions exist. This representation can be chosen as being the mactivations p(h(1) =1| h(0) ) or samples of p(h(1) | h(0) ).

    3. Train the second layer as an RBM, taking the transformed data (samples or mean atraining examples (for the visible layer of that RBM).

    4. Iterate (2 and 3) for the desired number of layers, each time propagating upward eior mean values.

    5. Fine-tune all the parameters of this deep architecture with respect to a proxy for thelikelihood, or with respect to a supervised training criterion (after adding extra learninto convert the learned representation into supervised predictions, e.g. a linear classifie

  • 7/22/2019 Text Sentiment analysis

    35/59

    3.2. Deep learning applicationHand-writing recognition:

    The MNIST dataset consists of handwritten digit images and it is div60,000 examples for the training set and 10,000 examples for testing

    In Dan Claudiu Ciresand Ueli Meier [15]:

    Multi layer perceptron (MLP).

    Train 5 MLPs with 2 to 9 hidden layers and varying numbers of hidden units. Malways the number of hidden units per layer decreases towards the output layer.

  • 7/22/2019 Text Sentiment analysis

    36/59

    3.2. Deep learning applicationIn [15]:

  • 7/22/2019 Text Sentiment analysis

    37/59

    3.2. Deep learning applicationSpeech recognition:

    In George Hilton [17], deep neural networks is used to make acoustifor speech recognition.

    Most current speech recognition systems use hidden Markov modelsto deal with the temporal variability of speech and Gaussian mixture(GMMs) to determine how well each state of each HMM fits a framwindow of frames of coefficients that represents the acoustic input.

    To evaluate the fit: use a feed-forward neural network

    Input: Frames of coefficients.

    Output: posterior probabilities over HMM states

  • 7/22/2019 Text Sentiment analysis

    38/59

    Outline1. Introduction

    2. Sentiment analysis approaches

    3. Overview of deep learning for applications.

    4. Deep learning for sentiment detection.

    5. Future research direction

  • 7/22/2019 Text Sentiment analysis

    39/59

    4. Deep learning for sentiment analys General approaches: use semantic word space.

    Semantic word spaces have been very useful but cannot exprelonger phrases in a principled way.

    Solution: Sentiment Treebank, with 215,154 phrases in the par11,855 sentences.

    Recursive Neural Tensor Network: predict compositional semapresent in new corpus

  • 7/22/2019 Text Sentiment analysis

    40/59

    4. Deep learning for sentiment analysExample of the Recursiv

    Network accurately pre

    classes, very negative to

    0, +, + +), at every node

    capturing the negation

    sentence.

  • 7/22/2019 Text Sentiment analysis

    41/59

    Recursive Neural Tensor Network RN Represent a phrase through word vectors and a parse tree and t

    vectors for higher nodes in the tree using the same tensor-basefunction.

    Related area research:

    Semantic Vector Spaces.

    Compositionality in Vector Spaces.

    Logical Form

    Deep Learning

    Sentiment analysis

  • 7/22/2019 Text Sentiment analysis

    42/59

    Semantic Vector Spaces The dominant approach in semantic vector spaces uses distribu

    similarities of single words.

    Variants of this idea use more complex frequencies such as hoappears in a certain syntactic context (Pado and Lapata, 2007; 2008).

    To overcome this, neural vector (Bengio, 2003) approach has bimplemented.

  • 7/22/2019 Text Sentiment analysis

    43/59

    Compositionality in Vector Spaces Compositionality algorithms: related datasets capture two wor

    :Mitchell and Lapata (2010) [24] two-word phrases and analycomputed by vector addition, multiplication and others.

    Some related models:

    Holographic reduced representations (Plate, 1995- [21]).

    compositional matrix space model (Rudolph and Giesbrecht,

  • 7/22/2019 Text Sentiment analysis

    44/59

    Compositionality in Vector SpacesCompositional matrix space model:

    Assigns ordinal sentiment scores to phrases.

    Account for critical interactions among the words in each sentimenphrase.

    The score of phrase i:

    Wk : d word of phr

    Represen

  • 7/22/2019 Text Sentiment analysis

    45/59

    Compositionality in Vector SpacesCompositional matrix space model (continue):

  • 7/22/2019 Text Sentiment analysis

    46/59

    Compositionality in Vector SpacesWith Stanford system:

    Recursive neural network (RNN)

    matrix-vector RNNs .

    New algorithm: Recursive Neural Tensor Network (RNTN).

  • 7/22/2019 Text Sentiment analysis

    47/59

    Recursive Neural Model Translate input text to vector.

    Compute parent vector in a bottom up fashion using different typecompositionality functions g.

    . Not very good .

    0 0 +

    ..

    -

    P1= g(b, c)

    P2= g(p1, a)

  • 7/22/2019 Text Sentiment analysis

    48/59

    Recursive Neural Network Two children vector is computed:

    = (

    ) = (

    )

    f : tanh function, standard element-wise nonlinearity.

    Compute label value by soft-max classifier:

    = softmax( a)

  • 7/22/2019 Text Sentiment analysis

    49/59

  • 7/22/2019 Text Sentiment analysis

    50/59

    Recursive Neural Tensor Network RN Provide an interaction that would allow the model to have greate

    between the input vectors. RNTN: The main idea is to use the same, tensor-based compositi

    for all nodes.

    A single layer tensor:

  • 7/22/2019 Text Sentiment analysis

    51/59

    Recursive Neural Tensor Network RN

  • 7/22/2019 Text Sentiment analysis

    52/59

    Tensor Backprop through StructureThe error as a function of the RNTN parameters = (V;W;Ws;L) for

    is:

    The full derivative for slice V[k] for this tri-gram tree then is the sumnode:

  • 7/22/2019 Text Sentiment analysis

    53/59

    Recursive Neural Tensor Network RN

  • 7/22/2019 Text Sentiment analysis

    54/59

    Stanford Sentiment analysis source coHave library in java, C# and python

    Extract from input text:

    POS, NER: CRF tagging.

    Parsed sentiment tree.

    Online demo:

    nlp.stanford.edu:8080/sentiment/rntnDemo.html

    http://nlp.stanford.edu/sentiment/treebank.html

    http://nlp.stanford.edu/sentiment/treebank.htmlhttp://nlp.stanford.edu/sentiment/treebank.htmlhttp://nlp.stanford.edu/sentiment/treebank.htmlhttp://nlp.stanford.edu/sentiment/treebank.html
  • 7/22/2019 Text Sentiment analysis

    55/59

    Stanford Sentiment analysis source coInput text: Stanford University is located in California. It is a great

    founded in 1891.

    Stanford

    Stanford

    0

    8

    NNP

    ORGANIZATION

    PER0

    (ROOT (S (NP (PRP It)) (VP (VB

    great) (NN university)) (, ,) (VP (VBN fo

    (CD 1891)))))) (. .)))

  • 7/22/2019 Text Sentiment analysis

    56/59

    Stanford Sentiment analysis source co

  • 7/22/2019 Text Sentiment analysis

    57/59

    Outline1. Introduction

    2. Sentiment analysis approaches

    3. Overview of deep learning for applications.

    4. Deep learning for sentiment detection.

    5. Future research direction

  • 7/22/2019 Text Sentiment analysis

    58/59

    5. Future research direction Overview of deep learning in sentiment detection.

    Other sentiment analysis researches:

    Sentiment Treebank.

    Paragraph positive/negative detection.

    With researches in Vietnamese language: Vietnamese Treebank (VLSP).

    Word and phrase processing.

  • 7/22/2019 Text Sentiment analysis

    59/59

    THANK YOU FOR YOUR ATTENTION