CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment

LSA.303 Introduction to Computational Linguistics

Dan JurafskyLecture 6: Emotion

CS 424P/ LINGUIST 287Extracting Social Meaning and Sentiment1In the last 20 yearsA huge body of research on emotionJust one quick pointer: Ekman: basic emotions:

2

Ekmans 6 basic emotionsSurprise, happiness, anger, fear, disgust, sadness

3Ekman and his colleagues did 4 studies to add support for their hypothesis that certain basic emotions (fear, surprise, sadness, happiness, anger, disgust) are universally recognized across cultures. This supports the theory that emotions are evolutionarily adaptive and unlearned. Study #1Ekman, Friesen, and Tomkins: showed facial expressions of emotion to observers in 5 different countries (Argentina, US, Brazil, Chile, & Japan) and asked the observers to label each expression. Participants from all five countries showed widespread agreement on the emotion each of these pictures depicted. Study #2Ekman, Sorenson, and Friesen: conducted a similar study with preliterate tribes of New Guinea (subjects selected a story that best described the facial expression). The tribesmen correctly labeled the emotion even though they had no prior experience with print media. Study #3Ekman and colleagues: asked tribesman to show on their faces what they would look like if they experienced the different emotions. They took photos and showed them to Americans who had never seen a tribesman and had them label the emotion. The Americans correctly labeled the emotion of the tribesmen.Study #4Ekman and Friesen conducted a study in the US and Japan asking subjects to view highly stressful stimuli as their facial reactions were secretly videotaped. Both subjects did show exactly the same types of facial expressions at the same points in time, and these expressions corresponded to the same expressions that were considered universal in the judgment research.

Disgust

AngerFear

Happiness

Surprise

Sadness

Slide from Harinder Aujla4Dimensional approach. (Russell, 1980, 2003)Arousal

High arousal, High arousal, Displeasure (e.g., anger) High pleasure (e.g., excitement) Valence

Low arousal, Low arousal, Displeasure (e.g., sadness) High pleasure (e.g., relaxation)

Slide from Julia Braverman6Image from Russell 1997

valence-+arousal-Image fromRussell, 19976EngagementLearning (memory, problem-solving, attention)MotivationDistinctive vs. Dimensional approach of emotionDistinctive

Emotions are units.Limited number of basic emotions.Basic emotions are innate and universalMethodology advantageUseful in analyzing traits of personality.Dimensional

Emotions are dimensions.Limited # of labels but unlimited number of emotions.Emotions are culturally learned.Methodological advantage:Easier to obtain reliable measures.Slide from Julia BravermanFour Theoretical Approaches to Emotion: 1. Darwinian (natural selection)Darwin (1872) The Expression of Emotion in Man and Animals. Ekman, Izard, PlutchikFunction: Emotions evolve to help humans surviveSame in everyone and similar in related speciesSimilar display for Big 6+ (happiness, sadness, fear, disgust, anger, surprise) basic emotionsSimilar understanding of emotion across culturesextended from Julia Hirschbergs slides discussing Cornelius 2000

The particulars of fear may differ, but "the brain systems involved in mediating the function are the same in different species" (LeDoux, 1996)

Four Theoretical Approaches to Emotion: 2. Jamesian: Emotion is experienceWilliam James 1884. What is an emotion? Perception of bodily changes emotionwe feel sorry because we cry afraid because we tremble" our feeling of the changes as they occur IS the emotion"The body makes automatic responses to environment that help us survive Our experience of these reponses consitutes emotion.Thus each emotion accompanied by unique pattern of bodily responsesStepper and Strack 1993: emotions follow facial expressions or posture.Botox studies: Havas, D. A., Glenberg, A. M., Gutowski, K. A., Lucarelli, M. J., & Davidson, R. J. (2010). Cosmetic use of botulinum toxin-A affects processing of emotional language. Psychological Science, 21, 895-900.Hennenlotter, A., Dresel, C., Castrop, F., Ceballos Baumann, A. O., Wohlschlager, A. M., Haslinger, B. (2008). The link between facial feedback and neural activity within central circuitries of emotion - New insights from botulinum toxin-induced denervation of frown muscles. Cerebral Cortex, June 17.

extended from Julia Hirschbergs slides discussing Cornelius 2000Four Theoretical Approaches to Emotion: 3. Cognitive: AppraisalAn emotion is produced by appraising (extracting) particular elements of the situation. (Scherer)Fear: produced by the appraisal of an event or situation as obstructive to ones central needs and goals, requiring urgent action, being difficult to control through human agency, and lack of sufficient power or coping potential to deal with the situation. Anger: difference: entails much higher evaluation of controllability and available coping potentialSmith and Ellsworth's (1985): Guilt: appraising a situation as unpleasant, as being one's own responsibility, but as requiring little effort.

Adapted from Cornelius 200010Spiesman et al 1964All participants watch subincision (circmcisions) video, but withdifferent soundtracks: No sound Trauma narrative: emphasized pain Denial narrative: emphasized joyful ceremony Scientific narrative: detached toneMeasured heart rate and self-reported stress.Who was most stressed?Most stressed: Trauma narrativeNext most stressed: No soundLeast stressed: Denial narrative, Scientific narrativeFour Theoretical Approaches to Emotion: 4. Social ConstructivismEmotions are cultural products (Averill)Explains gender and social group differencesanger is elicited by the appraisal that one has been wronged intentionally and unjustifiably by another person. Based on a moral judgmentdont get angry if you yank my arm accidentallyor if you are a doctor and do it to reset a boneonly if you do it on purposeAdapted from Cornelius 2000Scherers typology of affective statesEmotion: relatively brief eposide of synchronized response of all or most organismic subsysteems in responseto the evaluation of an extyernalor internal event as being of major significanceangry, sad, joyful, fearful, ashamed, proud, desparateMood: diffuse affect state, most pronounced as change in subjective feeling, of low intensity but relatively long duration, often without apparent causecheerful, gloomy, irritable, listless, depressed, buoyantInterpersonal stance: affective stance taken toward another person in a specific interaction, coloring the interpersonal exchange in that situationdistant, cold, warm, supportive, contemptuousAttitudes: relatively unduring, afectively color beleifes, preference,s predispotiions towards objkects or persons liking, loving, hating, valueing, desiringPersonality traits: emotionally laden, stable personality dispositions and behavior tendencies, typical for a personnervous, anxious, reckless, morose, hostile, envious, jealous

Scherers typology

Why Emotion Detection from Speech or Text?Detecting frustration of callers to a help lineDetecting stress in drivers or pilotsDetecting interest, certainty, confusion in on-line tutorsPacing/Positive feedbackLie detectionHot spots in meeting browsersSynthesis/generation:On-line literacy tutors in the childrens storybook domainComputergames14Hard Questions in Emotion RecognitionHow do we know what emotional speech is?Acted speech vs. natural (hand labeled) corporaWhat can we classify?Distinguish among multiple classic emotionsDistinguishValence: is it positive or negative?Activation: how strongly is it felt? (sad/despair)What features best predict emotions?What techniques best to use in classification?Slide from Julia Hirschberg1516Major Problems for Classification:Different Valence/Different Activation

slide from Julia Hirschberg16It is useful to note where direct modeling features fall short: While mean f0 successfully differentiates between emotions with different valence, so long as they have different degrees of activation17But.Different Valence/ Same Activation

slide from Julia Hirschberg17When different valence emotions such as happy and angry have the same activation, simple pitch features are less successful.Accuracy of facial versus vocal cues to emotion (Scherer 2001)

Background: The Brunswikian Lens Modelis used in several fields to study how observers correctly and incorrectly use objective cues to perceive physical or social reality physical or social environment observer (organism) cuescues have a probabilistic (uncertain) relation to the actual objectsa (same) cue can signal several objects in the environmentcues are (often) redundant slide from Tanja Baenziger19Scherer, K. R. (1978). Personality inference from voice quality: The loud voice of extroversion. European Journal of Social Psychology, 8, 467-487.

slide from Tanja BaenzigerExpressed emotion Emotional attributioncuesEmotional communicationexpressed anger ?

encoder

decoderperception ofanger?Vocal cuesFacial cuesGesturesOther cues Loud voiceHigh pitchedFrownClenched fistsShaking

Example:Important issues:- To be measured, cues must be identified a priori- Inconsistencies on both sides (indiv. diff., broad categories)- Cue utilization could be differenton the left and the right side (e.g. important cues not used)slide from Tanja BaenzigerImplications for HMIIf matching is lowExpressed emotion Emotional attributioncuesrelation of the cues to the expressed emotionrelation of the cues to the perceived emotionmatchingRecognition: Automatic recognition system developers should focus on the relation of the cues to expressed emotionGeneration: Conversational agent developers should focus on the relation of the cues to the perceived emotionImportant for ECAsImportant for Automatic Recognitionslide from Tanja BaenzigerExtroversion in Brunswikian LensISimilated jury discussions in German and Englishspeakers had detailed personality testsExtroversion personality type accurately identified from nave listeners by voice aloneBut not emotional stabilitylisteners choose: resonant, warm, low-pitched voicesbut these dont correlate with actual emotional stability

Data and tasks for Emotion DetectionScripted speechActed emotions, often using 6 emotionsControls for words, focus on acoustic/prosodic differencesFeatures:F0/pitchEnergyspeaking rateSpontaneous speechMore natural, harder to controlDialogueKinds of emotion focused on:frustration, annoyance, certainty/uncertainty activation/hot spots24Four quick case studiesActed speech: LDCs EPSaTAnnoyance and Frustration in Natural speechAng et al on Annoyance and FrustrationNatural speech: AT&Ts How May I Help You?Uncertainty in Natural speech: Liscombe et als ITSPOKE

Example 1: Acted speech; emotional Prosody Speech and Transcripts Corpus (EPSaT)Recordings from LDC http://www.ldc.upenn.edu/Catalog/LDC2002S28.html8 actors read short dates and numbers in 15 emotional styles

Slide from Jackson Liscombe26 EPSaT Exampleshappysadangryconfidentfrustratedfriendlyinterested

Slide from Jackson Liscombeanxiousboredencouraging

27Detecting EPSaT EmotionsLiscombe et al 2003Ratings collected by Julia Hirschberg, Jennifer Venditti at Columbia University

28Liscombe et al. FeaturesAutomatic Acoustic-prosodic [Davitz, 1964] [Huttar, 1968]Global characterization pitchloudnessspeaking rate

Slide from Jackson Liscombe29Global Pitch Statistics

Slide from Jackson Liscombe

30Global Pitch Statistics

Slide from Jackson Liscombe

31Liscombe et al. FeaturesAutomatic Acoustic-prosodic [Davitz, 1964] [Huttar, 1968]ToBI Contours [Mozziconacci & Hermes, 1999]Spectral Tilt [Banse & Scherer, 1996] [Ang et al., 2002]

Slide from Jackson Liscombe32Liscombe et al. ExperimentRIPPER 90/10 splitBinary Classification for Each EmotionResults62% average baseline75% average accuracy Acoustic-prosodic features for activation/H-L%/ for negative; /L-L%/ for positiveSpectral tilt for valence?Slide from Jackson Liscombe33Example 2 - Ang 2002Ang Shriberg Stolcke 2002 Prosody-based automatic detection of annoyance and frustration in human-computer dialogProsody-Based detection of annoyance/ frustration in human computer dialogDARPA Communicator Project Travel Planning Data NIST June 2000 collection: 392 dialogs, 7515 uttsCMU 1/2001-8/2001 data: 205 dialogs, 5619 uttsCU 11/1999-6/2001 data: 240 dialogs, 8765 uttsConsiders contributions of prosody, language model, and speaking styleQuestionsHow frequent is annoyance and frustration in Communicator dialogs?How reliably can humans label it?How well can machines detect it?What prosodic or other features are useful?Slide from Shriberg, Ang, Stolcke34Data Annotation5 undergrads with different backgrounds (emotion should be judged by average Joe). Labeling jointly funded by SRI and ICSI.Each dialog labeled by 2+ people independently in 1st pass (July-Sept 2001), after calibration.2nd Consensus pass for all disagreements, by two of the same labelers (0ct-Nov 2001).Used customized Rochester Dialog Annotation Tool (DAT), produces SGML output.Slide from Shriberg, Ang, Stolcke35Data LabelingEmotion: neutral, annoyed, frustrated, tired/disappointed, amused/surprised, no-speech/NASpeaking style: hyperarticulation, perceived pausing between words or syllables, raised voiceRepeats and corrections: repeat/rephrase, repeat/rephrase with correction, correction onlyMiscellaneous useful events: self-talk, noise, non-native speaker, speaker switches, etc.Slide from Shriberg, Ang, Stolcke36Emotion SamplesNeutralJuly 30YesDisappointed/tiredNo

Amused/surprisedNoAnnoyedYesLate morning (HYP)

FrustratedYesNo

No, I am (HYP)There is no Manila...

Slide from Shriberg, Ang, Stolcke

1234567891037Emotion Class Distribution

Slide from Shriberg, Ang, Stolcke

To get enough data, we grouped annoyed and frustrated, versus else (with speech)38Prosodic ModelUsed CART-style decision trees as classifiersDownsampled to equal class priors (due to low rate of frustration, and to normalize across sites)Automatically extracted prosodic features based on recognizer word alignmentsUsed automatic feature-subset selection to avoid problem of greedy tree algorithmUsed 3/4 for train, 1/4th for test, no call overlapSlide from Shriberg, Ang, Stolcke39Prosodic FeaturesDuration and speaking rate featuresduration of phones, vowels, syllablesnormalized by phone/vowel means in training datanormalized by speaker (all utterances, first 5 only)speaking rate (vowels/time)Pause featuresduration and count of utterance-internal pauses at various threshold durationsratio of speech frames to total utt-internal frames

Slide from Shriberg, Ang, Stolcke40Prosodic Features (cont.)Pitch featuresF0-fitting approach developed at SRI (Snmez)LTM model of F0 estimates speakers F0 range

Many features to capture pitch range, contour shape & size, slopes, locations of interest Normalized using LTM parameters by speaker, using all utts in a call, or only first 5 utts

Slide from Shriberg, Ang, StolckeLog F0TimeF0LTMFitting41Features (cont.)Spectral tilt featuresaverage of 1st cepstral coefficient average slope of linear fit to magnitude spectrumdifference in log energies btw high and low bandsextracted from longest normalized vowel region

Other (nonprosodic) featuresposition of utterance in dialogwhether utterance is a repeat or correctionto check correlations: hand-coded style features including hyperarticulation

Slide from Shriberg, Ang, Stolcke42Language Model FeaturesTrain 3-gram LM on data from each classLM used word classes (AIRLINE, CITY, etc.) from SRI Communicator recognizerGiven a test utterance, chose class that has highest LM likelihood (assumes equal priors)In prosodic decision tree, use sign of the likelihood difference as input featureFiner-grained LM scores cause overtrainingSlide from Shriberg, Ang, Stolcke43Results: Human and MachineSlide from Shriberg, Ang, Stolcke

Baseline44Results (cont.)H-H labels agree 72%, complex decision taskinherent continuumspeaker differencesrelative vs. absolute judgements?H labels agree 84% with consensus (biased)Tree model agrees 76% with consensus-- better than original labelers with each otherProsodic model makes use of a dialog state feature, but without it its still better than H-HLanguage model features alone are not good predictors (dialog feature alone is better)Slide from Shriberg, Ang, Stolcke45Predictors of Annoyed/FrustratedProsodic: Pitch features:high maximum fitted F0 in longest normalized vowelhigh speaker-norm. (1st 5 utts) ratio of F0 rises/fallsmaximum F0 close to speakers estimated F0 toplineminimum fitted F0 late in utterance (no ? intonation)Prosodic: Duration and speaking rate featureslong maximum phone-normalized phone durationlong max phone- & speaker- norm.(1st 5 utts) vowellow syllable-rate (slower speech) Other:utterance is repeat, rephrase, explicit correctionutterance is after 5-7th in dialog

Slide from Shriberg, Ang, Stolcke46Effect of Class Definition

Slide from Shriberg, Ang, StolckeFor less ambiguous tokens, or more extreme tokensperformance is significantly better than baseline47Ang et al 02 ConclusionsEmotion labeling is a complex decision taskCases that labelers independently agree on are classified with high accuracy Extreme emotion (e.g. frustration) is classified even more accuratelyClassifiers rely heavily on prosodic features, particularly duration and stylized pitch Speaker normalizations helpTwo nonprosodic features are important: utterance position and repeat/correction

Language model is an imperfect surrogate feature for the underlying important feature repeat/correction

Slide from Shriberg, Ang, Stolcke48Example 3: How May I Help YouSM (HMIHY)Giuseppe Riccardi, Dilek Hakkani-Tr, AT&T LabsLiscombe, Riccardi, Hakkani-Tr (2004)Each turn in 20,000 turns (5690 dialogues) annotated for 7 emotions by one personPositive/neutral, somewhat frustrated, very frustrated, somewhat angry, very angry, somewhat other negative, very other negativeDistribution was so skewed (73.1% labeled positive/neutral)So classes were collapsed to negative/nonnegativeTask is hard! Subset of 627 turns labeled by 2 people: kappa .32 (full set) and .42 (reduced set)!Slide from Jackson Liscombe49User Emotion Distribution

Slide from Jackson Liscombe50Emotion LabelsOriginal Set:Positive/Neutral

Somewhat FrustratedVery FrustratedSomewhat AngryVery AngryOther Somewhat NegativeVery Negative Reduced Set:Positive

NegativeSlide from Jackson Liscombe51Slide from Jackson LiscombePersonTranscriptionEmotionPlayAgent:I'm sorry, I didn't understand that. How may I help you?User:please an operator please [ .hst ] may I speak-Positive/NeutralAgent:I'm sorry we're having trouble. Can we try one more time? Just briefly tell me how I may help you.User:may I speak to an assistant pleaseSomewhat FrustratedAgent:Are you calling from your home phone?User:yesSomewhat Angry20020221/0221080552atf1536Pitch/Energy/Speaking Rate : Change

52Pitch/Energy/Speaking Rate : Change

Slide from Jackson Liscombe20020221/0221080552atf153653Slide from Jackson LiscombePersonTranscriptionEmotionPlayAgent:Hello, I am an AT&T automated assistant. You can speak naturally to me. How may I help you ?User:I need to find out about a number that I don't recognizePositive/NeutralAgent:Would you like to look up a number you don't recognize on you bill?User:yes I wouldPositive/NeutralAgent:Are you calling from your home phone?User:yes I amPositive/Neutral20020317/0317220739atf9051Pitch/Energy/Speaking Rate : No Change

54Pitch/Energy/Speaking Rate : No Change

Slide from Jackson Liscombe20020317/0317220739atf905155HMIHY FeaturesAutomatic Acoustic-prosodicContextual [Cauldwell, 2000]Transcriptive[Schrder, 2003] [Brennan, 1995]Pragmatic [Ang et al., 2002] [Lee & Narayanan, 2005]Slide from Jackson Liscombe56Lexical FeaturesLanguage Model (ngrams)Examples of words significantly correlated with negative user state (p

CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment

Documents

Transcript of CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment