9th December, 2010 1 Identifying Emotion Holder and Topic from Bengali Emotional Sentences Dipankar...

45
9th December, 2010 1 Identifying Emotion Holder and Topic from Bengali Emotional Sentences Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON 2010

Transcript of 9th December, 2010 1 Identifying Emotion Holder and Topic from Bengali Emotional Sentences Dipankar...

  • 9th December, 2010*Identifying Emotion Holder and Topic from Bengali Emotional Sentences

    Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & EngineeringJadavpur University, Kolkata-700032, India ICON 2010

    ICON 2010

  • *OutlineIntroductionResourceBaseline SystemSyntactic SystemError AnalysisResultsConclusion9th December, 2010ICON 2010

    ICON 2010

  • 9th December, 2010* Introduction (1/7) Opinion Mining and Sentiment Analyses have been attempted with more focused perspectives rather than fine-grained Emotions

    ....... (Quan and Ren, 2009)

    ICON 2010

    ICON 2010

  • 9th December, 2010* Introduction (2/7)

    Emotion is an aspect of a person's mental state of being, normally based in or tied to the persons internal (physical) and external (social) sensory feeling (Zhang et al., 2008)

    Emotions, of course, are not linguistic things. However the most convenient access that we have to them is through the language .(Strapparava and Valitutti, 2004)

    ICON 2010

    ICON 2010

  • Introduction (3/7)Natural language text contains emotions of a reader or writer with respect to some subject, event or topic

    (Rashed) (apnar) (kabitata) (pore) (khub) (sontushto) (hoyechilo) Rashed became very pleased by reading your poem

    9th December, 2010ICON 2010*

    ICON 2010

  • Introduction (4/7)Natural language text contains emotions of a reader or writer with respect to some subject, event or topic

    (Rashed) (apnar) (kabitata) (pore) (khub) (sontushto) (hoyechilo) Rashed became very pleased by reading your poem

    Emotional Expression- Subjective Evaluative Expression (word / phrase) (Wiebe et al., 2005) - Ekmans (1993) six universal basic emotions (happiness, sadness, anger, disgust, fear and surprise)

    9th December, 2010ICON 2010*

    ICON 2010

  • Introduction (5/7)Natural language text contains emotions of a reader or writer with respect to some subject, event or topic (Rashed) (apnar) (kabitata) (pore) (khub) (sontushto) (hoyechilo) Rashed became very pleased by reading your poem

    Emotion Holder

    - Person / Organization that expresses Emotion (Wiebe et. al. 2005)- In case of blogs /reviews, Writer / Author of the post - Nested source (Wiebe et al., 2005)9th December, 2010ICON 2010*

    ICON 2010

  • Introduction (6/7)Natural language text contains emotions of a reader or writer with respect to some subject, event or topic

    (Rashed) (apnar) (kabitata) (pore) (khub) (sontushto) (hoyechilo) Rashed became very pleased by reading your poem

    Emotion Topic- Primary subject of the Emotion as intended by its Holder (Stoyanov and Cardie, 2008)- Real world object / event / abstract entity

    9th December, 2010ICON 2010*

    ICON 2010

  • 9th December, 2010*Introduction (7/7) Analyses require some basic resource

    An emotion-annotated corpus is one of the primary ones to start with

    Non-native English speakers support the growing use of Internet (http://www.internetworldstats.com/stats.htm) Raises the demand of linguistic resources for languages other than English

    Bengali is the sixth popular language in the World (http://www.ethnologue.com/ethno_docs/distribution.asp?by=size) Second in India and the national language in Bangladesh

    Bengali is less computerized and resource constrained language

    Manual preparation of an Emotion annotated Corpus in BengaliICON 2010

    ICON 2010

  • *OutlineIntroductionResourceBaseline SystemSyntactic SystemError AnalysisResultsConclusion9th December, 2010ICON 2010

    ICON 2010

  • *ResourceEmotion CorpusRandom collection of 123 blog posts from Bengali web blog archive (www.amarblog.com)

    Total 12,149 sentences (comics, politics, sports and short stories)

    Three Annotators

    Open source graphical tool (http://gate.ac.uk/gate/doc/releases.html)

    Items for Annotation- Emotional Expression, Emotion Holder, Emotion Topic- Sentential Emotion of Ekmans (1993) six classes (happiness, sadness, anger, disgust, fear and surprise)- Sentential Intensity (low, general and high)

    9th December, 2010ICON 2010

    ICON 2010

  • *Resource Snapshot during Annotation (1)9th December, 2010ICON 2010

    ICON 2010

  • *Resource Snapshot after Annotation (2)9th December, 2010ICON 2010

    ICON 2010

  • *ResourceAgreement Emotion Holder - Cohens kappa () (Cohen, 1960) [0.75 ~ 0.81 w.r.t. all emotion classes]- Inter Annotator Agreement (IAA ) [0.73 ~ 0.82 w.r.t. all emotion classes]IAA = X Y / X U Y X and Y are two sets of emotion holders selected by two annotators- Highly moderate for single emotion holder - Less for multiple emotion holdersEmotion Topic- Measure of Agreement on Set-valued Items (MASI) (Passonneau, 2006) [0.75 ~ 0.82 w.r.t. all emotion classes]- agr metric (Wiebe et al., 2005) [0.73 ~ 0.83 w.r.t. all emotion classes] - High in sentences containing single emotion topic- Less selecting boundaries / textual spans- Less in selecting emotion topic from other relevant topics9th December, 2010ICON 2010

    ICON 2010

  • 9th December, 2010*More Details..

    D. Das and S. Bandyopadhyay. 2010. Labeling Emotion in Bengali Blog Corpus A Fine Grained Tagging at Sentence Level. In the 8th Workshop on Asian Language Resources (ALR8), 23rd International Conference on Computational Linguistics (COLING 2010), pp. 47-55, August 21-22, Beijing, China ICON 2010

    ICON 2010

  • *OutlineIntroductionResourceBaseline SystemSyntactic SystemError AnalysisResultsConclusion9th December, 2010ICON 2010

    ICON 2010

  • 9th December, 2010*Baseline SystemConsider only 1100 sentences of the emotion corpus (600 development and 500 test sentences)

    Pass the emotional blog sentences through an open source Bengali shallow parser (http://ltrc.iiit.ac.in/showfile.php?filename=downloads/shallow_parser.php)

    Grouped Lexical Similarity Patterns based on part-of-speech (POS)

    Consider Subject -Verb based Lexical Similarity Pattern

    ICON 2010

    ICON 2010

  • 9th December, 2010*Baseline System-holder

    Subject (Start)- Starting words tagged as Named Entities (NEs) (e.g. NNP, NNPC, NNC , NN , PRP)

    Verb (Stop)- By reaching at verb POS (e.g. VBZ, VM)

    Bengali is a free word order language- Append words (if any) between Subject (Start) and Verb (Stop) into Common_Portion - Append words (if any) after the Verb (Stop) into Common_Portion

    Consider POS level Lexical Similarity Pattern - [ ]ICON 2010

    ICON 2010

  • 9th December, 2010*Baseline System-holderPreliminary investigation on 600 development sentencesHints in bloggers comments- Each comment starts with an username Subject emotion holder- No emotion holder writer (default)Identified holders are not referred to their corresponding emotional expressions

    Assign a chunk as an emotional expression if the chunk contains at least one emotion word (e.g. koutuk comic) that is searched against Bengali WordNet Affect lists (Das and Bandyopadhyay, 2010b) ((JJP JJ ))

    Words present in immediate neighboring chunk of an emotional expression containing Subject Candidate emotion holder

    ICON 2010

    ICON 2010

  • 9th December, 2010*Baseline System-topicRemove emotional expression and emotion holder from sentence

    Rest textual span contains one or more potential emotion topics

    - Words with POS tags of NNP or NNC or NN in Common_Portion- Words in immediate neighboring chunks of Verb or emotional expressions

    Emotion holders and topic spans identified at chunk level Entities are of full word strings / phrases (No question of partial match or head match (Lu, 2010)) ICON 2010

    ICON 2010

  • 9th December, 2010*Baseline System-observation & results Lexical Similarity Patterns exist mostly in- Simple active sentences containing single emotion holder and/or topic- Sentences containing single word emotion topics rather than multiword

    Problem in Complex or Compound sentences containing nested emotion holders and/or multiple emotion topics

    Average F-scores of 53.85% and 50.02% for emotion holder and emotion topic on 500 gold standard test sentences

    Turn our focus towards developing Syntactic System - Handle passive sentences- Multiple emotion holders and potential emotion topics

    ICON 2010

    ICON 2010

  • *OutlineIntroductionEmotion CorpusBaseline SystemSyntactic SystemError AnalysisResultsConclusion9th December, 2010ICON 2010

    ICON 2010

  • 9th December, 2010*Syntactic System (1/7) Capture Thematic Roles for emotion holder and emotion topic

    Identify Thematic Roles from Syntactic Argument Structures or Subcategorization Frames

    Subcategorization Frame - Is a statement of what types of syntactic arguments a verb (or adjective) takes, such as objects, infinitives, that-clauses, participial clauses and subcategorized prepositional phrases (Manning, 1993)

    ICON 2010

    ICON 2010

  • 9th December, 2010ICON 2010*Syntactic System (2/7)

    Subcategorization Lexicon - English VerbNet (Kipper-Schuler, 2005)Associates the semantics of a verb with its syntactic frames and combines traditional lexical semantic information such as thematic roles and semantic predicates with selectional restrictions

    - Bengali Verb Subcategorization Lexicon is being developed .Banerjee S., D.Das and S.Bandyopadhyay. 2009. Bengali Verb Subcategorization Frame Acquisition - A Baseline Model. Asian Language Resource Workshop - ALR7, Association of Computational Linguistics (ACL) and International Joint Conference on Natural Language Processing (IJCNLP), pp. 76-83, Suntec, Singapore

    S. Banerjee, D.Das and S.Bandyopadhyay. 2010. Classification of Verbs Towards Developing a Bengali Verb Subcategorization Lexicon. In the proceedings of 5th Global WordNet Conference (GWC-2010), pp. 76-83, Mumbai

    ICON 2010

  • 9th December, 2010*Syntactic System (3/7)

    Identify the Bengali verb

    Acquire chunk level Syntactic Argument Structure with respect to the Bengali verb from shallow parsed sentences- A rule based phrasal-head extraction module - head parts of the phrases from Syntactic Argument Structure

    Identify equivalent English verbs

    Extract Subcategorization Frames for the equivalent English verbs using VerbNet Initial set of valid Subcategorization Frames for that Bengali verb

    ICON 2010

    ICON 2010

  • 9th December, 2010*Syntactic System (4/7)Match the acquired Syntactic Argument Structure with all of the extracted Subcategorization Frames

    If any match occurs Thematic Roles of English Subcategorization Frames are mapped on the equivalent slot of the Bengali Syntactic Argument Structure

    Thematic Roles of VerbNet- Emotion holder (Experiencer, Agent, Actor, Beneficiary [passive]) - Emotion topic (Topic, Theme, Event )

    ICON 2010

    ICON 2010

  • 9th December, 2010*Syntactic System (5/7)

    Example: Emotion Holder: < writer, Ram>, Emotion Topic:< Sita> (Ram) (Sitake) (bhalobase) Ram loves Sita.Bengali Verb: bhalobasa loveBengali Syntactic Argument Structure: [NNP NNP-ke VM]English VerbNet Subcategorization Frame: [

  • 9th December, 2010*Syntactic System (6/7) Example: Emotion Holder: < writer, , Rashed, Ram> Emotion Topic:< sukh >

    (Rashed) (anubhob) (korechilo) (je) (Ramer) (sukh) (antohin) Rashed felt that Rams pleasure is endless. Bengali verb: anubhab kara feel Bengali Acquired Argument Structure: [NNP VM DET-je S] English Extracted VerbNet Frame Syntax: [

  • 9th December, 2010*Syntactic System (7/7)

    Acquire Syntactic Argument Structure for the Bengali verb anubhab kara feel

    Syntactic Argument Structure contains sentential complement S started by je with DET type POS

    One of the extracted VerbNet Frames also contains that type sentential complement for the equivalent English verb feel

    As the acquired Syntactic Argument Structure matches with the extracted VerbNet Frame syntax- Emotion holder related Thematic Role (e.g. Experiencer) associated with the VerbNet Frame is mapped to the equivalent phrases ( Rashed) of the acquired Syntactic Argument Structure

    ICON 2010

    ICON 2010

  • *OutlineIntroductionEmotion CorpusBaseline SystemSyntactic SystemError AnalysisResultsConclusion9th December, 2010ICON 2010

    ICON 2010

  • 9th December, 2010*Error Analysis Case 1. Appositive UseExplicit constraints identify single prominent emotion holder directly involved with the emotional expressionImplicit constraints identify all direct and indirect nested sources as emotion holders

    Implicit emotion holder Ram in case of Rams pleasure

    ICON 2010

    ICON 2010

  • 9th December, 2010*Error Analysis Case 2. Co-reference with Emotional Expression

    Need only the emotion holders and topics that are referred by their corresponding emotional expressions

    Solution (Baseline System)- Extract words from shallow chunks that contain at least one emotion word matched in Bengali WordNet Affect lists (Das and Bandyopadhyay, 2010) - Words present in immediate neighboring chunk of an emotional expression containing Subject Candidate emotion holder

    NOTICON 2010

    ICON 2010

  • 9th December, 2010*Error Analysis Case 3. Multiple Holders and Topics (1)

    Complex or compound sentences contain more than one clause Each clause may contain emotional expression(s), associated holder(s) and topic(s)

    Multiple emotional expressions Multiple emotion holders and emotion topics associated with each of the emotional expressionsCommon Rhetoric SimilarityICON 2010

    ICON 2010

  • 9th December, 2010*Error AnalysisCase 3. Multiple Holders and Topics (2)Solution using the help of Sentential Rhetorical Structure (Mann and Thompson, 1988)

    {I enjoyed the summer vacation} [because I had a golden chance to play cricket in that period]

    Basic Rhetorical constituents, locus, {nucleus} and [satellite]Primary goal of the writer, termed as {nucleus}Other part that provides supplementary material, termed as [satellite]locus, the main effective part of {nucleus} or [satellite] ICON 2010

    ICON 2010

  • 9th December, 2010*Error Analysis Case 3. Multiple Holders and Topics (3)Assumption - locus occurs as emotional expression (word/phrase)- Word found in Bengali WordNet Affect Lists (Das and Bandyopadhyay, 2010) is referenced as locus - locus presents in nucleus or satellite

    Identification of nucleus and satellite Clues are useful if explicitly specified in text

    - Punctuation markers (,) (!) (?) - Frequently used discourse markers or causal keywords ( jehetu as, karon because, mane means) - Causal verbs ( ghotay caused) from a manually prepared seed list ICON 2010

    ICON 2010

  • 9th December, 2010*

    Error AnalysisCase 3. Multiple Holders and Topics (4)

    The topic of an opinion depends on the context in which its associated opinion expression occurs (Stoyanov and Cardie, 2008)

    Any word of emotional expression co-occurs with any word of the nucleus or satellite in the same chunk Common Rhetoric SimilarityOtherwise, Distinctive Rhetoric Similarity

    The chunks identified by the syntactic system as emotion holder or emotion topic and tagged as Common Rhetoric Similarity are only considered

    Identify all possible emotion holders and emotion topics from all clausesICON 2010

    ICON 2010

  • 9th December, 2010*Error AnalysisCase 4. Overlapping Topic Spans

    Topics may consist of multi word strings Problem of identifying emotion topic span from other non-emotional topic spansSolution- The chunks identified by the syntactic system as emotion topic and tagged as Common Rhetoric Similarity

    ICON 2010

    ICON 2010

  • 9th December, 2010*Error Analysis Case 5. Anaphoric Presence of Emotion Holders

    A special default phrasal pattern in blogs for emotion holders [ ] (Gedu ChaCha bole) (Rashed bolechen) (Sayan bolechhe)Solution- If a pronoun is present with an emotional expression in a chunk, consider its preceding Named Entity (NE)

    ICON 2010

    ICON 2010

  • *OutlineIntroductionEmotion CorpusBaseline SystemSyntactic SystemError AnalysisResultsConclusion

    9th December, 2010ICON 2010

    ICON 2010

  • 9th December, 2010*Results (1/2)ICON 2010Table 1: F-scores (in %) of the syntactic system on development set

    CasesSingle HolderMultipleHolderSingleTopicMultiple TopicBefore Error Analysis57.8353.7654.6253.96Case 160.7856.1154.6253.96Case 1+Case 262.4459.1257.9055.25Case 1+Case 2+Case 366.8262.5361.4857.96Case 1+Case 2+Case 3+Case 466.8262.5365.7760.58Case 1+Case 2+Case 3+Case 4+Case 569.1666.2465.7760.58

    ICON 2010

  • 9th December, 2010*Results (2/2)ICON 2010Table 2: F-scores (in %) of the Baseline and Syntactic systems on the test set

    SystemsBaselineSyntacticCategoriesSingleMultipleSingleMultipleHolder55.6752.1768.0364.21Topic52.9749.0164.0459.34

    ICON 2010

  • *OutlineIntroductionEmotion CorpusBaseline SystemSyntactic SystemError AnalysisResultsConclusion

    9th December, 2010ICON 2010

    ICON 2010

  • 9th December, 2010*Conclusion and Future TasksIncorporation of- Dependency Parsing - Co-reference Techniques- Supervised Techniques

    Identification of - Metaphors- Discourse in the natural language texts

    Tracking emotion from the perspectives of single as well as multiple users on basis of time and topicICON 2010

    ICON 2010

  • 9th December, 2010*References

    Das D. and S. Bandyopadhyay. 2010b. Developing Bengali WordNet Affect for Analyzing Emotion. In the proceedings of the 23rd International Conference on the Computer Processing of Oriental Languages (ICCPOL-2010), pp. 35-40, California, USA.Das D. and S. Bandyopadhyay. 2010c. Labeling Emotion in Bengali Blog Corpus A Fine Grained Tagging at Sentence Level. In the 8th Worshop on Asian Language Resources (ALR8), COLING-2010, pp. 47-55, August 21-22, Beijing, China.Ekman, P. 1993. Facial expression and emotion. American Psychologist, 48(4), pp.384392Hu, J., C. Guan, M. Wang, and F. Lin. 2006. Model of Emotional Holder. In Shi, Z.-Z., Sadananda, R. (eds.) PRIMA 2006. LNCS (LNAI), vol. 4088, pp. 534539. Springer, HeidelbergKipper-Schuler, K. 2005. VerbNet: A broad-coverage, comprehensive verb lexicon. Ph.D. thesis, Computer and Information Science Dept., University of Pennsylvania, Philadelphia, PA. Lu Bin, 2010. Identifying Opinion Holders and Targets with Dependency Parser in Chinese News Texts. Student Research Workshop, NAACL-HLT, pp. 46-51, California.Mann, W. C. and S. Thompson. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization, TEXT vol. 8, pp.243281Manning, C.D.1993. Automatic Acquisition of a Large Subcategorization Dictionary from Corpora. In 31st Meeting of the ACL, Columbus, Ohio, pp. 235242. Stoyanov, V. and C. Cardie. 2008a. Annotating topics of opinions, LREC.Stoyanov V. and C. Cardie. 2008b. Topic Identification for Fine-Grained Opinion Analysis, Coling 2008, pp. 817824.ICON 2010

    ICON 2010

  • 9th December, 2010*

    Thank You, ,

    ICON 2010

    ICON 2010

    ************