Dan Jurafsky Lecture 4: Sarcasm, Alzheimers, +Distributional Semantics Computational Extraction of...

Click here to load reader

download Dan Jurafsky Lecture 4: Sarcasm, Alzheimers, +Distributional Semantics Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011.

of 69

Transcript of Dan Jurafsky Lecture 4: Sarcasm, Alzheimers, +Distributional Semantics Computational Extraction of...

  • Slide 1

Dan Jurafsky Lecture 4: Sarcasm, Alzheimers, +Distributional Semantics Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011 Slide 2 Slide 3 Why Sarcasm Detection Sarcasm causes problems in sentiment analysis One task: Summarization of Reviews Identify features (size/weight, zoom, battery life, pic quality) Identify sentiment and polarity of sentiment for each feature (great battery life, insufficient zoom, distortion close to boundaries) Average the sentiment for each feature Perfect size, fits great in your pocket + got to love this pocket size camera, you just need a porter to carry it for you = ?! slide adapted from Tsur, Davidov, Rappaport 2010 Slide 4 What is sarcasm Sarcasm is verbal irony that expresses negative and critical attitudes toward persons or events (Kreuz and Glucksberg, 1989) a form of irony that attacks a person or belief through harsh and bitter remarks that often mean the opposite of what they say The activity of saying or writing the opposite of what you mean in a way intended to make someone else feel stupid or show them that you are angr y. Slide 5 Examples from Tsur et al. Great for insomniacs. (book) Just read the book. (book/movie review) thank you Janet Jackson for yet another year of Super Bowl classic rock! Great idea, now try again with a real product development team. (e-reader) make sure to keep the purchase receipt (smart phone) slide adapted from Tsur, Davidov, Rappaport 2010 Slide 6 Examples Wow GPRS data speeds are blazing fast. @UserName That must suck. I can't express how much I love shopping on black Friday. @UserName that's what I love about Miami. Attention to detail in preserving historic landmarks of the past. [I] Love The Cover (book) Negative positive Overexaggeration dont judge a book by its cover Slide 7 Twitter #sarcasm issues Problems: (Davidov et al 2010) Used infrequently Used in non-sarcastic cases, e.g. to clarify a previous tweet (it was #Sarcasm) Used when sarcasm is otherwise ambiguous (prosody surrogate?) biased towards the most difficult cases GIMW11 argues that the non-sarcastic cases are easily filtered by only using ones with #sarcasm at the end Slide 8 Davidov, Tsur, Rappaport 2010 Data Twitter: 5.9M tweets, unconstrained context Amazon: 66k reviews, known product context Books (fiction, non fiction, children) Electronics (mp3 players, digital cameras, mobiles phones, GPS devices,) Mechanical Turk annotation K= 0.34 on Amazon, K = 0.41 on Twitter Slide 9 Amazon review of Kindle e-reader a slide adapted from Tsur, Davidov, Rappaport 2010 Slide 10 Twitter 140 characters Free text Lacking context Hashtags: #hashtag URL addreses References to other users: @user slide adapted from Tsur, Davidov, Rappaport 2010 Slide 11 Star Sentiment Baseline (Amazon) Saying or writing the opposite of what you mean... Identify unhappy reviewers (1-2 stars) Identify extremely-positive sentiment words (Best, exciting, top, great, ) Classify these sentences as sarcastic slide adapted from Tsur, Davidov, Rappaport 2010 Slide 12 SASI: Semi-supervised Algorithm for Sarcasm Identification Small seed of sarcastic-tagged sentences. Tags 1,...,5: 1: not sarcastic at all 5: clearly sarcastic slide adapted from Tsur, Davidov, Rappaport 2010 Slide 13 SASI: Extract features from all training sentences. Represent training sentences in a feature vector space. Features: Pattern based features Punctuation based features Given a new sentence: use weighted-kNN to classify it. Majority vote (over k>0) slide adapted from Tsur, Davidov, Rappaport 2010 Slide 14 Hashtag classifier #sarcasm hashtag Not very common Use this tag as a label for supervised learning. slide adapted from Tsur, Davidov, Rappaport 2010 Slide 15 Preprocessing [author],[title], [product], [company] [url], [usr], [hashtag] Silly me, the Kindle and the Sony eBook cant read these protected formats. Great! Silly me, the Kindle and the [company] [product] cant read these protected formats. Great! slide adapted from Tsur, Davidov, Rappaport 2010 Slide 16 Pattern based features Davidov & Rappoport 2006, 2008 High Frequency Words (>0.0001) Content Words (