Question Identification on Twitter Baichuan Li, Xiance Si, Michael R. Lyu, Irwin King, and Edward Y....
-
Upload
amber-hunter -
Category
Documents
-
view
226 -
download
0
Transcript of Question Identification on Twitter Baichuan Li, Xiance Si, Michael R. Lyu, Irwin King, and Edward Y....
Question Identification on Twitter
Baichuan Li, Xiance Si, Michael R. Lyu, Irwin King, and Edward Y. Chang
04/21/23 1
Agenda
• Background• Two-phase Classification• Experiments• Conclusion
04/21/23 2
Background
04/21/23 3
04/21/23 4
Two Challenges
• 140 characters
• Special features
04/21/23 5
Two-phase Classification
• Interrogative Tweet Detection– Tweets which contain question sentences
• Qweet Extraction– Interrogative tweets which require some information
or help and thus need to be answered
Interrogative Tweet
DetectionTweets Qweet
ExtractionQweetsInterrogative
Tweets
04/21/23 6
Interrogative Tweet Detection
• Rule-based Approach– Question marks– 5W1H words and Refined 5W1H words – Heuristic Rules (Efron and Winget, 2010)
• Learning-based Approach– Frequent question patterns mining (Pei et al.,
2001) + One-class SVM (Schölkopf et al., 2001)– Over 850,000 QA pairs in community question
answering (CQA) portals were used
04/21/23 7
Qweet Extraction
• Types of Interrogative Tweets
04/21/23 8
Qweet Extraction
• Types of Interrogative Tweets
04/21/23 9
Qweet Extraction
• Types of Interrogative Tweets
04/21/23 10
Qweet Extraction
• Feature Extraction
04/21/23 11
Experiments
• Data Set
04/21/23 12
Results: Interrogative Tweet Detection
• Heuristics– H1: Must appear at the beginning of one sentence– H2: Add auxiliary words to the original 5W1H words
• “what” -> “what is” and “what are”
04/21/23 13
Results: Qweet Extraction
• Context features are of great importance in distinguishing qweets from non-qweets
• Tweet-specific features also help in qweet identification
04/21/23 14
Conclusion
• First Attempt in discovering questions from tweets automatically
• Two-phase classification – Interrogative Tweet Detection– Qweet Extraction
• Limitations and future work– Tweets containing rhetorical questions and
complicated self-ask-self-answer sentences– Real-time clustering (Ahmed et al., 2011)– Question analysis and classification
04/21/23 15
Thank You!
Q&A
04/21/23 16