Question-Answering on Yahoo!Answers: Preliminary Results Rong Tang Sheila Denn OCLC/ALISE LIS...
-
Upload
clifton-bell -
Category
Documents
-
view
215 -
download
1
Transcript of Question-Answering on Yahoo!Answers: Preliminary Results Rong Tang Sheila Denn OCLC/ALISE LIS...
Question-Answering on Question-Answering on Yahoo!Answers: Yahoo!Answers:
Preliminary ResultsPreliminary ResultsRong TangRong Tang
Sheila DennSheila DennOCLC/ALISE LIS Research Grant PresentationOCLC/ALISE LIS Research Grant Presentation
ALISE 2009ALISE 2009January 23, 2009January 23, 2009
BackgroundBackgroundYahoo!AnswersYahoo!Answers
Social Q&ASocial Q&A
25+ pre-defined categories25+ pre-defined categories
Users post questions, answer questions, Users post questions, answer questions, rate answers, provide commentsrate answers, provide comments
One best answer chosen by the asker or One best answer chosen by the asker or through votethrough vote
Users may provide commentsUsers may provide comments
Our Research Our Research ProjectProject
Funded by OCLC/ALISE Grant Program and Funded by OCLC/ALISE Grant Program and Simmons College President’s Fund for Simmons College President’s Fund for ResearchResearch
Project Staff:Project Staff:Rong Tang (PI)Rong Tang (PI)Sheila Denn (Co-PI)Sheila Denn (Co-PI)Sam Kalat (technology consultant, programmer)Sam Kalat (technology consultant, programmer)Laura Saunders (Research Assistant)Laura Saunders (Research Assistant)
The The project wiki page documents the relevant documents the relevant literature and project progression, with literature and project progression, with extensive meeting notes on coding decisionsextensive meeting notes on coding decisions
Research Research QuestionsQuestions
Are existing question taxonomies (such as Are existing question taxonomies (such as those in Graesser et al. (1994) and Freed those in Graesser et al. (1994) and Freed (1994)) valid in a social Q&A environment?(1994)) valid in a social Q&A environment?
What are the relationships between the What are the relationships between the linguistic characteristics, functional properties, linguistic characteristics, functional properties, and subject content of the questions and the and subject content of the questions and the kinds of responses that they receive?kinds of responses that they receive?
What are the characteristics of answers that are What are the characteristics of answers that are chosen as “best” answers?chosen as “best” answers?
What is the role of the social function vs. the What is the role of the social function vs. the information function in social Q&A?information function in social Q&A?
What are the implications of the above for What are the implications of the above for provision of library and information services?provision of library and information services?
Previous Previous ResearchResearch
Question classificationQuestion classificationWh- questions (Robinson & Rackstraw, 1972)Wh- questions (Robinson & Rackstraw, 1972)Conceptual question categories (Lehnert, 1978)Conceptual question categories (Lehnert, 1978)Content-based question categories (Graesser, et al., Content-based question categories (Graesser, et al., 1994)1994)Reference question classification (Pomerantz, 2005)Reference question classification (Pomerantz, 2005)Questions in Dynamic Semantics (Aloni, Butler, & Questions in Dynamic Semantics (Aloni, Butler, & Dekker, 2007)Dekker, 2007)
Answer classificationAnswer classificationMuch less research here than with question Much less research here than with question classificationclassification
Answer selection rules (Lehnert, 1978)Answer selection rules (Lehnert, 1978)
Criteria based on Yahoo!Answers comments (Kim et al., Criteria based on Yahoo!Answers comments (Kim et al., 2007)2007)
Previous Previous Research Research (cont.)(cont.)
Formal studies of Online Q&AFormal studies of Online Q&AAnswerers: “specialists” vs. “synthesists” Answerers: “specialists” vs. “synthesists” (Gazan, 2006)(Gazan, 2006)
Questioners: “seekers” vs. “sloths” (Gazan, Questioners: “seekers” vs. “sloths” (Gazan, 2007)2007)
Question purpose (Graesser, et al., 1994)Question purpose (Graesser, et al., 1994)Filling knowledge gapsFilling knowledge gaps
Establishing and monitoring common groundEstablishing and monitoring common ground
Coordinating social actionCoordinating social action
Directing the conversation and controlling Directing the conversation and controlling attention attention
Research PlanResearch PlanData collection and samplingData collection and sampling
Gathered a stratified random sample of Gathered a stratified random sample of 3,000 question-answer sets, including 3,000 question-answer sets, including any commentsany commentsStratified by 25 top-level categories Stratified by 25 top-level categories assigned by Yahoo!Answersassigned by Yahoo!Answers
Data codingData codingContent analysis at multiple levelsContent analysis at multiple levels
SyntacticSyntacticSemanticSemanticPragmaticPragmatic
Research Plan Research Plan (cont.)(cont.)
Data AnalysisData AnalysisDescriptive statistics will be produced for:Descriptive statistics will be produced for:
Frequency of answers provided per questionFrequency of answers provided per questionAverage length of time to first answerAverage length of time to first answerDistribution of subject categories Distribution of subject categories Distribution of question and answer typesDistribution of question and answer typesDistribution of chosen answer typesDistribution of chosen answer types
Correlation analysis will be performed for:Correlation analysis will be performed for:Linguistic characteristics of questions and Linguistic characteristics of questions and answersanswersFunctional categories of questions and answersFunctional categories of questions and answersSubject categories of questions and answersSubject categories of questions and answers
Progress to DateProgress to DateSample has been collectedSample has been collected
Preliminary coding has begunPreliminary coding has begunSyntactic coding of questions is completeSyntactic coding of questions is complete
Wh- questionsWh- questionsInversion questionsInversion questionsOther questionsOther questionsMultiparts Multiparts Double codingDouble coding
Syntactic coding of question descriptions Syntactic coding of question descriptions is completeis complete
Number of questions included in description Number of questions included in description texttextType of questionsType of questions
Data CodingData CodingTwo coders perform coding individually then go Two coders perform coding individually then go over the coding to reach consensus on final over the coding to reach consensus on final coding of each question coding of each question
Use of informal language presents a challenge for Use of informal language presents a challenge for codingcoding
Is it a question if it doesn’t include a question mark? Is it Is it a question if it doesn’t include a question mark? Is it a question simply because it has a question mark in the a question simply because it has a question mark in the end?end?Should “WTF” be coded a “what” question or other Should “WTF” be coded a “what” question or other question? Or not at all?question? Or not at all?Coding multiparts of a question, eg., “Why do husbands Coding multiparts of a question, eg., “Why do husbands feel they have to lie to other women about being feel they have to lie to other women about being married, and when the other woman finds out?”married, and when the other woman finds out?”Double coding questions such as "Is there anywhere you Double coding questions such as "Is there anywhere you can listen to citizen band radio online?" can listen to citizen band radio online?"
Number of Answers Number of Answers Per Question Per Question
Average Number of Answers per Question by Category
8.2
7.86
7.14
6.98
6.92
6.72
6.46
6.37
6.28
6.18
6.08
5.79
5.51
4.78
3.84
3.76
3.68
3.68
3.65
3.61
3.28
3.15
2.98
2.89
2.63
0 1 2 3 4 5 6 7 8 9
pregancyparentingdiningout
politicsgovbeautystyle
socialscience
environmentfamilyrelationships
pets
societyculturefooddrink
newsevents
sportsentertainmentmusic
artshumanities
healtheducationreference
homegarden
travelgamrecreation
carstransportation
consumerelectronicsciencemath
computerinternet
businessfinancelocalbusiness
Length to Receive Length to Receive 11st st Answer Answer
Average length (min.) to receive first answer
10.8
41.78
59.86
74.83
87.4
90.52
157.47
163.28
163.67
171.22
182.07
197.31
277.68
286.07
302.37
326.75
346.44
370.91
402.5
411
463
485.2
635
660.77
1635.04
0 200 400 600 800 1000 1200 1400 1600 1800
familyrelationships
pregancyparenting
fooddrink
beautystyle
socialscience
homegarden
sciencemath
health
newsevents
societyculture
artshumanities
environment
sports
pets
politicsgov
carstransportation
diningout
computerinternet
educationreference
travel
consumerelectronic
gamrecreation
businessfinance
entertainmentmusic
localbusiness
Wh-question Wh-question frequencyfrequency
““What” QuestionsWhat” Questions
Average Number of What Questions By Category
0
10
20
30
40
50
Wh-question Wh-question frequencyfrequency““Why” QuestionsWhy” Questions
Average Number of Why Questions by Category
0
2
4
6
8
10
12
14
16
18
homega
rden
localb
usine
ss
beau
tystyl
e
cons
umer
electr
onic
dinin
gout
enter
tainm
entm
usic
food
drink
trave
l
busin
essfi
nance
educ
ationre
fere
nce
healt
h
com
pute
rinte
rnet
gamre
crea
tion
cars
trans
porta
tion
preg
ancy
pare
nting
artsh
uman
ities
fam
ilyre
lationsh
ips
pets
scien
cem
ath
spor
ts
envir
onm
ent
politi
csgo
v
newse
vents
socia
lscien
ce
socie
tycu
lture
Wh-question Wh-question frequencyfrequency““How” QuestionsHow” Questions
Average Number of How Questions by Category
0
5
10
15
20
25
30
35
40
Wh-question Wh-question frequencyfrequency
““Inversion” QuestionsInversion” Questions
Average Number of Inversion Question By Category
0
10
20
30
40
50
60
Next StepsNext StepsStart semantic and pragmatic analysis Start semantic and pragmatic analysis of questionsof questions
Start answer analysisStart answer analysis
Start comment codingStart comment coding
Explore the association and features of Explore the association and features of Q and A and CQ and A and C
Develop a conceptual and analytical Develop a conceptual and analytical model for social Q&Amodel for social Q&A