Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : ...
description
Transcript of Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : ...
![Page 1: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/1.jpg)
Date: 2013/12/04Author: Gideon Dror, Yoelle Maarek, Avihai Mejer, Idan SzpektorSource: WWW’13Advisor: Jia-ling KohSpeaker: Chen-Yu Huang
From Query to Question in One Click:Suggesting Synthetic
Questions to Searchers
![Page 2: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/2.jpg)
2
Outline• Introduction•Query-to-question recommendation approach• Template extraction• Generating question candidates• Question feature representation• Learning-to-rank model• Diversifying suggestions
•Experiment•Conclusion
![Page 3: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/3.jpg)
3
• In Web search, user may remain unsatisfied for several reasons:• The search engine may not be effective enough • Query might not reflect user’s intent
•A common solution is for users to turn to Community Question Answering(CQA) sites, and ask for other users to generate the missing content.
Introduction
![Page 4: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/4.jpg)
4
•Propose to generate synthetic questions that were automatically generate from the searcher’s query.•Searchers can directly post a question in one click on CQA service.• EX: Italy rent villa
Introduction
![Page 5: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/5.jpg)
5
•Specific requirements on the quality of the generated questions• The displayed questions have to be grammatically correct and natural enough to be posted• The list of displayed questions do not include near-duplicates• The list of displayed questions should be as diverse as possible
Introduction
![Page 6: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/6.jpg)
6
Outline• Introduction•Query-to-question recommendation approach• Template extraction• Generating question candidates• Question feature representation• Learning-to-rank model• Diversifying suggestions
•Experiment•Conclusion
![Page 7: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/7.jpg)
7
Query-to question recommendation approach
• Template extraction• Generating question
candidates• Question feature representation
• Learning-to-rank model• Diversifying suggestions
![Page 8: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/8.jpg)
8
• the approach of Zhao et al.
Query-to question recommendation approach
![Page 9: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/9.jpg)
9
•Follow the approach of Zhao et al. and apply the template extraction algorithm on a query/question dataset
•Built a dataset consisting of 28 million pairs of (query, question), in which the question originated from Yahoo! Answers, keeping only the title as the question’s text( English queries consisting of 3-5 terms )
Template extraction
![Page 10: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/10.jpg)
10
•To ensure the quality of the extracted templates, use a simple heuristic, and kept only questions that start with one word that belong to a manually-defined white list(what, who…should, can, is…..)
•Templates are extracted by substituting the query terms found in the question
Template extraction
![Page 11: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/11.jpg)
11
•Encode the query term positions part of the variable name• Is T1 taller than T2?• Is T2 taller than T1?
•Only templates are associated with at least 10 different queries
•40,000 template
Template extraction
![Page 12: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/12.jpg)
12
•Use the approach of Zhao et al. in order to select relevant templates•Given a new query q, we fetch all similar queries to produce a set {qi}. For each qi, we fetch the templates tj•Specifically, two queries are considered similar if they count the same number of terms and share at least on term in the same query positions• EX:
• fix old car• buy old records• old car repair
Generating question candidates
![Page 13: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/13.jpg)
13
•Baseline Features
a. Likelihood score for the query to instantiate the template behind the general question
b. A trigram language model score for the question
•Combine two feature
Question feature representation
![Page 14: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/14.jpg)
14
•Reranking Features
a. Question POS language modelsCompute the five-gram language model scores for the questions POS tags and for the coarse-POS tags
Should(NNP) I(PRP) fix(VB) my(PRP$) old(JJ) car(NN)
b. Query POS sequencesGenerate binary features representing all the position dependent unigram, bi-gram and tri-gram subsequences with the query tag sequence
Question feature representation
![Page 15: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/15.jpg)
15
•Reranking Features
c. Dependency relations
d. Parse tree scoreProvide by the parser to the generated parse tree
e. Template word order
Question feature representation
![Page 16: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/16.jpg)
16
•Use a linear model for scoring each question vector in the candidate pool
• μ :the model weight vector, be trained using the averaged variant of the Passive-Aggressive online learning algorithm• : feature vector of question Q
•For training and evaluating ranking model, use a subset of query/question pairs dataset
Learning-to-rank model
![Page 17: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/17.jpg)
17
•As queries are typically underspecified, they can easily represent different intents.• In general, we found that the top-5 recommended questions represent on average only 2-3 different intents.• Integrate diversification into algorithm as a post-process filtering step.
Diversifying suggestions
![Page 18: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/18.jpg)
18
•First-word filter : Impose a different question form, represented by a different interrogative word.• Start with a different word
•Edit-distance filter: eliminate redundancy• Two questions are to be redundant if the edit distance between them is lower than a threshold k (k=3)
Diversifying suggestions
![Page 19: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/19.jpg)
19
Outline• Introduction•Query-to-question recommendation approach• Template extraction• Generating question candidates• Question feature representation• Learning-to-rank model• Diversifying suggestions
•Experiment•Conclusion
![Page 20: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/20.jpg)
20
•Experimental setup• Feature in the scoring function : λ = 0.2• Top-k question candidates : k=100• Ranking model training procedure : T = 3
Ranking Evaluation
![Page 21: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/21.jpg)
21
•Manual Evaluation• Randomly sampled 1,000 queries• For each query collected the top three question suggestions• Two metrics : relevance and grammar
Ranking Evaluation
![Page 22: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/22.jpg)
22
•Automatic evaluation• 147,000 queries
Ranking Evaluation
![Page 23: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/23.jpg)
23
•Manual Evaluation•Most incorrect questions are due to grammatical mistakes• Rare cases of relevance mistakes are more subtle and harder to mitigate
Ranking Evaluation
![Page 24: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/24.jpg)
24
•Feature analysis
Ranking Evaluation
![Page 25: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/25.jpg)
25
•Filters Comparison• 140,000 queries•Measured how many questions were filtered out
•Manual Evaluation• 50 queries, top-3 questions
Diversification Evaluation
![Page 26: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/26.jpg)
26
•Automatic Evaluation• A suggested question is considered as matching the target question if it is similar enough to the target question
Diversification Evaluation
![Page 27: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/27.jpg)
27
Outline• Introduction•Query-to-question recommendation approach• Template extraction• Generating question candidates• Question feature representation• Learning-to-rank model• Diversifying suggestions
•Experiment•Conclusion
![Page 28: Date: 2013/12/04 Author: Gideon Dror , Yoelle Maarek , Avihai Mejer , Idan Szpektor Source : WWW’13 Advisor: Jia -ling Koh Speaker: Chen-Yu Huang](https://reader035.fdocuments.in/reader035/viewer/2022081520/568168ac550346895ddf5b3d/html5/thumbnails/28.jpg)
28
• Introduce a new model for synthetic question generation that pays special attention to the eventual grammatical correctness and appeal of the suggestion list.
•Use the learning-to-rank framework that leverages millions of features covering both relevance and correctness aspects
• Increase the overall appeal of the suggested list by introducing a novel diversification component.
• Intent to develop a few use cases and associated treatments, and investigate the reaction of users in an extensive online experiment
Conclusion