Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010...

13
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics

Transcript of Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010...

Page 1: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Query Rewriting Using Monolingual Statistical Machine TranslationStefan RiezlerYi LiuGoogle2010 Association for Computational Linguistics

Page 2: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Introduction• Create a system that learns to generate query rewrite from a

large amount of user query logs.• Use query expansion in Web search for evaluation of rewritten

queries.• For a given set of randomly selected queries, n-best rewrites

are produced.• From the changes introduced by the rewrites, expansion terms

are extracted and added as alternate form.

Page 3: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Example• For a query like herbs for chronic constipation AND operator

used. Expansion terms added with OR operator. For this sentence remedies, medicine, or supplement are appropriate terms, but in this context spices are not.

• Herbs for mexican cooking only spices is a good alternative.

Page 4: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Goal• Use the translation model and language model to expand

query terms in context.• Translation model proposes expansion candidates.• Query language model performs a selection in the context of

the surrounding query terms.• SMT is readily applicable to this task. Apply to large parallel

data of queries on the source side, and snippets of clicked search results on the target side.

• Snippets introduce noise since they are not complete sentences.

• TREC Data.

Page 5: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Review: Query Expansion by Q-D Term Correlation• A session links query terms with a document:

• Aggregation of clicks over sessions will reflect the preferences of multiple users (probability distribution of doc words given query words from counts over clicked docs D over sessions):

• This formula considers the Query as a cohesive unit:

Page 6: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Review: Machine Translation 1/2• Linear Model for SMT:• Find English string e that is a translation of foreign string f using a

linear combination of feature function hm(e,f) and weights lambda:

• Word Alignment:• Relationship of translation model and alignment model for source

language string f and targe string e is via a hidden variable describing an alignment mapping from source position j to target position aj:

Page 7: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Review: Machine Translation 2/2• “Sentence Aligned” parallel training data are prepared by

paring user queries with snippets of clicked search results for the respective queries.

• Phrase Extraction:• Maximum-likelihood estimation of sentence aligned strings:

• Alignment with highest probability:

Page 8: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Language Model• n-gram language modeling, smoothing for sparse data

problems.• Ultimate task is to pick appropriate phrase translations in the

context of the original query for query expansion.

Page 9: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Data• Training data for translation model and correlation-base

model consists of pairs of queries and snippets for clicked result taken from query logs.

• 3 billion query-snippet pairs from which a phrase-table of 700 million query-snippet phrase translation is extracted.

• Trigram trained on English queries in user logs.• N-gram cutoffs at minimum frequency of 4.• Query were avg. length of 2.6 words.• Snippets were avg. length 8.3 words.

Page 10: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Query Expansion• Use Google, SMT-based system, correlation-based system,

and correlation-based system using language model as filter.• Expansion terms:• 150,000 randomly extracted 3+ word queries rewritten by each of

the systems.• For each system, expansion terms from 5-best rewrites, and

stored in table that maps source phrases to target phrases in context of full query.

Page 11: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Evaluation 1/2• 3 independent raters, presented with queries and 10-best

search results from two systems. 7-point Likert Scale

Page 12: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Evaluation 2/2

Page 13: Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Conclusion• SMT model is flexible enough to capture the peculiarities of

query-snippet translation.• Hope to apply SMT to query suggestions.