1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
-
Upload
austen-montgomery -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
1
Cross-Lingual Query Suggestion Using
Query Logs of Different Languages
SIGIR 07
2
Abstract
• Query suggestion– To suggest relevant queries for a given query– To help users better specify their information
needs
• Cross-Lingual Query Suggestion (CLQS): – For a query in one language, we suggest similar or
relevant queries in other languages.• cross-lingual keyword bidding (Search Engine)
• cross-language information retrieval (CLIR)
3
Introduction
• CLQS vs. Cross-Lingual Query Expansion – Full queries formulated by users in another
language.
• The users of search engines – similar interests in the same period of time– queries on similar topics in different languages
• Key point– How to learn a similarity measure between two
queries– MLQS: Term Co-Occurrence based MI and 2
4
Estimating Cross-Lingual Query similarity
• Discriminative Model for Estimating Cross-Lingual Query Similarity
• Monolingual Query Similarity Measure Based on Click-through Information
• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion
• Estimating Cross-lingual Query Similarity
5
Discriminative Model for Estimating Cross-Lingual Query Similarity – 1/2
– qf : a source language query
– qe : a target language query
– simML : Monolingual query similarity
– simCL : Cross-lingual query similarity
– Tqf : translation of qf in the target language
6
Discriminative Model for Estimating Cross-Lingual Query Similarity – 2/2
• Learning: LIBSVM regression algorithm– f : feature functions– : mapping feature space onto kernel space– w : weight vector in the kernel space
– relevant vs. irrelevant– strongly relevant, weakly relevant or irrelevant
7
Estimating Cross-Lingual Query similarity
• Discriminative Model for Estimating Cross-Lingual Query Similarity
• Monolingual Query Similarity Measure Based on Click-through Information
• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion
• Estimating Cross-lingual Query Similarity
8
Monolingual Query Similarity Measure Based on Click-through Information
• click-through information in query logs [26]
• KN(x) : number of keyword in a query x
• RD(x) : number of clicked URLs for a query x
• = 0.4 , =0.6
9
Estimating Cross-Lingual Query similarity
• Discriminative Model for Estimating Cross-Lingual Query Similarity
• Monolingual Query Similarity Measure Based on Click-through Information
• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion
• Estimating Cross-lingual Query Similarity
10
1. Bilingual Dictionary – 1/2
– 120,000 unique entries (built-in-house)– Given an input query qf={wf1,wf2,…,wfn} (in source languag
e)– By bilingual dictionary D: D(wfi)={ti1,ti2,…,tim}
– C(x,y) is the number of queries in the log containing both x and y.
– C(x) is the number of queries in the log containing x. – N is the total number of queries in the log
11
1. Bilingual Dictionary – 2/2
–
– The set of top-4 query translations is denoted as S(Tqf)
– T S(Tqf)• Retrieve all queries containing T in target language and
assign Sdict(T) as their value
12
2. Parallel Corpora– Given a pair of queries
• qf : in the source language • qe : in the target language
– Bi-Directional Translation Score : • IBM model 1 & GIZA++ tool
• P(yj|xi) is the word to word translation probability
– Top 10 queries {qe} with qf from the query log
13
3. Online Mining for Related Queries – 1/3
• OOV is a major knowledge bottleneck for query translation and CLIR
• Assumption :– A query in the target co-occurs with the source
query in many web pages– They are probably semantically related – but, amount of noise
14
3. Online Mining for Related Queries – 2/3
– Frequency in the Snippets• For example:
– Given a query q=abc in source language
– By dictionary : a={a1,a2,a3}, b={b1,b2} and c={c1}
– Web query : q ^ (a1 v a2 v a3) ^ (b1 v b2) ^ (c1) in target language
– 700 snippets , most frequent 10 target queries
15
3. Online Mining for Related Queries – 3/3
– Any query qe mined from the web will be associated with a feature CODC Measure with SCODC(qf,qe)
16
4. Monolingual Query Suggestion
• Q0 : candidate queries (in target language)
– For each target query qe,
• SQML(qe) : monolingual source query
17
Estimating Cross-Lingual Query similarity
• Discriminative Model for Estimating Cross-Lingual Query Similarity
• Monolingual Query Similarity Measure Based on Click-through Information
• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion
• Estimating Cross-lingual Query Similarity
18
Estimating Cross-lingual Query Similarity
• Four categories of features are used to learn the cross-lingual query similarity.
• cross-lingual query similarity score– Learning: LIBSVM regression algorithm
• f : feature functions
• : mapping feature space onto kernel space
• w : weight vector in the kernel space
19
Performance Evaluation – Log Data
• Data Resources : – MSN Search Engine
• French (source language) vs. English ( target language)– A one-month English query log
– 7 million unique English queries
– Occurrence frequency more than 5
• 5,000 French queries – 4,171 queries have their translations in the English queries
– 70% training weight of LIBSVM
– 10% development data
– 20% testing
20
Performance Evaluation - CLIR
• Data Resources : – TREC6 CLIR data (AP88-90 newswire, 750MB)– 25 short French-English queries Pairs (CL1-CL25)
• average long 3.3
• match in the web query logs for training CLQS
Source Language
Target Language
BM25
CLIR
CLQS {q
e}qf
21
• CLQS
22
23
• CLIR
24
Conclusion
• Cross-lingual query suggestion
• Query Logs
• French to English
• TREC6 French to English CLIR task– CLQO demonstrates the high quality