Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT...
-
Upload
lewis-chase -
Category
Documents
-
view
217 -
download
0
description
Transcript of Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT...
Advisor: Koh Jia-Ling Nonhlanhla Shongwe
2010-09-28
EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCHWANG.H, LIANG.Y, FU.L, XUE.G, YU.YSIGIR’09
Preview
Introduction AdSearch
Bid phrase clustering Index structure for efficient ad search Query processing
Experimental evaluation Conclusion
Introduction
Web has become an important venue for advertising e.g Google, Yahoo
Mainly two kinds of advertising channels Contextual advertising Sponsored advertising
Ranking: derived from relevance to the user query page content
Introduction cont’s
Ad’s are characterized by bid phrases keywords the advertisers choose for their ads
Syntactic approaches suffer low recall Example
Query: “job training” Ad: career college
Ad does not have a syntactic match and is not proposed
Introduction cont’s
The problem is even worse because Shorter lengths of ads Sparsity of the bid phrases
Propose an efficient adsearch solution Tackle the issues with query expansion
AdSearch Overview
AdSearch cont’s Bid phrase clustering
Bipartite Graph Construction for Bid Phrase and Ads
Agglomerative Iterative Clustering
Bipartite Graph Construction for BidPhrase and Ads
A, B , C Ad0, Ad1, Ad2, Ad3, Ad4
1. B = 2. A =
3. G = vba, vbb, vbc 4. G = va0, va1, va2, va3, va4
Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4
Agglomerative Iterative Clustering
Jaccard Similarity
Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4 (A,B) = 1/4 (B,C) = 2/4
Agglomerative Iterative Clustering cont’sCorpus data C
A = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4
A, B , C Ad0, Ad1, Ad2, Ad3, Ad4Bid-phrases Ads
Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4
A, B , C Ad0, Ad1, Ad2, Ad3, Ad4Bid-phrases
(A, B) = 0.25 (A, C) = 0.25(B, C) = 0.5
Bipartite graph
Ads
Ad0 = A, Ad1 = B, Ad2 = B, CAd3 = B, A, CAd4 = C
Ad0, Ad1 = 0Ad0, Ad2 = 0Ad0, Ad3 = 0.33Ad0, Ad4 = 0Ad1, Ad2 = 0.5Ad1, Ad3 = 0.33Ad1, Ad4 = 0Ad2, Ad3 = 0.66Ad2, Ad4 =0.5Ad3, Ad4 =0.33
Merge:Ad2, Ad3Ad2, Ad4Ad1, Ad2Ad0, Ad3
MergeB to CThen A
AB, C
Ad0
Ad1, Ad4
Ad2, Ad3
AdSearch cont’s
• Index structure for efficient adsearch Mapping clusters of Bid Phrases to Index Terms Block-based Index Structure Dictionaries
Mapping clusters of Bid Phrases to Index Terms
Clusters
B
A
C
D E
Block-based Index Structure3 inverted lists
Contains: Index =bid phrase
List = ad1 inverted list
Contains:Index =3 bid
phrases List = ad and bid phrase
Query =B
Block-based Index Structure cont’s
Advantages over the traditional method Similar bit phrases and their
corresponding ads are placed together Merge operations become fewer or even
can be avoided Expanding phrase B with phrase A and C,
in the traditional method is not efficient.
Dictionaries
Dictionary D used to record the mapping
Bid phrase to its corresponding artificial words Locate corresponding block to a bid phrase
Bid phrase artificial words (path)
A 6:0B 6_5:1C 6_5:2
Cluster pathNumber of distinct ads
Dictionaries cont’s
Dictionary C (counter dictionary) used to record number of distinct Ads per
cluster
Corpus data CA = Ad0, Ad3B = Ad1, Ad2, Ad3C = Ad2, Ad3, Ad4
Cluster path
Distinct ads
6 |Ad0, Ad3|=2
6_5 |Ad1, Ad2, Ad3, Ad4| = 4
(6, 2)(6_5, 4)
AdSearch cont’s
• Query processing Finding Related Bid phrases with
Corresponding Ads Ranking Top-k Relevant Ads
Finding Related Bid phrases with Corresponding Ads
The process to find related bid phrases Input: user queries
Look up the dictionary D to get corresponding artificial words
Find minimum clusters that contain enough ads
Bid phrase artificial words (path)
A 6:0B 6_5:1C 6_5:2
Query: ABD
Cluster path
Distinct ads
6 |Ad0, Ad3|=2
6_5 |Ad1, Ad2, Ad3, Ad4| = 4
e.g. Top 2 ads M=1.5 *2 = 3
Bid phrase artificial words (path)
A 6:0B 6_5:1C 6_5:2
Cluster path
Distinct ads
6 |Ad0, Ad3|=2
6_5 |Ad1, Ad2, Ad3, Ad4| = 4
Finding Related Bid phrases with Corresponding Ads
The process to find related bid phrases Return clusters, those containing at least
one bid are stored in one group
Perform a multi-way merge operation to get the final results.
Ad Ad1 Ad2 Ad3 Ad4
Bid phrases
A B,C A,B,C C
Ad Ad1 Ad2 Ad3 Ad4
Bid phrases
A B,C A,B,C C
Ranking Top-k Relevant Ads
A procedure to expand the user query with related bid phrases and get a list of ads
To get the top K User a scoring function
Q Query B(x) Set of related bid phrases
Similarity between x and ytfidf(y, ad) term frequency and inverse
document frequency
Experimental evaluation
Both Chinese and English
Experimental evaluation cont’s
Name Description CQS1 (Chinese )or EQS1 (English) Randomly sampled 100 bid
phrases and each bid phrase is associated with few distinct ads
CQS2 (Chinese )or EQS2 (English) Selected 100 pairs bid phrases, each pair could return ads associated with both bid phrases inside it
CQS3 (Chinese )or EQS3 (English) Constructed similarly with queries composed of 3 to 4 bid phrases
CQF ( Chinese Frequent Query set)and EQS( English Frequency Query Set )
100 popular bid phrases to build the CQF and EQF
Experimental evaluation cont’s
Evaluation of the clusters step
Experimental evaluation cont’s
Efficiency evaluation The adSearch was implemented in fixed and unfixed block sizes
The block size is defined as the fraction of distinct ads in the block with regards to the whole ads.
AdSearch(0.001) number of distinct ads in each block.
For example Chinese data 524, 868 * 0.001 = 525Chinese data set = 525
Inv= perform query expansion on top of the traditional inverted index
Experimental evaluation cont’s
Effectiveness valuation
•Randomly selected 50 queries •10 people invited to evaluate the returned ads by AdSearch and Baidu.
Experimental evaluation cont’s
Effectiveness evaluation
Conclusion Introduced a AdSearch system which
consists Bid phrase clustering
For each bid phrase and ad, it will contract a bipartite graph
Used the agglomerative iterative clustering to cluster similar ads
Index structure for efficient ad search Used a block-based index structure to index all ads
and bid phrases Used the dictionary to record mappings between
bid phrases and ads Query processing
Explained how ads we retrieved and ranked to get the top-k results
THANK YOU
Introduction cont’s
Back
All Docs Relevant Ads
Relevant Docs (R)
Relevant Ads in the Ads set (Ra)
Q = “job
training”