Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1...
-
Upload
jocelin-foster -
Category
Documents
-
view
213 -
download
1
Transcript of Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1...
Statistic Models for Web/Sponsored Search Click Log Analysis
The Chinese University of Hong Kong
1
Some slides are revised from Mr Guo Fan’s tutorial at CIKM 2009.
Index
• Background.• A Simple Click Model.– Dependent click model [WSDM09].
• Advanced Design.– Five extension directions.
• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model (BBM) [Liu09].– Click chain model (CCM) [Guo09].
• Course Project.
2
Scenario: Web Search
3
User Click Log
4
36
23
1811
36
1
2
3
4
5
Eye-tracking User Study
• Users have bias to examine the top results.
5
Position-bias Identification
6
• Higher positions receive more user attention (eye fixation) and clicks than lower positions.
• This is true even in the extreme setting where the order of positions is reversed.
• “Clicks are informative but biased”.
Normal Position
Perc
enta
ge
Reversed Impression
Perc
enta
ge
[Joachims07]
Answer to Previous Example
• Result 5 is more relevant compared with Result 1. • Because Result 5 has less opportunity to be examined.
7
36
23
1811
36
1
2
3
4
5
Click Model Motivation
• Modeling the user’s click behavior in an interpreted manner and estimate the pure relevance of a query-document/ad pair regardless of bias. – Position-bias is the main problem.– Other kinds of bias.
• Influence among documents/ads• Attractiveness bias• Search intent bias• …
• Pure relevance of a query-document/ad pair intuition.– When the query is submitted to the search engine and only one single
document/ad is shown, what is the click-through rate of this query-document/ad pair?
8
Examination Hypothesis [Richardson07]• A document must be examined before a click.• The probability of click conditioned on being examined
depends on the pure relevance of the query-document/ad pair.
• The click probability could be decomposed.– Global component.
• the examination probability which reflects the position-bias.
– Local component (pure relevance).• click probability of the (query, URL) pair conditioned on being examined.
9
Click Models
• Key tasks.– How to design the user examination behavior? – How to estimate the relevance of a query-doc/ad pair?
• Desired Properties.– Effective: aware of the position-bias/other-bias and address it
properly.– Scalable: linear complexity for both time and space, easy to parallel.– Incremental: flexible for model update based on new data.
10
From this slide, “relevance” is equal to “pure relevance”.
Importance of Understanding Logs
• Better matching query and documents/ads.• All the participants would benefit.
– Users: better relevance.– Search engines: more revenue from advertisers and more users.– Advertisers: more return on investment (ROI).
11
Advertiser
User PublisherBetterMatch
Growth of Web Users
12
Growth of Web Revenue
13
Index
• Background.• A Simple Click Model.– Dependent Click Model [WSDM09].
• Advanced Design.• Advanced Estimation.• Projects.
14
Notations
– Ei• binary r.v. for Examination Event on position i;
– Ci• binary r.v. for Click Event on position i;
– ri = p(Ci = 1| Ei = 1)• relevance for the query-document pair on position i.
15
Click Model Design
16
1( 1) 1
( 1| 0) 0
( 1| 1)i i
i i i
p E
p C E
p C E r
1
1
1
( 1| 0) 0
( 1| 0, 1) 1
( 1| 0, 1)
i i
i i i
i i i i
p E E
p E C E
p E C E
Dependent Click Model (DCM) [GUO09]
Parameters in DCM
• r=p(C=1|E=1) is local parameter.– Modeling the relevance of a query-document/ad pair.
– The position-bias has been modeled by p(E=1).
• λ is global parameter.– Modeling p(Ei+1=1|Ci=1,Ei=1).
17
Parameters estimationMaximum log-likelihood method
Estimation of r: Step 1
• Define as last click position.• When there is no click, is the last position.
18
l
l
Query cikmPos URL Click
1 cikm2008.org 02 www.cikm.org 13 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 16 Ir.iit.edu/cikm2004 0
Query cikmPos URL Click
1 cikm2008.org 02 www.cikm.org 03 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0
Estimation of r: Step 2
• Log-likelihood of a query session.
19
1
1
1
1
1
1
( (log log ) (1 ) log(1 ))
log (1 ) log(1 )
log(1 (1 ))
( log (1 ) log(1 ))
log(1 )
l
DCM i i i i ii
l l l l
M
l l jj l
l
i i i ii
l
i i li
L C r C r
C r C r
r
C r C r
C
Estimation of r: Step 3
• By maximizing the lower bound of the log-likelihood, we have
20
1
1 1
( log (1 ) log(1 )) log(1 )
01
#click
#impression before or on position
l l
DCM i i i i i i li i
AllDCM
L C r C r C
L M N
r r rM
rM N l
Suppose the current pair has occurred in different sessions. For M sessions, it occurs before/on l and has been clicked; for N sessions, it occurs before/on l and is not clicked.
Estimation of λ
• For a specific , By maximizing the lower bound of the log-likelihood, we have
21
i
1
1 1
( log (1 ) log(1 )) log(1 )
01
#query sessions when last clicked position =1
#query sessions when position is clicked
l l
DCM i i i i i i li i
AllDCM
i i i
i
L C r C r C
L B C
B i
B C i
Suppose there are totally A sessions. In B sessions, the position l is large than position i and click event happens in position i. In C sessions, the position l is just equal to position i. Other cases happen in the other A-B-C sessions.
Property Verification
• Effective.
• Scalable and Incremental.
22
#click
#impression before or on position r
l
#query sessions when last clicked position =1
#query sessions when position is clickedi
i
i
Evaluation Criteria for DCM
• Log-likelihood.– Given the document impression in the test set.– Compute the chance to recover the entire click vector.– Averaged over different query sessions.
23
Experimental Result for DCM
24
Some Other Evaluations
• Log-likelihood.– http://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood
• Perplexity.– http://en.wikipedia.org/wiki/Perplexity
• Root mean square error (RMSE).– http://en.wikipedia.org/wiki/Root-mean-square_deviation
• Area under ROC curve.– http://en.wikipedia.org/wiki/Receiver_operating_characteristic
25
Index
• Background.• A Simple Click Model.• Advanced Design.– Five extension directions.
• Advanced Estimation.• Project.
26
1 Dependency from Previous Docs/Ads
• For position 4 in the following two cases, do they have the same chance to be examined?
• Intuitively, the left one has less chance, since user may find the URL he/she wants in position 2 and stops the session.
27
Query cikmPos URL Click
1 cikm2008.org 02 www.cikm.org 13 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0
Query cikmPos URL Click
1 cikm2008.org 02 www.cikm.org 03 www.fc.ul.pt/cikm 04 cikmconf.org 15 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0
Solution: Click Chain Model [Guo09]
• The chance of being examined depend on the relevance of previous documents/ads.
• Other similar work includes [Dupret08][Liu09].
28
2 Perceived v.s. Actual Relevance
• After clicking the docs/ads, the actual relevance, by judging from the landing page, might be different from user’s perceived relevance.
29
Pizza
Query
Ad1
Ad2
before examination
after examination
Solution: Dynamic Bayesian Network [Chapelle09]• For each ad, two kinds of relevance are defined, perceived
relevance r and actual relevance s. s would influence the examination probability of the latter docs/ads.
30
3 Aggregate v.s. Instance Relevance
• Users might have different intents for the same query.• The click event could indicate the intent.
31
Aggregate search. E.g., learn the parameters
Instance search. E.g., buy a camera
CanonQuery
Ad1
Ad2
Canon
Ad1
Ad2
Canon
Ad1
Ad2
Solution: Joint Relevance Examination Model [Srikant10]• Add a correction factor , which is determined by the click
events of other docs/ads.• Other similar work includes [Hu11].
32
( )i
4 Competing Influence in Docs/Ads
• When co-occurred with a high-relevant doc/ad, the perceived relevance of the current doc/ad would be decreased.
33
Solution: Temporal Click Model [Xu10]
• The docs/ads are competed to win the priority to be examined.
34
5 Incorporating Features
• Feature example: dwelling time.
35
Solution: Post-Clicked Click Model [Zhong 10]• Incorporating features to determine the relevance. • Other similar work include [Zhu 10].
36
Index
• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.
• Project.
37
Limitation of Maximum Log-likelihood
• Cannot fit the scalable and incremental properties.– It has difficulty in getting closed-form formula, when the model is
complex.– Even in DCM as shown in this page, we need to approximate a lower
bound for easy calculation. • No prior information could be utilized in such sparse data
environment.
38
Log-likelihood of DCM
1
1
1
( (log log ) (1 ) log(1 ))
log (1 ) log(1 )
log(1 (1 ))
l
DCM i i i i ii
l l l l
M
l l jj l
L C r C r
C r C r
r
1
1
1
( log (1 ) log(1 ))
log(1 )
l
i i i ii
l
i i li
C r C r
C
An Coin-Toss Example for Bayesian Framework
• Scenario: to estimate the probability of tossing a head according to the following five training samples.
• The probability is a variable X = x.• Each training sample is denoted by Ci , e.g., C1 = 1, C4=0.
• According to Bayesian rule, we have
39
1:5 1:5
1:5
1:5 1:5
( | ) ( ) ( | ) ( )( | )
( ) ( | ) ( )x
p C X x p X x p C X x p X xp X x C
p C p C X x p X x dx
Bayesian Estimation of Coin-tossing
40
X
C1 C2 C3 C4 C5
1:5 1:5
1:5
1:5 1:5
( | ) ( ) ( | ) ( )( | )
( ) ( | ) ( )x
p C X x p X x p C X x p X xp X x C
p C p C X x p X x dx
( ) 1p x
1:5
5 51
1 1
( | ) ( | ) (1 )i iC Ci
i i
p C X x p C X x x x
Bayesian rule:
Uniform prior:
Independent sampling :
Distribution : 51
1:51
( | ) ( ) (1 )i iC C
i
p X x C p x x x
Estimation:
1:5( | )E X C
Density Function Update of Coin-tossing
41
Prior Posterior
Density Function(not normalized)
x1(1-x)0 x2(1-x)0 x3(1-x)0
x3(1-x)1 x4(1-x)1
Click Data Scenario
42
a
b
c
d
a
c
e
a
b
a
c
b
a
f
g
query
1:5
1:5
1:5
( | ) ( )( | )
( | ) ( )x
p C X x p X xp X x C
p C X x p X x dx
Bayesian rule:
( ) 1p x Uniform prior:
1:5
5
1
( | ) ( | )ii
p C X x p C X x
Independent sampling :
Distribution : 5
1:51
( | ) ( ) ( | )ii
p X x C p x p C X x
Factor Trick
• If the factors of p(C|X) are arbitrary, for each training sample, a unique factor of p(X) must be stored. Thus it is space consuming;
• However if the factors of p(C|X) are from a small discrete set, only the exponents are needed to be stored.
43
Distribution : 5
1:51
( | ) ( ) ( | )ii
p X x C p x p C X x
Updating Example
44
Prior
Density Function(not normalized)
x1
(1-x)0
(1-0.6x)0
(1+0.3x)1
(1-0.5x)0
(1-0.2x)0
…
x1
(1-x)1
(1-0.6x)0
(1+0.3x)1
(1-0.5x)0
(1-0.2x)0
…
x2
(1-x)1
(1-0.6x)0
(1+0.3x)2
(1-0.5x)0
(1-0.2x)0
…
x3
(1-x)1
(1-0.6x)1
(1+0.3x)2
(1-0.5x)0
(1-0.2x)0
…
x3
(1-x)1
(1-0.6x)1
(1+0.3x)2
(1-0.5x)1
(1-0.2x)0
…
How to realize the factor trick?
• Setting a global parameter for all cases.– Bayesian browsing model (BBM) [Liu09].
• Assuming all other docs/ads follows the same distribution and integrating them.– Click chain model (CCM) [Guo09].
45
In the following two example, we only concern the estimation of r using Bayesian framework. The estimation of other parameters are all based on maximizing the log-likelihood similarly as shown in DCM. Please refer the original paper for details.
Index
• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.
• Project.
46
BBM Variable Definition
47
• For a specific query session, let– ri, the relevance variable at position i. – Ei, the binary examination variable at position i. – Ci, the binary click variable at position i. – ni, last click position before position i. – di, the distance between position i and its previous clicked
position.
Small Discrete Set of Beta
• Suppose M = 3 for simplicity illustration. • There are only 6 values of beta.
48
n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
Estimation Algorithms
49
1 2,
0, 0,
( | ) ( ) (1 )N Nn d
r d r d M
p r C p r r r
How many times the Doc/ad was clicked
How many times the Doc/ad was not clicked with the probability of betan,d
5
1:51
51
1
( | ) ( ) ( | )
( ) ( ( 1) ) (1 ( 1) )i i
ii
C Ca a
i
p X x C p x p C X x
p x p E x p E x
Toy Example Step 1
50
• Only top M=3 positions are shown, 3 query sessions and 4 distinct URLs.
41
4
3
1 3
31 2
Position 1 2 3
Query Session 3
Query Session 2
Query Session 1
Toy Example Step 2
51
• Initialize M(M+1)/2+1 counts for each URL.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 0 0 0 0 0 0 0
Toy Example Step 3
52
• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 0 0 0 0 0 0 0
Toy Example Step 4
53
• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 0 0 0 0 0 0 1
Toy Example Step 5
54
• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 1 0 0 0 0 0 1
Toy Example Step 6
55
• The posterior for URL 4.
• Interpretation: – The larger the probability of examination, the stronger the penalty for
a non-click.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 1 0 0 0 0 0 1
Algorithm Complexities
56
• Let
• Initializing and updating the counts:– Time: Space:
Linear to the size of the click log
Almost constant storage required
Index
• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.
• Project.
57
User Behavior Description
58
Examine the Document
Click?
See Next Doc?
DoneNo
Yes
Yes
No
Yes
iR
1 iRSee Next
Doc?
DoneNo
2 31 i iR R
Estimation Algorithms
• By assuming other docs/ads in a session follow the same distribution and integrate them, the factors f p(C|R) could be
described from a small discrete set.
59
1
| |N
nj j j
n
p R p R P C R
C
Five Cases
• The current doc/ad may occur in five different cases. • For each case, there would be unique factors for p(C|Ri).
60
Case 1
61
( | ) ( 0 | 1, ) 1i i i i i iP C R P C E R R
• The doc/ad must be examined. • Other R can seen as constants.
Case 2
62
Case 3
63
All Cases
64
• By assuming other docs/ads in a session follows the same distribution and integrate them, the factors f p(C|R) could be
described from a small discrete set.
1
| |N
nj j j
n
p R p R P C R
C
Index
• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.• Project.
65
Description• Fake dataset.• Format.
– queryId– ad1Id, click– ad2Id, click– ad3Id, click
• Evaluation Metric: ROC.• Baseline.
– Average (Avg).• Current competitive method.
– Simplified CCM (SCCM).• Task.
– Implement another advanced click model. – Compare the result with the Avg and SCCM.– Analyzing the reasons of improvement.
66
End
67