CS336: Intelligent Information Retrieval Why is Information Retrieval difficult?
Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic...
Transcript of Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic...
Ghislain Fourny
Information Retrieval9. Probabilistic Information Retrieval
Picture copyright: johan2011/123RF Stock Photo
2
What we have seen so far
3
Boolean retrieval
lawyer ANDPenang AND NOT silver
InputSet of documents
OutputSubset of documents
query
4
Ranked retrieval
lawyerPenangsilver
2
1
3
InputSet of documents
OutputRanked subset of documents
query
4
5
Probability theory
6
Universe
⌦
7
Elementary event (outcome)
! 2 ⌦
⌦
8
Probability distribution
0.3
0.2
0.1
0.4
0.1
p :⌦ ! [0, 1]! 7! p(!)
X
!2⌦
p(!) = 1
⌦
9
Event
0.3
0.2
0.1
0.4
0.1p(E) =
X
!2E
p(!) 2 [0, 1]
E ⇢ ⌦
⌦
10
Complement
0.3
0.2
0.1
0.4
0.1
E ⇢ ⌦
p(E) = 1� p(E)
⌦E ⇢ ⌦
11
E ⇢ ⌦
⌦
Odds
0.3
0.2
0.1
0.4
0.1
E ⇢ ⌦
Op(E) =p(E)
p(E)
12
Intersection
0.3
0.2
0.1
0.4
0.1
⌦
E \ F
E
F
13
Union
0.3
0.2
0.1
0.4
0.1
⌦
E
F
P (E [ F ) = P (E) + P (F )� P (E \ F )
E [ F
14
Disjoint events
0.3
0.2
0.1
0.4
0.1
⌦
E
FE [ F
P (E [ F ) = P (E) + P (F )
P (E \ F ) = 0
15
⌦
Partition rule
0.3
0.2
0.1
0.4
0.1p(E) = p(E \ F ) + p(E \ F )
E
F
16
⌦
Conditional probability
0.3
0.2
0.1
0.4
0.1
E
F
p(E|F ) =p(E \ F )
p(F )
17
⌦
Conditional probability
0.3
0.2
0.1
0.4
0.1
E
F
p(E|F ) =p(E \ F )
p(F )
18
⌦
Chain rule
0.3
0.2
0.1
0.4
0.1
E
F
p(E \ F ) = p(E|F )⇥ p(F )
19
⌦
Independence
0.3
0.2
0.10.1
E
F
p(E \ F ) = p(E)⇥ p(F )
P (E|F ) = P (E)
20
Puzzle
When it rains, I forget my umbrella 5% of the time.
It rains every other day.
Last year, I had my umbrella on 271 days.
21
Puzzle
When it rains, I forget my umbrella 5% of the time.
It rains every other day.
Last year, I had my umbrella on 271 days.
Today I have my umbrella. Is it raining?
22
Visual representation
23
Visual representation
24
Visual representation
25
Probability that I don't forget my umbrella if it rains
26
Likelihood that it is raining if I took my umbrella
27
Visual representation
+
28
P( and )Bayes Formula – Using Chain Rule...
29
P( and )= P( | ).P( )
Bayes Formula – Using Chain Rule...
30
P( and )= P( | ).P( ) = P( | ).P( )
Bayes Formula – Using Chain Rule...
31
P( | )
P( | ).P( ) P( )
Bayes Formula
=
32
Visual representation
Christos Georgiou / 123RF Stock Photo
33
P( | )
0.95 .P( ) P( )
Bayes Formula
=
34
P( | )
0.95 . 0.5P( )
Bayes Formula
=
35
P( | )
0.95 . 0.5271/365
Bayes Formula
=
36
P( | )
64%
Bayes Formula
=
37
Visual representation (at scale)
38
Visual representation (at scale)
I had my umbrella 271 of 365 days.
39
Visual representation (at scale)
I only forget my umbrella 5% of the rainy days.
40
Visual representation (at scale)
It rains 50% of the time
41
Visual representation (at scale)
But it rains 64% of the timeon the days I have my umbrella.
42
The magics
x P( )
prior
43
Prior
x P( )
prior
44
Adding new information
P( | ) =P( | )
x P( )P( )
45
Posterior
P( | ) =
posterior
46
The magics
P( | ) =P( | )
x P( )P( )
posterior prior
47
⌦
Bayes' rule
0.3
0.2
0.1
0.4
0.1
E
p(E|F ) =P (F |E)⇥ P (E)
P (F )
F
48
Bayes' rule
p(E|F ) =P (F |E)
P (F )⇥ P (E)
posterior prior
⌦
0.3
0.2
0.1
0.4
0.1
E
F
49
Random variable
0.3
0.2
0.1
0.4
0.1
⌦
X :⌦ ! S! 7! X(!)
50
Random variable
0.3
0.2
0.1
0.4
0.1
⌦
X :⌦ ! S! 7! X(!)
S
51
Random variable
0.3
0.2
0.1
0.4
0.1
⌦
S
pX :S ! [0, 1]
x 7!X
!|X(!)=x
p(!)
52
Random variable
0.3
0.2
0.1
0.4
0.1
⌦
S
pX( ) = 0.5
pX( ) = 0.1
pX( ) = 0.5
53
In practice...
SpX( ) = 0.5
pX( ) = 0.1
pX( ) = 0.5
54
Alternate notation
S
P (X = ) = 0.1
P (X = ) = 0.5
P (X = ) = 0.5
55
Don't do that!
P ( ) = 0.5
No go!
No go!
No go!
No go!
56
Joint probabilities
0.3
0.2
0.1
0.4
0.1
⌦
pXY (x, y) =X
!|X(!)=x^Y (!)=y
p(!)
57
Conditional probabilities
0.3
0.2
0.1
0.4
0.1
⌦
pX|Y (x, y) =pXY (x, y)
pY (y)
58
Conditional probabilities
0.3
0.2
0.1
0.4
0.1
⌦
P (X = x|Y = y) =P (X = x ^ Y = y)
P (Y = y)
59
P (x|y) = P (x, y)
P (y)
Don't do that!
No go!
No go!
No go!
No go!
60
Probability model for document retrieval
61
Query as a random variable
⌦
! 2 ⌦ q
Q
62
Document as a random variable
⌦
d
D! 2 ⌦
63
Relevance as a random variable
⌦
r=0 or 1
R! 2 ⌦
B
64
Probability ranking principle
Probability that, for a query q and a document d,d is relevant for query q
P (R = 1|D = d ^Q = q)
65
Probability ranking principle
Probability that, for a query q and a document d,d is relevant for query q
P (R = 1|D = d ^Q = q)
D=d
Q=qR=1
R=0
66
Ideal world
P (R = 1|D = d ^Q = q)
P (R = 1|D = e ^Q = q)
P (R = 1|D = f ^Q = q)
P (R = 1|D = g ^Q = q)
67
Probability ranking principle
P (R = 1|D = d ^Q = q)
P (R = 1|D = e ^Q = q)
P (R = 1|D = f ^Q = q)
P (R = 1|D = g ^Q = q)
68
Probability ranking principle
SortP (R = 1|D = d ^Q = q)
P (R = 1|D = e ^Q = q)
P (R = 1|D = f ^Q = q)
P (R = 1|D = g ^Q = q)
69
"Boolean retrieval"
P (R = 1|D = d ^Q = q) > P (R = 0|D = d ^Q = q)
70
"Boolean retrieval"
D=d
Q=qR=1
R=0D=d
Q=qR=1
R=0
71
"Boolean retrieval"
P (R = 1|D = d ^Q = q) >1
2
72
Retrieval costs
Sort by increasing
C0 ⇥ P (R = 0|D = d ^Q = q)� C1 ⇥ P (R = 1|D = d ^Q = q)
73
Retrieval costs
Sort by increasing
Return Not return
Relevant0 -C1
Not relevant-C0 0
diff
C0 ⇥ P (R = 0|D = d ^Q = q)� C1 ⇥ P (R = 1|D = d ^Q = q)
P (R = 1|D = d ^Q = q)
P (R = 0|D = d ^Q = q)
74
Retrieval costs
Sort by increasing
Cost of returning an irrelevant document
C0 ⇥ P (R = 0|D = d ^Q = q)� C1 ⇥ P (R = 1|D = d ^Q = q)
75
Retrieval costs
Sort by increasing
Cost of not returning an relevant document
C0 ⇥ P (R = 0|D = d ^Q = q)� C1 ⇥ P (R = 1|D = d ^Q = q)
76
With identical costs...
Sort by increasing
C(1�⇥P (R = 1|D = d ^Q = q))� C ⇥ P (R = 1|D = d ^Q = q)
77
With identical costs...
Sort by increasing
1� 2⇥ P (R = 1|D = d ^Q = q)
78
With identical costs...
Sort by decreasing
We "fall back" to the previous method
P (R = 1|D = d ^Q = q)
79
Binary independence model
80
Model and abstraction
Document as a list of words(with duplicates)
Simplification
Document as a set of words
Document as a vector of booleans
(0 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0)
81
Documents and queries as vectors
d =
0
BB@
d1d2d3...
1
CCA q =
0
BB@
q1q2q3...
1
CCA
82
One "document" random variable per term
⌦
dETH
DETH
! 2 ⌦
B
83
One "query" random variable per term
⌦
qinformation
Qinformation
! 2 ⌦
B
84
So we can ow write things like...
P (Dk = dk|R = 1 ^Q = q)
85
So we can ow write things like...
P (Dk = dk|R = 1 ^Q = q)
Probability that the documents contains term k, knowingthat the query is q and that it is relevant.
86
Naive Bayes Assumption
Term occur "independently" in documents
P (D = d) =k=MY
k=1
P (Dk = dk)
87
Naive Bayes Assumption
P (D = d) =k=MY
k=1
P (Dk = dk)
Term occur "independently" in documents
88
Going back to what we want to rank...
SortP (R = 1|D = d ^Q = q)
P (R = 1|D = e ^Q = q)
P (R = 1|D = f ^Q = q)
P (R = 1|D = g ^Q = q)
89
Bayes' formula
P (R = 1|D = d) =P (D = d|R = 1)⇥ P (R = 1)
P (D = d)
90
Condition on a query q...
P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|Q = q)
91
Condition on a query q...
That's a lot to evaluate!
P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|Q = q)
92
Condition on a query q...
Can we get rid of some of it?
P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|Q = q)
93
Condition on a query q...
P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|Q = q)
94
Condition on a query q...
P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|Q = q)
P (R = 0|D = d ^Q = q) =P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)
P (D = d|Q = q)
95
Condition on a query q...
P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|Q = q)
P (R = 0|D = d ^Q = q) =P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)
P (D = d|Q = q)
O(R = 1|D = d ^Q = q) =P (R = 1|D = d ^Q = q)
P (R = 0|D = d ^Q = q)
96
Condition on a query q...
P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|Q = q)
P (R = 0|D = d ^Q = q) =P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)
P (D = d|Q = q)
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)
97
Condition on a query q...
P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|Q = q)
P (R = 0|D = d ^Q = q) =P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)
P (D = d|Q = q)
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)
98
Condition on a query q...
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)
P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)
99
Condition on a query q...
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)
P (D = d|R = 0 ^Q = q)⇥ P (R = 1|Q = q)
P (R = 0|Q = q)
100
Condition on a query q...
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)
P (D = d|R = 0 ^Q = q)⇥ P (R = 1|Q = q)
P (R = 0|Q = q)
These are odds!
101
Condition on a query q...
These are odds!
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)
P (D = d|R = 0 ^Q = q)⇥O(R = 1|Q = q)
102
Condition on a query q...
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)
P (D = d|R = 0 ^Q = q)⇥O(R = 1|Q = q)
103
Condition on a query q...
We can use ourindependence
assumptionhere
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)
P (D = d|R = 0 ^Q = q)⇥O(R = 1|Q = q)
104
Naive Bayes Assumption
P (D = d) =k=MY
k=1
P (Dk = dk)
Term occur "independently" in documents
105
Naive Bayes Assumption
Term occur "independently" in documentseven conditioned on relevance and a given query
P (D = d|R = 1 ^Q = q) =k=MY
k=1
P (Dk = dk|R = 1 ^Q = q)
106
Condition on a query q...
We can use ourindependence
assumptionhere
O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)
P (D = d|R = 0 ^Q = q)⇥O(R = 1|Q = q)
107
Condition on a query q...
O(R = 1|D = d ^Q = q) =
Qk=Mk=1 P (Dk = dk|R = 1 ^Q = q)
Qk=Mk=1 P (Dk = dk|R = 0 ^Q = q)
⇥O(R = 1|Q = q)
108
Condition on a query q...
O(R = 1|D = d ^Q = q) =k=MY
k=1
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)
109
Condition on a query q...
O(R = 1|D = d ^Q = q) =k=MY
k=1
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)
This is boolean!
110
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=0
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥
Y
k|dk=1
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)
111
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=0
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥
Y
k|dk=1
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)
We call this pk
112
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=0
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥
Y
k|dk=1
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)
We call this pk
We call this uk
113
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=0
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥
Y
k|dk=1
P (Dk = dk|R = 1 ^Q = q)
P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)
We call this pk
We call this uk
kth term present kth term absent
Relevant pk 1-pk
Not relevant uk 1-uk
114
Condition on a query q...
kth term present kth term absent
Relevant pk 1-pk
Not relevant uk 1-uk
O(R = 1|D = d ^Q = q) =Y
k|dk=1
pk
uk⇥
Y
k|dk=0
1� pk
1� uk⇥O(R = 1|Q = q)
115
O(R = 1|D = d ^Q = q) =Y
k|dk=1
pk
uk⇥
Y
k|dk=0
1� pk
1� uk⇥O(R = 1|Q = q)
Condition on a query q...
We can limit the product to terms in qif...
116
O(R = 1|D = d ^Q = q) =Y
k|dk=1
pk
uk⇥
Y
k|dk=0
1� pk
1� uk⇥O(R = 1|Q = q)
Condition on a query q...
We can limit the product to terms in qif...
8k 2 [1,M ], qk = 0 =) (pk = uk)
117
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk
uk⇥
Y
k|dk=0^qk=1
1� pk
1� uk⇥O(R = 1|Q = q)
118
Condition on a query q...
Terms found in d
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk
uk⇥
Y
k|dk=0^qk=1
1� pk
1� uk⇥O(R = 1|Q = q)
119
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk
uk⇥
Y
k|dk=0^qk=1
1� pk
1� uk⇥O(R = 1|Q = q)
Y
k|dk=1^qk=1
1� pk1� uk
Y
k|dk=1^qk=1
1� pk1� uk
divide with multiply with
120
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk ⇥ (1� uk)
uk ⇥ (1� pk)⇥
Y
k|qk=1
1� pk
1� uk⇥O(R = 1|Q = q)
121
Condition on a query q...
This does not depend on d!
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk ⇥ (1� uk)
uk ⇥ (1� pk)⇥
Y
k|qk=1
1� pk
1� uk⇥O(R = 1|Q = q)
122
Condition on a query q...
This does not depend on d!
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk ⇥ (1� uk)
uk ⇥ (1� pk)⇥
Y
k|qk=1
1� pk
1� uk⇥O(R = 1|Q = q)
123
Condition on a query q...
This does not depend on d!
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk ⇥ (1� uk)
uk ⇥ (1� pk)⇥
Y
k|qk=1
1� pk
1� uk⇥O(R = 1|Q = q)
This is what we care about!
124
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk ⇥ (1� uk)
uk ⇥ (1� pk)⇥
Y
k|qk=1
1� pk
1� uk⇥O(R = 1|Q = q)
125
Condition on a query q...
O(R = 1|D = d ^Q = q) =Y
k|dk=1^qk=1
pk ⇥ (1� uk)
uk ⇥ (1� pk)⇥
Y
k|qk=1
1� pk
1� uk⇥O(R = 1|Q = q)log
126
Odds ratio
RSVd = logY
k|dk=1^qk=1
pk
1�pk
uk1�uk
127
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� uk
128
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� ukOdds of containing term k
in relevant documents
129
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� ukOdds of containing term k
in relevant documentsOdds of containing term kin non-relevant documents
130
Retrieval Status Value
kth term present kth term absent
Relevant pk 1-pk
Not relevant uk 1-uk
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� uk
131
RSV as a sum of weights
RSVd =X
k|dk=1^qk=1
ck
132
RSV as a scalar product
RSVd = �!c .�!s(if we overwrite c with zeros outside of the query support)
133
Retrieval Status Value
RSVd = �!c .�!s(if we overwrite c with zeros outside of the query support)
Weights!
134
Reminder: Zone queries
3information 1.body 4.title, 4.body 5.body, 5.abstract
Score of a document:
~g.~s
135
Estimating the weights (in theory)
136
Contingency table
kth term present kth term absent
Relevant pk 1-pk
Not relevant uk 1-uk
137
Contingency table
kth term present kth term absent Total
Relevant
Not relevant
Total N
138
Contingency table
kth term present kth term absent Total
Relevant
Not relevant
Total dtf N
139
Contingency table
kth term present kth term absent Total
Relevant S
Not relevant
Total dtf N
140
Contingency table
kth term present kth term absent Total
Relevant S
Not relevant N-S
Total dtf N - dft N
141
Contingency table
kth term present kth term absent Total
Relevant s S-s S
Not relevant dtf - s N-S
Total dtf N - dft N
142
Contingency table
kth term present kth term absent Total
Relevant s S-s S
Not relevant dtf - s N - dft – S + s N-S
Total dtf N - dft N
143
Contingency table
kth term present kth term absent Total
Relevant s S-s S
Not relevant dtf - s N - dft – S + s N-S
Total dtf N - dft N
144
Contingency table
kth term present kth term absent Total
Relevant s S-s S
Not relevant dtf - s N - dft – S + s N-S
Total dtf N - dft N
Odds for pt
Odds for ut
145
Contingency table
kth term present kth term absent Total
Relevant s S-s S
Not relevant dtf - s N - dft – S + s N-S
Total dtf N - dft N
c = logpk
1� pk+ log
1� uk
uk
Odds for pt
Odds for ut
146
Contingency table
kth term present kth term absent Total
Relevant s S-s S
Not relevant dtf - s N - dft – S + s N-S
Total dtf N - dft N
ck = logs
S�sdft�s
N�dft�S+s
147
With smoothing
kth term present kth term absent Total
Relevant s S-s S
Not relevant dtf - s N - dft – S + s N-S
Total dtf N - dft N
+1/2
+1/2
+1/2
+1/2
ck = log
s+ 12
S�s+ 12
dft�s+ 12
N�dft�S+s+ 12
148
Estimating the weights (in practice)
149
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� uk
150
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� ukOdds of containing term k
in relevant documents
151
Retrieval Status Value
Croft and Harper suggestthat the odds are 1.
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� uk
152
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� uk
Odds of containing term kin non-relevant documents
153
Retrieval Status Value
Statistics-based estimate
RSVd =X
k|dk=1^qk=1
logpk
1� pk� log
uk
1� uk
Odds of containing term kin non-relevant documentsdft
N � dft
154
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
log 1� logdft
N � dft
155
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
log 1 + logN � dft
dft
156
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logN
dft
(approximation)
157
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logN
dft
This is the inverted document frequency!
158
Retrieval Status Value
RSVd =X
k|dk=1^qk=1
logN
dft
This justifies idf weighting in the Vector-Space Model!
159
Relevance feedback
160
Relevance feedback
Input query
161
Relevance feedback
Input query
Execute query
162
Relevance feedback
Input query
Display results
Execute query
163
Relevance feedback
Input query
Display results
Mark relevant documents
Execute query
164
Relevance feedback
Input query
Display results
Mark relevant documents
Execute query
Update (posteriors)
165
Relevance feedback
Input query
Display results
Mark relevant documents
Execute query
Update (posteriors)
166
Information need vs. query
ETH Zurich
167
Information need vs. query
ETH Zurich
Alice is searching for a higher-education institution
168
Information need vs. query
ETH Zurich
Alice is searching for a higher-education institution
Bob is searching for Ethereum cryptocurrency in Zurich, Kansas
169
Information need vs. query
ETH Zurich
Alice is searching for a higher-education institution
Bob is searching for Ethereum cryptocurrency in Zurich, Kansas
Carlos is searching for the extended trading hours on insurance stocks.
170
Language Models
171
Finite State Automaton (FSA)
b a astart
ab b b
a,b
a
172
Finite State Automaton (FSA) – Transition table
State a b1 5 22 3 53 4 54 4 55 5 5
b a astart
ab b b
a,b
a
173
Finite State Automaton (FSA)
b a astart
ab b b
a,b baaaa
a
174
Finite State Automaton (FSA)
b a astart
ab b b
a,b |baaaa
a
175
Finite State Automaton (FSA)
b a astart
ab b b
a,b b|aaaa
a
176
Finite State Automaton (FSA)
b a astart
ab b b
a,b ba|aaa
a
177
Finite State Automaton (FSA)
b a astart
ab b b
a,b baa|aa
a
178
Finite State Automaton (FSA)
b a astart
ab b b
a,b baaa|a
a
179
Finite State Automaton (FSA)
b a astart
ab b b
a,b baaaa|
accept
a
180
Finite State Automaton (FSA)
b a astart
ab b b
a,b baba
a
181
Finite State Automaton (FSA)
b a astart
ab b b
a,b |baba
a
182
Finite State Automaton (FSA)
b a astart
ab b b
a,b b|aba
a
183
Finite State Automaton (FSA)
b a astart
ab b b
a,b ba|ba
a
184
Finite State Automaton (FSA)
b a astart
ab b b
a,b bab|a
a
185
Finite State Automaton (FSA)
b a astart
ab b b
a,b baba|reject
a
186
Finite State Automaton - formally
q0 q1q2 q3b a a
start
q4a
b b b
a,b
F
S
Q
d(q1,a)=q2
a
187
Language model
baabaaabaaaa...
Language space
L
188
Language model
baabaaabaaaa...
Language space
⌦
189
Language model
baabaaabaaaa...
Language space
⌦
p :⌦ ! [0, 1]! 7! p(!)
X
!2⌦
p(!) = 1
190
Language model
baabaaabaaaa...
Language space
X
s2Lp(s) = 1
p :L ! [0, 1]s 7! p(s)
L
191
Finite State Automaton (FSA)
b a a
astart
ab b b
a,b
How do we turn thisinto a generator?
192
Finite State Automaton (FSA)
b a a
a
ab b b
a,b
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
4
1
4
stop
start
193
Finite State Automaton (FSA)
b a a
a
ab b b
a,b
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
4
1
4
stop
start
194
Finite State Automaton (FSA)
b a a
a
ab b b
a,b
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
4
1
4
stop
start
195
Finite State Automaton (FSA)
b a a
a
ab b b
a,b
1
2
1
2
1
2
1
2
1
2
1
2
1
2
b
1
4
1
4
stop
start
196
Finite State Automaton (FSA)
b a a
a
ab b b
a,b
1
2
1
2
1
2
1
2
1
2
1
2
1
2
ba
1
4
1
4
stop
start
197
Finite State Automaton (FSA)
b a a
a
ab b b
a,b
1
2
1
2
1
2
1
2
1
2
1
2
1
2
baa
1
4
1
4
stop
start
198
Finite State Automaton (FSA)
b a a
a
ab b b
a,b
1
2
1
2
1
2
1
2
1
2
1
2
1
2
baaa
1
4
1
4
stop
start
199
Finite State Automaton (FSA)
b a a
a
ab b b
a,b
1
2
1
2
1
2
1
2
1
2
1
2
1
2
baaa
1
4
1
4
stop
start
1
64
200
Generating a document at random
Random documentL 2 ⌦ ! T ⇤
201
Generating a document at random
Random document
List of words model!
L 2 ⌦ ! T ⇤
202
Generating a document at random
Random document
Random length
L 2 ⌦ ! T ⇤
kLk 2 ⌦ ! N
203
Chain rule (generating a document at random)
P (L = (t1, t2, t3)) =?
204
Chain rule (generating a document at random)
P (L = (t1, t2, t3)) = P (L1 = t1).P (L2 = t2|L1 = t1)
.P (L3 = t3|L2 = t2 ^ L1 = t1).P (kLk = 3|L3 = t3 ^ L2 = t2 ^ L1 = t1)
205
Unigram language model(Term independence)
P (L = (t1, t2, t3)) = P (L1 = t1).P (L2 = t2).P (L3 = t3).pstop
206
Corresponding automaton (unigram model)
ETH 0.1Zürich 0.3
information 0.5retrieval 0.1
207
Corresponding automaton (unigram model)
ETH 0.1Zürich 0.3
information 0.5retrieval 0.1
We can build such alanguage model
from any document!
Use term frequencies!
208
How do we get rid of the order?
Random document(list of word model)
L 2 ⌦ ! T ⇤
209
How do we get rid of the order?
Random document(list of word model)
L 2 ⌦ ! T ⇤
Random document(bag of words model)
210
How do we get rid of the order?
Random document(list of word model)
L 2 ⌦ ! T ⇤
Random document(bag of words model)
D 2 ⌦ ! NW
211
How do we get rid of the order?
Random document(list of word model)
L 2 ⌦ ! T ⇤
Random document(bag of words model)
D 2 ⌦ ! NW
P (D = d) =X
l matching bag d
P (L = l)
212
How do we get rid of the order?
Random document(list of word model)
L 2 ⌦ ! T ⇤
Random document(bag of words model)
D 2 ⌦ ! NW
P (D = d) =X
l matching bag d
P (L = l)
(This is actually a multinomial distribution: see combinatorics)
213
We generate a model for every document
214
Now, let's get back to information retrieval
Enters a query q
215
Now, let's get back to information retrieval
Enters a query q
Thought experiment: imagine that:• we picked a random document and built its model
216
Now, let's get back to information retrieval
Enters a query q
Thought experiment: imagine that:• we picked a random document and built its model• we used this model to generate a new document
217
Now, let's get back to information retrieval
Enters a query q
Thought experiment: imagine that:• we picked a random document and built its model• we used this model to generate a new document• that document turns out to be q
218
Now, let's get back to information retrieval
Enters a query q
Thought experiment: imagine that:• we picked a random document and built its model• we used this model to generate a new document• that document turns out to be q
What document is the most likely to have been picked and to have generated q?
219
Bayesian model
P (D = d|Q = q) =P (Q = q|D = d).P (D = d)
P (Q = q)
220
Bayesian model
P (D = d|Q = q) =P (Q = q|D = d).P (D = d)
P (Q = q)
We need to sort by this
221
Bayesian model
P (D = d|Q = q) =P (Q = q|D = d).P (D = d)
P (Q = q)We can ignore this (constant)
222
Bayesian model
P (D = d|Q = q) =P (Q = q|D = d).P (D = d)
P (Q = q)
We can also ignore this (uniform)
223
Bayesian model
P (D = d|Q = q) =P (Q = q|D = d).P (D = d)
P (Q = q)
This is just the probability of qunder the model built from d!