Information Models for Ad Hoc Information Retrieval, SIGIR 2010
Transcript of Information Models for Ad Hoc Information Retrieval, SIGIR 2010
Information-Based Models for Ad Hoc IR
Stephane Clinchant 1,2 Eric Gaussier 2
1 Xerox Research Centre Europe
2 Laboratoire d’Informatique de GrenobleUniv. Grenoble 1
SIGIR’10, 20 July 2010
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 1 / 33
Overview
Information ModelsNormalization
Probability DistributionRSV
Heuristic Constraints
Condition 1Condition 2Condition 3Condition 4
BurstinessPhenomenon
Property of Prob.Distributions
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 2 / 33
Informative Content
Use Shannon’s information to weigh words in documents
P(X)−log P(X)
Inf(x) = − log P(x |ΘC ) = Informative ContentDeviation from an average behavior
- Observation by Harter (70): non-specialty words deviates from a Poisson- Informative Content, core to Divergence From Randomness Models
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 3 / 33
Informative Content
Use Shannon’s information to weigh words in documents
P(X)−log P(X)
Inf(x) = − log P(x |ΘC ) = Informative ContentDeviation from an average behavior- Observation by Harter (70): non-specialty words deviates from a Poisson- Informative Content, core to Divergence From Randomness Models
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 3 / 33
Information-based Model
Main idea:
1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length
2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P
3 Queries and documents are compared with a surprise measure, amean information:
RSV (q, d) =∑w∈q
−xqw log P(Tfw > t(xd
w )|λw )
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33
Information-based Model
Main idea:
1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length
2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P
3 Queries and documents are compared with a surprise measure, amean information:
RSV (q, d) =∑w∈q
−xqw log P(Tfw > t(xd
w )|λw )
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33
Information-based Model
Main idea:
1 Discrete terms frequencies x are renormalized into continuousvalues t(x), due to different document length
2 For each term w , values t(x) are assumed to follow a distribution Pwith parameter λw on the corpus, ie Tfw |λw ∼ P
3 Queries and documents are compared with a surprise measure, amean information:
RSV (q, d) =∑w∈q
−xqw log P(Tfw > t(xd
w )|λw )
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 4 / 33
Outline
1 Model PropertiesI Retrieval HeuristicsI Burstiness Phenomenon
2 Two Power-Law InstancesI log-logistic modelI smoothed power-law model
3 Experiments
4 Extension to PRF
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 5 / 33
Notations
xdw frequency of word w in document d , xq
w in querytdw normalized term frequency
Tfw random variable for frequency of word w
ld length of document didfw corpus parameter for word wθ model parameter.
Most (Ad-Hoc) IR models can be written as:
RSV (q, d) =∑w∈q
f (xqw )h(xd
w , ld , idfw , θ)
⇒ What do we know about h?
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33
Notations
xdw frequency of word w in document d , xq
w in querytdw normalized term frequency
Tfw random variable for frequency of word wld length of document didfw corpus parameter for word wθ model parameter.
Most (Ad-Hoc) IR models can be written as:
RSV (q, d) =∑w∈q
f (xqw )h(xd
w , ld , idfw , θ)
⇒ What do we know about h?
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33
Notations
xdw frequency of word w in document d , xq
w in querytdw normalized term frequency
Tfw random variable for frequency of word wld length of document didfw corpus parameter for word wθ model parameter.
Most (Ad-Hoc) IR models can be written as:
RSV (q, d) =∑w∈q
f (xqw )h(xd
w , ld , idfw , θ)
⇒ What do we know about h?
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 6 / 33
Overview
Information ModelsNormalization
Probability DistributionRSV
Heuristic Constraints
Condition 1Condition 2Condition 3Condition 4
BurstinessPhenomenon
Property of Prob.Distributions
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 7 / 33
Condition 1Docs with more occurrences of query terms get higher scores than docswith less occurrences
∀(l , idf , θ),∂h(x , l , idf , θ)
∂x> 0 (h increases with x)
0 5 10 15
01
23
45
6
x
h(x)
"Good" h: increasing"Bad" h: decreasing
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 8 / 33
Condition 2The increase in the retrieval score should be smaller for larger termfrequencies. Ex: 2→4, 50→ 52
∀(l , idf , θ),∂2h(x , l , idf , θ)
∂x2< 0 (h concave)
0 5 10 15
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
x
h(x)
"Good" h: Concave"Bad" h: Convex
Difference of scores decreases
Difference of scores increases
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 9 / 33
Condition 3
Longer documents, when compared to shorter ones with exactly thesame number of occurrences of query terms, should be penalized(likely to cover additional topics)
∀(x , idf , θ),∂h(x , l , idf , θ)
∂l< 0 (h decreasing with l)
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 10 / 33
Condition 4: IDF EffectIt is important to downweight terms occurring in many documents
∀(x , l , θ),∂h(x , l , idf , θ)
∂idf> 0 (IDF Effect)
0 5 10 15
1.6
1.8
2.0
2.2
2.4
2.6
2.8
3.0
x
h(x)
h(x,IDF=10)h(x,IDF=5)
IDF Effect: h(x,IDF=10)>h(x,IDF=5)
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 11 / 33
Heuristic Constraints
Condition 1: h increases with x
Condition 2: h is concave
Condition 3: h decreases with l
Condition 4: h increases with idf (IDF Effect)
Additionnal conditions in the paper
⇒ Analytical Reformulation of TFC1, TFC2, LNC1 and TDC:
Fang et al, A Formal Study of Information Retrieval Heuristics, SIGIR’04
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 12 / 33
Heuristic Constraints
Condition 1: h increases with x
Condition 2: h is concave
Condition 3: h decreases with l
Condition 4: h increases with idf (IDF Effect)
Additionnal conditions in the paper
⇒ Analytical Reformulation of TFC1, TFC2, LNC1 and TDC:
Fang et al, A Formal Study of Information Retrieval Heuristics, SIGIR’04
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 12 / 33
Overview
Information ModelsNormalization
Probability DistributionRSV
Heuristic Constraints
Condition 1Condition 2Condition 3Condition 4
BurstinessPhenomenon
Property of Prob.Distributions
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 13 / 33
Burstiness Phenomenon
We proceed to Word Frequency distributions:
Church and Gale 1 showed that a 2-Poisson model yields a poor fit toword frequencies
A possible explanation: the behavior of words which tend to appear inbursts, ie burstiness
Once a word appears in a document, it is much more likely to appearagain
Recent works on Dirichlet Coumpound Multinomial
⇒ Which distributions can account for burstiness?
1Poisson MixturesS.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 14 / 33
Burstiness Phenomenon
We proceed to Word Frequency distributions:
Church and Gale 1 showed that a 2-Poisson model yields a poor fit toword frequencies
A possible explanation: the behavior of words which tend to appear inbursts, ie burstiness
Once a word appears in a document, it is much more likely to appearagain
Recent works on Dirichlet Coumpound Multinomial
⇒ Which distributions can account for burstiness?
1Poisson MixturesS.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 14 / 33
Burstiness Property of Probabilility Distribution
Definition
A distribution P is bursty iff the function gε defined by:
gε(x) = P(X ≥ x + ε|X ≥ x)
is a strictly increasing function of x ( ∀ε > 0)
Interpretation: it becomes easier to generate more occurrences
gε(x) strictly increasing ⇐⇒ ∆ = log gε(x) strictly increasing⇐⇒ ∆ = log P(X ≥ x + ε)− log P(X ≥ x) is increasing
As ∆ < 0, absolute values of successive difference ∆ decreases
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 15 / 33
Burstiness Property of Probabilility Distribution
Definition
A distribution P is bursty iff the function gε defined by:
gε(x) = P(X ≥ x + ε|X ≥ x)
is a strictly increasing function of x ( ∀ε > 0)
Interpretation: it becomes easier to generate more occurrences
gε(x) strictly increasing ⇐⇒ ∆ = log gε(x) strictly increasing⇐⇒ ∆ = log P(X ≥ x + ε)− log P(X ≥ x) is increasing
As ∆ < 0, absolute values of successive difference ∆ decreases
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 15 / 33
Geometric Interpretation of Burstiness
0 5 10 15
−5−4
−3−2
−10
x
log
P(X
>x)
Delta = log P(X>x+e) − log P(X>x) increases
As Delta<0, absolute value decreases
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 16 / 33
Gaussian(mean=5,std=1) is not bursty
0 5 10 15
−50
−40
−30
−20
−10
0
x
log
P(X
>x)
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 17 / 33
Overview
BurstinessPhenomenon
Property of Prob.Distributions
Information ModelsNormalization
Probability DistributionRSV
Heuristic Constraints
Condition 1Condition 2Condition 3Condition 4
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 18 / 33
Information Models & Heuristics Constraints:
Models defined by:
RSV (q, d) =∑w∈q
xqw
Function h︷ ︸︸ ︷(− log P(Tfw > td
w |λw )) (1)
Condition 1: h increasing with x X
Condition 3: h penalizes long documents X
Condition 2: h concave
Theorem
If the distribution P is bursty, then the information model defined with Pis concave
IDF effect and 2 additional Conditions depend on the choice of P
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 19 / 33
Information Models & Heuristics Constraints:
Models defined by:
RSV (q, d) =∑w∈q
xqw
Function h︷ ︸︸ ︷(− log P(Tfw > td
w |λw )) (1)
Condition 1: h increasing with x X
Condition 3: h penalizes long documents X
Condition 2: h concave
Theorem
If the distribution P is bursty, then the information model defined with Pis concave
IDF effect and 2 additional Conditions depend on the choice of P
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 19 / 33
Information Models & Heuristics Constraints:
Models defined by:
RSV (q, d) =∑w∈q
xqw
Function h︷ ︸︸ ︷(− log P(Tfw > td
w |λw )) (1)
Condition 1: h increasing with x X
Condition 3: h penalizes long documents X
Condition 2: h concave
Theorem
If the distribution P is bursty, then the information model defined with Pis concave
IDF effect and 2 additional Conditions depend on the choice of P
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 19 / 33
Information Models & Heuristics Constraints:
Models defined by:
RSV (q, d) =∑w∈q
xqw
Function h︷ ︸︸ ︷(− log P(Tfw > td
w |λw )) (1)
Condition 1: h increasing with x X
Condition 3: h penalizes long documents X
Condition 2: h concave
Theorem
If the distribution P is bursty, then the information model defined with Pis concave
IDF effect and 2 additional Conditions depend on the choice of P
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 19 / 33
Characterization of Information Models
1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td
w = xdw log(1 + c avg l
ld)
2 Probability Distribution Continuous and Bursty. Support = [0,+∞)
3 Retrieval Function
RSV (q, d) =∑w∈q
−xqw log P(Tfw > td
w |λw )
=∑
w∈q∩d
−xqw log P(Tfw > td
w |λw )
λw =Fw
Nor
Nw
N
where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33
Characterization of Information Models
1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td
w = xdw log(1 + c avg l
ld)
2 Probability Distribution Continuous and Bursty. Support = [0,+∞)
3 Retrieval Function
RSV (q, d) =∑w∈q
−xqw log P(Tfw > td
w |λw )
=∑
w∈q∩d
−xqw log P(Tfw > td
w |λw )
λw =Fw
Nor
Nw
N
where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33
Characterization of Information Models
1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td
w = xdw log(1 + c avg l
ld)
2 Probability Distribution Continuous and Bursty. Support = [0,+∞)
3 Retrieval Function
RSV (q, d) =∑w∈q
−xqw log P(Tfw > td
w |λw )
=∑
w∈q∩d
−xqw log P(Tfw > td
w |λw )
λw =Fw
Nor
Nw
N
where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33
Characterization of Information Models
1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td
w = xdw log(1 + c avg l
ld)
2 Probability Distribution Continuous and Bursty. Support = [0,+∞)
3 Retrieval Function
RSV (q, d) =∑w∈q
−xqw log P(Tfw > td
w |λw )
=∑
w∈q∩d
−xqw log P(Tfw > td
w |λw )
λw =Fw
Nor
Nw
N
where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33
Characterization of Information Models
1 Normalisation of FrequenciesIncreasing in x , decreasing in lex: DFR normalization td
w = xdw log(1 + c avg l
ld)
2 Probability Distribution Continuous and Bursty. Support = [0,+∞)
3 Retrieval Function
RSV (q, d) =∑w∈q
−xqw log P(Tfw > td
w |λw )
=∑
w∈q∩d
−xqw log P(Tfw > td
w |λw )
λw =Fw
Nor
Nw
N
where:-Fw Frequency of w in the corpus-Nw Document Frequency of w-N Number of documents in the collection
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 20 / 33
Two Power-law Instances
The log-logistic and smoothed power law models
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 21 / 33
Log-Logistic Model
Log-Logistic distribution
P(Tfw > tdw |λw ) =
λw
(tdw + λw )
The LGD model is defined by
1 DFR Normalization with parameter c
2 Tfw ∼ LogLogistic(λw = NwN )
3 Ranking Model (as before):
RSV (q, d) =∑
w∈q∩d
xqw
[− log P(Tfw > td
w )]
Meets all conditions for all parameter values
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 22 / 33
Log-Logistic Model
Log-Logistic distribution
P(Tfw > tdw |λw ) =
λw
(tdw + λw )
The LGD model is defined by
1 DFR Normalization with parameter c
2 Tfw ∼ LogLogistic(λw = NwN )
3 Ranking Model (as before):
RSV (q, d) =∑
w∈q∩d
xqw
[− log P(Tfw > td
w )]
Meets all conditions for all parameter values
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 22 / 33
Log-Logistic Model
Log-Logistic distribution
P(Tfw > tdw |λw ) =
λw
(tdw + λw )
The LGD model is defined by
1 DFR Normalization with parameter c
2 Tfw ∼ LogLogistic(λw = NwN )
3 Ranking Model (as before):
RSV (q, d) =∑
w∈q∩d
xqw
[− log P(Tfw > td
w )]
Meets all conditions for all parameter values
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 22 / 33
Log-Logistic Model
Log-Logistic distribution
P(Tfw > tdw |λw ) =
λw
(tdw + λw )
The LGD model is defined by
1 DFR Normalization with parameter c
2 Tfw ∼ LogLogistic(λw = NwN )
3 Ranking Model (as before):
RSV (q, d) =∑
w∈q∩d
xqw
[− log P(Tfw > td
w )]
Meets all conditions for all parameter values
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 22 / 33
Smoothed Power Law SPL
Distribution on [0,+∞) with parameter 0 < λ < 1:
P(Tfw > tdw |λw ) =
λ
tdwtdw +1w − λw
1− λw
IR Model:
1 DFR Normalization with parameter c
2 Tfw ∼ SPL(λw = NwN )
3 Ranking Model (as before):
RSV (q, d) =∑
w∈q∩d
xqw
[− log P(Tfw > td
w )]
Meets all conditions
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 23 / 33
Smoothed Power Law SPL
Distribution on [0,+∞) with parameter 0 < λ < 1:
P(Tfw > tdw |λw ) =
λ
tdwtdw +1w − λw
1− λw
IR Model:
1 DFR Normalization with parameter c
2 Tfw ∼ SPL(λw = NwN )
3 Ranking Model (as before):
RSV (q, d) =∑
w∈q∩d
xqw
[− log P(Tfw > td
w )]
Meets all conditions
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 23 / 33
Experiments
Comparison with language models, BM25, DFR models
Corpus: ROBUST, TREC-3, CLEF03, GIRT with short (-t) and longqueries (-d)
6 query sets: ROB-d, ROB-t, T3-t, GIRT, CLEF-d, CLEF-t
Methodology:
1 Divide each collection into 10 splits training/test
2 Learn best parameter (µ, c , k1) to optimize MAP or P10 on thetraining set
3 Measure MAP or P10 on the 10 splits and test difference with a t-test.
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 24 / 33
Comparison with Dirichlet Smoothing
Table: LGD and SPL versus LM-Dirichlet after 10 splits; bold indicates significantdifference
MAP ROB-d ROB-t GIR T3-t CL-t CL-d
DIR 27.1 25.1 41.1 25.6 36.2 48.5LGD 27.4 25.0 42.1 24.8 36.8 49.7P10 ROB-d ROB-t GIR T3-t CL-t CLF-d
DIR 45.6 43.3 68.6 54.0 28.4 33.8LGD 46.2 43.5 69.0 54.3 28.6 34.5
MAP ROB-d ROB-t GIR T3-t CL-t CL-d
DIR 26.7 25.0 40.9 27.1 36.2 50.2SPL 25.6 24.9 42.1 26.8 36.4 46.9
P10 ROB-d ROB-t GIR T3-t CL-t CL-d
DIR 45.2 43.8 68.2 52.8 27.3 32.8SPL 46.6 44.7 70.8 55.3 27.1 32.9
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 25 / 33
Comparison with Dirichlet Smoothing
Table: LGD and SPL versus LM-Dirichlet after 10 splits; bold indicates significantdifference
MAP ROB-d ROB-t GIR T3-t CL-t CL-d
DIR 27.1 25.1 41.1 25.6 36.2 48.5LGD 27.4 25.0 42.1 24.8 36.8 49.7P10 ROB-d ROB-t GIR T3-t CL-t CLF-d
DIR 45.6 43.3 68.6 54.0 28.4 33.8LGD 46.2 43.5 69.0 54.3 28.6 34.5
MAP ROB-d ROB-t GIR T3-t CL-t CL-d
DIR 26.7 25.0 40.9 27.1 36.2 50.2SPL 25.6 24.9 42.1 26.8 36.4 46.9
P10 ROB-d ROB-t GIR T3-t CL-t CL-d
DIR 45.2 43.8 68.2 52.8 27.3 32.8SPL 46.6 44.7 70.8 55.3 27.1 32.9
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 25 / 33
Comparison with DFR models
Table: LGD and SPL versus PL2 after 10 splits; bold indicates significantdifference
MAP ROB-d ROB-t GIR T3-t CL-t CL-d
PL2 26.2 24.8 40.6 24.9 36.0 47.2LGD 27.3 24.7 40.5 24.0 36.2 47.5
P10 ROB-d ROB-t GIR T3-t CL-t CL-d
PL2 46.4 44.1 68.2 55.0 28.7 33.1LGD 46.6 43.2 66.7 53.9 28.5 33.7
MAP ROB-d ROB-t GIR T3-t CL-t CL-d
PL2 26.3 25.2 42.8 25.8 37.3 45.7SPL 26.3 25.2 42.7 25.3 37.4 44.1
P10 ROB-d ROB-t GIR T3-t CL-t CL-d
PL2 46.0 45.2 69.3 54.8 26.2 32.7SPL 47.0 45.2 69.8 55.4 25.9 32.9
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 26 / 33
Extension to Pseudo Relevance Feedback
Mean information of the top retrieved documents
InfoR(w) =1
|R|∑d∈R
− log P(Tfw > tdw ;λw )
Query Update:
xq2w =
xqw
maxw xqw
+ βInfoR(w)
maxw Info(w)
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 27 / 33
Comparison with others PRF Models
Mixture Model (Zhai)
R comes from a mixture of a relevant topic model θwand the corpus language model (multinomialdistribution)Query Update :
p(w |q2) = αp(w |q) + (1− α)θw
Bo2 Model (Amati)
Documents in R are merged together. A Geometricprobability model measures the informative content of awordQuery Update:
xq2w =
xqw
maxw xqw
+ βInfoBo2(w)
maxw InfoBo2(w)
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 28 / 33
Comparison with others PRF Models
Mixture Model (Zhai)
R comes from a mixture of a relevant topic model θwand the corpus language model (multinomialdistribution)Query Update :
p(w |q2) = αp(w |q) + (1− α)θw
Bo2 Model (Amati)
Documents in R are merged together. A Geometricprobability model measures the informative content of awordQuery Update:
xq2w =
xqw
maxw xqw
+ βInfoBo2(w)
maxw InfoBo2(w)
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 28 / 33
Pseudo Relevance Feedback Experiments
1 Divide each collection in 10 splits training/test
2 Learn best interpolation weight (β, α) to optimize MAP on thetraining set
3 Measure MAP on the 10 splits and test difference with a t-test
4 Change |R| and termCount TC to add to the queries
5 Repeat
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 29 / 33
Table: MAP, bold indicates best performance, ∗ significant difference over LMand Bo2 models
Model |R| TC ROB-t GIRT TREC3-t CLEF-t
LM+MIX 5 5 27.5 44.4 30.7 36.6INL+Bo2 5 5 26.5 42.0 30.6 37.6
LGD 5 5 28.3∗ 44.3 32.9∗ 37.6
LM+MIX 5 10 28.3 45.7∗ 33.6 37.4INL+Bo2 5 10 27.5 42.7 32.6 37.5
LGD 5 10 29.4∗ 44.9 35.0∗ 40.2∗
LM+MIX 10 10 28.4 45.5 31.8 37.6INL+Bo2 10 10 27.2 43.0 32.3 37.4
LGD 10 10 30.0∗ 46.8∗ 35.5∗ 38.9LM+MIX 10 20 29.0 46.2 33.7 38.2INL+Bo2 10 20 27.7 43.5 33.8 37.7
LGD 10 20 30.3∗ 47.6∗ 37.4∗ 38.6
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 30 / 33
Table: Mean average precision (MAP) of PRF experiments; bold indicates bestperformance, ∗ significant difference over LM and Bo2 models
Model |R| TC ROB-t GIR T3-t CL-t
LGD 5 5 28.3∗ 44.3 32.9∗ 37.6SPL 5 5 28.9∗ 45.6∗ 32.9∗ 39.0∗
LGD 5 10 29.4∗ 44.9 35.0∗ 40.2∗
SPL 5 10 29.6∗ 47.0∗ 34.6∗ 39.5∗
LGD 10 10 30.0∗ 46.8∗ 35.5∗ 38.9SPL 10 10 30.0∗ 48.9∗ 33.8∗ 39.1∗
LGD 10 20 30.3∗ 47.6∗ 37.4∗ 38.6SPL 10 20 29.9∗ 50.2∗ 34.3 39.7∗
LGD 20 20 29.5∗ 48.9∗ 37.2∗ 41.0∗
SPL 20 20 28.8 50.3∗ 33.9 39.0∗
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 31 / 33
Conclusion
Can we design IR models compatible with empirical evidence?
⇒ Proposal: Information Models modelling burstiness (better fit to data)
Analytical Characterization of Retrieval Constraints
Definition of Burstiness for Probabilility distributions
Information-Based Models compliant with Retrieval ConstraintsI Bursty Distribution ⇒ Concave Model
Extension to PRF
The Log-logistic and Smoothed Power Law ModelsI Similar/Better Performance to LM and DFR without PRF, better with
PRF
Questions ?
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 32 / 33
Conclusion
Can we design IR models compatible with empirical evidence?
⇒ Proposal: Information Models modelling burstiness (better fit to data)
Analytical Characterization of Retrieval Constraints
Definition of Burstiness for Probabilility distributions
Information-Based Models compliant with Retrieval ConstraintsI Bursty Distribution ⇒ Concave Model
Extension to PRF
The Log-logistic and Smoothed Power Law ModelsI Similar/Better Performance to LM and DFR without PRF, better with
PRF
Questions ?
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 32 / 33
Relation with DFR
DFR Models are defined by:
RSV (q, d) =∑
w∈q∩d
−xqw Inf2(td
w ) log P(tdw )
We can show that:
Inf2 makes DFR models concave (condition 2)
Without Inf2 , DFR models have poor performances
Discrete Laws with continues values
2 Notions of informations (non homogenous)
⇒ Information Models uses continuous laws and a single concept ofinformation
S.Clinchant E.Gaussier (XRCE-LIG) Information-Based Models for Ad Hoc IR SIGIR’10, 20 July 2010 33 / 33