TF-IDF Uncoveredthor/talks/2012-01-19-Yahoo-Labs-BCN-IR... · Penrose: Shadows of the Mind Roger...
Transcript of TF-IDF Uncoveredthor/talks/2012-01-19-Yahoo-Labs-BCN-IR... · Penrose: Shadows of the Mind Roger...
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
TF-IDF Uncovered
Thomas RoellekeJun Wang, Hengzhi Wu, Hany Azzam
Queen Mary University of London
Yahoo Labs BarcelonaJanuary 19th 2012
1 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
1 Intro & Background
2 TF
3 IDF
4 IDF and LM P(t)
5 P(q|d) and P(d |q): Conjunctive
6 P(q|d) and P(d |q): Disjunctive
7 Summary
8 Appendix
2 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Experiments & Theory
TREC-2 TREC-3 TREC-8 WT2G Blog06MAP P@10 MAP P@10 MAP P@10 MAP P@10 MAP P@10
LMDir,µ=2000 18.02 41.20 22.87 48.20 21.48 40.00 29.85 46.2 29.21 60.80LMJM,λ=0.7 14.70 32.4 20.80 40.20 21.81 39.4 23.11 33.80 21.04 45.60TFb=0.25,k1=1.2 ·IDF 18.90 42.2 25.0 50.0 22.39 40.60 31.76 48.2 30.46 63.8TFTF=1 ·IDF 09.19 17.00 11.53 22.00 11.20 09.40 14.00 15.20 05.51 11.80TFTF=tf_d ·IDF 02.78 06.20 03.98 05.2 04.34 07.80 07.96 13.00 22.37 48.20BM25b=0.25,k1=1.2 18.90 42.80 25.05 50.20 22.3 40.2 31.41 49.20 30.27 63.40
See also: http://barcelona.research.yahoo.net/dokuwiki/doku.php?id=baselines
TREC3 TREC8A TREC8B WT2GMAP MAP MAP MAP
BM25 20.64 24.39 32.33 32.33Tfidf 26.15LM-JM 24.96LM-Dir 30.87
Credits to Hany Azzam
What is our IR-driven mathematical framework (tool box) toinvestigate theoretically — to fully understand — why whichmodel is better when?
3 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
The world according to Binomial/Poisson Prob and Independent Events
Definition (Binomial Probability)
PBinomial,N,pt (nt ) :=(
Nnt
)·pnt
t · (1−pt )(N−nt ) (1)
P(4 sunny days in a week (n=7) ≈ 0.2734 for psunny = 45/90P(4 “sunny" in d (dl=500) ≈ 0.00157 for psunny = 1,000/1,000,000
Definition (Poisson Probability)
PPoisson,λt(nt ) :=
λntt
nt !·e−λt (2)
Definition (Independent Events)
P(e1, . . . ,en|h) = ∏ei
P(ei |h) (3)
4 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
The world according to Binomial/Poisson Prob and Independent Events
5 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
The world according to TF-IDF
TF-IDF
RSVTF-IDF(d ,q,c) := ∑t
TF(t,d) ·TF(t,q) · IDF(t,c) (4)
TF “normalisation"
TF(t,d) :=tfd
tfd +k1 ·(
b · dlavgdl +(1−b)
) Semantics? (5)
IDF “normalisation"
pidf(t,c) :=idf(t,c)maxidf
0 ≤ pidf ≤ 1 Semantics? (6)
6 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
TF Variants
TF(t,d) :=
tfd total tf count
tfddl Psum(t|d)
tfdmaxtfd
Pmax(t|d)
tfdtfd +K parameter K ∝ pivdl
tfdtfd +k1·(b· dl
avgdl +(1−b)) K set in BM25-like way
b +(1−b) · tfddl lifted tf; e.g. b=0.5
tfdK “pivoted" tf
7 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
TF Variants: Graphical Illustration
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20nL(t,d)
tfmax: nL(tmax,d)=20
tfmax: nL(tmax,d)=40
tfsum: NL(d)=100
tfsum: NL(d)=1000
x/20x/40
x/100x/1000
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20nL(t,d)
K=1
K=2
K=5
K=10
x/(x+1)x/(x+2)x/(x+5)
x/(x+10)
tfddl and tfd
maxtfdtfd
tfd +K
8 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
IDF Variants
IDF(t,c) =
− log df(t,c)ND
is − logPD(t|c)
− log df(t,c)+0.5ND+1 Laplace-like correction
− log df(t,c)ND−df(t,c) BIR/BM25
− log df(t,c)+1ND−df(t,c)+0.5 RSJ/BM25
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.2 0.4 0.6 0.8 1PD(t|c)
idf(t,c)
9 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
TF-IDF and LM and Probability/Information Theory
P(q|d)
P(q|d ,c) = ∏t
P(t|d ,c)TF(t,q)
logP(q|d ,c) = ∑t
TF(t,q) · log(λ ·P(t|d)+(1−λ ) ·P(t|c))
TF-IDF and LM
P(q|d): semantics of LM.P(d |q): ??? Semantics of TF-IDF???
10 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Motivation
Before we engage with math to assign semantics to TF and IDF, the question is:
Why should we care?
11 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Motivation continued
What people say (common beliefs):
I “We used STANDARD TF-IDF ..."
I “LM is P(q|d) - good. TF-IDF is HEURISTIC - bad."
I “LM and BM25 are the main baselines; TF-IDF is out ..."
I “It’s clear why TF-IDF works; not clear why LM works."
What we would like to know (research challenges):
1 Can we improve (the retrieval quality of) existing models, or have wereached a ceiling?
2 Are there other models out there? One model per decade?
VSM/TF-IDF mid 60s, probabilistic retrieval (BIR/RSJ weight) mid 70s, LSIand BM25 80s/90s, LM late 90s, FooBar 2010+ ???
12 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Penrose: Shadows of the Mind
Roger Penrose describes in the opening of his book “Shadows of the Mind" ascene where dad and daughter enter a cave.- “Dad, that boulder at the entrance, if it comes down, we are locked in."- “Well, it stood there the last 10,000 years, so it won’t fall down just now."- “Dad, will it fall down one day?"- “Yes."- “So it is more likely to fall down with every day it did not fall down?"
P(boulder falls) ? =? n(boulder fell)/N
P(boulder falls) ? =? 1−n(boulder stood)/N
P(x) ? =? n(x)/N
13 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Independent and Dependent Events
independent events
P(information∧ theory∧ theory) = P(information) ·P(theory)2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . how about . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
multiple occurrence of same term: dependent events
P(information∧ theory∧ theory) = P(information) ·P(theory)(2· 22+1 )
At roulette, you observe 1 × black followed by 17 × red.Where do you place your tokens?
14 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Math: Pythagorean triplets and Fermat’s last theorem: What can IR-ler learn from it?
Pythagorean (a,b,c) triplets
(3, 4, 5), (5, 12, 13), (7, 24, 25), ...
a2 +b2 = c2 9+16 = 25
Fermat’s last theorem
There are no three positive integers
an +bn = cn for n > 2
How long did it take to prove the theorem?
math4physics: Physics inspired math, math inspired physics.math4IR: ???Do we IR-ler have the “away-time" to engage with math4IR?
15 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
TF-IDF: Back to the Basics
Definition
TF-IDF retrieval status value RSVTF-IDF:
RSVTF-IDF(d ,q,c) := ∑t
wTF-IDF(t,d ,q,c) (7)
Inserting the TF-IDF term weight yields the decomposed form:
RSVTF-IDF(d ,q,c) = ∑t
TF(t,d) ·TF(t,q) · IDF(t,c) (8)
16 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
TF: What is the probabilistic semantics of BM25 TF?
What is the probabilistic semantics of
Definition
BM25 TF
TFBM25(t,d) :=tfd
tfd +Kd(9)
Kd := k1 ·(
b · dlavgdl
+(1−b))
(10)
pivdl := dl/avgdl.
17 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Semi-subsumed Events: Prob Semantics for BM25 TF
independent semi-subsumed subsumedCredits to Hengzhi Wu
Example
For the two events e1 and e2, the combined probabilities are:
0.32 = 0.09 independent
0.3(2· 22+1 ) ≈ 0.2008 semi-subsumed
0.31 subsumed
18 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Independence-Subsumption Triangle
independent semi-subsumed subsumed
1 12/2
2 21
23/2
22
3 31
34/2
33
4 41
42
45/2
43
44
5 51
52
56/2
54
55
... ... ... ...
n n1
n2
n3
n(n+1)/2
nn−2
nn−1
nn
Note: Gaussian sum 1+2+ ...+n = n · (n +1)/2.
The story: Gauss as a school kid faced “time-spending" task by his teacher: add the numbers 1 to 100. Gauss answered within a
minute: 5050. The famous formula: (1+100) + (2+99) + ... + (50+51) = 50 x 101.
19 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Independence-Subsumption Triangle
Independence-Subsumption Triangle: embeds the BM25 TF into probability theory.
P(theory∧ theory) = P(theory)(2·TFBM25) = P(theory)(2· 22+1 ) = P(theory)(1.33)
ind semi-sub subprob 2 1.33 10.001 0.000001 0.0001 0.001
20 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
IDF: What is the probabilistic semantics of IDF?
What is the probabilistic semantics of
Definition
Probability of being informative (probabilistic idf):
maxidf(c) :=− log1
ND(c)= logND(c) (11)
P(t informs|c) := pidf(t,c) :=idf(t,c)
maxidf(c)(12)
21 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Understanding IDF: On Theoretical Arguments
Definition
BIR term weight wBIR:
wBIR(t, r , r̄) := logP(t|r)P(t|r̄)
· P(t̄|r̄)P(t̄|r)
(13)
A simplified form considers term presence only:
wBIR,F1(t, r , r̄) := logP(t|r)P(t|r̄)
(14)
logwBIR(t, r , r̄) = logP(t|r)
1−P(t|r)− log
P(t|r̄)1−P(t|r̄)
≈− lognt
N−nt≈ IDF(t,c)
Here, the log is a mathematical transformation; no information-theoretic orprobabilistic meaning associated to IDF.
See also: [Croft and Harper, 1979], “prob models without relevanceinformation"
22 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
PIDF: Proof via Euler’s number/convergence
Proof: Probability of Being Informative
Euler’s number/convergence:
limN→∞
(1− λ
N
)N
= e−λ (15)
λ := idf(t,c), N := maxidf(c).
Theorem
Occurrence-Informativeness-Theorem: The probability that a term t occurs is equal tothe probability that the term is not informative in maxidf trials.
P(t occurs|c) = (1−P(t informs|c))maxidf(c) (16)
Moreover, for the probability to be not informative:
1−P(t informs|c) =lognD(t,c)logND(c)
(17)
Does this help to estimate P(boulder falls)?23 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Event Spaces & Poisson Bridge: Definition & Example
Definition
Poisson Bridge: Let x be a set of documents (e.g. the collection, set of relevantdocuments, set of retrieved documents).
avgtf(t,x) ·PD(t|x) = λ (t,x) = avgdl(x) ·PL(t|x) (18)
Example
Poisson bridge: For a collection, let a term t (“sailing") occur in nL(t, toy)=2,000of NL(toy)=109 Locations, and nD(t, toy)=1,000 of ND(toy)=106 Documents.The Poisson bridge is:
2,0001,000
· 1,000106 =
2,000106 =
109
106 ·2,000109
Note: Which averages are “useful"?
Credits to Theodora Tsikrika and Gabriella Kazai, “Notation in General Matrix Framework"
24 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
TF-IDF and LM: P(q|d) and P(d |q): Conjunctive
LM semantics: conventional
P(q|d ,c) = ∏t
P(t|d ,c) Semantics for LM (19)
Conventional mixture:P(t|d ,c) = λ ·P(t|d)+(1−λ ) ·P(t)
TF-IDF semantics: non-conventional
P(d |q,c) = ∏t
P(t|q,c) Semantics for TF-IDF (20)
“Extreme" mixture:t ∈ q : P(t|q,c) = P(t|q), otherwise, P(t|q,c) = P(t|c).
25 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
TF-IDF and LM: P(q|d) and P(d |q): Disjunctive
Total Probability
P(q|d) = ∑t
P(q|t) ·P(t|d) (21)
P(d ,q) = ∑t
P(d |t) ·P(q|t) ·P(t) (22)
P(d ,q)P(d) ·P(q)
= ∑t
P(t|d) ·P(t|q) · 1P(t)
(23)
Relationship between total prob and TF-IDF??? And LM???Option PIN’s (probabilistic inference networks, [Turtle and Croft, 1990]):
∑t
P(q|t)∑t ′ P(q|t ′)
·P(t|d) ∝ ∑t
IDF(t)∑t ′ IDF(t ′)
·TF(t,d) (24)
26 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Integral-based Interpretation of TF-IDF
Indefinite and Definite Integral ∫1x
dx = logx (25)∫ 1
P(t)
1x
dx = log1− logP(t) =− logP(t) (26)
0
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 1PD(t|c)
DQI(t,x) = 100/2 * 2/100 * 1/5 * 1/x (avg term in avg document)
TF-IDF(t,0.1) = = int(0.1, 1) DQI(t,x) dx = 1/5 * -log 0.1
0
0.5
1
1.5
2
0 0.2 0.4 0.6 0.8 1PD(t|c)
goodavgpoor
TF-IDF(t,x)derivative of TF-IDF = DQIDQI(t,x) = 1
tgoodtavg
tpoor
Credits to Jun Wang27 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Summary
I TF
I BM25 TF corresponds to semi-subsumed eventsI this relationship opens up pathways to new — IR-driven — probability
theory, applicable in contexts beyond IR
I IDF
I Poisson bridge: relates PD(t|c) (IDF) and PL(t|c) (LM): pathways torelate IDF/BIR to LM
I normalisation pidf = idf(t)/maxidf: is sound
I P(q|d)/P(q) and P(d |q)/P(d): conjunctive
I symmetric relationship between LM and TF-IDFI positions IR models; clarifies the P(q|d) vs P(r |d ,q) issue
I P(q|d)/P(q) and P(d |q)/P(d): disjunctive
I∫ 1
x dx : relationship between total prob and TF-IDF
I TF-IDF uncovered — TF-IDF is not heuristic anymore.
28 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Outlook
I A unifying framework to derive all models from?
I A formal framework to prove ranking equivalences/differences?
I A “new" model?
I “New" math (probability theory) inspired by IR results but applicable inother domains?
29 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Background: References
I IDF: deviation from Poisson, [Church and Gale, 1995]
I Information-theoretic explanation of TF-IDF, [Aizawa, 2003]
I Understanding IDF, [Robertson, 2004]
I Event Spaces, [Robertson, 2005]
I On Event Spaces and Rank Equivalences, [Luk, 2008]
I A Probabilistic Justification for TF-IDF, [Hiemstra, 2000]
I Understanding Relationships between Models, [Aly and Demeester, 2011]
I DFR, [Amati and van Rijsbergen, 2002]
I TF-IDF Uncovered, [Roelleke and Wang, 2008]
I Semi-subsumed Events: A Probabilistic Semantics of BM25 TF,[Wu and Roelleke, 2009]
I Probability of Being Informative, [Roelleke, 2003]
I Axiomatic Approach to IR Models, [Fang and Zhai, 2005]
I Bayesian extension to the language model for ad hoc information retrieval,[Zaragoza et al., 2003], ‘integral over model parameters"
30 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
Concepts beyond TF and IDF
Binomial Prob: → Poisson Prob → 2-Poisson Prob
I 2-Poisson is motivation for BM25 TF:[Robertson and Walker, 1994]
Event Spaces:
I {0,1}: BIR (and TF-IDF?)I {0,1,2, ...}: Poisson (and TF-IDF?)I {t1, t2, ...}: LM (and TF-IDF?)
Document-Query-(In)dependence: DQI = P(d ,q)P(d)·P(q)
Burstinesss (avgtf): Given avgdl = 100. Given d .
I t1: occurs in 1,000 locations, 500 docs. avgtf = 2. tfd = 2.Term is “average".
I t2: occurs in 1,000 locations, 999 docs. avgtf ≈ 1. tfd = 2.Term is “good"; however: IDF(t2) < IDF(t1).
31 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
tflifted : b = 0.5
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20nL(t,d)
tfmax: nL(tmax,d)=20
tfmax: nL(tmax,d)=40
tfsum: NL(d)=100
tfsum: NL(d)=1000
b + (1-b)*x/20b + (1-b)*x/40
b + (1-b)*x/100b + (1-b)*x/1000
32 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
tfpivoted : b = 0.7, avgdl = 200
0
5
10
15
20
0 5 10 15 20nL(t,d)
nL(t,d) / (b * NL(d)/avgdl + (1-b))
NL(d)=100
NL(d)=200
NL(d)=1000
x/(b*100/avgdl+(1-b))x/(b*200/avgdl+(1-b))
x/(b*1000/avgdl+(1-b))
33 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
BM25 TF: Pivoted
TF pivoted (in SIGIR BM25 tutorial by Hugo/Stephen, tf′d )
Definition
TF pivoted
TFpiv(t,d) :=tfdKd
(27)
Move from BM25 TF to semi-subsumed in probability theory
2 ·TFBM25 = 2 · TFpiv
TFpiv +1
P(theory∧ theory) = P(theory)(2·TFBM25)
34 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
References
Aizawa, A. (2003).
An information-theoretic perspective of tf-idf measures.
Information Processing and Management, 39:45–65.
Aly, R. and Demeester, T. (2011).
Towards a better understanding of the relationship between probabilistic models in ir.
In Advances in Information Retrieval Theory: Third International Conference, Ictir 2011, Bertinoro, Italy, September 12-14,2011, Proceedings, volume 6931, pages 164–175. Springer-Verlag New York Inc.
Amati, G. and van Rijsbergen, C. J. (2002).
Probabilistic models of information retrieval based on measuring the divergence from randomness.
ACM Transaction on Information Systems (TOIS), 20(4):357–389.
Church, K. and Gale, W. (1995).
Inverse document frequency (idf): A measure of deviation from Poisson.
In Proceedings of the Third Workshop on Very Large Corpora, pages 121–130.
Croft, W. and Harper, D. (1979).
Using probabilistic models of document retrieval without relevance information.
Journal of Documentation, 35:285–295.
Fang, H. and Zhai, C. (2005).
An exploration of axiomatic approaches to information retrieval.
In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development ininformation retrieval, pages 480–487, New York, NY, USA. ACM.
35 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
References
Hiemstra, D. (2000).
A probabilistic justification for using tf.idf term weighting in information retrieval.
International Journal on Digital Libraries, 3(2):131–139.
Luk, R. W. P. (2008).
On event space and rank equivalence between probabilistic retrieval models.
Inf. Retr., 11(6):539–561.
Robertson, S. (2004).
Understanding inverse document frequency: On theoretical arguments for idf.
Journal of Documentation, 60:503–520.
Robertson, S. (2005).
On event spaces and probabilistic models in information retrieval.
Information Retrieval Journal, 8(2):319–329.
Robertson, S. E. and Walker, S. (1994).
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval.
In Croft, W. B. and van Rijsbergen, C. J., editors, Proceedings of the Seventeenth Annual International ACM SIGIR Conferenceon Research and Development in Information Retrieval, pages 232–241, London, et al. Springer-Verlag.
Roelleke, T. (2003).
A frequency-based and a Poisson-based probability of being informative.
In ACM SIGIR, pages 227–234, Toronto, Canada.36 / 37
Intro & Background TF IDF IDF and LM P(t) P(q|d) and P(d |q): Conjunctive P(q|d) and P(d |q): Disjunctive Summary Appendix
References
Roelleke, T. and Wang, J. (2008).
TF-IDF uncovered: A study of theories and probabilities.
In ACM SIGIR, pages 435–442, Singapore.
Turtle, H. and Croft, W. B. (1990).
Inference networks for document retrieval.
In Vidick, J.-L., editor, Proceedings of the 13th International Conference on Research and Development in InformationRetrieval, pages 1–24, New York. ACM.
Wu, H. and Roelleke, T. (2009).
Semi-subsumed events: A probabilistic semantics for the BM25 term frequency quantification.
In ICTIR (International Conference on Theory in Information Retrieval). Springer.
Zaragoza, H., Hiemstra, D., and Tipping, M. (2003).
Bayesian extension to the language model for ad hoc information retrieval.
In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on research and development ininformation retrieval, pages 4–9, New York, NY, USA. ACM Press.
37 / 37