A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao...
-
Upload
ashlyn-richards -
Category
Documents
-
view
223 -
download
2
Transcript of A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao...
![Page 1: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/1.jpg)
A Markov Random Field Model for Term Dependencies
Donald Metzler W. Bruce Croft
Present by Chia-Hao Lee
![Page 2: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/2.jpg)
2
outline
• Introduction• Model
– Overview– Variants– Potential Functions– Training
• Experimental Results• Conclusions
![Page 3: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/3.jpg)
3
Introduction
• There is rich history of statistical models for information retrieval, including the binary independence model (BIM), language modeling, inference network model, and so on.
• It is well known that dependencies exist between terms in a collection of text.
• For example, with a SIGIR proceeding, occurrences of certain pairs of terms are correlated, such as information and retrieval.
![Page 4: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/4.jpg)
4
Introduction
• Unfortunately, estimating statistical models for general term dependencies is infeasible, due to data sparsity.
• For this reason, most retrieval models assume some form of independence exists between terms.
• Most work on modeling term dependencies in the past has focused on phrases/proximity or term co-occurrences. Most of these models only consider dependencies between pairs of terms.
• Several recent studies have examined term dependence models for the language modeling framework.
![Page 5: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/5.jpg)
5
Model
• Markov random fields (MRF), also called undirected graphical models, are commonly used in the statistical machine learning domain to succinctly model joint distributions.
• We use MRFs to model the joint distribution over queries Q and documents D, parameterized by Λ.
DQP ,
![Page 6: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/6.jpg)
6
Model
• A markov random field is constructed from a graph G.• The nodes in the graph represent random variables, and
the edges define the independence semantics between the random variables.
• In this model, we assume G consists of query nodes and a document node D, such as the graphs in the figure.
GCc
cZ
DQP ;1
,
nqqQ ,,1
GC : the set of cliques in G
; : a non-negative potential function over clique configurations parameterized by Λ
DQ GCc
cZ,
; :normalizes the distribution
![Page 7: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/7.jpg)
7
Model
• For ranking purposes we compute the posterior:
• As noted above, all potential functions must be non-negative, and are must commonly parameterized as:
GCc
rank
rank
c
QPDQP
QP
DQPQDP
;log
log,log
,
cfc c exp; cf : real-valued feature function over clique values
c : the weight given to that particular feature function
![Page 8: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/8.jpg)
8
Model
• Substituting this back into ranking function, we end up with the following ranking function
• To utilize the model, the following steps must be taken for each query Q:– Construct a graph representing the query term dependencies to
model – Define a set of potential functions over the cliques of this graph– Rank documents in descending order of
1
GCc
c
rank
cfQDP
QDP
![Page 9: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/9.jpg)
9
Model
• We now describe and analyze three variants of the MRF model, each with different underlying dependence assumptions.– Full independence (FI)– Sequential dependence (SD)– Full dependence (FD)
![Page 10: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/10.jpg)
10
Model
• The full independence variant makes the assumption that query terms are independent given some document D.
• The likelihood of query term occurring is not affected by the occurrence of any other query term, or more succinctly,
.
• The sequential dependence variant assumes a dependence between neighboring query terms.
• Formally, this assumption states that only for nodes that are not adjacent to .
iq
iq
DqPqDqP iiji ,
DqPqDqP iji ,
iqjq
![Page 11: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/11.jpg)
11
Model
• The full dependence variant, all query terms are in some way dependent on each other.
• Graphically, a query of length n translates into the complete graph , which includes edges from each query node to the document node D.
1nK
![Page 12: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/12.jpg)
12
Model
• The potential functions φ play a very important role in how accurate our approximation of the true joint distribution is.
• For example : Consider a document D on the topic of information retrieval.
Using the sequential dependence variant, we would expect
, as the term
information and retrieval are much more “compatible” with the topicality
of document D than the terms information and assurance.
Dassurance,n,informatioDretrieval,n,informatio
![Page 13: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/13.jpg)
13
Model
• Since documents are ranked by Equation 1, it is also important that the potential functions can be computed efficiently.
• Based on these criteria and previous research on phases and term dependence, we focus on three types of potential functions.
• These potential functions are attempt to abstract the idea of term co-occurrence.
![Page 14: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/14.jpg)
14
Model
• Since potentials are over cliques in the graph, we now proceed to enumerate all of the possible ways graph cliques are formed in our model and how potential functions are defined for each.
• The simplest type of clique that can appear in our graph is a 2-clique consisting of an edge between a query term and the document D.
iq
![Page 15: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/15.jpg)
15
Model
• In keeping with simple to compute measures, we define this potential as:
C
cf
D
tf
DqPc
ii qD
DqDT
iTT
,1log
log
DqP i : a smoothed language modeling estimate
Dwtf , : the number of the terms w occurs in document D
wcf : the number of times term w occurs in the entire collection
D : total number of terms in the document D
C : the length of the collection
![Page 16: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/16.jpg)
16
Model
• Next, we consider cliques that contain two or more query terms.
• For example: In the query train station security measures, if any of the sub-phrases,
train station, train station security, station security measures, or
security measures appear in a document then there is strong
evidence in favor of relevance.
![Page 17: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/17.jpg)
17
Model
• Therefore, for every clique that contains a contiguous set of two or more terms and the document node D,
we apply the following “ordered” potential function:
C
cf
D
tf
DqqPc
kiikii qqD
DqqDO
kiiOO
,,1#,,,1#1log
,,1#log
kii qqcf1# : the number of times term ω occurs in the entire collection
D : total number of terms in the document D
C : the length of the collection
kii qq ,,
Dqq kiitf ,1# : the number of the times the exact phrase occurs in document D kii qq ,,
![Page 18: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/18.jpg)
18
Model
• Although the occurrence of contiguous sets of query terms provide strong evidence of relevance, it is also the case that the occurrence of non-contiguous sets of query terms can provide valuable evidence.
• In the previous example, documents containing the terms train and security within some short proximity of one another also provide additional evidence towards relevance.
![Page 19: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/19.jpg)
19
Model
• For our purposes, we construct an “unordered” potential function over cliques that consist of sets of two or more query terms and the document node D. Such potential functions have the following from:
C
cf
D
tf
DqquwNPc
jiji qquwN
D
DqquwN
DU
jiUU
,,#,,,#1log
,,#log
DqquwN jitf ,#
: the number of the times the terms appear ordered or unordered with a window N terms.
ji qquwNcf # : the number of times term ω occurs in the entire collection
D : total number of terms in the document D
C : the length of the collection
ji qq ,,
![Page 20: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/20.jpg)
20
Model
• Using these potential functions, we derive the following specific ranking function:
UOcUU
OcOO
TcTT
GCcc
rank
cfcfcf
cfQDP
![Page 21: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/21.jpg)
21
Experimental Results
• We make use of the Associated Press and Wall Street Journal sub-collections of TREC, which are small homogeneous collections, and two web collections, WT10g and GOV2, which are considerably larger and less homogeneous.
![Page 22: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/22.jpg)
22
Experimental Results
• Full independence
![Page 23: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/23.jpg)
23
Experimental Results
• Sequential dependence
![Page 24: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/24.jpg)
24
Experimental Results
• Full dependence
![Page 25: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/25.jpg)
25
Conclusions
• In this paper, we develop as general term dependence model that can make use of arbitrary text feature.
• Three variants of the model are described, where each capture different dependencies between query terms.
![Page 26: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/26.jpg)
26
Markov Random Fields
• Let be random variables taking values in some finite set S, and let be a finite graph such that , whose elements will sometime be called sites.
• For a set , let define its neighbor (or boundary) set: all elements in that have a neighbor in A. For
, let .
• The random variables are said to define a Markov random field if, for any vector :
nXX ,,1 ENG ,
NN ,,1
NA AAN \
Ni ii
NSx
ijxXxXiNjxXxX jjiijjii ,Pr\,Pr
![Page 27: A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649ea75503460f94baa8d8/html5/thumbnails/27.jpg)
27
Potentials
• A potential is a function indexed by subsets of N on the space . We will write potentials as for , .
• Given a full set of potentials, the energy of a configuration w will be defined as:
• Using the energy, we can define a probability measure, P, from a set of potentials by:
NS NA wVANSw
NA
A wVwU
Z
wUwP
exp
NSw
wUZ exp