Post on 04-Jan-2016
A Lattice-Based Approach to Query-By-Example
Spoken Document Retrieval
Tee Kiah Chia† Khe Chai Sim‡
Haizhou Li‡ Hwee Tou Ng†
†National University of Singapore‡Institute for Infocomm Research, Singapore
2
Outline
IntroductionRelated workMain contributions of our workRetrieval methodsExperimental setupExperimental resultsConclusions and future work
3
Intro: Query-by-Example SDRQuery by exampleGiven
document collectionquery which is a full-fledged doc. – a query exemplar
Task: find docs. in coll. on similar topic as query
Spoken document retrieval (SDR)Info. retrieval on speech recordings
Query-by-example SDRQuery-by-example task where docs. & queries are in speech form
4
Intro: Query-by-Example SDROne way — obtain 1-best automatic transcript of each spoken doc. & query
Problems How to solve?Speech recog. errors, especially for noisy, spontaneous speech
Work with multiple recog. hypotheses
Lots of non-content words in query exemplars
Stop word removal
5
Intro: Lattices
Lattice – connected directed acyclic graph
James & Young (1994), James (1995)Each edge labeled with term hypothesis, probs.Each path gives hypothesized seq. of terms for utterance, and its probability
and
it’s
my son’s </s>
<s>
mentor
niceand tender
</s>
to tender
6
Intro: Lattices
and
it’s
my son’s </s>
<s>
mentor
niceand tender
</s>
to tender
Lattice – connected directed acyclic graph
James & Young (1994), James (1995)Each ledge labeled with term hypothesis, probs.Each path gives hypothesized seq. of terms for utterance, and its probability
7
Intro: Our proposal
Statistical lattice-based query by exampleEstimate stat. models for docs. and query exemplars – from expected word countsUse neg. KL divergence as doc.-query relevance
and
it’s
my son’s </s>
<s>
mentor
niceand tender
</s>
to tender
8
Outline
IntroductionRelated workMain contributions of our workRetrieval methodsExperimental setupExperimental resultsConclusions and future work
9
Related work:Lattice-based speech processingLats. of spoken docs. for word-spotting and SDR
James & Young (1994); James (1995); Jones et al. (1996): phone lattices
Phone: basic unit of speech: /æ/, /t/, /p/, …
Siegler (1999); Chelba & Acero (2005); Mamou et al. (2006), Chia et al. (2007): word latticesSaraclar & Sproat (2004); Hori et al. (2007): combine word & phone lattices for word-spotting
Lattices of spoken queries for IRColineau & Halber (1999)
10
Related work:Information retrievalStatistical lang. modeling approach to IRSong & Croft (1999)Lafferty & Zhai (2001): ranking by query likelihood = ranking by neg. KL divergence
Stop word removalSinka & Corne (2003); Carvalho et al. (2007): effect of diff. stop word lists in NLP tasks
11
Related work:Information retrievalQuery by exampleChen et al. (2004): newswire articles (text) for queries, broadcasts (speech) for docs.He et al. (2003), Lo & Gauvain (2002, 2003): tracking task in Topic Detection and Tracking (TDT)All using only 1-best transcripts
12
Outline
IntroductionRelated workMain contributions of our workRetrieval methodsExperimental setupExperimental resultsConclusions and future work
13
Main contributionsof our workExtend use of stat. model in lattice-based SDR (Chia, 2007) to query by exampleCan also be considered as extensions of
Chen et al. (2004)’s query by ex. with text queriesLafferty & Zhai (2001)’s stat. SDR as KL divergence computation
Study effect of stop word removal in query-by-ex. SDR
14
Outline
IntroductionRelated workMain contributions of our workRetrieval methodsExperimental setupExperimental resultsConclusions and future work
15
Retrieval methods
Statistical, using 1-best transcriptsMotivated by Song & Croft (1999), and Chen et al. (2004)
Statistical, using latticesOur proposed method
16
Retrieval methods:Statistical with 1-bestSuppose we have
spoken doc. d doc. collection C query exemplar q
Compute relevance of d to q, Rel(d, q)Define Rel(d, q) = log Pr(q | d)
Under uniform Pr(d), this is equiv. to Pr(d | q)
Letq’s 1-best transcript be q1q2…qK
c(w ; q) be count of word w in q’s transcriptThen
Rel(d, q) = ∑1≤i ≤K Pr(qi | d) = ∑w c(w ; q) log Pr(w | d)
17
Retrieval methods:Statistical with 1-best
Equiv. to neg. KL divergence of doc. model from query modelLafferty & Zhai (2001)-ΔKL(q, d) = ∑w Pr(w | q) log Pr(w | d)
= log Pr(q | d) / K + Hq
= Rel(d, q) / K + Hq
where K and Hq do not depend on d
18
Retrieval methods: Statistical with 1-bestBuilding unigram model to get Pr(qi | d)
Use 2-stage smoothing (Zhai & Lafferty, 2004)Combination of Jelinek-Mercer and Bayesian smoothing
Pr(w|d) = (1-λ) + λPr(w|U )
w is a word e.g. query wordc(w ; d) no. of times w occurs in dU a background language modelλ (0, 1) set according to nature of queriesμ set using estimation algo. by Zhai & Lafferty (2004)
Thus we can compute Rel(d, q) = log Pr(q | d)
19
w2
Retrieval methods: Statistical with latticesOur proposed methodObtain lattices
Start with speech seg.’s acoustic observations oGenerate lat. with acoustic probs. — Pr(o|t) for each possible transcript tRescore with n-gram model — gives Př(t|o) Př(t, o) = Pr(t)(Pr(o|t) e ρ|t|)1/ω (Chelba & Acero, 2005)
o1 o2 o3o =
w1/Pr(o1|w1) w3
w4
w4/Pr(o3|w4)
w2/Pr(o2o3|w2)
w5
w1/Pr(w1, o1, <s>)
w4
w2
w3
w3
w5
w2/Pr(w2</s>, o2o3, w2)
w4/Pr(w4</s>, o3, w3)
w4/Pr(w4</s>, o3, w5)
Acoustic
observation
s
Latice with
acoustic
probs.
Rescore
dlattice
20
Prune latticesRemove path if log prob. exceeds best path’s by a thresh.2 thresholds: Θdoc for docs., Θqry for query exemplars
Find expectations ofword counts E[c(w ; d)]doc. lengths E[|d|]If d contains speech segs. o(1), …, o(M), then
Similarly for q
Pruned
lattic
eRetrieval methods: Statistical with lattices
o1 o2 o3o =
w4/p1
(p1 = Pr(w1, o1, <s>))
w2/p2
w3/p3
w2/p5
w4/p4
Word Expected count
w2 2p2p5/(p1p3p4+p2p5)
w3 p1p3p4/(p1p3p4+p2p5)
w4 2p1p3p4/(p1p3p4+p2p5)
Expecte
dcounts
E[c(w ; d)] = ∑1≤j≤M ∑t c(w ; t)Pr(t |o(
j ))E[|d|] = ∑1≤j≤M ∑t |t|Pr(t |o( j ))
(Hatch, 2005)
21
Retrieval methods: Statistical with latticesBuild unigram model of d
With expected countsAgain, use 2-stage smoothing
Pr(w | d) = (1-λ) + λPr(w|U )
` (Zhai & Lafferty, 2004)
Build unigram model of qUnsmoothed
Pr(w | q) = E[c(w ; q)] / E[|q|]Compute relevance as neg. KL div.
Rellat(d, q) = ∑w Pr(w | q) log Pr(w | d) (Lafferty &
Zhai, 2001)
22
Outline
IntroductionRelated workMain contributions of our workRetrieval methodsExperimental setupExperimental resultsConclusions and future work
23
Corpus: Fisher English Training corpus from LDC
11,699 telephone calls, total 1,920 hours, ≈ 109Mb textEach call initiated by one of 40 topics6,605 calls for training ASR engine
Queries40 exemplars – 32 test, 8 devel. – for 40 topics
Doc. collection5,054 callsUnit of retrieval (“document”): a call
Ground truth rel. judgementsd rel. to q iff d and q on same topic
Experimental setup: Task
ENG01. Professional sports on TV. Do either of you have a favorite TV sport? How many hours per week do you spend watching it and other sporting events on TV?
Example of a topic spec.
24
Experimental setup: DetailsLattices
Generated by HTK (Young et al., 2006)Large vocab. triphone-based cont. speech recognizer
Rescored with trigram language model1-best transcripts
Decoded from rescored latticesWord error rate: 48.1%
Words stemmed with Porter stemmerOther tools used
AT&T FSM (Mohri et al., 1998)SRILM (Stolcke, 2002)CMU Lemur toolkit
25
Experimental setup: RetrievalSmoothing parameter
λ = 0.7 — good for verbose queries (Zhai and Lafferty, 2004)
Lattice pruning thresholds Θdoc and Θqry
Vary on devel. queries, use best value on test queries
Stop word removal: usedno stoppingstopping with 319-word list from U. of Glasgow (gla)stopping with 571-word list used in SMART system (smart)
26
Experimental setup: RetrievalRetrieval performed using1-best trans. of exemplars & docs. (1-best → 1-best)exemplar 1-best, doc. lat. (1-best → Lat)exemplar lat., doc. 1-best (Lat → 1-best)lat. counts of exemplars and docs. (Lat → Lat): our proposed methodAlso tried
ref. trans. of exemplars. & docs. (Ref → Ref)orig. Fisher topic spec. for queries (Top → Ref, Top → 1-best, Top → Lat)
27
Experimental setup: Evaluation
Eval. methodResults ranked by rel. score, compared with ground truth rel. judgements
Eval. measure: mean avg. precision (MAP)
L = no. of queriesRi = no. of docs. rel. to ith queryri,j = position of jth rel. doc. in ranked list output for ith query
28
Outline
IntroductionRelated workMain contributions of our workRetrieval methodsExperimental setupExperimental resultsConclusions and future work
29
Experimental results:MAP without stop word removal
0.8149
0.76130.7723
0.7468
0.69580.70090.70230.7079
0.68
0.70.72
0.740.76
0.780.8
0.82
Orig. Fisher topic specs., no stopping Exemplars, no stopping
MAP of test queries
Top > 1-best Top > Lat Top > Ref Ref > Ref1-best > 1-best 1-best > Lat Lat > 1-best Lat > Lat
Stat. significance testing — 1-tailed t-test, Wilcoxon testLat → Lat vs. 1-best → 1-best: improvement sig. at 99.95% levelHowever, original topic specs. still better – nature of exemplars presents difficulties for retrieval
30
Experimental results:MAP with stop word removal
0.7468
0.7630
0.7781
0.6958
0.7193
0.7406
0.7009
0.7283
0.7499
0.7023
0.7285
0.7487
0.7079
0.7364
0.7569
0.68
0.7
0.72
0.74
0.76
0.78
0.8
Exemplars, no stopping Exemplars, gla stop list Exemplars, smart stop list
MAP of test queries
Ref > Ref 1-best > 1-best 1-best > Lat Lat > 1-best Lat > Lat
Stat. significance testingWith gla stop list: Lat → Lat better than 1-best → 1-best at 99.99% levelWith smart stop list: better at 99.95% level
31
Outline
IntroductionRelated workMain contributions of our workRetrieval methodsExperimental setupExperimental resultsConclusions and future work
32
Conclusions andfuture workConclusionsPresented a method for query-by-example SDR using lattices, under stat. retrieval modelSig. improvement over using 1-best transcriptsConsistent improvement, even with stop word removal
Future workExtend to other speech processing tasks, e.g. spoken document classification
33
Thank you!