Pei- Ning Chen NTNU CSIE SLP Lab

19
Effects of Query Expansion for Spoken Document Passage Retrieval Tomoyosi Akiba, Koichiro Honda INTERSPEECH 2011 Pei-Ning Chen NTNU CSIE SLP Lab

description

Effects of Query Expansion for Spoken Document Passage Retrieval Tomoyosi Akiba , Koichiro Honda INTERSPEECH 2011. Pei- Ning Chen NTNU CSIE SLP Lab. Outline. Introduction Passage Retrieval for Spoken Document Query Expansion for SDR Experiments Conclusions. Introduction. - PowerPoint PPT Presentation

Transcript of Pei- Ning Chen NTNU CSIE SLP Lab

Page 1: Pei- Ning  Chen NTNU CSIE SLP Lab

Effects of Query Expansion for Spoken Document Passage Retrieval

Tomoyosi Akiba, Koichiro HondaINTERSPEECH 2011

Pei-Ning ChenNTNU CSIE SLP Lab

Page 2: Pei- Ning  Chen NTNU CSIE SLP Lab

Outline

• Introduction • Passage Retrieval for Spoken Document• Query Expansion for SDR• Experiments• Conclusions

Page 3: Pei- Ning  Chen NTNU CSIE SLP Lab

Introduction

• Because confirming the content of a spoken document requires playing back its audio data, browsing speech data is much more difficult and time-consuming than browsing textual data.

• They apply relevance models, a query expansion method, for the spoken document passage retrieval task. They adapted the original relevance model for passage retrieval, and also extended it to benefit from massive collections of Web documents for query expansion.

Page 4: Pei- Ning  Chen NTNU CSIE SLP Lab

Retrieval Methods for Passage Retrieval

• Using the Neighboring Context to Index the Passage– Passages from the same lecture may be related to each

other in the passage retrieval task, whereas the target documents are considered to be independent of each other in a conventional document retrieval task.

• Penalizing Neighboring Retrieval Results– In applying context indexing, neighboring passages are

liable to be retrieved at the same time as they share the same indexing words.

Page 5: Pei- Ning  Chen NTNU CSIE SLP Lab

Query Expansion for SDR

• Relevance Models

• Extending Relevance Models to Context Indexing

• Extending Relevance Models using Web

Page 6: Pei- Ning  Chen NTNU CSIE SLP Lab

• Linear interpolation: • the two models are linearly interpolated:

• Document weighting: • the Web model is used to weight the target documents:

Page 7: Pei- Ning  Chen NTNU CSIE SLP Lab

Experiments

Page 8: Pei- Ning  Chen NTNU CSIE SLP Lab

Experiments

Page 9: Pei- Ning  Chen NTNU CSIE SLP Lab

Conclusions• They applied relevance models for the spoken

document passage retrieval task.• They also extended it to take advantage of the

massive collection of Web documents for query expansion.

• In order to improve the performance of their Web extension of relevance models, filtering for noisy Web documents might be necessary.

• In future work, we will apply Web document filtering methods to select only the documents most related to the target documents.

Page 10: Pei- Ning  Chen NTNU CSIE SLP Lab

Speech Indexing Using Semantic Context InferenceChien-Lin Huang, Bin Ma, Haizhou Li and Chung-Hsien Wu

INTERSPEECH 2011

Page 11: Pei- Ning  Chen NTNU CSIE SLP Lab

Outline

• Introduction • Semantic Context Inference • Experiments• Conclusions

Page 12: Pei- Ning  Chen NTNU CSIE SLP Lab

Introduction

• The indexing techniques of text-based information retrieval have been widely adopted in spoken document retrieval

• However, due to imperfect speech recognition results, out-of vocabulary, and the ambiguity in homophone and word tokenization, conventional text-based indexing techniques are not always appropriate for spoken document retrieval

Page 13: Pei- Ning  Chen NTNU CSIE SLP Lab

Semantic Context Inference(SCI)

• They proposed the semantic context inference representation by finding the semantic relation between terms, and suggesting semantic term expansion for speech indexing

Page 14: Pei- Ning  Chen NTNU CSIE SLP Lab

Semantic relation matrix• A spoken document database comprises an accu-

mulation of spoken documents from which the document-by-term matrix

Page 15: Pei- Ning  Chen NTNU CSIE SLP Lab

SCI for indexing

• By summing up all the semantic inference vectors for the spoken document d, we finally obtain the semantic context inference vector

Page 16: Pei- Ning  Chen NTNU CSIE SLP Lab

Retrieval model

• For spoken document retrieval, we adopt the vector space models which have been widely used in information retrieval by offering a highly efficient retrieval with a feature vector representation for a document

Page 17: Pei- Ning  Chen NTNU CSIE SLP Lab

Experiments

• To measure the accuracy of retrieved documents and the ranking position of the relevant document, they use the mean average precision to evaluate.

Page 18: Pei- Ning  Chen NTNU CSIE SLP Lab
Page 19: Pei- Ning  Chen NTNU CSIE SLP Lab

Conclusions

• The proposed semantic context inference explores the latent semantic information and extends the semantic related terms to speech indexing. The semantic context inference vector can be regarded as a re-weighing indexing vector which is a way of query expansion to overcome speech recognition errors.