DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop...

13
INES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance in Automatic Audio Indexing of Course Lectures Saturday May 20, 2006 Richard Rose 1 , Renato Rispoli 1 , and Jon Arrowood 2 1 McGill University Dept of ECE Montreal, QC Canada 2 Nexidia Inc. Atlanta, GA USA

Transcript of DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop...

Page 1: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

DIVINES SRIV Workshop

The Influence of Word Detection Variability on IR Performance in Automatic Audio

Indexing of Course Lectures

Saturday May 20, 2006

Richard Rose1, Renato Rispoli1, and Jon Arrowood2

1McGill University

Dept of ECE

Montreal, QC Canada

2Nexidia Inc.

Atlanta, GA USA

Page 2: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Indexing Audio Lectures

• Existing multimedia resources have the potential to make recorded University lectures and seminars accessible online to a wider audience

• It is important that the audio lectures be searchable …

• … but, human annotation of large corpora is expensive

• Automatic Speech Recognition (ASR) based tools can be used to facilitate search of the un-transcribed audio material

Page 3: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

An Audio Search Tool for Course Lectures

Text Query Term

Retrieved Segments from Lecture Audio Files• Click to listen to audio segment

Synchronized Presentation Slides

User Interface Developed by Nexidia

Page 4: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Audio Indexing of Lectures - Motivation

• Goal – Provide Disabled and Non-Disabled Students and Scholars Access to a Large Collection (thousands of hours) of Audio Lectures and Seminars

• Multimedia – Permit Synchronization and Interpretation of audio with Lecture Slides and Video Content

• Challenges – Large variability in dialect, speaking style, recording conditions, and task domain

Page 5: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Issues in Audio Indexing

• Acoustic – Extraction of query terms from audio– Must be extremely fast during search (>>1,000 X real-time)

• Information Retrieval (IR) – Definition of relevance measure– Score query against hypothesized audio segment

• Task Domain - Definition of the notion of relevance– When does relevant segment begin and end?

• Evaluation Metrics– Acoustic: ASR word error rate, Keyword detection performance– IR: Precision / Recall of relevant segments– Task Domain: Increase in Productivity for the target user community

Page 6: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Audio Indexing Task Domains• Several techniques have been applied to indexing of

spoken audio in several task domains:• [Rose, 1991]:

– Task: Topic Spotting from Conversational Speech– Method: Keyword spotting

• [Foote et al, 1997]:– Task: Retrieval of multimedia mail messages (Video mail browser)– Method: Phone lattice based open vocabulary indexing

• [Garofolo, 2000]:– Task: Spoken Document Retrieval (SDR) from Broadcast News– Method: Large vocabulary continuous speech recognition (LVCSR)

• Course Lectures:– How to define a topic of interest?– How to segment a continuous lecture by topic?– How to define query terms and extract them from audio?

Page 7: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

• Phone Lattice-Based Search Engine – Off-line Lattice Generation (50 x real-time):

• Obtain phonetic lattice from utterance (50 x real-time)

– Search (100,000 x real-time): • Submit text based keyword queries, • Obtain phonetic expansion,• Find best match in phone lattice

A Preliminary Study of Audio Indexing

Page 8: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Evaluating Information Retrieval Performance

• Database – Twelve hours of lectures from McGill ECE Photonics Course (Prof. Andrew Kirk)

• Domain Experts – Course TA’s• Target Domain – Example questions taken from course

material …– Sample question:

“Explain the modal properties of a conducting waveguide from the point of view of destructive and constructive interference”

• Relevance Labeling – Domain experts identify lecture segments that are relevant

to question– A lecture segment is the audio that overlaps a given

lecture slide

Page 9: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Relevance Measure• Given an audio segment of length seconds, • For a Query containing query terms• Obtain hypothesized occurrences for term with

acoustic posterior scores

lQk

ktt0

,1iS ,2iS , ikloi N

S

ktlN

ikloN iW

ijS

,1 1

1 1

(1 )

ikllo

k l lT

NN

ijii jk s

IIR SFOMN t R

• Combine weighted posterior scores to obtain a measure of relevance for segment w.r.t. query lQk

Audio Segment k

Acoustic Scoresfor Query Term i

Hypothesized Occurrences

of Term i

Page 10: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Relevance Measure - Normalization

• There are two normalization components: – Acoustic Confidence Normalization:

• Function of the average Figure of Merit observed for query term

• FOM: Average of the detection prob. over a range of false alarm rates

– Document Length Normalization: • Estimate of the number of words in audio segment k

• Relies on estimate of speaking rate: words/sec.

,1 1

1 1

(1 )

ikllo

k l lT

NN

ijii jk s

IIR SFOMN t R

• Relevance Measure:

(1 )iF FOM

iW

k st R

2.1sR

Page 11: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Acoustic Variability• Impact of length of phonetic baseform on word detection

performanceWord duration in phones:

• Effect of word length in detection performance

Pro

b.

of

De

tec

tio

n

Baseform Phones FOM (%)

5 or less 50.96

6-8 61.26

9 or more 73.03

Figure of Merit vs. Baseform Length:

• Figure of Merit (FOM): Average over the range form 0 to 10 false alarms per keyword per hour dP

Page 12: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

Acoustic Variability

• Impact of accuracy of phonetic baseforms on word spotting performance

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

False Alarms per Keyword per Hour (FA/KW/HR)

Pd

Probability of Detection for "dielectric" with Different Phonetic Spellings

dielectric

dyelectric

• Word pronunciation:• Comparison of 2 phonetic expansions of the word “dielectric”

d iy l eh k t r ih k

d ay l eh k t r ih k

False Alarms per Keyword per Hour (FA/KW/HR)

Pro

b.

of

De

tec

tio

n

Page 13: DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.

DIVINES – Speech Rec. and Intrinsic Variation W.S. May 20, 2006 Richard Rose

IR Performance• Define a relevance metric based on normalized frequency

of occurrence of keywords chosen by domain experts• Rank segments of messages based relevance metric• Plot Results …

Rank (R)

% queries with at least one relevant document in top R

ranks (text)

% queries with at least one relevant document in top R

ranks (speech)

5 75% 58.33%

10 83.33% 66.67%

15 83.33% 75%

20 91.67% 83.33%