An Efficient Distance Measure for Comparing Phoneme Sequencesevents.eventact.com/afeka/afeka/An...

Post on 19-Feb-2018

222 views 2 download

Transcript of An Efficient Distance Measure for Comparing Phoneme Sequencesevents.eventact.com/afeka/afeka/An...

An Efficient Distance Measure for

Comparing Phoneme Sequences

Baruchi Har-Lev Speech Recognition Day – 8/6/2010

Multi-Stage LVCSR

2

Vocabulary

Reduction

Phoneme

Decoder

Lexicon

CSR

Input

Speech

Textual

Output

Probable

Word

Selection

Recognized

Phoneme

Sequence

Reduced

Vocabulary

Distance

Measure

Hypothesis

Creation

Search Space Creation

Lexicon

Distance measure

O(n2)

Overview The Problem

3

Word1

.

.

Word100k

Ph1 Ph2 . Phk . PhM

Example

Lexicon size = 100k

Phoneme sequence length = 250

Grid size = 25M

Avg. hypothesis length = 7

Number of Operations > 1,000,000,000

Recognized Phoneme

Sequence Lexicon

Wordi

Reduce computational complexity of the

distance measure while maintaining

minimum performances loss

Overview

The Goal

4

Test Data Base

5

VoiceMail Macrophone Parameter

2485 3140 # of Utterances

20+ sec 3.5 sec Utterance Length

70-80 8-9 # of Words in Utterance

250 40 # of Phonemes in

Utterance

Insertion, Deletion, Substitution

Equal weight probabilities

O(NM) ≈ O(n2)

Levenshtein Distance

6

D

ji PhPhSub ,

1jPh jPh

1iPh

iPh jPhIns

iPhDel

Ref.

Phoneme

String

Tested Phoneme

String

InsDelSubMinPhPhD ji ,,,

Levenshtein Distance in the

Exhaustive Process

7

.

X1

XN

.

0

Y1 . . YM

Lexicon

Word

Hypothesis

from Sequence

W1

.

.

.

W100k

Y T N . . .

Full Search Grid

2

:,

nOSizeGrid

HypWDofNumber ji

Results

8

Coverage [%] Creation Time

[sec] Distance Method

85% 8 Levenshtein Distance

Reduced

Vocabulary Size

Lexicon

Size

Mean Sequence

Size [Phones] DB

50k 100k 41 Macrophone

Choose a canonic word - Word0

Compute distances (Levenshtein)

A-priori distance computation

Real time computation

Sum of results

A-Priori Distance Differences

9

HypWordDWordWiDHypWiD ,,, 00

A-priori One Time in RT

A-Priori Distance Differences Full Search

10

O(Grid Size)

priori-A RT

W1

.

.

.

W100k

1W

0,WordWiD

kW100

Hyp1

.

.

.

HypN

1Hyp

0,WordHypD i

NHyp

+

O(#Hyp) O(1)

Results

11

Coverage [%] Creation Time

[sec] Distance Method

85% 8 Levenshtein Distance

53% 0.32 A-priori Differences

Reduced

Vocabulary Size

Lexicon

Size

Mean Sequence

Size [Phones] DB

50k 100k 41 Macrophone

96% Time Reduction, 32% Coverage Loss

Advantages:

Reduces number of

hypotheses

“Breaks” the distance

measure

Disadvantages:

Complex interface

Tree Distance

12

t

L4

“about”

v

L4

“above”

m

L4

“charm”

ey1

L2

{}

ax

L1

“a”

Ch

L1

{}

aw1

L3

{}

ah1

L3

{}

aa1

L2

{}

0

r

L3

{}

n

L3

“chain”

b

L2

{}

Results

13

Coverage [%] Creation Time

[sec] Distance Method

85% 8 Levenshtein Distance

53% 0.32 A-priori Differences

85% 8 Tree Distance

Reduced

Vocabulary Size

Lexicon

Size

Mean Sequence

Size [Phones] DB

50k 100k 41 Macrophone

No Improvement

Stringology – Computer science

O(n1.4N1.2)

Good compression rate

Avg. length (Lexicon Word) ≈ 7 Phones

Bad compression rate

Biology

Blast

Computer network

A Glimpse at Different Disciplines

14

Length(Hyp.) = Length(Word)

Basic comparisons

Uses substitutions only

Does not use:

Diagonal Distance

15

.

X1

XN

.

0

Y1 . . YM

Lexicon

Word

Hypothesis from

Sequence

Insertion Deletion

Results

16

Coverage [%] Creation Time

[sec] Distance Method

85% 8 Levenshtein Distance

53% 0.32 A-priori Differences

85% 8 Tree Distance

83% 0.9 Diagonal Distance

Reduced

Vocabulary Size

Lexicon

Size

Mean Sequence

Size [Phones] DB

50k 100k 41 Macrophone

90% Time Reduction, 2% Coverage Loss

Additional data - Phoneme engine characteristics

Confusion matrix

Posteriori probabilities

Larger distinction between

phones

Disadvantage

Sensitivity to phoneme

engine characteristics Del

Ins

.

Ph1

0

Ph1 . . PhN

PhN

.

i

j

Weighted Distance

17

Results

18

Coverage [%] Creation Time

[sec] Distance Method

85% 8 Levenshtein Distance

53% 0.32 A-priori Differences

85% 8 Tree Distance

83% 0.9 Diagonal Distance

90% 0.9 Diagonal Weighted Distance

Reduced

Vocabulary Size

Lexicon

Size

Mean Sequence

Size [Phones] DB

50k 100k 41 Macrophone

90% Time Reduction, 5% Coverage Improvement

Vocabulary reduction

Problems and Goals

Levenshtein distance method

Very complex in exhaustive search

Improving the distance method

Best method – Weighted diagonal distance method

Next step

Summary

19

20