Post on 19-Feb-2018
An Efficient Distance Measure for
Comparing Phoneme Sequences
Baruchi Har-Lev Speech Recognition Day – 8/6/2010
Multi-Stage LVCSR
2
Vocabulary
Reduction
Phoneme
Decoder
Lexicon
CSR
Input
Speech
Textual
Output
Probable
Word
Selection
Recognized
Phoneme
Sequence
Reduced
Vocabulary
Distance
Measure
Hypothesis
Creation
Search Space Creation
Lexicon
Distance measure
O(n2)
Overview The Problem
3
Word1
.
.
Word100k
Ph1 Ph2 . Phk . PhM
Example
Lexicon size = 100k
Phoneme sequence length = 250
Grid size = 25M
Avg. hypothesis length = 7
Number of Operations > 1,000,000,000
Recognized Phoneme
Sequence Lexicon
Wordi
Reduce computational complexity of the
distance measure while maintaining
minimum performances loss
Overview
The Goal
4
Test Data Base
5
VoiceMail Macrophone Parameter
2485 3140 # of Utterances
20+ sec 3.5 sec Utterance Length
70-80 8-9 # of Words in Utterance
250 40 # of Phonemes in
Utterance
Insertion, Deletion, Substitution
Equal weight probabilities
O(NM) ≈ O(n2)
Levenshtein Distance
6
D
ji PhPhSub ,
1jPh jPh
1iPh
iPh jPhIns
iPhDel
Ref.
Phoneme
String
Tested Phoneme
String
InsDelSubMinPhPhD ji ,,,
Levenshtein Distance in the
Exhaustive Process
7
.
X1
XN
.
0
Y1 . . YM
Lexicon
Word
Hypothesis
from Sequence
W1
.
.
.
W100k
Y T N . . .
Full Search Grid
2
:,
nOSizeGrid
HypWDofNumber ji
Results
8
Coverage [%] Creation Time
[sec] Distance Method
85% 8 Levenshtein Distance
Reduced
Vocabulary Size
Lexicon
Size
Mean Sequence
Size [Phones] DB
50k 100k 41 Macrophone
Choose a canonic word - Word0
Compute distances (Levenshtein)
A-priori distance computation
Real time computation
Sum of results
A-Priori Distance Differences
9
HypWordDWordWiDHypWiD ,,, 00
A-priori One Time in RT
A-Priori Distance Differences Full Search
10
O(Grid Size)
priori-A RT
W1
.
.
.
W100k
1W
0,WordWiD
kW100
Hyp1
.
.
.
HypN
1Hyp
0,WordHypD i
NHyp
+
O(#Hyp) O(1)
Results
11
Coverage [%] Creation Time
[sec] Distance Method
85% 8 Levenshtein Distance
53% 0.32 A-priori Differences
Reduced
Vocabulary Size
Lexicon
Size
Mean Sequence
Size [Phones] DB
50k 100k 41 Macrophone
96% Time Reduction, 32% Coverage Loss
Advantages:
Reduces number of
hypotheses
“Breaks” the distance
measure
Disadvantages:
Complex interface
Tree Distance
12
t
L4
“about”
v
L4
“above”
m
L4
“charm”
ey1
L2
{}
ax
L1
“a”
Ch
L1
{}
aw1
L3
{}
ah1
L3
{}
aa1
L2
{}
0
r
L3
{}
n
L3
“chain”
b
L2
{}
Results
13
Coverage [%] Creation Time
[sec] Distance Method
85% 8 Levenshtein Distance
53% 0.32 A-priori Differences
85% 8 Tree Distance
Reduced
Vocabulary Size
Lexicon
Size
Mean Sequence
Size [Phones] DB
50k 100k 41 Macrophone
No Improvement
Stringology – Computer science
O(n1.4N1.2)
Good compression rate
Avg. length (Lexicon Word) ≈ 7 Phones
Bad compression rate
Biology
Blast
Computer network
A Glimpse at Different Disciplines
14
Length(Hyp.) = Length(Word)
Basic comparisons
Uses substitutions only
Does not use:
Diagonal Distance
15
.
X1
XN
.
0
Y1 . . YM
Lexicon
Word
Hypothesis from
Sequence
Insertion Deletion
Results
16
Coverage [%] Creation Time
[sec] Distance Method
85% 8 Levenshtein Distance
53% 0.32 A-priori Differences
85% 8 Tree Distance
83% 0.9 Diagonal Distance
Reduced
Vocabulary Size
Lexicon
Size
Mean Sequence
Size [Phones] DB
50k 100k 41 Macrophone
90% Time Reduction, 2% Coverage Loss
Additional data - Phoneme engine characteristics
Confusion matrix
Posteriori probabilities
Larger distinction between
phones
Disadvantage
Sensitivity to phoneme
engine characteristics Del
Ins
.
Ph1
0
Ph1 . . PhN
PhN
.
i
j
Weighted Distance
17
Results
18
Coverage [%] Creation Time
[sec] Distance Method
85% 8 Levenshtein Distance
53% 0.32 A-priori Differences
85% 8 Tree Distance
83% 0.9 Diagonal Distance
90% 0.9 Diagonal Weighted Distance
Reduced
Vocabulary Size
Lexicon
Size
Mean Sequence
Size [Phones] DB
50k 100k 41 Macrophone
90% Time Reduction, 5% Coverage Improvement
Vocabulary reduction
Problems and Goals
Levenshtein distance method
Very complex in exhaustive search
Improving the distance method
Best method – Weighted diagonal distance method
Next step
Summary
19
20