Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer,...
-
Upload
jeffry-higgins -
Category
Documents
-
view
215 -
download
0
Transcript of Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer,...
Recognition of spoken and spelled proper names
Reporter : CHEN, TZAN HWEI
Author :Michael Meyer, Hermann Hild
2
Outline
Introduction
Experiments
Summary
3
Introduction
The recognition of increasingly large sets of spoken names is difficult : Very large recognition vocabularies contain
many easily confused words or even homophones.
In this paper, it compares the performance for proper name recognition when a name is spoken only, spelled only or both spoken and spelled.
4
Introduction (cont)
In what contexts do people speak and spell names
Table 1: Three scenarios for speaking and spelling a proper name
5
Experiments
Speech data : A database of about 2800 German last name
spoken by 57 different speaker, according to scenario 2.
Recorded with a close-talking microphone at a sampling rate of 16 kHz.
The boundaries between all spoken and spelled names were to identified to conduct scenario 1
6
Experiments (cont)
The pronunciation dictionary covers about half of the 2800 names of our speech data.
The set of 1337 spoken and spelled names is used all the experiments described below.
For experiments, we use a MS-TDNN as a specialized letter recognizer.
And use the LVCSR of the JANUS system as a spoken name recognizer.
7
Experiments (cont)
JANUS : 60.0% names correct were achieved on the
test set of the 1337 spoken last names.
To recognize the spelled name with JANUS, 93.3% correct names were achieved on the spelled names.
MS-TDNN Achieved 96.5% correct spelled names on the
test .
8
Experiments (cont)
Small list : We assume that the list of names to be
recognized is small enough, so that every name can be explicitly represented in the dictionary.
How can we combine the different information provided by the spoken and spelled names?
9
Experiments (cont)
After all, the pronunciations of the spelled letters represent in approximation the sounds of the letters in the fluently spoken words. “TOM” versus “T-O-M”
Exceptions are letters with unusual pronunciation and those letter combination which define their own pronunciation, such as : “Sch , ch”
In the following we will just combines the two representations on the basis of their acoustic scores only
10
Experiments (cont)
Scenario 1
0.96. at corret names 97.7% of rate nrecognitioa in
resultsit names, spelled theof list best -N thein foundalready were whichnames
for those computeonly is names spokenfluently for the list best -N theif However,
abserved. wasabsolute 0.5% ofimprovent ant insignific an 1, toclose factor, theAs
)1()(
scorenew itsby determined islist best -N combined thein name of position The
JANUS.by found as names spoken
fluently oflist best -N thein name same theof score thebe and ,recognizerletter
TDNN- MS theoflist best -N thein found name spelleda of score thebe Let
(i)Y
(i)Y
(i)Y(i)YiY
i
(i)Y
i(i)Y
L
F
FL
F
L
11
Experiments (cont)
Scenario 2 With 86.1% correct, the recognition on the entire
utterance is worse than on the spelled apart alone.
It is possible to adapt a similar approach as in scenario 1. The boundary of the first best hypothesis was used for
the weighting of all hypotheses. Resulting in a recognition rate of 89.1%
To incorporate the MS-TDNN letter recognizer, resulting in 95.8% recognition rate.
12
Experiments (cont)
Figure 1: % names correct for a –weighted combinationof the N-best list of spoken and spelled names (scenario 1 and 2)
13
Experiments (cont)
Large lists If the number of names exceeds the recognizer’s
maximum vocabulary size, a different approach has to be taken.
A two-step approach is employed. A coarse recognition run is used to get a reduced list of
name candidates. Then, these are processed in which all the previously
described techniques for small word lists can be applied.
14
Experiments (cont)
In the case of scenario 1, the list of candidates can be easily reduced if only the spelled names are considered in the first pass.
For scenario 2, only phonemes and letters in JANUS’s recognition vocabulary.
15
Experiments (cont)
For scenario 2 A special language model is employed.
A list of the most similar names can be retrieved, and then used in another JANUS recognition run.
The letter segments are then re-recognized with the MS-TDNN
16
Experiments (cont)
Table 3: Summary of results for the separated and combined recognition of fluently spoken and spelled last names
17
Summary
By combining the N-best lists of both the spoken and spelled recognition, the overall performance can be improved.
An input of either L or FL can be distinguish with almost 99% correct, resulting of 95.5% names correct without a priori knowing whether L or FL was spoken.
Caller Identification from Spelled-Out Personal Data Over the Telephone
Reporter : CHEN, TZAN HWEI
19
Outline
Introduction
The personal identification algorithm
Tests and results
Conclusions
20
Introduction
The problem of automatically identifying the caller in a telephone conversation from the information spoken in the call is extremely difficult.
The identification must take place despite rather substantial speech recognition errors that may be made by the machine.
21
Introduction (cont)
We can find a solution to the problem if we make two assumptions. We assume that there is a database of records
containing personal information about there the caller which can serve as a reference during the identification process.
We ask our caller to spell the personal identifying items so that the spoken vocabulary is small and we can look for correlations with the items in the database.
22
The personal identification algorithm
Fig 1. the algorithm of personal identification from spelled tokens
23
The personal identification algorithm (cont)
Bayesian computation which starts with an estimate, for each record in , of the probability that record represents the identity of the caller.
It uses the acquired information and updates each record’s probability that it corresponds to the current caller’s identity.
The incremental computation is
S
T
)(
)()|()|(
TP
recordPrecordTPTrecordP
24
The personal identification algorithm (cont)
Bayesian update of probabilities
)(
)(),(
)(
)()|(
)|()(
: follows as derived be can update The
)(
)(),()(
in record eachFor
in character eachFor
recognized been has that filed eachFor
ofcharacter
th thebe let and record cached theof filed th theof value thebe Let
ofcharacter th the
be let and ,recognized field th for the ASR by the returned value thebe Let
cached. been has that records ofset thebe Let
'
'
ij
ijij
ij
ijijij
ijij
ij
ijij
iij
i
i
iji
i
iji
fP
tPtfCM
fP
tPtfP
ftPtP
fP
tPtfCMtP
St
ff
f
t
jttit
fj
fi f
S
25
Tests and results
The system was tested using a database of one million records that was constructed by using random combinations of 4,375 female first names, 1,129 male first names, and 88,799 last names.
The account numbers were generated so that the values for the last four digits of the number occurred with equal frequency throughout the database.
The city, postal code, and phone number fields were generated to correspond the locations in the U.K.
26
Tests and results (cont)
Our test involved identifying 300 different records in the database.
If the system was unable to make an identification of the target record after asking the user for all of the information, the caller was asked to make a second attempt using the same information.
If the system failed to produce a result after the second attempt, the call was terminated at that point.
27
Tests and results (cont)
For each telephone call, the users were asked eight questions Enter your ID, using you telephone keypad, followed by
the pound key. If you make a mistake press the star key.
You entered (the value entered in (1)) if this is correct, press ‘1’. If it is not, press’2’.
Please say the first four letters of your last name.
Please say the first four digits of your first name.
28
Tests and results (cont)
Please say the last four digits of your card number.
Could you please spell the city currently listed on your account?
Please say you phone number.
Please say the postal code currently listed on your account.
29
Tests and results (cont)
Fig 2. summary of results from 300 calls.
30
Tests and results (cont)
Fig 3. Rate of ASR character misrecognition by filed
31
Tests and results (cont)
Fig. 4. Rate of misrecognition of field by ASR (misrecognition = at least one error made in spelled filed value)
32
Tests and results (cont)
Fig 5. Average cumulative number of records examined by system
33
Conclusions
The method tolerates high misrecognition rates.
The method can be used with off-the-shelf component; it doesn’t require specialized ASR.
To allow the personal identification information to be spoken instead of spelled tokens.
34
Record Rv that will be verified
oper
ator
Request T from caller
Collect subset near T
Add subset to S
Update the risk for eachRecord in S
Rm <- min risk in S
Rm == Rv
Accept !
Risk(Rv) < Risk(reject)
Another T?
Reject !
Select another T
No
Yes
Yes
No
No